Educational journalism, not medical advice. Every claim here is checked against its cited sources by editor Tim Bunce — a health writer, not a physician. It isn’t specific to your situation: for health decisions, talk to your own clinician. How we work →
The 60-second version
The "AI personal trainer" pitch has been on every fitness app's marketing page since 2023. Most of what's actually shipped is a thin layer of personalization — a recommendation system more than a coach. The real progress has been quieter and is mostly happening in three specific places: load auto-regulation, recovery prediction, and program structure inference. While AI handles dose calculations and fatigue tracking better than subjective "feel," it still lacks the context to understand injuries, life stress, and long-horizon periodization. AI is currently an exceptional assistant for human coaches, rather than a replacement for them.
Three real things AI does well today
1. Real-time load auto-regulation
The least flashy and most useful application. Apps that integrate with smart barbell sensors (Vitruve, Output Sports) measure bar velocity in real time and prescribe the next set's load to keep velocity in a target range. This is velocity-based training — a well-established methodology — automated. After 6 to 8 sessions of training data, the regression models typically converge with a prediction error below 5 percent Spitz 2018.
2. Recovery prediction from wearable data
Garmin, Whoop, Oura, and Fitbit all attempt the same task: predict readiness based on HRV, resting heart rate, sleep architecture, and recent training load. While these scores correlate with subjective readiness about as well as a coach's morning check-in, they cannot yet predict what to train today as a function of that score. They provide a "go hard" or "go easy" verdict that often ignores the user's actual program.
3. Program structure inference
Systems like Hyperhuman and Hevy AI ingest months of training data and produce a recommended next mesocycle. The output is plausible, picking volumes and intensities that resemble competent human coaching. However, the AI still misses the "why" — it doesn't know if a plateau is due to a technical fault or if you have a specific competition goal 12 weeks out.
Where the AI fails
- Context outside the training log. The model only knows what you tell it. It can't see the stress of a newborn, a difficult week at work, or a nagging joint issue unless it's explicitly logged.
- Long-horizon planning. Most AI programs handle the next 4 weeks well but struggle with the periodization shapes (accumulation, intensification, peak, deload) needed for long-term athlete development.
- Transfer to non-numerical goals. "I want to get back to enjoying training" is a common human goal that AI cannot yet operationalize.
The medium-term horizon
Over the next 24 to 36 months, we expect to see multi-modal training models where vision sensors watch your squat form and feed quality scores back into the load-prescription engine. We will also see tighter integration between wearable recovery data and specific session adjustments — moving from "you are tired" to "reduce today’s squat volume by 2 sets."
The unchanging fundamentals
AI does not change the laws of physiology. You still need progressive overload, adequate protein, and consistent sleep. The biggest mistake the early AI fitness wave made was promising "smarter training" when adherence, not optimization, is the bottleneck for 95% of users. The smartest program in the world is useless if you don’t show up.
Practical takeaways
- For lifters: Use auto-regulating apps (like Hevy) if you’re past the beginner phase. They are a meaningful upgrade over fixed-load plans.
- For coaches: Use AI to handle the math (VBT, fatigue tracking, substitutions), so you can focus on the human context and technical coaching.
- Be skeptical of "AI" marketing: If an app doesn’t specify what it’s actually predicting or cite peer-reviewed performance data, it’s likely just a template with a new name.
What the wearable readiness score is actually measuring
The recovery and "readiness" scores at the heart of every AI training app lean heavily on two inputs: heart-rate variability (HRV — the beat-to-beat timing variation that loosely tracks your autonomic nervous system's recovery state) and automated sleep staging (how long the device thinks you spent in light, deep, and REM sleep). It is worth separating what these sensors can validly measure from what the marketing implies. For the simple binary of asleep-versus-awake, wrist and ring devices are genuinely good: a U.S. Navy laboratory study that tested seven consumer trackers against polysomnography — the clinical gold standard with scalp electrodes — found most performed as well as or better than research-grade actigraphy at detecting sleep versus wake Chinoy 2021. The trouble starts when the same device claims to tell you how much deep or REM sleep you got. A 2024 scoping review in npj Digital Medicine covering 35 studies and 62 wearable setups concluded that devices relying only on movement are effective for sleep/wake detection but "fall short" at distinguishing multiple sleep stages, and that even the better accelerometer-plus-optical-sensor combinations remain inconsistent at staging Birrer 2024. In practical terms: trust the trend in your total sleep and your resting heart rate, treat the nightly "2 h 14 m of deep sleep" figure as a rough estimate, and be aware that the readiness number an AI app feeds into tomorrow's load prescription is built partly on that softer data.
Does training to a recovery metric actually beat following a plan?
The deeper question for anyone paying a subscription is whether autoregulation — letting a daily readiness signal nudge your training up or down — produces better results than simply following a sensible pre-written program. Here the honest answer is that the effect is real but modest. A 2021 systematic review and meta-analysis pooled the randomized controlled trials comparing HRV-guided endurance training against predefined training and found the difference in maximal oxygen uptake (VO₂max) was small and not statistically significant (standardized effect of about 0.13), with a similarly small, non-significant edge for performance Manresa-Rocamora 2021. The authors' own summary is the line worth quoting to anyone selling an algorithm: if HRV-guided training is superior to a fixed plan for group-level fitness gains, "it is only by a small margin." The more reliable benefit the trials show is not a bigger average gain but fewer non-responders — the recovery signal seems to catch the individuals who would otherwise overreach and stall. That is a genuine, useful effect, but it is a far cry from the "unlock your potential" framing. The takeaway: the value of an AI readiness feature is mostly in flagging the bad days you would have pushed through, not in finding a hidden optimum on the good ones.
The evidence behind velocity-based prescription
The smart-barbell systems the original article rightly calls the most useful AI application rest on velocity-based training (VBT) — prescribing the next set's load by how fast the bar moves rather than by a fixed percentage of your one-rep max. Because your true daily strength fluctuates, the logic is that velocity auto-corrects for the day you slept badly or the day you feel strong. The evidence supports a small advantage. A 2022 meta-analysis in the International Journal of Sports Medicine comparing VBT with traditional percentage-based training on maximal strength found a small but statistically significant benefit for VBT (effect size 0.26), rising to a slightly larger advantage (0.35) in trained athletes specifically Zhang 2022. This is one of the few places in consumer fitness AI where the underlying method is backed by randomized trials rather than vendor white papers — though note the same caveat as above: the advantage is meaningful for competitive athletes chasing single-digit-percent gains, and largely irrelevant for a beginner whose progress is governed by simply showing up and adding weight over months. The algorithm is automating a method that works; it is not the algorithm itself that has been tested.
The privacy cost most users never price in
One thing the optimization conversation almost always leaves out is what happens to the data the moment you grant the app access. Health and fitness apps sit at the center of a sprawling data-sharing ecosystem, and the sharing is routine rather than exceptional. A 2019 analysis published in The BMJ traced the data flows of 24 prominent health-related apps and found that user data was transmitted to a median of three external parties per app, that 55 distinct entities (owned by 46 parent companies) received or processed that data, and that the third parties in turn advertised the ability to pass it to more than 200 "fourth parties" — with Alphabet (Google) and Amazon receiving the largest volumes Grundy 2019. The risk is not hypothetical. In February 2023 the U.S. Federal Trade Commission took its first-ever enforcement action under the Health Breach Notification Rule against GoodRx, which had promised never to share users' personal health information with advertisers yet disclosed it to Facebook, Google, Criteo and others; the company agreed to a $1.5 million civil penalty and a ban on sharing health data for advertising FTC 2023. None of this means you should refuse to use these tools — it means you should treat your HRV, sleep, weight and menstrual-cycle data as the sensitive medical information it is. Read what the app says it shares, prefer products that process data on-device, and assume that a free "AI coach" is monetizing something, and that something is often you.
So who is this actually for?
Putting the evidence together gives a clearer buyer's guide than the marketing does. The lifter or endurance athlete already training hard and consistently is the person most likely to extract the small, real edge that VBT and HRV-guided autoregulation offer — they are operating close enough to their ceiling that catching an overreaching week or shaving a percent off a load actually matters. The beginner or intermittent exerciser is almost certainly paying for precision they cannot use: their results are dominated by adherence and progressive overload, not by tomorrow's readiness score, and the same money spent on a coach, a training partner, or simply a written plan will do more. As the validation literature makes plain, the sensors are best treated as trend-trackers rather than oracles — useful for spotting the bad night or the creeping fatigue, unreliable for the precise stage-by-stage numbers the dashboards display so confidently Birrer 2024. And because almost none of these apps have been tested as products in randomized trials — the evidence is for the underlying methods, not the specific algorithm in your pocket — a healthy skepticism toward any claim that an app will "optimize" your training is, on the current evidence, the correct default. If a recovery score and a coach disagree, the coach who can see your life still wins.
References
Spitz 2018Spitz RW, Gonzalez AM, Willoughby DS, et al. Barbell Velocity: A Novel Training Tool for the 21st Century. IEEE. 2018. Plews 2013Plews DJ, Laursen PB, Stanley J, et al. Training adaptation and heart rate variability in elite endurance athletes: opening the door to effective monitoring. Sports Med. 2013;43(9):773-781. View source →Chinoy 2021Chinoy ED, Cuellar JA, Huwa KE, et al. Performance of seven consumer sleep-tracking devices compared with polysomnography. Sleep. 2021;44(5):zsaa291. doi:10.1093/sleep/zsaa291. View source →Birrer 2024Birrer V, Elgendi M, Lambercy O, Menon C. Evaluating reliability in wearable devices for sleep staging. npj Digital Medicine. 2024;7:74. doi:10.1038/s41746-024-01016-9. View source →Manresa-Rocamora 2021Manresa-Rocamora A, Sarabia JM, Javaloyes A, Flatt AA, Moya-Ramon M. Heart Rate Variability-Guided Training for Enhancing Cardiac-Vagal Modulation, Aerobic Fitness, and Endurance Performance: A Methodological Systematic Review with Meta-Analysis. International Journal of Environmental Research and Public Health. 2021;18(19):10299. doi:10.3390/ijerph181910299. View source →Zhang 2022Zhang M, Tan Q, Sun J, Ding S, Yang Q, Zhang Z, Liu J. Comparison of Velocity and Percentage-based Training on Maximal Strength: Meta-analysis. International Journal of Sports Medicine. 2022;43(12):981-995. doi:10.1055/a-1790-8546. View source →Grundy 2019Grundy Q, Chiu K, Held F, Continella A, Bero L, Holz R. Data sharing practices of medicines related apps and the mobile ecosystem: traffic, content, and network analysis. BMJ. 2019;364:l920. doi:10.1136/bmj.l920. View source →FTC 2023Federal Trade Commission. FTC Enforcement Action to Bar GoodRx from Sharing Consumers' Sensitive Health Info for Advertising. February 1, 2023. View source →


