Skeletal muscle is critical for numerous functional and metabolic processes essential to good health. Resistance training (RT), muscle contraction against external weight, potently increases muscle strength and mass (hypertrophy), improves physical performance, provides a myriad of metabolic-health benefits and combats chronic disease risk.1–4 Although endogenous biological and physiological factors are pertinent to maximising RT-induced skeletal muscle adaptations,5 6 RT programming variables can affect RT adaptations.7–13 Therefore, a RT prescription (RTx) should be determined appropriately. Each RTx is comprised of a distinct combination of RT variables, and the most-studied RTx variables include the load lifted per repetition, sets per exercise (generally involving a single RT manoeuvre or muscle group) and weekly frequency (the number of RT sessions completed per week).
Guideline developers rely on systematic reviews and meta-analyses for determining recommendations, as these study designs are, in most cases, the most robust forms of evidence.14 Indeed, various meta-analyses have provided seminal evidence on the univariate impact of load,15–18 sets19–22 or frequency23–27 to improve muscle strength, mass and physical function. However, these univariate analyses limit RT guideline development because individual RT variables are neither mutually exclusive nor prescribed independently; rather, several variables are collectively inherent to any RTx. Comparisons between multivariate RT prescriptions are needed to advance optimal RTx guidelines.
Pairwise meta-analyses are methodologically constrained to only comparing two RTxs.28 Several RTxs are conceivable, and multiple pairwise meta-analyses are unlikely to yield congruent insights. Network meta-analysis (NMA) expands on pairwise meta-analysis by permitting the simultaneous comparison of multiple treatments.29 NMA leverages direct and indirect evidence to produce enhanced effect estimates between all treatments, even when some comparisons have never been tested in randomised trials.30 Additionally, NMA permits the rank-ordering of all included treatments and the incorporation of data from multi-arm trials.28 Within exercise science, NMA has been used to compare different types of exercise31–34; within RT, NMA has only been used to compare different load doses.35 Importantly, NMA can compare several multivariate RTxs.
The purpose of this systematic review and NMA was to determine how different RTxs affect muscle strength, hypertrophy and physical function in healthy adults. Specifically, we sought to compare distinct combinations of RTx variables—load, sets and frequency—and non-exercising control groups. For each outcome, we used NMA to integrate data from randomised trials.
Protocol and registration
This review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension statement for network meta-analyses (PRISMA-NMA)36 and Cochrane Handbook for Systematic Reviews of Interventions.37 The PRISMA-NMA checklist is provided in online supplemental appendix 1. This review combines NMAs registered in the International Prospective Register of Systematic Reviews (https://www.crd.york.ac.uk/prospero/).
The eligibility criteria are detailed in table 1. Only trials that included healthy adults ≥18 years old, were randomised, compared at least 2 of 13 unique conditions (box 1), and measured muscle strength, size and/or physical function were included. Physical function was subdivided into three domains: mobility, the ability to physically move; balance, the ability to maintain a body position during a task; and gait speed, the time taken to locomote over a given distance. Trials that included athletes, persons with comorbidities or military persons; spanned <6 weeks; involved unsupervised RT (eg, home-based exercise); were reported in a non-English language; or were non-randomised were excluded.
Description of predefined conditions
Condition acronym – condition description
CTRL – non-exercise control.
LS1 – lower load, single set/exercise, 1 day/week day/week resistance training.
LS2 – lower load, single set/exercise, 2 days/week days/week resistance training.
LS3 – lower load, single set/exercise, ≥3 days/week resistance training.
LM1 – lower load, multiple sets/exercise, 1 day/week day/week resistance training.
LM2 – lower load, multiple sets/exercise, 2 days/week days/week resistance training.
LM3 – lower load, multiple sets/exercise, ≥3 days/week resistance training.
HS1 – higher load, single set/exercise, 1 day/week day/week resistance training.
HS2 – higher load, single set/exercise, 2 days/week days/week resistance training.
HS3 – higher load, single set/exercise, ≥3 days/week resistance training.
HM1 – higher load, multiple sets/exercise, 1 day/week day/week resistance training.
HM2 – higher load, multiple 2 sets/exercise, 2 days/week days/week resistance training.
HM3 – higher load, multiple sets/exercise, ≥3 days/week resistance training.
Condition coding framework
Arms of included studies were classified as 1 of 12 RTxs or non-exercise control (CTRL). Each RTx was classified based on the load, set and frequency prescription (box 1). RTxs were denoted with a three-character acronym—XY#—where X is load (H, ≥80% one-repetition maximum (1RM); L, <80% 1RM); Y is sets (M, multiset; S, single-set); and # is the weekly frequency (3, ≥3 days/week; 2, 2 days/week; 1, 1 day/week), respectively. For example, HM2 denotes higher-load, multiset, twice-weekly RT within this framework. CTRL was comprised of subjects who received no intervention.
MEDLINE, Embase, Emcare, SPORTDiscus, CINAHL and Web of Science were systematically searched until 7 February 2022. Multiple experts developed the search strategy, which included subject headings and keywords specific to the research question and each database. No language nor study design limits were used in the search strategy. The complete search strategy is provided in online supplemental appendix 2. Relevant systematic reviews (online supplemental appendix 3) were manually selected, and the references were scrutinised for eligibility.
Study selection and data extraction
All records underwent title/abstract screening by two independent reviewers, with discrepancies resolved by a third reviewer. The full text of potentially eligible reports was then assessed for inclusion by two independent reviewers, with discrepancies resolved by a third reviewer. Reports deemed eligible for inclusion then underwent data extraction.
Data from included studies were extracted independently by pairs of reviewers, with any discrepancies resolved by consensus with a third reviewer (BSC or JCM). Extracted data included study and participant characteristics, RTx details and measurements of muscle strength and/or size (online supplemental appendix 4). Measures of mobility, balance and/or gait speed were extracted when the mean participant age was ≥55 years old. Authors of studies with missing data were contacted twice with a request for the missing data. The systematic review software Covidence (Veritas Health Innovation, Melbourne, Australia. Available at www.covidence.org) was used for record screening and data extraction.
Mean change from baseline and SD change (SDchange) from baseline were the outcomes of interest and extracted when reported. When unreported, SD was calculated with SEs, CIs, p values or t-statistics,37 and SDchange was imputed from pre-SD and post-SD values with a correlation coefficient of 0.5.35 RT loads reported as repetition maximum (RM) were converted to a percentage of one-repetition maximum (%1RM) with the equation: %1RM=100−(RM(2.5)).38 The highest-ranked measurement was extracted, per predetermined hierarchy (online supplemental appendix 5), when multiple measurements were reported for a single outcome (eg, MRI and ultrasonography for muscle size). The longest period that all conditions were unchanged from baseline was analysed when the outcome(s) of interest were measured at multiple time points.37 Cohorts randomised separately but reported together (eg, young and old39) were analysed independently. Within-group outcomes reported by participant sex were grouped by condition.37 40
Risk of bias
Reviewers independently evaluated the within-study risk of bias using the Cochrane Risk of Bias V.2.0. tool.41 Signalling questions and criteria were followed to inform the risk of bias appraisals for the intention-to-treat effect. Articles were assessed in duplicate at the strength and hypertrophy outcome level for bias: (1) arising from the randomisation process, (2) due to deviations from intended interventions, (3) due to missing outcome data, (4) in the measurement of the outcome and (5) in the selection of reported result. Every domain was determined to be of high, moderate (some concerns) or low risk of bias, and studies were subsequently given an overall classification of high, moderate or low risk of bias. Any disagreement was resolved by consensus (BSC and JCM).
Standardised mean differences (SMD), adjusted for small-sample size bias,42 were calculated as the summary statistic because each outcome was measured with various tools.37 The direction of effect was standardised to analyse mobility, gait speed and balance to ensure consistency of desirable outcomes.43 When multiple studies compared two conditions, random-effects pairwise meta-analyses were conducted to identify comparison-level heterogeneity, publication bias, outliers and influential cases.40 44 To account for within-trial correlations in multi-arm trials (≥3 conditions), the SE in the base/reference arm was calculated as the square root of the covariance between calculated effects,45 assuming a correlation of 0.5 between effect sizes.46
NMA integrated all direct evidence, with one network constructed for each outcome. NMA models were fitted within a Bayesian framework using Markov chain Monte Carlo methods.47 Four chains were run with non-informative priors. There were 50 000 iterations per chain; the first 20 000 were discarded as burn-in iterations. Values were collected with a thinning interval of 10. Convergence was evaluated by visual inspection of trace plots48 and the potential scale reduction factor. Both fixed-effects and random-effects models were fit, and the more parsimonious model was used for analysis.49 Model fit was assessed with the deviance information criterion (DIC) and posterior mean residual deviance.49 50 Heterogeneity was assessed by examining the between-study SD (τ) and 95% credible intervals (95% CrI). Global inconsistency was assessed by comparing model fit, DIC and variance parameters between the NMA model and an unrelated mean effects (UME) model.51 Local inconsistency was assessed with the node-splitting method,52 and inconsistency was considered to be detected when the Bayesian p value<0.05. Forest plots and league tables were generated to display relative effects. Surface under the cumulative ranking curve values were used to rank-order each condition from top-to-bottom; additionally, the probability of each condition ranking in the top three was calculated as a percentage of the area under the curve. NMA results were presented as posterior SMD and 95% CrI, interpreted as a range in which a parameter lies with a 95% probability.53
Confidence in recommendations
The robustness of recommendations was assessed with threshold analysis.47 54 Several factors, including bias and sampling error, can influence NMA results. Threshold analysis determines how much the included evidence could change—for any reason—before treatment recommendations differ and identifies the subsequent treatment recommendation.55 Identifying the robustness of results with threshold analysis permits guideline developers to have appropriate confidence levels in the reported recommendations.
Sensitivity analysis and network meta-regression
Sensitivity analyses were conducted to explore the impact of outliers, influential cases and sources of network inconsistency on model fit, relative effects and treatment rankings. The first sensitivity analysis excluded studies identified during pairwise meta-analyses and node-splitting, and the second sensitivity analysis excluded node(s) comprised of only one study. Network meta-regression (NMR), assuming independent treatment interactions,56 was performed to determine if additional factors improved model fit and altered treatment effects. NMR covariates included age, training status, the proportion of females, duration, volitional fatigue, relative weekly volume load, outcome measurement tool, outcome measurement region and publication year. Missing data on covariates were managed through multivariate imputation by chained equations (n imputations=20).57 NMR is detailed in online supplemental appendix 12.
All analyses were performed in R V.4.0.4 using the packages: ‘esc’,58 to calculate SMD; ‘dmetar’,40 to conduct pairwise meta-analyses and assess comparison-level heterogeneity; ‘multinma’,47 to conduct NMA, NMR and consistency testing; ‘nmathresh’,54 to perform thresholding; and ‘mice’,59 to perform multiple imputation. Figures were created with multinma,47 metafor60 ggplot2,61 and GraphPad Prism (V.9.1.0 for Windows, GraphPad Software, San Diego, California, USA, www.graphpad.com). All code was made publicly available (see Data Sharing Statement).
Equity, diversity and inclusion statement
Our author group comprises various disciplines, career stages and genders. Data collection, analysis and reporting methods were not altered based on regional, educational or socioeconomic differences of the community in which the included studies were conducted. The only consistently reported equity, diversity and inclusion-relevant variable on which we have analysed the data is biological sex.
The systematic search yielded 16 880 records after duplicates were removed. Following title/abstract screening, 1051 full texts were assessed for inclusion. A total of 192 articles were included in this review (figure 1). Characteristics of included studies are detailed in the online supplemental appendix 6.
Network geometry for strength is displayed in figure 2A. The strength NMA (178 studies, n=5097) included 13 conditions and 32 direct comparisons. The three largest nodes were CTRL (n=1321), LM3 (n=1133) and LM2 (n=710), and the three smallest nodes were HM1 (n=54), LS1 (n=34), and HS1 (n=13). The most common comparisons were LM3 versus CTRL (51 studies), HM3 versus LM3 (32 studies), HM3 versus CTRL (30 studies) and LM2 versus CTRL (30 studies).
Network geometry for hypertrophy is displayed in figure 2B. The hypertrophy NMA (119 studies, n=3364) included 11 conditions—no studies included HS1 or LS1—and 24 direct comparisons. The three largest nodes were CTRL (n=847), LM3 (n=810) and LM2 (n=548), and the three smallest nodes were HS3 (n=60), HS2 (n=21) and HM1 (n=11). The most common comparisons were LM3 versus CTRL (35 studies), HM3 versus LM3 (22 studies), LM2 versus CTRL (18 studies) and HM3 versus CTRL (17 studies).
Risk of bias
Within-study risk of bias was moderate–high for both strength and hypertrophy outcomes. In the strength network, 22%, 67% and 1% of studies had a high, moderate or low risk of bias, respectively. In the hypertrophy network, 18%, 82% and 0% of studies had a high, moderate or low risk of bias, respectively. Study-level risk of bias assessments for both strength and hypertrophy is detailed in online supplemental appendix 7.
RTxs versus CTRL
The relative effect of each RTx compared with CTRL on muscle strength is displayed in figure 3A. The posterior SMD for all prescriptions ranged from 0.75 to 1.60, with the largest relative effect from HM3 (1.60 (1.38 to 1.82)). Compared with CTRL, the relative effect of LS1 (0.75 (−0.16 to 1.68)) and HS1 (0.79 (−0.88 to 2.45)) were the only comparisons that the 95% CrI crossed zero.
The relative effect of each RTx compared with CTRL on muscle hypertrophy is displayed in figure 3B. The posterior SMD for all RTx ranged from 0.10 to 0.66, with the largest relative effect from HM2 (0.66 (0.47 to 0.85)). Compared with CTRL, the relative effect of HS2 (0.10 (-0.57 to 0.80)), HS3 (0.34 (−0.02 to 0.71)) and HM1 (0.40 (−0.35 to 1.17)) were the only comparisons that the 95% CrI crossed zero.
The relative effects from all 133 network comparisons for muscle strength and hypertrophy are displayed in table 2. For comparisons between RTxs (ie, not CTRL), the 95% CrI excluded zero for 13.6% (9/66) and 2.2% (1/45) of comparisons in the strength and hypertrophy NMA, respectively. For muscle strength, there was a 95% probability that HM2 yields a larger relative effect than LS1, LS2, LS3, LM2 and LM3 and that HM3 yields a larger relative effect than LS2, LS3, LM2 and LM3. There was a 95% probability for muscle hypertrophy that HM2 yields a larger relative effect than LS3.
Figure 4 displays the probability that each condition would rank in the top three best interventions for muscle strength and hypertrophy, such that scores closer to 100% indicate a greater chance of ranking in the top three. HM3 (85.5%), HM2 (83.5%) and HM1 (60.5%) were most likely to rank in the top three for muscle strength. HM2 (86.9%), LM1 (48.7%) and LM2 (48.3%) were most likely to rank in the top three for muscle hypertrophy. CTRL was the only condition with a 0% chance for strength and hypertrophy. Posterior rankings and distribution curves for all conditions are reported in the online supplemental appendix 8.
Model fit outputs and node-splitting plots are reported in the online supplemental appendix 9. In the strength network, the UME model (DIC=402.3) was not meaningfully different than the random-effects NMA model (DIC=400.8). Node-splitting was performed on 29 comparisons; the only significant difference was LM1 versus HM1 (p<0.01). In the hypertrophy network, the UME model (DIC=143.1) was meaningfully different than the random-effects NMA model (DIC=137.8). Node-splitting was performed on 22 comparisons; the only significant difference was LS2 versus CTRL (p<0.01).
Threshold analysis results for strength and hypertrophy are shown in online supplemental appendix 10. HM3 was the top-ranked condition for strength; however, 65 comparisons indicated some sensitivity to the level of uncertainty and potential biases in the evidence. The revised top-ranked strength condition was HM2 in 92% (60/65) or HM1 in 8% (5/65) of comparisons. HM2 was the top-ranked condition for hypertrophy, and this finding was robust. Two comparisons indicated some sensitivity to the level of uncertainty and potential biases in the evidence, and HM1 was the revised top-ranked condition in both cases.
Sensitivity analysis results are displayed in the online supplemental appendix 11. For both the strength and hypertrophy NMAs, the second sensitivity analysis (discussed herein) most improved model fit. The strength network included 155 studies (n=4397) and 11 conditions (LS1 and HS1 excluded). The relative effects for all RTx versus CTRL were tempered, such that posterior SMDs ranged from 0.77 to 1.49, with the largest relative effect from HM2 (1.49 (1.29 to 1.70)) and smallest from LS3 (0.77 (0.56 to 0.98)). The 95% CrI for each RTx versus CTRL excluded zero. There was a 95% probability that HM2 yields larger relative effects than LS2, LS3, LM1, LM2, LM3 and HS3; that HM3 was superior to LS2, LS3, LM1, LM2 and LM3; and that LM2 was superior to LS3. HM2 (99.9%) and HM3 (95.7%) remained most likely to rank in the top three for muscle strength.
The hypertrophy network included 115 studies (n=3240) and 9 conditions (HS2 and HM1 excluded). The relative effect for each RTx versus CTRL was roughly unchanged, with the largest relative effect from HM2 (0.59 (0.39 to 0.78)) and the smallest from HS3 (0.30 (−0.05 to 0.66)). Between prescriptions, there was a 95% probability that LM2 was superior to LS3. HM2 (82.8%) and LM2 (80.4%) remained most likely to rank in the top three for muscle hypertrophy.
Network meta-regression results are displayed in the online supplemental appendix 12. Model fit was not meaningfully different than the unadjusted model for all covariates, except relative weekly volume load, which worsened model fit. Age, training status, proportion of females, duration, volitional fatigue, relative weekly volume load, outcome measurement tool, outcome measurement region and publication year did not yield any obvious modifying effect on the relative effect for each RTx versus CTRL, and data-sparse nodes reduced estimate precision.
Physical function results are reported in the online supplemental appendix 13. Few studies assessed mobility (25 studies, n=859, age (mean)=68 years), gait speed (15 studies, n=488, 68 years) and balance/flexibility (11 studies, n=323, 68 years). Compared with CTRL, there was a 95% probability that LM2, LM3 and HM3 improved mobility and gait speed, while HM3 was the only condition that improved balance/flexibility (figure 5). No differences were found between RT prescriptions for any physical function outcome.
Twelve distinct RT prescriptions and non-exercising control groups were compared using network meta-analysis to determine their effect on gains in muscle strength, hypertrophy and improvements in physical function in healthy adults. Compared with no exercise, most load, sets and frequency combinations increased muscle strength and hypertrophy, indicating that several RTx resulted in beneficial skeletal muscle adaptations. RT with higher loads characterised the top-ranked strength prescriptions, and RT with multiple sets characterised the top-ranked hypertrophy prescriptions. A diverse range of RT prescriptions improved physical function, but evidence scarcity limited insights. Guideline developers and practitioners may consider these results when forming recommendations and prescribing RT for healthy adults.
Network meta-analysis has previously been used to compare different types of exercise31–34 and doses of RT load.35 In the NMA by Lopez et al,35 23 (n=582) and 24 (n=604) studies were included in the strength and hypertrophy networks, respectively. The present strength (178 studies, n=5097) and hypertrophy (119 studies, n=3364) networks were much larger, and this is likely attributable to Lopez et al35 excluding studies not including RT to momentary muscular failure and our more comprehensive search strategy (262935 vs 16 880 records identified). This NMA, to our knowledge, represents the largest synthesis of RT data from randomised trials.
All loads, sets and frequency combinations increased muscle strength and size compared with CTRL. There was a 95% probability that RT with at least two sets or two sessions per week increased strength (figure 3A), and training with at least two sets and two sessions per week resulted in hypertrophy (figure 3B). Considering only the lower credible interval limit, each RTx induced at least a moderate (SMD>0.47) and small (SMD>0.16) increase in muscle strength and mass, respectively. Such certainty is not possible for all prescriptions, though, because the 95% CrI crossed zero for two RTx for strength (HS1 and LS1) and three RTx for hypertrophy (HM1, HS2 and HS3), meaning these prescriptions might increase, not change or decrease muscle strength and size. However, we posit that this is unlikely to represent an ineffectiveness of those particular RTx and that imprecise network estimates confound these findings. These strength (HS1 and LS1) and hypertrophy (HM1, HS2 and HS3) nodes included <60 participants and contributed little direct evidence (figure 2). Within each study testing these prescriptions, strength increased significantly compared with CTRL/baseline in all cases and hypertrophy increased from baseline in most cases. Those prescribing RT can be confident that all RTxs increased strength and hypertrophy compared with no exercise.
Network comparisons suggest that most RT prescriptions were comparable for strength and hypertrophy. The 95% CrI contained zero for a striking 91% (101/111) of all between-RTx comparisons (table 2). Nine of the 10 comparisons that did not contain zero were between HM2 or HM3 and a lower-load RTx for strength, suggesting higher-load, multiset programmes caused the largest strength gains. This result remained after sensitivity analyses (online supplemental appendix 11) and aligned with previous meta-analyses that found higher-load RT yields the largest strength gains.17 18 35 A critical point for practitioners is that lower-load RT prescriptions increase strength compared with no exercise. All RT prescriptions may comparably promote muscle hypertrophy, and the influence of load was less apparent. The lack of importance of load for hypertrophy is supported by other analyses,16 17 35 62 but performing RT to momentary muscular failure (fatigue) has been posited as a key component for RT-induced hypertrophy with lower loads.62 Network meta-regression for exercise ‘failure’ (fatigue) did not improve model fit nor substantially alter network estimates, suggesting that lifting to fatigue does not suitably explain the observed hypertrophic response. Our finding in this domain agrees with previous work,63 suggesting that untrained individuals still achieve large gains in skeletal muscle mass without performing RT to failure. Performing RT to momentary muscular failure may, however, be increasingly important for trained individuals.13 For both strength and hypertrophy, though, there was a large credible interval surrounding the non-significant effect estimate for many comparisons between RTxs, so a wide range of different effects are possible for these comparisons. The available evidence does not permit definitive, statistically valid conclusions about the equivalency of each RTx, despite most comparisons between RTxs not being statistically significantly different from each other.
Prescriptions for RT with higher loads were more likely to rank in the top three for strength than all lower-load prescriptions, and RT prescriptions with multiple sets per exercise were most likely to rank in the top three for hypertrophy (figure 4). Rankings are sensitive to uncertainties within the network,28 but posterior ranking credible intervals supported higher-load, multiset programmes being the highest-ranked for strength and multiple sets or multiple sessions being the highest-ranked for hypertrophy. Notably, sets and frequency are major components of RT volume, a key factor for hypertrophy.21 64–66 The probability of each condition ranking in the top three was calculated because the top-ranked RTx does not necessarily reflect the best intervention for all individuals.67 Personal preferences, including disliking higher loads or time constraints, including an inability to train more than once weekly, can be observed while still benefiting from RT. In our view, especially given the low participation rates in RT, practitioners should not avoid prescribing, nor should individuals be discouraged from completing non-top-ranked RTx. While all prescriptions increased muscle strength and mass, the top-ranked prescriptions involved higher loads for strength and higher volume for hypertrophy. We do not know how these RTx affect relevant health outcomes. Some data suggest that health benefits exist with low time commitment (30–60 min/week) to RT and greater time commitment with reduced health benefits.4 68
Ours is the first review to assess confidence in RTx recommendations with threshold analysis. Several factors can influence NMA results,55 and the robustness of treatment recommendations should be considered when interpreting results. Previous methods to evaluate the confidence of meta-analytical findings do not consider how potentially influencing factors can change treatment recommendations55 69 70 or are not yet developed for Bayesian NMA.71 Threshold analysis determines how much the available evidence could change before recommendations differ and identifies a new top-ranked treatment.54 55 Sixty-five direct comparisons were identified that could potentially impact the recommendation of HM3 as the top-ranked strength treatment; however, the revised treatment recommendation was HM2 in 60 of these cases and HM1 in the other five cases (online supplemental appendix 10), suggesting that performing RT with higher loads and multiple sets/exercise are robust recommendation for optimising RT-induced strength gains. The top-ranked RTx for hypertrophy—HM2—was sensitive to the uncertainty of only two comparisons, and HM1 was the revised recommendation because both comparisons were from the same multi-arm study.72 Furthermore, 127 of the 161 direct comparisons would need to change by more than four SDs to alter HM2 as the top recommendation for hypertrophy. The optimised recommendations of higher load, multiple-set programmes for strength and HM2 for hypertrophy were extremely robust.
Current guidelines collectively advise healthy adults to complete RT at least twice weekly.10–12 73 The results herein support these recommendations and should not deter practitioners from promoting existing guidelines to improve strength and hypertrophy, nor do these results contradict the effectiveness of guidelines incorporating additional RTx variables, such as rest intervals and contraction type and velocity.10 12 However, our results support RT at less than recommended often-cited levels for enhancing strength and hypertrophy. Most individuals do not meet current guidelines, and RTx complexities may impede the adoption of RT. Minimal-dose approaches have been proposed to reduce barriers to RT,74 and our results strongly support the WHO’s claim, ‘Doing some activity is better than none’.73 While others attempt to optimise RTx,75 we propose that, for most adults, regularly engaging in any RTx is more important than training to optimise strength and hypertrophy outcomes. Our analysis found multiple RTx comparable for healthy adults to increase muscle strength and mass. Thus, adults should engage in RT, even if they cannot meet existing recommendations.
Risk of bias was frequently introduced by protocol deviations, randomisation procedures and selection of the reported result for both outcomes (online supplemental appendix 7). All three domains were regularly rated “Some concerns” because participants were aware of the intervention, appropriate analyses to estimate the effect of assignment were not performed and randomisation, concealment and prespecified analysis procedures were rarely reported. Double-blinding RT is unfeasible, but the remaining issues are prevalent and reoccurring in RT research.76 Researchers should preregister analysis plans and report randomisation procedures to reduce bias.
Several limitations require acknowledgement and consideration when interpreting the findings of this review. Well-trained elite athletes/military persons and individuals with chronic disease were excluded, so the results should be translated to these populations with caution and additional insights.13 77–79 Mobility, gait speed and balance/flexibility findings should also be interpreted with caution due to the limited evidence available, which could be attributed to including only healthy older (>55 year) adults (eg, not frail). The coding framework for RT prescriptions prevented the inclusion of periodized RT programmes overlapping conditions (eg, loads ranging from 60–90% 1RM) from being captured in the network. Initially, our objective was to further divide the load and set prescriptions; however, this yielded sparse, disconnected networks, violating a critical assumption of NMA.49 The continuous RTx variables investigated herein (load, sets, frequency) were classified categorically, so future work could use dose-response/model-based NMA methods to explore these RTx variables as continuous predictors.80 81 Several acute RT variables were not factored into the included RT prescriptions (eg, inter-set rest, time under tension, repetition velocity, volitional fatigue, tempo); where possible, NMR was used to explore if these factors improved model fit and altered effects. Results from NMR are correlative, however, and should be interpreted cautiously.82 Nonetheless, many variables (inter-set rest, tempo, time under tension) were reported too infrequently for inclusion as covariates. Calculating the relative weekly volume load (ie, load × repetitions/set × number of sets × number of exercises × weekly frequency), which should impact results,21 also required approximations that hindered model fit. The principle of specificity17 (ie, the similarity between training and testing movement) and approximations of muscle mass (,83 eg, lean mass) could infringe on transitivity assumptions37 when integrating results from multiple studies and NMR with the covariates measurement tool and region were imperfect solutions. Including one measurement per outcome for each study may limit the totality of evidence captured by this review, so future methodological work could explore the integration of multiple correlated effect sizes in NMA, as in recent pairwise meta-analyses.63 84 Increasingly, within-subject models are used due to their increased statistical power.85 To our knowledge, however, no methods are available to account for the additional correlation when including within-subject and between-subject comparisons in NMA. With consideration for these limitations, guideline developers and practitioners can obtain meaningful insights from this analysis.
This NMA represents the largest synthesis of RTx data from randomised trials. Most RTx increased muscle strength and mass compared with no exercise. Top-ranked prescriptions for muscle strength were characterised by lifting heavier loads, and multiple sets characterised top-ranked prescriptions for muscle hypertrophy. Guideline developers and practitioners should encourage the adoption of RT since all RTx can increase muscle strength and mass in healthy adults. The effects on health outcomes of various RTx remain largely unknown.