Most manuscripts submitted to medical journals do not correctly report statistics. That sentence sounds like an exaggeration, but the evidence behind it is consistent. A 2025 study reviewing 100 clinical medicine papers found that 65 percent of them needed revisions to adequately describe the purpose and application of their statistical tests, and 64 percent were missing effect size reporting entirely. A parallel study examining 150 editorial reviews across specialty journals found similar patterns. These are not obscure methodological edge cases. They are the core elements of any statistical results section, and most papers are sending them out incomplete.
The reason this matters beyond methodology courses is straightforward. When statistical reporting is thin, readers cannot judge whether a finding is clinically meaningful. A P-value of 0.04 tells a cardiologist that the result was unlikely under the null hypothesis. It does not tell them whether the drug reduced mortality by 0.1 percent or 8 percent. The number needed to treat, the absolute risk reduction, the hazard ratio with its confidence interval: these are the numbers clinicians use to make decisions. When they are missing, the paper is less useful regardless of how sophisticated the underlying analysis was.
The SAMPL guidelines, developed in 2015 and since endorsed by a growing list of journals and publishers, offer a practical checklist for what statistical reporting should include at a minimum. They are not widely enforced at submission, which is part of why compliance remains low. But several major journals have tightened their own policies in ways that overlap directly with SAMPL requirements. If you are preparing a clinical study for submission in 2026, checking your statistics section against SAMPL before you send is one of the fastest ways to reduce revision cycles.
Working Principle
P-values answer the question “is this result unlikely under the null hypothesis?” Effect sizes answer the question “how large is this effect?” Most readers of clinical research need both, but only the second tells them whether the finding is meaningful in practice. Report both, with confidence intervals.
Where the SAMPL Guidelines Came From and What They Cover
The SAMPL guidelines, short for Statistical Analyses and Methods in the Published Literature, were formulated by Thomas Lang and Douglas Altman as a practical companion for biomedical researchers and journal editors. Published in 2015 as a chapter within a Wiley manual on reporting health research, they were designed to fill a gap that had frustrated statistical reviewers for years: the absence of a single accessible checklist covering the most common statistical methods used in biomedical studies. Where guidelines like CONSORT and STROBE addressed the overall structure of study reports, SAMPL focused specifically on how the numbers inside those reports should be presented.
SAMPL guidelines are organized around common analytical approaches rather than study designs, which makes them usable across almost any research type. They provide specific guidance for descriptive statistics (including means, medians, standard deviations, and interquartile ranges, and when each is appropriate), for inferential statistics (naming tests, explaining assumptions, and interpreting outputs), for estimates of association and effect (odds ratios, risk ratios, regression coefficients, correlation coefficients, and hazard ratios), and for confidence intervals, sample size justifications, and handling of subgroup analyses and multiplicity. A researcher who works through the SAMPL checklist for their specific analyses will catch most of the problems that statistical reviewers routinely flag.
The EQUATOR Network, which maintains a searchable library of reporting guidelines for health research, lists SAMPL alongside more widely known standards such as CONSORT, STROBE, and PRISMA. Several journals now reference SAMPL in their instructions for authors, though relatively few enforce it at the submission screen. The practical effect is that compliance is voluntary but increasingly expected at the review stage at journals with dedicated statistical reviewers, including JAMA, the BMJ, and titles in the Springer Nature medical portfolio.
The Evidence That Statistical Reporting Still Falls Short
The 2025 audit examined 100 clinical medicine articles drawn from biomedical databases and evaluated each against the SAMPL checklist. The authors then tracked which recommendations authors actually implemented during the revision process, allowing them to quantify both the frequency of deficiencies and the feasibility of correction. The results were discouraging but consistent with earlier studies from other groups.
The most common problem was inadequate description of the purpose and application of statistical tests, found in 65 percent of papers. Authors routinely named a test without explaining why they chose it or what assumption its use requires. A paper might state that a Wilcoxon signed-rank test was used without noting whether the data failed a normality check, why the non-parametric approach was preferred over a paired t-test, or what the comparison was actually testing. Reviewers who know statistics can infer some of this, but readers cannot, and the information should appear in the methods section regardless.
Effect size reporting was the second most common gap, affecting 64 percent of papers. The 2025 study found that most authors either omitted effect sizes entirely or reported P-values in their place, which is not equivalent. A P-value below 0.05 and an odds ratio of 1.04 with a confidence interval from 1.00 to 1.08 tell very different stories about the size of a finding. The P-value alone conceals the second one.
Fifty-eight percent of papers had inadequate reporting of statistical assumptions. This includes noting whether continuous data were assessed for normality before applying a parametric test, or whether proportional hazards assumptions were verified in a Cox regression. Readers and reviewers cannot evaluate whether the choice of test was appropriate without this information. The 2025 study found that adding these disclosures was among the easiest corrections to implement once reviewers flagged them, suggesting that the barrier is habit and awareness rather than complexity.
Outlier handling, while less common a problem than the others, affected 34 percent of papers. In clinical data, outliers are frequent, and an undisclosed decision to exclude or retain extreme values can materially shift results. The SAMPL guidelines ask authors to state whether outliers were identified, how they were defined, and what was done about them.
P-Values and What Major Journals Now Require
The New England Journal of Medicine updated its statistical reporting guidelines in 2019 in a way that continues to shape expectations at high-impact medical journals. Under those guidelines, when neither the clinical trial protocol nor the statistical analysis plan specified a method for adjusting for multiplicity, reports of secondary and exploratory endpoints should be limited to point estimates of treatment effects with 95 percent confidence intervals. P-values should not serve as the primary summary for those analyses.
This change was significant because secondary endpoints in clinical trials are where P-value fishing and selective reporting have historically caused the most damage. The NEJM policy acknowledged explicitly that reporting 30 P-values from a single trial and highlighting the subset below 0.05 is misleading, because the false positive rate for that exercise is not 5 percent per test but something considerably higher. By shifting emphasis to effect estimates and intervals for secondary endpoints, the journal was forcing authors to present findings as estimates with uncertainty rather than as binary significance decisions.
The American Statistical Association reinforced this direction in its 2019 statement on statistical significance, followed by a 2022 update on replication and effect sizes. The core message was that statistical significance, defined as P below some threshold, should not serve as the primary basis for scientific conclusions. Effect sizes, confidence intervals, and pre-specified analysis plans provide a more honest summary of evidence. Most major medical publishers have updated their author guidelines in some form to reflect this, even though enforcement varies considerably across titles.
The BMJ has had a statistical checklist integrated into its review process for decades, and JAMA employs statistical editors who review manuscripts for quantitative adequacy alongside subject-matter peer reviewers. Both journals have explicit requirements for effect size reporting alongside P-values in original research. At journals that do not employ statistical editors, the SAMPL checklist becomes a form of self-review that catches the same category of errors before they reach a reviewer.
Reporting Effect Sizes: The Most Commonly Missing Element
Effect size is not a single number but a family of measures, each appropriate to different study designs. For clinical trials comparing two groups on a continuous outcome, common choices include the mean difference with its unit and confidence interval, or the standardized mean difference (Cohen's d) when scales differ across studies. For dichotomous outcomes, absolute risk reduction, relative risk reduction, and number needed to treat are the measures that carry the most clinical weight. Hazard ratios are standard for time-to-event data.
For observational studies, the odds ratio and relative risk are both common, but their difference matters. The odds ratio from a case-control study is not directly interpretable as a risk ratio unless disease prevalence is very low. A paper that presents only odds ratios when absolute risk reduction could be calculated is asking readers to do extra work, and many will not. SAMPL guidelines ask authors to choose the effect size measure appropriate to their study design and to report it with a confidence interval in every results table or figure where a test is mentioned.
Some specialty journals have their own conventions. Cardiology journals expect hazard ratios from Kaplan-Meier analyses. Psychiatry journals often expect standardized mean differences for psychometric outcomes. Oncology journals frequently require reporting in terms of time-to-progression or response rate rather than P-values alone. Before selecting an effect size measure, look at how recently published papers in the target journal present similar results. Matching the journal's preferred convention reduces reviewer friction, but do not omit the measure entirely simply because the convention is unclear: ask a statistician which is most appropriate, or consult the SAMPL guidance for your analysis type.
Common effect size measures by study design
- RCT, continuous outcome:Mean difference, standardized mean difference (Cohen's d), with 95% CI.
- RCT, binary outcome:Absolute risk reduction, relative risk reduction, number needed to treat.
- Survival analysis:Hazard ratio with 95% CI; median survival times with CI.
- Case-control study:Odds ratio with 95% CI; note that OR does not approximate RR unless disease is rare.
- Cohort study:Risk ratio or rate ratio, with 95% CI; absolute risk difference where meaningful.
- Correlation:Pearson or Spearman r with CI; note r-squared for proportion of variance explained.
Confidence Intervals, Precision, and What Readers Actually Need
A 95 percent confidence interval expresses the range of effect sizes reasonably compatible with the observed data under the assumed statistical model. Wide intervals indicate imprecision, not just uncertainty about whether an effect exists. A trial that finds a hazard ratio of 0.75 with a confidence interval from 0.30 to 1.88 is telling the reader that the data are compatible with meaningful benefit, with no effect, and possibly with harm. The interval is so wide that the trial is essentially uninformative about effect direction, regardless of what the P-value says.
Authors sometimes present narrow P-values (0.03, 0.04) without confidence intervals, hoping the reader will interpret the result as settled. This is misleading. A result with P = 0.04 and a confidence interval from 0.01 to 0.98 for an odds ratio is technically significant but is clinically uninformative. The interval tells a different story than the P-value. Readers who understand this will look for the interval. Reviewers at JAMA, the Lancet, and the BMJ will flag its absence. Some journals now require 95 percent confidence intervals and will return manuscripts that omit them before peer review begins.
One practical rule: whenever you report a P-value for a primary or secondary outcome, the corresponding confidence interval should appear in the same sentence or in the same table cell. A results table that lists only P-values is a statistical reporting failure regardless of how well the methods section was written.
Multiple Testing, Subgroup Analyses, and Exploratory Claims
Multiple testing is the area where statistical reporting most often misleads rather than informs. A trial with one primary endpoint and twelve secondary outcomes that presents P-values for all thirteen comparisons without any adjustment has an actual false positive rate far exceeding 5 percent per comparison. This is not a mistake limited to junior researchers. Published meta-research has found inflated P-value distributions in high-impact medical journals, consistent with selective reporting of secondary analyses, which is the practice of running many comparisons and presenting only the ones that reached significance.
The NEJM's 2019 statistical guidelines addressed this explicitly. The standard for 2026 is to prespecify primary and secondary endpoints in a clinical trial registry before data collection begins, and that registration number belongs in the manuscript. The endpoints in the paper should match those in the registry. When exploratory analyses are presented, they should be labeled as such, and no strong causal inference should be drawn from them without qualification.
Subgroup analyses present a related problem. They are almost always underpowered to detect true subgroup-specific effects, and they are prone to false positives through the same multiplicity mechanism. A subgroup analysis that produces a P-value of 0.03 for an interaction term sounds impressive until you realize it was one of eight subgroups tested without adjustment. The appropriate response is to present the interaction test result honestly, note the lack of multiplicity adjustment, and describe the finding as hypothesis-generating rather than confirmatory. SAMPL guidelines specifically ask authors to indicate whether subgroup analyses were pre-specified or post-hoc, and many journals now use this distinction as a condition for whether subgroup results appear in the abstract.
Statistical Assumptions and When You Must Report Them
Statistical tests carry assumptions, and reporting whether those assumptions were checked is part of what SAMPL requires. For parametric tests including t-tests, ANOVA, and linear regression, the relevant questions are whether continuous outcomes were assessed for normality, whether variances were checked for homogeneity, and whether corrections such as Welch's t-test were applied where appropriate. For regression models, relevant assumptions include the linearity of the relationship between predictors and outcome, independence of observations, absence of problematic collinearity among predictors, and, for logistic regression, absence of complete separation.
None of these require paragraph-length descriptions in the methods section of most papers. A sentence noting that model assumptions were checked and that residual diagnostics confirmed adequate fit is usually sufficient. The problem identified in the 2025 audit was not that authors wrote too little about assumptions in their results; it was that they wrote nothing at all, leaving reviewers to wonder whether any checking was done. A statistical editor at a major journal will ask about this. A statistical reviewer at a specialty journal will ask about this. Writing it proactively removes a predictable question from the revision cycle.
For non-parametric tests, the methods section should explain why the parametric alternative was rejected. For survival analyses, proportional hazards assumptions should be stated as checked, with a note on the approach taken if they were violated. For mixed-effects models, the covariance structure chosen and the rationale for it should be mentioned. These are not requests for exhaustive statistical theory. They are documentation of decisions that readers need to evaluate the analysis.
Missing Data, Software, and the Rest of the Checklist
Two items from the SAMPL checklist that authors often forget: missing data handling and statistical software reporting. Missing data in clinical research is nearly universal, and how it is handled can meaningfully affect results. A paper that excludes cases with missing covariates without noting this, or that uses complete-case analysis without mentioning it, is omitting information that affects reproducibility. SAMPL guidelines ask authors to describe the proportion of missing data for key variables, whether the pattern was missing at random or not (even if only addressed briefly), and what approach was used. At a minimum, noting that multiple imputation or complete-case analysis was used with the proportion of cases affected is sufficient for most manuscripts.
Statistical software and version should appear in the methods section for every paper. This is not ceremonial. Certain packages implement statistical tests differently, make different default assumptions about ties in non-parametric tests, or use different algorithms for mixed-effects model fitting. Readers who want to verify calculations, or who need to explain why their replication attempt produced a slightly different number, need the software name and version. R version 4.4.1 with specified packages, Stata 18, SAS 9.4, SPSS 29: a one-line entry in your statistical methods paragraph covers this completely.
Pre-submission statistical checklist
- Have you named every statistical test and explained why it was chosen?
- Have you reported effect sizes, not just P-values, for primary and secondary outcomes?
- Have you included 95 percent confidence intervals for all estimates?
- Have you described how statistical assumptions were checked for each major analysis?
- Have you labeled subgroup and post-hoc analyses as exploratory?
- Have you noted how missing data were handled and in what proportion?
- Have you confirmed that the endpoints in your paper match those registered in the trial registry?
- Have you reported the statistical software name and version?
- Have you described the sample size calculation and the assumptions used?
Where Journals Are Heading on Statistical Standards
The trajectory in statistical reporting requirements is toward greater specificity and transparency. Several journals in the Springer Nature portfolio have adopted standards from the Center for Open Science's Transparency and Openness Promotion (TOP) guidelines, which include standards for statistical reporting alongside data sharing and preregistration. The EQUATOR Network, which maintains a searchable library of reporting guidelines, lists SAMPL among its recommended standards and is cited in a growing number of journal author instructions. JMIR Publications, which publishes several large-volume open access journals covering digital health, has published explicit statistical reporting guidelines aligned with SAMPL.
The change is slow but visible. Journals that enforced weak statistical standards five years ago are increasingly adopting the practice of sending manuscripts to statistical reviewers alongside domain peer reviewers. That means statistical weaknesses that once survived review are now more likely to come back as major revision comments, sometimes after months in the system. An author who checks their statistics section against SAMPL before submission is not just following a best-practice checklist: they are getting ahead of a review cycle that is becoming more demanding.
A meta-research study posted to medRxiv in early 2026, tracing P-value reporting in biomedical literature from 1990 to 2025, found that despite years of advocacy for effect sizes and confidence intervals, P-value-only reporting has declined more slowly than advocates hoped. Old habits persist in part because researchers learn statistical reporting from papers they read, and the papers they read were themselves written before current standards took hold. Breaking that cycle requires active attention to current guidelines rather than pattern-matching to old papers in your field.
The practical implication for authors preparing manuscripts now is to build a statistical review step into the pre-submission process rather than treating it as something peer review will catch later. The 2025 audit data shows that most papers still arrive with fixable statistical problems. Those problems delay publication, generate extra revision rounds, and occasionally lead to rejection at journals with strong statistical standards. Spending two hours on a pre-submission statistics check before you send is one of the most efficient investments available at the end of a research project.
Further Reading
CONSORT 2025: Updated Trial Reporting Guideline
What the seven new checklist items and the open science section require of randomized trial authors.
The STROBE Checklist for Observational Studies
A practical guide to reporting cohort, case-control, and cross-sectional studies before submission.
PRISMA 2020: Reporting Systematic Reviews
What the updated checklist requires for systematic reviews and meta-analyses in 2026.
How to Write an Academic Research Paper
A complete guide to research paper structure and the writing process from start to submission.
Written by Dr. Meng Zhao
Physician-Scientist · Founder, LabCat AI
MD · Former Neurosurgeon · Medical AI Researcher
Dr. Meng Zhao is a former neurosurgeon turned medical-AI researcher. After years in the operating room, he moved into applied AI for clinical workflows and now leads LabCat AI, a medical-AI company working on decision support and research tooling for clinicians. He built Journal Metrics as a free resource for researchers who need reliable journal metrics without paid database subscriptions.
Related Articles
PRISMA 2020: A Practical Guide to Reporting Systematic Reviews and Meta-Analyses in 2026
PRISMA 2020 added a separate 12-item abstract checklist, new items on automation tools, certainty of evidence, and data availability. A 2025 meta-epidemiological study found average adherence of only 42.64%. Here is what medical authors writing systematic reviews must address before submission.
16 min readWriting GuideChecking Citations for Retractions Before Journal Submission: A 2026 Guide
A 2026 JMIR study found that freely available AI tools cannot reliably flag retracted literature. Here is how to use scite.ai, the Crossref Retraction Watch API, and a practical pre-submission workflow to catch retracted papers before editors do.
15 min readWriting GuideSex and Gender Reporting in Medical Manuscripts: What the SAGER Guidelines Require in 2026
The SAGER guidelines turned ten in 2026, and Nature Portfolio, the Lancet family, and WHO now mandate sex- and gender-disaggregated reporting. A 2026 Communications Medicine study found fewer than half of eligible papers run sex-based analyses even when both sexes are enrolled. Here is what medical authors must do before submitting.
15 min read