Journals endorsing a reporting guideline and journals enforcing one are different things. For most of the decade following ARRIVE's original 2010 publication, the gap between those two positions was enormous. More than 1,000 journals put the Animal Research: Reporting of In Vivo Experiments (ARRIVE) checklist into their author instructions. Compliance surveys repeatedly found blinding described in roughly one in five papers, sample size justification appearing in fewer than one in ten, and animal characteristics missing in a substantial fraction. The guidelines were everywhere. The reporting they described was not.
ARRIVE 2.0, published simultaneously in PLOS Biology and several other journals in July 2020 by the UK-based National Centre for the Replacement, Refinement and Reduction of Animals in Research (NC3Rs), attempted to address the adoption problem by restructuring the checklist into two tiers. The Essential 10 sets a floor that journals can realistically demand as a minimum condition. The Recommended Set describes the fuller picture of best-practice reporting. The idea was that a tiered structure would be easier to enforce consistently than a monolithic list with no clear minimum threshold.
Five and a half years on, the tiered approach has helped. It has not solved the problem. Journals that mandate the Essential 10 as an editorial requirement, such as those in the Nature Publishing Group portfolio, see measurably better reporting quality in accepted papers. Journals that note the guideline exists without embedding it into their editorial workflow see little change. As of January 2025, the American Association for Laboratory Animal Science (AALAS) journals, including JAALAS and Comparative Medicine, formally adopted ARRIVE 2.0 for all animal research manuscripts. More journals are moving toward the same position. If you are submitting preclinical work, the Essential 10 is now the baseline expectation at a growing number of outlets, not optional guidance.
What ARRIVE 2.0 Is
ARRIVE 2.0 is a reporting guideline developed by NC3Rs for any publication describing in vivo animal experiments. It applies to original research articles and should be completed during manuscript preparation, not assembled retroactively from memory. The checklist, the companion Explanation and Elaboration document, and an adherence checker tool are freely available from the NC3Rs website without registration.
Why Reporting Quality Matters for Animal Research Specifically
The reproducibility crisis is not evenly distributed across biomedical research. Preclinical animal studies have been disproportionately difficult to replicate, and a consistent explanation across the literature is that published methods sections leave too much out for another team to follow. When key methodological decisions are unreported, readers cannot tell whether the findings are reliable or whether the design was structured in ways that would predictably inflate positive results.
Randomization and blinding are the clearest examples. In clinical trials, these have been mandatory disclosure items since the original CONSORT statement was published in the mid-1990s. In animal research, they remain poorly documented despite being just as methodologically important. A study where group allocation was not randomized, or where the investigator assessing outcomes knew which animals received treatment, carries a much higher risk of bias than its results section would suggest. Readers who cannot find a statement about blinding cannot make that risk assessment. They are being asked to evaluate the findings without one of the most relevant pieces of methodological information.
Sample size justification is the other major gap. Many animal research papers report the number of animals used without explaining how that number was determined. When a study is underpowered, a negative result may reflect inadequate sample size rather than absence of a true effect. When a study uses far more animals than a power calculation would have required, there are welfare implications alongside the statistical ones. Reviewers who see no power calculation or sample size rationale are being asked to trust the design without the information that would let them evaluate it. That is not a position any reviewer should accept uncritically, and increasingly they are not.
The ARRIVE 2.0 Structure: Essential 10 and Recommended Set
ARRIVE 2.0 divides its 21 items into two categories because the development team, working through a Delphi exercise with researchers, statisticians, and journal editors, recognized that demanding full compliance with a flat list had not worked. The Essential 10 represents the information without which a reader genuinely cannot assess a study's reliability. The Recommended Set covers methodological context that strengthens a paper but is not treated as a minimum threshold at most journals.
The Essential 10 covers: study design, the sample size and how it was determined, the criteria used to include and exclude animals from the study, the approach to randomization, the approach to blinding, the outcome measures assessed, the statistical methods used, a description of the experimental animals, a description of the experimental procedures, and reporting of the results including any animals excluded from analysis. That is a broad set, but none of it is unreasonable given what peer reviewers need to evaluate the findings presented.
The Recommended Set adds items including the abstract content, background and rationale, study objectives, ethical statement, housing and husbandry conditions, protocol registration, fuller statistical methods detail, baseline data, numbers analyzed with explanation of any exclusions, and interpretation including generalizability to other settings. These items are what distinguishes a paper that clears the floor from one that handles a reviewer's likely questions before they are asked. Many journals that have endorsed ARRIVE 2.0 treat the Recommended Set items as expected at high-quality journals even if they are not strictly mandatory checklist items.
The ARRIVE 2.0 Essential 10
- 1.Study design: the experimental groups, the unit of analysis, and any within-experiment controls.
- 2.Sample size: exact number per group and how the number was determined.
- 3.Inclusion and exclusion criteria: pre-specified criteria for including or excluding animals.
- 4.Randomization: whether it was used, and if so, what method generated the sequence.
- 5.Blinding: who was aware of group allocation at each stage (allocation, conduct, outcome assessment, data analysis).
- 6.Outcome measures: the specific endpoints assessed, and whether they were pre-specified or exploratory.
- 7.Statistical methods: the tests used for each analysis, including any correction for multiple comparisons.
- 8.Experimental animals: species, strain, sex, age, weight range, source, health status, and any prior procedures.
- 9.Experimental procedures: all procedures applied with doses, routes, timing, and relevant welfare details such as analgesia.
- 10.Results: data for all pre-specified outcome measures, including any animals excluded from analysis with reasons given.
The Compliance Data Is Stark
Multiple cross-sectional analyses of animal research publications have found the same pattern regardless of journal prestige or field. Randomization is described in approximately 30 to 40 percent of papers. Blinding appears in roughly 20 percent. Sample size justification is present in fewer than 10 percent. Animal characteristics, including basic details like sex and age, are missing in a substantial proportion. A 2024 cross-sectional study of 943 animal research publications from journals that had published either ARRIVE 1.0 or ARRIVE 2.0 found that while the ARRIVE 2.0 era showed improvement on about 40 percent of comparable items, a significant gap in full compliance persists across the checklist.
The most instructive single study on intervention is a randomised controlled trial conducted at PLOS ONE, designed to test whether requiring authors to submit a completed ARRIVE checklist at manuscript submission would improve reporting quality in published papers. The result was negative. Requesting the checklist at submission did not translate into better information appearing in the methods sections of accepted papers. This finding points to something uncomfortable: compliance is not primarily an awareness problem. Researchers often know what is supposed to be reported. The issue is whether journals make compliance a genuine condition of acceptance rather than a procedural formality that can be satisfied by submitting a checklist full of placeholder responses.
The Nature Publishing Group provides a contrasting example. When Nature journals mandated the ARRIVE checklist as part of editorial policy and directed managing editors to verify responses during desk review, reporting quality improved measurably. The distinction is operational. A journal that asks for a checklist and proceeds regardless is not enforcing the guideline. A journal that treats checklist compliance as a condition for sending the paper to external reviewers is. More journals are moving toward the second position, which is why authors who have previously treated the ARRIVE checklist as an afterthought should revisit that habit now.
What Journals Are Now Doing Differently
The most significant recent development is the AALAS adoption. Starting January 1, 2025, JAALAS and Comparative Medicine formally require ARRIVE 2.0 compliance for all manuscripts reporting animal research. AALAS journals serve the laboratory animal science community, so the signal reaches researchers most likely to be producing preclinical work at scale. Other journals across pharmacology, neuroscience, oncology, and physiology have issued similar updates to their author instructions in the same period. The question for authors is no longer whether their target journal has endorsed the guideline. It is whether that journal has embedded enforcement into its editorial workflow, and that fraction is growing.
Some journals now check ARRIVE compliance at the desk review stage, before the manuscript reaches external reviewers. This means incomplete or vague checklist responses can result in a return to authors without review. That is not a rejection on scientific grounds. It is an avoidable delay that signals to the editorial office that the submission was not carefully prepared, and those impressions linger when the revised version arrives. Treating the Essential 10 as a desk-rejection checklist, rather than a best-efforts aspiration, is the more realistic framing at journals with active enforcement.
How to Check Your Target Journal's ARRIVE Policy
- Search the journal's author instructions page for "ARRIVE". Most endorsing journals name it explicitly and link to the NC3Rs checklist.
- Check whether the checklist is described as required or recommended. That single word changes what happens at submission.
- Look for whether blinding and randomization are specifically called out as items editorial staff will verify at desk review.
- If the journal belongs to a Springer Nature, BMC, or Wiley imprint, check the publisher-level reporting standards page as well as the journal-level instructions, since requirements are sometimes distributed across both.
- If ARRIVE is not mentioned but the journal publishes animal research regularly, contact the editorial office before submitting. Many journals adopt policies before fully updating their public instructions text.
The Four Items That Authors Get Wrong Most Often
Across the Essential 10, four areas account for the largest share of missing or inadequate reporting. Understanding what adequate looks like in each area is the most direct preparation for submission.
Randomizationrequires more than stating that "animals were randomly allocated." The checklist asks for the method used to generate the random sequence. A valid response describes whether a random number generator was used, which tool produced it, whether the allocation sequence was concealed from the investigator distributing animals to groups, and whether any stratification was applied. "Animals were randomly assigned to treatment groups" is not sufficient. "Animals were allocated using a random number generator (R version 4.3, base package) stratified by baseline body weight" is. The difference matters to any reviewer trying to assess whether the randomization was genuine or nominal.
Blinding is more nuanced because complete blinding is not always possible in animal research. The checklist does not require that every stage was blinded. It requires a description of who knew group allocation at each of four stages: during allocation, during the conduct of the experiment, during outcome assessment, and during data analysis. If blinding was not possible at a particular stage, that should be stated with a brief reason. What creates the problem is the absence of any description. Without it, readers are left to assume the worst about bias risk, because they have no information on which to base a more favorable assessment.
Sample size justification is where many authors struggle most. A formal power calculation is the expected answer, but it is not the only acceptable one. If the sample size was determined by resource constraints, by regulatory requirements in the case of safety studies, or by the precedent set in a prior exploratory study that is being replicated, any of those explanations can be stated. The problem is the absence of any explanation at all, which leaves reviewers unable to assess whether the study had a reasonable chance of detecting a biologically meaningful effect. A sentence acknowledging the limitation is far better than silence.
Inclusion and exclusion criteria trips up many authors because the most common failure is retrospective exclusion without documentation. When animals are removed from analysis after data collection, the criteria used to make that decision need to be stated clearly, ideally pre-specified in a protocol. If animals were excluded for humane endpoints, equipment failure, sample processing errors, or outlier criteria applied post-hoc, each of those decisions needs to appear in the results section with the numbers excluded and the specific reason. Failing to report this looks, to a careful reviewer, like selective reporting regardless of whether it actually was, because the information is not there to assess it either way.
Tools That Make Compliance Practical
NC3Rs provides the ARRIVE 2.0 checklist as a downloadable PDF with specific prompts for each item and a column for authors to note the manuscript location of the relevant information. The companion Explanation and Elaboration document, published in PLOS Biology alongside the main guideline, provides worked examples for each item and addresses common edge cases. Both are freely available and worth reading before you draft the methods section, not after.
NC3Rs also developed an ARRIVE adherence checker tool designed to let authors review their manuscript text against checklist items. The tool is intended as a pre-submission verification step rather than an editorial screen, and using it before you finalize the methods section is more productive than using it after the manuscript is otherwise complete. Like any reporting checklist, it is most useful when consulted during writing.
For teams doing preclinical work regularly, building the ARRIVE Essential 10 into the methods template from the start is the most efficient long-term approach. If your standard methods outline is structured around the ten items rather than retrofitted to them at submission, the reporting will almost always be more complete. The most common missing items (blinding, sample size justification, randomization method) are rarely absent because the information does not exist. They are absent because the team assembled the methods section from lab notes and protocol documents without checking systematically what a reader would need to evaluate the design. A template based on the Essential 10 prevents that.
The Recommended Set: When to Go Further
The Recommended Set items do not have a universal mandatory threshold, but several are increasingly expected at specific journals even when the guideline classifies them as non-mandatory. Protocol preregistration is the clearest example. Registering an animal study protocol before data collection begins, on a platform such as the Open Science Framework or protocols.io, is still uncommon in preclinical research. Journals committed to reducing publication bias are beginning to request it, or at least to ask whether it was done. A study that was registered and then conducted as described is a stronger submission, and noting the registration in the methods takes one sentence.
Housing and husbandry details, part of the Recommended Set, are more consequential than many authors realize. Animal behavior, physiology, and response to experimental interventions can vary meaningfully with housing conditions, light cycles, cage type, enrichment practices, and diet composition. When these conditions are omitted, another laboratory attempting to replicate the work has to guess at variables that may actually explain divergent results. This is a documented source of inter-laboratory variability in preclinical models, not a theoretical concern. The information is typically already in the lab protocol. It simply needs to be transferred to the methods section.
The discussion items in the Recommended Set, particularly the generalizability and translation section, ask whether authors address the potential relevance of their findings to the clinical or applied setting. This is not a requirement to speculate broadly. It is a prompt to be honest about what the animal model used can and cannot tell us. Models that are clearly stated to be exploratory or hypothesis-generating are evaluated differently from claims of direct translational relevance, and being specific about which category your study occupies is an act of scientific accuracy rather than understatement.
A Pre-Submission Self-Check for Animal Research Papers
The most efficient way to close compliance gaps is to review the methods and results sections against the Essential 10 before the manuscript goes to co-authors for final approval. At that stage, adding a missing blinding description or a sample size rationale takes minutes. Discovering the omission after submission takes weeks and creates an unnecessary revision cycle before the science even reaches external peer review.
Essential 10 pre-submission check
- Study design: Does the methods section state the experimental groups, the unit of analysis (individual animal, litter, or cage), and what served as the control condition?
- Sample size: Is the exact n per group stated? Is there any explanation of how that number was determined, whether by power calculation, prior literature, resource constraint, or regulatory requirement?
- Inclusion/exclusion criteria: Are the criteria for including animals stated? If any were excluded after data collection started, is each exclusion described with the specific reason and the number of animals affected?
- Randomization: Is there a statement about whether randomization was used? If yes, does it name the specific method used to generate the allocation sequence?
- Blinding: Is there a statement about who was aware of group allocation at each of the four stages: allocation, treatment conduct, outcome assessment, data analysis? If any stage was unblinded, is a reason given?
- Outcome measures: Are all outcome measures listed in the methods? Are they described as pre-specified or exploratory?
- Statistical methods: Is the test used for each comparison stated? Is any correction for multiple comparisons described?
- Experimental animals: Does the methods section state species, strain, sex, age, weight range, source, and health or immune status?
- Experimental procedures: Are all procedures described with doses, routes, timing, frequency, and relevant welfare details including analgesia?
- Results: Are data provided for all pre-specified outcome measures? Are any animals excluded from analysis accounted for with reasons?
Going through this list with the manuscript open takes roughly 15 minutes for a methods section that is already written. If any question cannot be answered by pointing to a specific sentence in the text, that item needs a sentence before submission. Reviewers will not supply the missing information on your behalf, and at journals with active ARRIVE enforcement, a desk editor may not either.
Where This Is Heading
ARRIVE compliance will continue to tighten for straightforward reasons. The preclinical reproducibility problem has attracted attention from funders, regulators, and the public in ways that have created real pressure on journals to clean up methods reporting. The NC3Rs, which developed and maintains the guideline, is a UK government-funded body with a mandate to improve scientific quality alongside animal welfare outcomes. Its institutional backing gives the guideline a durability that many voluntary initiatives lack.
The trajectory is clearly toward more enforcement rather than less. Journals that currently treat ARRIVE as a checkbox requirement are watching neighboring journals move toward genuine editorial scrutiny of checklist responses, and the competitive dynamics in academic publishing push in that direction as research integrity moves up the priority list for editors and editorial boards. Protocol preregistration for animal studies, currently a Recommended Set item, may migrate into expectations for high-profile journals within the next few years, following the trajectory of clinical trial registration a generation earlier.
None of this should be alarming to researchers who are already doing good preclinical science. The ARRIVE 2.0 Essential 10 asks for information that, in most cases, already exists somewhere in your lab notebooks, protocol files, or analysis scripts. The reporting gap is not usually a gap in the underlying methodology. It is a gap in the transfer of that methodology into the published record. Closing that gap consistently, as a matter of standard practice rather than a response to reviewer comments, is what positions a preclinical research group well as this landscape continues to shift.
Further Reading
CONSORT 2025: Updated Clinical Trial Reporting
The parallel reporting standard for randomized controlled trials, updated with new open science requirements for 2025.
STROBE: Reporting Observational Studies
The 22-item checklist for cohort, case-control, and cross-sectional studies, including the 2025 STROBE-Equity extension.
TRIPOD+AI: Clinical Prediction Model Reporting
Reporting requirements for prediction models using regression or machine learning, updated in the BMJ for AI applications.
How to Read Journal Author Guidelines
How to extract reporting requirements and checklist obligations from author instructions before you begin preparing a manuscript.
Written by Dr. Meng Zhao
Physician-Scientist · Founder, LabCat AI
MD · Former Neurosurgeon · Medical AI Researcher
Dr. Meng Zhao is a former neurosurgeon turned medical-AI researcher. After years in the operating room, he moved into applied AI for clinical workflows and now leads LabCat AI, a medical-AI company working on decision support and research tooling for clinicians. He built Journal Metrics as a free resource for researchers who need reliable journal metrics without paid database subscriptions.
Related Articles
The STROBE Checklist: A Practical Guide to Reporting Observational Studies in 2026
The 22-item STROBE checklist covers every cohort, case-control, and cross-sectional study submitted to medical journals. Studies consistently find poor compliance with items on bias, study size, and sensitivity analyses. In September 2025, JAMA Network Open published STROBE-Equity, a 10-item extension for health equity reporting. Here is what medical authors need to check before submission.
16 min readWriting GuideCRediT Author Contribution Statements: A Practical Guide for Medical Researchers in 2026
The CRediT taxonomy of 14 contributor roles is now mandatory at most major medical publishers, and Crossref is embedding CRediT data in its metadata schema in 2026. Here is what the 14 roles mean for clinical research teams, how to fill in your statement correctly, and why it matters for authorship disputes.
16 min readWriting GuideTRIPOD+AI: A Practical Guide to Reporting Clinical Prediction Models
The TRIPOD+AI statement, published in the BMJ in April 2024, updated the reporting standard for clinical prediction models that use regression or machine learning. Studies show most papers still fail the checklist. Here is what you need to report before submission.
16 min read