Data from students and crowdsourced online platforms do not often measure the same thing

Comparative research is how evidence is generated to support or refute broad hypotheses (e

1999).However, the foundations of such research must be solid if one is to arrive at the correct conclusions.
Determining the external validity (the generalizability across situations/individuals/populations) of the building blocks of comparative data sets allows one to place appropriate caveats around the robustness of their conclusions (Steckler & McLeroy 2008).
In the current study, Alley and colleagues (2023) tackled the external validity of comparative research that relies on subjects who are either university students or participating in experiments via an online platform.
They determined whether data from these two types of subjects have measurement equivalence -whether the same trait is measured in the same way across groups.
Although they use data from studies involved in the Many Labs replication project to evaluate this question, their results are of crucial importance to other comparative researchers whose data are generated from these two sources (students and online crowdsourcing).The authors show that these two types of subjects do not often have measurement equivalence, which is a warning to others to evaluate their experimental design to improve validity.They provide useful recommendations for researchers on how to to implement equivalence testing in their studies, and they facilitate the process by providing well annotated code that is openly available for others to use.
After one round of review and revision, the recommender judged that the manuscript met the Stage 2 criteria and awarded a positive recommendation.URL to the preregistered Stage 1 protocol: https: //osf.io/7gtvf Level of bias control achieved: Level 2. At least some data/evidence that was used to answer the research question had been accessed and partially observed by the authors prior to Stage 1 IPA, but the authors certify that they had not yet observed the key variables within the data that were used to answer the research question AND they took additional steps to maximise bias control and rigour.

List of eligible PCI RR-friendly journals:
• Advances in Methods and Practices in Psychological Science In my view, the author's have conducted a very thorough study that followed the Stage 1 approval.The conclusions that are justified by the evidence.The writing is of a very high standard throughout, and the discussion of the generalisability of their results is fully appropriate.I re-ran the code for one comparison (EMA implicit vs MTurk) as a reproducibility check, and I was able to fully reproduce the equivalence test results for this comparison, although I did note the mean age for MTurk was 34.98400 (35.0) rather than the 34.0 reported.The code is clear, and excellently commented on throughout.

Yours sincerely, Ben Farrar
Reviewed by Shinichi Nakagawa, 26 September 2023 I have reviewed stage 1 of this MS and very much enjoyed it and was looking forward to reading stage 2. I first acknowledge that I am a quantitative ecologist so I do not know the relevant field and literature.Yet, I would be able to check whether the statistical analyses conducted were sound.Also, this is my first time reviewing stage 2, but my understanding is that I check whether they followed the stage 1 plan and check for deviations.The authors conducted the study with very minor deviations.I liked that the Discussion section had limitation and recommendation sections, which are very clearly and honestly written.Overall, I think this is a great stage 2. I have one question, tho.By reading this work, I got the impression that authors are encouraging to be cautious about mixing samples.Yet, some papers in biology encourage the mixing of samples knowing non-equivalence (differences, e.g.sex and strains).I wondered what authors make of this, and there should be some related discussion.I note this mixing process is called "heterogenization", which is encouraged by an increasing number of grant agencies.There is an example paper:

•
applicable, whether any unregistered exploratory analyses are justified, methodologically sound, and informative.As above, yes.2E.Whether the authors' conclusions are justified given the evidence.