Data from students and crowdsourced online platforms do not often measure the same thing
Convenience Samples and Measurement Equivalence in Replication Research
Recommendation: posted 13 November 2023, validated 13 November 2023
Logan, C. (2023) Data from students and crowdsourced online platforms do not often measure the same thing. Peer Community in Registered Reports, 100551. 10.24072/pci.rr.100551
This is a stage 2 based on:
Comparative research is how evidence is generated to support or refute broad hypotheses (e.g., Pagel 1999). However, the foundations of such research must be solid if one is to arrive at the correct conclusions. Determining the external validity (the generalizability across situations/individuals/populations) of the building blocks of comparative data sets allows one to place appropriate caveats around the robustness of their conclusions (Steckler & McLeroy 2008).
In the current study, Alley and colleagues (2023) tackled the external validity of comparative research that relies on subjects who are either university students or participating in experiments via an online platform. They determined whether data from these two types of subjects have measurement equivalence - whether the same trait is measured in the same way across groups.
Although they use data from studies involved in the Many Labs replication project to evaluate this question, their results are of crucial importance to other comparative researchers whose data are generated from these two sources (students and online crowdsourcing). The authors show that these two types of subjects do not often have measurement equivalence, which is a warning to others to evaluate their experimental design to improve validity. They provide useful recommendations for researchers on how to to implement equivalence testing in their studies, and they facilitate the process by providing well annotated code that is openly available for others to use.
After one round of review and revision, the recommender judged that the manuscript met the Stage 2 criteria and awarded a positive recommendation.
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.
Evaluation round #1
DOI or URL of the report: https://doi.org/10.17605/OSF.IO/HT48Z
Version of the report: 1
Author's Reply, 06 Nov 2023
Decision by Corina Logan, posted 27 Sep 2023, validated 27 Sep 2023
Congratulations on completing your study and finishing your Stage 2 article! The unexpected bumps that came up along the way were normal, and your solutions to the problems upheld the scientific integrity of the registered report - nice work. The same two reviewers who evaluated the Stage 1 came back to evaluate Stage 2, and both found that your manuscript meets the PCI RR Stage 2 criteria. I am ready to issue IPA after you revise per my and Nakagawa’s comments. Note that my comments are so minor that you do not need to address them if you feel they are not useful, but please do make sure to address Nakagawa’s comment.
To answer your question, there are no space constraints at PCI RR so you don’t need to move anything to supplementary material.
Here are my comments on the manuscript…
1) Results: I found it extremely useful that you clarified the size of the effects in relation to what your tests were powered for (e.g., “Item 1 (“I find satisfaction in deliberating hard for long hours”) was the only item above the cut-off for a medium effect, all others were small or negligible”). I noticed that some paragraphs discussed the a small effect being the cut-off, while others discussed a medium effect being the cut-off. It might be even clearer if you noted in each paragraph that the effect size cut-off related to the power/sensitivity/etc analyses you conducted at Stage 1 for each analysis, which is why it differed.
2) Discussion: “power in ME testing is impact by the strength of inter-item correlations” - change “impact” to “impacted”
3) Discussion: “For this reason, researchers should not assume that different crowdsourced samples will be equivalent to each other, or even student samples collected in different settings”. Could you please clarify what “different settings” refers to? Different countries/languages/etc.?
4) Study design table: you could add a column to the right that shows your findings.
I'm looking forward to receiving your revision.
All my best,