Can psychology researchers predict which effects will generalise across cultures?

ORCID_LOGO based on reviews by Michèle Nuijten, Ian Hussey, Jim Grange and Matthias Stefan
A recommendation of:

Researcher Predictions of Effect Generalizability Across Global Samples


Submission: posted 16 February 2023
Recommendation: posted 10 September 2023, validated 11 September 2023
Cite this recommendation as:
Chambers, C. (2023) Can psychology researchers predict which effects will generalise across cultures?. Peer Community in Registered Reports, .


Compared to the wealth of debate surrounding replicability and transparency, relatively little attention has been paid to the issue of generalisability – the extent to which research findings hold across different samples, cultures, and other parameters. Existing research suggests that researchers in psychology are prone to generalisation bias, relying on narrow samples (e.g. drawn predominantly from US or European undergraduate samples) to draw broad conclusions about the mind and behaviour. While recent attempts to address generalisability concerns have been made – such as journals requiring explicit statements acknowledging constraints on generality – addressing this bias at root, and developing truly generalisable methods and results, requires a deeper understanding of how researchers perceive generalisability in the first place.
In the current study, Schmidt et al. (2023) tackle the issue of cross-cultural generalisability using four large-scale international studies that are being conducted as part of the Psychological Science Accelerator (PSA) – a globally distributed network of researchers in psychology that coordinates crowdsourced research projects across six continents. Specifically, participants (who will be PSA research members) will estimate the probability that an expected focal effect will be observed both overall and within regional subsamples of the PSA studies. They will also predict the size of these focal effects overall and by region.
Using this methodology, the authors plan to ask two main questions: first whether researchers can accurately predict the generalisability of psychological phenomena in upcoming studies, and second whether certain researcher characteristics (including various measures of expertise, experience, and demographics) are associated with the accuracy of generalisability predictions. Based on previous evidence that scientists can successfully predict the outcomes of research studies, the authors expect to observe a positive association between predicted and actual outcomes and effect sizes. In secondary analyses, the authors will also test if researchers can predict when variables that capture relevant cultural differences will moderate the focal effects – if so, this would suggest that at least some researchers have a deeper understanding as to why the effects generalise (or not) across cultural contexts.
The Stage 1 manuscript was evaluated over two rounds of in-depth review. Based on detailed responses to the reviewers' comments, the recommender judged that the manuscript met the Stage 1 criteria and therefore awarded in-principle acceptance (IPA).
URL to the preregistered Stage 1 protocol: (under temporary private embargo)
Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA.
List of eligible PCI RR-friendly journals:
1. Schmidt, K., Silverstein, P. & Chartier, C. R. (2023). Registered Report: Researcher Predictions of Effect Generalizability Across Global Samples. In principle acceptance of Version 3 by Peer Community in Registered Reports.
Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

Evaluation round #2

DOI or URL of the report:

Version of the report: 2

Author's Reply, 01 Sep 2023

Decision by ORCID_LOGO, posted 15 Aug 2023, validated 15 Aug 2023

Three of the four reviewers were available to evaluate your revised submission. As you can see, the good news is that all are broadly satisfied and we are now within reach of Stage 1 in-principle acceptance (IPA). There is one remaining statistical issue to address in Michèle Nuijten's review, which I agree should be resolved before we proceed further. I will assess your response and revision at desk and we should be able to then quickly issue IPA.

Reviewed by ORCID_LOGO, 14 Jul 2023

I thank the authors for their thorough revision of their paper. All my comments were addressed sufficiently. I do have one remaining question about the planned regression analysis, but I leave it to the authors/editor to evaluate whether this requires further adaptations in the manuscript.

In response to comment 7 of my review, the authors now include an additional model in which they include all six researcher characteristics to predict generalizability of prediction accuracy. However, they also still plan on fitting six separate models for each of the individual researcher characteristics. I do not really understand this choice. 

If I understand correctly, the goal is to identify which characteristics are (the most) important to predict generalizability of prediction accuracy. In my view, it will be hard to interpret any of the estimated coefficients in these separate models to assess importance, because they are not controlled for any influences of the other five characteristics. 

I may miss something here, but I would argue to only include the “omnibus” model with all six researcher characteristics included at the same time, and omit the six separate models.

Reviewed by , 10 Jul 2023

The authors have addressed the comments I raised in the previous review to my satisfaction. Good luck with the research - I'm really interested in seeing the outcome.

Reviewed by , 13 Jul 2023

While I was critical in my detailed comments, I already mentioned that this is an interesting research project and that there is much to like about the Registered Report -- already in Stage 1 and this general assessment has not changed.

Moreover, I think my comments have all been sufficently and convincingly addressed. I already said that my minor points often are a matter of taste and you (the authors) disagree with some of them, while you clarify others. In the former case, your arguments are considerate. More importantly, I think my major concerns are well addresses. In some cases I seem to have missed some details, but it is helpful to guide the reader. In this regard, I think you have managed to make the Report more reader friendly.

I have no more comments and am looking forward to reading about the findings in this study.

Evaluation round #1

DOI or URL of the report:

Version of the report: 2 (Snapshot)

Author's Reply, 08 Jul 2023

Decision by ORCID_LOGO, posted 20 Apr 2023, validated 21 Apr 2023

I have now received four very constructive and helpful evaluations of your Stage 1 submission. As you will see, the reviewers concur this is a valuable RR proposal that already comes close to meeting the Stage 1 criteria at PCI RR. Nevertheless I think you will find the reviews helpful in strengthening the work even more prior to in-principle acceptance.

Without summarising all of the insights, across the set of reviews the following issues struck me as particular headlines: (1) including a deeper reflection on the concept of generalisability and, in particular, the generalisability of this study itself, (2) addressing concerns about measurement validity, (3) elaborating a variety of additional key methodological details (including analysis plans and potential exclusion criteria), (4) tightening up of potential sources of bias resulting from some remaining researcher dfs, (5) justification of specific design decisions (including the participant pool), and (6) considering structural edits to improve the clarity of presentation.

All of these issues are readily addressable as part of the regular Stage 1 review process, therefore I am happy to invite a comprehensive revision and response.

Reviewed by , 13 Apr 2023

Reviewed by , 19 Apr 2023

I think that the study addresses an interesting and timely topic, and an answer to the question posed by the research would definitely be of value. Although whether empirical effects generalise across different samples is of more importance than whether researchers can predict such generalisation, the current proposed research certainly fills an interesting gap in the literature and I am looking forward to seeing the outcome of this study. 

I only have a few relatively minor points I hope the authors find of some use. I provide them in chronological order in which they appear in the manuscript.

Page 3 - "Psychology's WEIRDness problem": Perhaps spell out what this initialism stands for on first use for readers unfamiliar with the term.

Page 4, second paragraph - When discussing generalisability of an effect across methods & measures, it might be of use to cite the following paper which proposes a method to deal with this issue: Baribault, B., et al. (2018). Metastudies for robust tests of theory. Proceedings of the National Academy of Sciences, 115(11), 2607–2612.

Page 13 - The supplementary material were very clear and provided an excellsent overview of what the participants would experience. Thank you for including this.

Page 14 - As you are providing reminders of effect size interpretations, is there a concern that predictions will "cluster" around the effect size boundaries of small, medium, and large? Is there a statistical consequence to such clustering if indeed it occurs (e.g., reduction in variance)? Is there any way to encourage full use of the scale rather so this is minimised if there is a statistical concern?
Related to this, as you are converting the effect sizes to Cohen's d before analyses, might it be worth using this metric when asking participants to estimate the effect size of the focal effect rather than using the original effect size metric (e.g., odds ratios)? As you mention in the paper researchers (perhaps!) have a better feel for Cohen's d so might lead to more accurate estimates.

Page 18 - Thank you for including the R markdown html for the power analyses. This was very clear and comprehensive and sets the standard for how power analyses should be reported in studies.

Page 20 - "No participants will be excluded from the analytic dataset". Will participants with missing data (e.g., dropping out halfway through) still be included? If so, how will missing data be handled (e.g., imputation?)


Signed: Jim Grange.

Reviewed by ORCID_LOGO, 20 Apr 2023

The aim of this Registered Report is to 1) investigate whether researchers can accurately predict generalizability of effects across global regions and 2) if certain researcher characteristics are related to prediction accuracy.


I think this is an interesting and relevant research question and overall, the proposed research plan is solid. I think this is an innovative and relevant way to make use of the extremely rich data of the PSA. 


I do still have some questions/remarks I would like the researchers to address. I will copy them in a numbered list below. I think the majority of these points could be addressed relatively easily. 


I am very curious to see the results of this project.



Michèle Nuijten



1.       I miss any remarks on the generalizability of the results of *this* study. It seems to me that although PSA researchers are likely from all over the world, they might not be representative for “psychology researchers” in general. I think it is important that the authors “practice what they preach” in a way, and make some sort of statement about this.

2.       In the introduction, the authors clearly show a gap in the literature: there is research on prediction of effects, there is some research about predicting generalizability over time, but no prediction of effects across global samples. I think it would strengthen the argument if the authors also indicate what the main benefit of answering this question would be. Is the main point to assess potential generalization bias (as mentioned on p. 7)? Or could we use the results from this study in some way to come up with guidelines or advice for researchers along the lines of the Constraints on Generalizability statement?

3.       The authors mention a host of different measures for researcher characteristics. I wonder about the validity of these measures. I would like to see some validity information about the scales that are planned to be used (open-minded thinking, need for cognition, and any other I’m forgetting here).

4.       I think the paper could use an additional sentence or two on what it means to be a “member of the PSA”, since this describes the participant pool. 

5.       Three of the four PSA projects still need to be selected. What is the sampling plan for this? Are there any predetermined criteria that a project needs to meet before it is selected? Is there a set timeline (e.g. the first three projects that have a finished protocol)? 

6.       Again, I may have missed it (if so my apologies and please ignore this comment), but I don’t remember seeing a clear explanation of the different “sources” of data from each of the regions. From the context I’m deducing that each region/country will collect a university sample and a non-university sample? Please make sure that this is clarified in the text.

7.       The authors plan to measure many additional variables at the researcher level. I may have missed it, but for some of them, I can’t seem to find any hypotheses or analysis plans (task difficulty, demographics). For the ones that are planned to be included in the accuracy analyses (p. 21-22), I do not completely understand the rationale behind the analysis plan. If I understand the authors correctly, they first intend to correlate all measured researcher characteristics to prediction accuracy, and when a correlation is found (is there some sort of cut-off here?), the characteristic will be added to a regression model? I’m not familiar with this analysis strategy, and I wondered if it would not be more straightforward to simply run a regression model including all the measured researcher characteristics and look at the resulting regression coefficients to judge which characteristics are important. Finally, there are *a lot* of characteristics added; did the authors take into account the risk of Type I error inflation due to the large number of tests?

8.       On p. 20 the authors state they will calculate “a correlation” between mean probability estimates of finding an effect in the subsamples and the binary outcome variable. Please specify the type of correlation that will be used (considering that the probability estimates are likely non-normally distributed, and one of the variables is binary).

9.       This is a complex project with data and predictions at many different levels. This can sometimes result in unclear sentences. E.g.: “We will compare the responses on the overall prediction items to the overall results within each study.” (p. 22). It is not clear to me which variables and items this refers to, exactly. Do the authors refer to all prediction items? At which levels? And what is meant by “overall results within each study”? Which results? Effect sizes? I had similar difficulties with wrapping my head around the distinction between the “aggregate-level analyses” and “prediction-level analyses”. It may help improve clarity if the authors also explain what the substantive difference/advantage/interpretation is of having these two levels in the analysis.

Reviewed by , 12 Apr 2023

The study examines, in the field of psychology, how well researchers can predict the generalizability of psychological effects and whether and which researcher characteristics influence prediction accuracy. 

My report is directly addressing the authors. My general assessment is that the research question is well defined, valid and timely; the hypotheses are well stated, coherent and precise; the procedure is feasible and the methodology proposed is sound. There is much to like about the registered report. Therefore, I have only few comments:

The following points are major to me. I suggest to address them in a revised version of the registered report if you agree with my concerns and if I did not miss anything:

- One major point is on the measure of generalizability: you ask researchers to estimate the “probability that a statistically significant focal effect (p < .05) in the hypothesized direction will be observed”. In the paper you define this as “estimate [of] the probability that the expected effect will be observed”. If I understand correctly, this definition does not include the studies’ effect sizes. For example, if an effect is significant and has the same direction, but is substantially lower, you would still define this outcome as generalizable. While I think this is a fair approach, it still merits some open discussion. If I am wrong, maybe you can clarify your measure.

- On the point of effect sizes, it would be interesting to have some discussion of effect sizes in your study on researchers’ prediction accuracy. I wonder what we can learn from (very) small effect sizes in your study. This is an important point, since your power analysis indicates more than 90% power to detect (very) small effects. If you find one, you should be able to determine if the effect is relevant or not in order not to be “overpowered” in your study. I wonder why you chose such high power? I was missing a discussion on this point.

- One rather general comment is on the choice of the subject pool: your study is restricted to psychological researchers, i.e., experts. While this might be the most interesting pool, it restricts generalizability to other researchers and potentially focuses on a biased group. For example, such researchers might be overoptimistic regarding their own field – or, alternatively, overly critical. Just to be clear, I do not suggest to conduct a wider study, I just think it would be nice to see a discussion of the pool choice and the implications. One such implication is generalizability since your wording (e.g., “researchers”) could suggest a generalization that might be unjustified. The discussion of researcher groups and their replicability forecast in Gordon et al. (2020) could be helpful.

- I was a bit confused by the country/region choice: what exactly does “region” refer to. How do you derive at 15 countries and why do you choose 10 out of those 15? And why are prediction items presented in one out of eight possible orders instead of just using randomization? Similarly, why are you only including subsamples with 100 or more valid participants (page 14)? There must be a reason for this choice, but I did not find it. In the end, I am not sure I fully understood the details of data collection.

- The following statements from your registered report are too general to be understood by the reader:
“A single focal effect will be chosen from each study based on input from the proposing authors. The effect will be the result of a single inferential statistical test that answers a central research question from the project. Priority will be given to effects that are grounded in theory and supported by previous research.”
“Single page project descriptions will be generated for each study and approved by the proposing authors of the project as a quality check.”
I think it would be helpful to give more details on your procedure.

Here are some more minor points and I think they can be addressed in the main paper (i.e., after data collection), if you agree with them or consider them helpful:

- In general, it was not always easy for me to follow the manuscript. Some of my comments might be helpful for writing-up the paper:

o   I found the discussion of concepts interesting. However, for me it was difficult to follow this discussion until the second half of page 4, where you define which concept you are focusing on (generalizability across cultural context). Before this, I was missing a clear definition of relevant concepts in the context of your study. For instance, on page 4 you state that generalizability refers not only to settings and samples, but also to methods and measures. In your study, however, you only focus on the former. Moreover, a common understanding that replicability is related to statistical factors (such as sampling error), while generalizability is related to samples (e.g. participants, time period, cultural factors, etc.). This understanding might be too narrow and definitions are not clear. Even more so, it would be helpful to clearly define the two concepts in the context of your study from the very beginning and maybe focus less on replicability. Another example again can be found on page 4: you discuss that generalizability can be related to methods or interpretation, without clearly separating the two concepts. It would be helpful to the reader early on to understand what you are focusing on in your study and which concepts are (ir)relevant. 

o   You are asking participants about their prediction of moderators. This part of the study came as a surprise as moderatos are first mentioned on page 14. It has not been completely clear, how it relates to your research questions.

o   In the main paper, you could introduce the Psychological Science Accelerator, since readers from other fields (such as myself) might not be familiar with it.

- Another minor point: Is the potential bias of predictions by desired results really motivated reasoning or not rather confirmation bias? Of course, the two concepts are closely related and might not even be clearly distinguishable.



Gordon, M., Bishop, M., Chen, Y., Dreber, A., Goldfedder, B., Holzmeister, F., Johannesson, M., Liu, Y., Tran, L., Twardy, C., Wang, J., & Pfeiffer, T. (2022). Forecasting the Publication and Citation Outcomes of Covid-19 Preprints. Royal Society Open Science, 9: 220440.


User comments

No user comments yet