
CHUAN-PENG Hu
- School of Psychology, Nanjing Normal University, Nanjing, China
- Life Sciences, Social sciences
- recommender
Recommendations: 0
Reviews: 3
Website
huchuanpeng.com
Areas of expertise
Metascience and reproducibility.
Self-bias in information processing.
Evidence accumulation model.
Bayesian Statistics.
Cognitive Neuroscience
Reviews: 3
Language models accurately infer correlations between psychological items and scales from text alone
Using large language models to predict relationships among survey scales and items
Recommended by Matti Vuorre based on reviews by Hu Chuan-Peng, Johannes Breuer and Zak HussainHow are the thousands of existing, and yet to be created, psychological measurement instruments related, and how reliable are they? Here, Hommel and Arslan (2025) trained a language model--SurveyBot3000--to provide answers to these questions efficiently and without human intervention.
In their Stage 1 submission, the authors described the training and pilot validation of a statistical model whose inputs are psychological measurement items or scales, and outputs are the interrelationships between the items and scales, and their reliabilities. The pilot results were promising: SurveyBot3000's predicted inter-scale correlations were strongly associated with empirical correlations from existing human data.
In this Stage 2 report, the authors further examined the model's performance and validity. In accordance with their Stage 1 plans, they collected new data from 450 demographically diverse participants, and tested the model's performance fully out of sample. The model's item-to-item correlations correlated at r=.59 with corresponding item-to-item correlations from human participants. The scale-to-scale correlations were even more accurate at r=.83, indicating reasonable performance. Nevertheless, the authors remain justifiably cautious in their recommendation that the "synthetic estimates can guide an investigation but need to be followed up by human researchers with human data."
The authors documented all deviations between Stage 2 execution and Stage 1 plans in extensive online supplements. These supplements also addressed other potential issues, such as the potential for data leakage (finding the results in the training data) and robustness of results across different exclusion criteria.
The authors' proposed psychometric approach and tool, which is freely available as an online app, could prove valuable for researchers either looking to use or adapt existing scales or items, or when developing new scales or items. More generally, these results add to the growing literature on human-AI research collaboration and highlight a practical application of these tools that remain novel to many researchers in the field. As such, this Stage 2 report and SurveyBot3000 promise to contribute positively to the field.
The Stage 2 report was evaluated by two reviewers who also reviewed the Stage 1 report, and a new expert in the field. On aggregate, the reviewers' comments were helpful but relatively minor; the authors improved their work in a resubmission, and the recommender judged accordingly that the manuscript met the Stage 2 criteria for recommendation.
URL to the preregistered Stage 1 protocol: https://osf.io/2c8hf
Level of bias control achieved: Level 6. No part of the data or evidence that was used to answer the research question was generated until after IPA.
List of eligible PCI RR-friendly journals:
Level of bias control achieved: Level 6. No part of the data or evidence that was used to answer the research question was generated until after IPA.
List of eligible PCI RR-friendly journals:
- Advances in Methods and Practices in Psychological Science
- Collabra: Psychology
- International Review of Social Psychology
- Peer Community Journal
- PeerJ
- Personality Science
- Royal Society Open Science
- Social Psychological Bulletin
- Studia Psychologica
- Swiss Psychology Open
References
Hommel, B. E. & Arslan, R. C. (2025). Language models accurately infer correlations between psychological items and scales from text alone [Stage 2]. Acceptance of Version 4 by Peer Community in Registered Reports. https://doi.org/10.31234/osf.io/kjuce_v4
Hommel, B. E. & Arslan, R. C. (2025). Language models accurately infer correlations between psychological items and scales from text alone [Stage 2]. Acceptance of Version 4 by Peer Community in Registered Reports. https://doi.org/10.31234/osf.io/kjuce_v4
28 Apr 2025
STAGE 1

Language models accurately infer correlations between psychological items and scales from text alone
Using large language models to predict relationships among survey scales and items from text
Recommended by Matti Vuorre based on reviews by Hu Chuan-Peng, Johannes Breuer and 1 anonymous reviewerHow are the thousands of existing, and yet to be created, psychological measurement instruments related, and how reliable are they? Hommel and Arslan (2024) have trained a language model--SurveyBot3000--to provide answers to these questions efficiently and without human intervention.
In their Stage 1 submission, the authors describe the training and pilot validation of a statistical model whose inputs are psychological measurement items or scales, and outputs are the interrelationships between the items, scales, and their reliabilities. The pilot results are promising: SurveyBot3000's predicted inter-scale correlations were extremely strongly associated with empirical correlations from existing human data.
The authors now plan for a further examination their model's performance and validity. They will collect novel test data across a large number of subjects, and again test the model's performance fully out of sample. Reviewers deemed these plans, and their associated planned analyses suitable. The anticipated results--along with already existing pilot results--promise a very useful methodological innovation to aid researchers in both selecting and evaluating existing measures, and developing and testing new measures.
The Stage 1 submission was reviewed twice by three reviewers each with expertise in the area. All reviewers identified the initial submission as timely and important, and suggested mostly editorial improvements that could be made to the Stage 1 report. After two rounds of review, the relatively minor remaining suggestions can be taken into account during preparation of the Stage 2 report.
URL to the preregistered Stage 1 protocol: https://osf.io/2c8hf
Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA.
List of eligible PCI RR-friendly journals:
Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA.
List of eligible PCI RR-friendly journals:
- Advances in Methods and Practices in Psychological Science
- Collabra: Psychology
- International Review of Social Psychology
- Peer Community Journal
- PeerJ
- Personality Science
- Royal Society Open Science
- Social Psychological Bulletin
- Studia Psychologica
- Swiss Psychology Open
References
Hommel, B. E., & Arslan, R. C. (2024). Language models accurately infer correlations between psychological items and scales from text alone. In principle acceptance of Version 4 by Peer Community in Registered Reports. https://osf.io/2c8hf
26 Feb 2024
STAGE 1

Lure of choice revisited: Replication and extensions Registered Report of Bown et al. (2003)
Replicating the "lure of choice" phenomenon
Recommended by Patrick Savage based on reviews by Hu Chuan-Peng and Gakuto ChibaThe "lure of choice" refers to the idea that we prefer to preserve the option to choose even when the choice is not helpful. In a classic study cited hundred of times, Bown et al. (2003) reported evidence for the lure of choice from a series of studies involving choices between competing options of night clubs, bank savings accounts, casino spinners, and the Monty Hall door choice paradigm. In all cases, participants tended to prefer to choose an option when paired with a "lure", even when that lure was objectively inferior (e.g., same probability of winning but lower payoff).
The lure of choice phenomenon applies to a variety of real-life situations many of us often face in our daily lives, and Bown et al.’s findings have influenced the way organizations present choices to prospective users. Despite their theoretical and practical impact, Bown et al.'s findings have not previously been directly replicated, even as the importance of replication studies has become increasingly acknowledged (Nosek et al., 2022).
Here, Chan & Feldman (2024) outline a close replication of Bown et al. (2003) that will replicate and extend their original design. By unifying Bown et al.'s multiple studies into a single paradigm with which they will collect data from approximately 1,000 online participants via Prolific, they will have substantially greater statistical power than the original study to detect the predicted effects. They will follow LeBel et al.’s (2019) criteria for evaluating replicability, such that it will be considered a successful replication depending on how many of the 4 scenarios show a signal in the same direction as Bown et al.’s original results (at least 3 out of 4 scenarios = successful replication; no scenarios = failed replication; 1 or 2 scenarios = mixed results replication). They have also added additional controls including a neutral baseline choice without a lure, further ensuring the the validity and interpretability of their eventual findings.
One of the goals in creating Peer Community In Registered Reports (PCI RR) was to increase the availability of publishing venues for replication studies, and so PCI RR is well-suited to the proposed replication. Feldman’s lab has also pioneered the use of PCI RR for direct replications of previous studies (e.g., Zhu & Feldman, 2023), and the current submission uses an open-access template he developed (Feldman, 2023). This experience combined with PCI RR’s efficient scheduled review model meant that the current full Stage 1 protocol was able to go from initial submission, receive detailed peer review by two experts, and receive in-principle acceptance (IPA) for the revised submission, all in less than one month.
URL to the preregistered Stage 1 protocol: https://osf.io/8ug9m
Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA.
List of eligible PCI RR-friendly journals:
Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA.
List of eligible PCI RR-friendly journals:
References
Bown, N. J., Read, D. & Summers, B. (2003). The lure of choice. Journal of Behavioral Decision Making, 16(4), 297–308. https://doi.org/10.1002/bdm.447
Chan, A. N. Y. & Feldman, G. (2024). The lure of choice revisited: Replication and extensions Registered Report of Bown et al. (2003) [Stage 1]. In principle acceptance of Version 2 by Peer Community In Registered Reports. https://osf.io/8ug9m
Feldman, G. (2023). Registered Report Stage 1 manuscript template. https://doi.org/10.17605/OSF.IO/YQXTP
LeBel, E. P., Vanpaemel, W., Cheung, I. & Campbell, L. (2019). A brief guide to evaluate replications. Meta-Psychology, 3. https://doi.org/10.15626/MP.2018.843
Nosek, B. A., Hardwicke, T. E., Moshontz, H., Allard, A., Corker, K. S., Dreber, A., ... & Vazire, S. (2022). Replicability, robustness, and reproducibility in psychological science. Annual Review of Psychology, 73(1), 719-748. https://doi.org/10.1146/annurev-psych-020821-114157
Zhu, M. & Feldman, G. (2023). Revisiting the links between numeracy and decision making: Replication Registered Report of Peters et al. (2006) with an extension examining confidence. Collabra: Psychology, 9(1). https://doi.org/10.1525/collabra.77608