BREUER Johannes's profile
avatar

BREUER JohannesORCID_LOGO

  • Computational Social Science, GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany
  • Social sciences

Recommendations:  0

Reviews:  2

Areas of expertise
Use and effects of digital media; digital trace data; computational social science

Reviews:  2

Yesterday
STAGE 2
(Go to stage 1)

Language models accurately infer correlations between psychological items and scales from text alone

Using large language models to predict relationships among survey scales and items

Recommended by based on reviews by Hu Chuan-Peng, Johannes Breuer and Zak Hussain
How are the thousands of existing, and yet to be created, psychological measurement instruments related, and how reliable are they? Here, Hommel and Arslan (2025) trained a language model--SurveyBot3000--to provide answers to these questions efficiently and without human intervention.
 
In their Stage 1 submission, the authors described the training and pilot validation of a statistical model whose inputs are psychological measurement items or scales, and outputs are the interrelationships between the items and scales, and their reliabilities. The pilot results were promising: SurveyBot3000's predicted inter-scale correlations were strongly associated with empirical correlations from existing human data.
 
In this Stage 2 report, the authors further examined the model's performance and validity. In accordance with their Stage 1 plans, they collected new data from 450 demographically diverse participants, and tested the model's performance fully out of sample. The model's item-to-item correlations correlated at r=.59 with corresponding item-to-item correlations from human participants. The scale-to-scale correlations were even more accurate at r=.83, indicating reasonable performance. Nevertheless, the authors remain justifiably cautious in their recommendation that the "synthetic estimates can guide an investigation but need to be followed up by human researchers with human data."
 
The authors documented all deviations between Stage 2 execution and Stage 1 plans in extensive online supplements. These supplements also addressed other potential issues, such as the potential for data leakage (finding the results in the training data) and robustness of results across different exclusion criteria.
 
The authors' proposed psychometric approach and tool, which is freely available as an online app, could prove valuable for researchers either looking to use or adapt existing scales or items, or when developing new scales or items. More generally, these results add to the growing literature on human-AI research collaboration and highlight a practical application of these tools that remain novel to many researchers in the field. As such, this Stage 2 report and SurveyBot3000 promise to contribute positively to the field.
 
The Stage 2 report was evaluated by two reviewers who also reviewed the Stage 1 report, and a new expert in the field. On aggregate, the reviewers' comments were helpful but relatively minor; the authors improved their work in a resubmission, and the recommender judged accordingly that the manuscript met the Stage  2 criteria for recommendation.
 
URL to the preregistered Stage 1 protocol: https://osf.io/2c8hf
 
Level of bias control achieved: Level 6. No part of the data or evidence that was used to answer the research question was generated until after IPA.
 
List of eligible PCI RR-friendly journals:
 
References
 
Hommel, B. E. & Arslan, R. C. (2025). Language models accurately infer correlations between psychological items and scales from text alone [Stage 2]. Acceptance of Version 4 by Peer Community in Registered Reports. https://doi.org/10.31234/osf.io/kjuce_v4
 
Yesterday
STAGE 1

Language models accurately infer correlations between psychological items and scales from text alone

Using large language models to predict relationships among survey scales and items from text

Recommended by based on reviews by Hu Chuan-Peng, Johannes Breuer and 1 anonymous reviewer
How are the thousands of existing, and yet to be created, psychological measurement instruments related, and how reliable are they? Hommel and Arslan (2024) have trained a language model--SurveyBot3000--to provide answers to these questions efficiently and without human intervention.
 
In their Stage 1 submission, the authors describe the training and pilot validation of a statistical model whose inputs are psychological measurement items or scales, and outputs are the interrelationships between the items, scales, and their reliabilities. The pilot results are promising: SurveyBot3000's predicted inter-scale correlations were extremely strongly associated with empirical correlations from existing human data.
 
The authors now plan for a further examination their model's performance and validity. They will collect novel test data across a large number of subjects, and again test the model's performance fully out of sample. Reviewers deemed these plans, and their associated planned analyses suitable. The anticipated results--along with already existing pilot results--promise a very useful methodological innovation to aid researchers in both selecting and evaluating existing measures, and developing and testing new measures.
 
The Stage 1 submission was reviewed twice by three reviewers each with expertise in the area. All reviewers identified the initial submission as timely and important, and suggested mostly editorial improvements that could be made to the Stage 1 report. After two rounds of review, the relatively minor remaining suggestions can be taken into account during preparation of the Stage 2 report.
 
URL to the preregistered Stage 1 protocol: https://osf.io/2c8hf
 
Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA.
 
List of eligible PCI RR-friendly journals:
 
 
References
 
Hommel, B. E., & Arslan, R. C. (2024). Language models accurately infer correlations between psychological items and scales from text alone. In principle acceptance of Version 4 by Peer Community in Registered Reports. https://osf.io/2c8hf
avatar

BREUER JohannesORCID_LOGO

  • Computational Social Science, GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany
  • Social sciences

Recommendations:  0

Reviews:  2

Areas of expertise
Use and effects of digital media; digital trace data; computational social science