Improving the measurement of social connection

ORCID_LOGO based on reviews by Jacek Buczny, Richard James and Alexander Wilson
A recommendation of:

A systematic review of social connection inventories

Submission: posted 09 July 2023
Recommendation: posted 18 January 2024, validated 19 January 2024
Cite this recommendation as:
Bishop, D. (2024) Improving the measurement of social connection. Peer Community in Registered Reports, .


This is an ambitious systematic review that uses a combination of quantitative and qualitative methods to make the measurement of the construct of social connection more rigorous. Social connection is a heterogeneous construct that includes aspects of structure, function and quality. Here, Paris et al. (2024) will use predefined methods to create a database of social connection measures, and will assess heterogeneity of items using human coders and ChatGPT. This database will form the basis of a second systematic review which will look at evidence for validity and measurement properties. This study will also look at the population groups and country of origin for which different measures were designed, making it possible to see how far culturally specific issues affect the content of measures in this domain.
The questions asked by this study are exploratory and descriptive and so the importance of pre-registration is in achieving clear criteria for how each question is addressed, rather than evidential criteria for hypothesis-testing.
The authors responded comprehensively to three reviewer reports. This study will provide a wealth of useful information for those studying social connection, and should serve to make the literature in this field more psychometrically robust and less fragmented.
URL to the preregistered Stage 1 protocol:
Level of bias control achieved: Level 3. At least some data/evidence that will be used to the answer the research question has been previously accessed by the authors (e.g. downloaded or otherwise received), but the authors certify that they have not yet observed ANY part of the data/evidence. 
List of eligible PCI RR-friendly journals:

1. Paris, B., Brickau, D., Stoianova, T., Luhmann, M., Mikton, C., Holt-Lunstad, J., Maes, A., & IJzerman, H. (2024). A systematic review of social connection inventories. In principle acceptance of Version 3 by Peer Community in Registered Reports.

Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

Reviewed by , 16 Jan 2024

The authors have done an extremely thorough job of responding to these comments. The revised manuscript is extremely comprehensive, and has been thoughtfully revised in a manner that substantially improves the Stage 1 Report. Overall, I am happy for this to go forward to Stage 2, with potential for minor changes but nothing that requires fundamental revisions.

The only point I wanted to explain further was with regard to the forward searching. My concern here was that the approach taken assumes that (1) studies using these inventories will be captured within the search terms and (2) focused searches with the measure name and methodological/measurement terms (which I thought were comprehensive) will pick out the relevant studies reporting validity. The search terms included for study 2 are thorough, and from initial searches in study 1 the principal search terms produced several hundred thousand results. As such I think the first assumption is fair. However, I am less certain of the second, which assumes these details are repliably captured in the bibliographic details of existing studies. Having considered it further, I also think this issue is likely to vary by domain as well, as for some properties (e.g. cross-cultural validity, measurement invariance, structural validity) this is often the focal point of a published article and so will be flagged prominently, whereas for others (e.g. criterion validity, internal consistency) it will not. The suggestion for the forward searches was mainly with the view of confirming the sensitivity of the search terms to pick up relevant articles assessing the validity of these domains. While I agree with the authors' hypothesis that these measures will perform poorly on the COSMIN taxonomy domains, it might be the case that relevant data is systematically missing because of deficiencies in the reporting of these statistics. However, I am also conscious that depending on the number of inventories identified this might entail substantial work. As such, I think this can be left to the author's discretion, perhaps either testing a sample of measures to check whether this is redundant or not, or discussing the issue as a potential limitation in the stage 2 RR.

One thought on the redundancy of the search terms: for the reporting of the stage 2 RR, it would be possible to quantify the number of unique entries by extracting the DOIs and titles (for articles without DOIs).

Minor comment: H2 on the design table - I'd revise from "will show insufficient evidence of great measurement properties" to "show evidence of insufficient measurement validity". 

Otherwise, I look forward to seeing how this comes out in the Stage 2 Report!

Reviewed by , 12 Dec 2023

I enjoyed reading the authors' response to the reviewers' comments. I was impressed by the level of detail and felt that the authors addressed the reviewers' queries very well. I was particularly pleased that the authors gave careful consideration to refining the methodology of study 1 (item coding) as this was where my main concerns lay. I hope the changes make the coding more robust.

Good luck with carrying out the review. I look forward to reading the results.

Alex Wilson

Evaluation round #1

DOI or URL of the report:

Version of the report: 1

Author's Reply, 29 Nov 2023

Decision by ORCID_LOGO, posted 26 Sep 2023, validated 26 Sep 2023

Dear Dr IJzerman and colleagues

Thank you for your patience in waiting for a decision on your manuscript. I now have three thorough reviews.  My impression is that, though the reviews are detailed, they do not raise any issues that you won't be able to deal with.  They are mostly concerned with clarification rather than suggestions for major methodological changes.

As someone who is not familiar with this domain of assessment (other than having recently completed a questionnaire on social connection as a Biobank participant!) I have one question, which relates to the way in which the 3 different aspects of social connection are interpreted. Specifically, I wondered whether the structural measures are regarded as a kind of context against which function and quality would be evaluated. For instance, opportunities for social connection, and a mismatch between those opportunities and reality, might be rather different for someone who is employed vs retired, or for someone who is single by choice rather than widowed or divorced. In other words, do researchers in this area use the structural measures to stratify samples when considering the impact of function and quality? This is an idle thought prompted by my curiosity from being on the receiving end of a questionnaire, and perhaps a distraction from your main aims, so feel free to ignore if this isn't something that can be usefully addressed here.

The paper is different from the kinds of registered report I usually see, which tend to be empirical, hypothesis-testing studies, but I can see the value of pre-registering the methods for a complex piece of work like this. It will give the study more authority by demonstrating that decisions were made according to rigorous predetermined criteria, rather than ad hoc. It was very useful to have your scripts and mock-up data available to clarify any questions about the analysis, and to give confidence that the analyses will be able to be conducted in a timely fashion.

I would encourage you to submit a revised manuscript that addresses the comments of reviewers, and look forward to seeing it in due course.

Reviewed by , 05 Sep 2023

This Stage 1 Registered Report proposes an ambitious programme of research, utilising systematic review in Study 1a to draw together different indices of social connection, and categorise them into three previously proposed domains: structural, functional, and qualitative. Then in Study 1b, it is proposed that an item content analysis will be undertaken on these measurements to categorise them into different sub-domains, and subsequently assess the extent to which there is overlap in content of measures that have been used to measure social connection. Then, in Study 2a the systematic review process will be extended to extract data from studies that have utilised the measurements reviewed in Study 1. In Study 2b, the extracted data will be evaluated using the COSMIN taxonomy to assess measurement properties, and whether measurement invariance has been established between countries and populations studied in subsequent research.

I thought the Registered Report was really well written and thought out, and it sounds like a really exciting piece of research. This is an extremely ambitious piece of work that I think has the potential to make a major contribution to improving measurement in this area. Although I am not a specialist on social connectedness, my own experience with population-wide data where measures of social connection have been collected highlights this is being a glaring problem that can easily prevent the development of our understanding in a number of directions (e.g. cross-national comparisons, use of poorly validated or invalid measures to draw fragile or biased conclusions).

That being said, I did have some specific comments that it would be good to get the authors' consideration on and make changes as necessary. I had some methodological comments where there is scope for making minor amendments to strengthen the approach. Also, while I thought the manuscript was very well written, some of the research questions (RQ 3 and 4) didn't seem to be strongly represented in the RR itself, and this is an area where I felt this could be revised. These are mostly pretty minor to be honest though, and are included below in the order of presentation in the manuscript:

- RQ 3 and RQ4: I didn't think these really came out in the Stage 1 Report as being key aims of the research. I thought the area where this was most clearly referred to was in the abstract. Reading through the report without reference to the RQ table, my impression would be that the results are be reporting the country/population of study i.e. to represent coverage in the literature, rather than the application of these measures to other contexts is meaningful (i.e through use of measurement invariance). The paragraph from lines 190-203 makes the case strongly for the importance of testing these questions, but I thought the end of the paragraph from line 204-212 ought to make it clearer that the aim of this exercise is to assess whether these generalizations are defendable. Similarly I would recommend re-working Section 2 a bit, ideally with a specific sub-heading in the methods for 2b highlighting this is a specific set of analyses, and how these will be reported in the results, with reference to the proposed findings on whether the measures have been validated in countries/populations where it has been applied.

- My main concern for Study 1 relates to the justification for the structural indicators searches. I completely understand that parsing through 400,000+ results is not feasible or an effective use of time. However, the use of a random subsample has potential drawbacks. Specifically, I have reservations that the variety of different types of structural indicator would be captured by random sample of a similar number of results as the number of functional and qualitative indicators. Given the information the authors' have presented, my impression is that there will be much greater heterogeneity among structural indicators relative to the functional and qualitative ones. Second, given the issues reported in the Stage 1 submission so far, it seems fair to expect the results to be far noisier. I wondered whether it might be preferable to stratify the sample to capture a subset of the most relevant results and a random sample (sorted by time), but am also conscious this has its own drawbacks. I would appreciate the author's thoughts on this, and some additional justification of the sampling approach in revising the methods section.

- The justification for 60% agreement on the item content analysis raises questions. Again, understand given the potential range and heterogeneity of measures how this would be difficult. I think some additional justification of this criterion would be useful with reference to specific studies where this has been a problem.

- Study 2a: I agree with the overall approach for the systematic review, and the searches are specifically defined to identify appropriate studies. The only concern I had was that the search strategy relies on the studies clearly flagging this, which in my own experience of gathering data to examine a scale or scope the literature isn't guaranteed.

I would like the authors to consider whether there would be value including select forward citation searches of key papers relating to the scales identified (i.e. initial validation papers), to ensure any relevant studies aren't missed. Otherwise, I agree with not conducting further reviews in the structural indicators domain given the use of single item scales. If validated measures do come up though from that search, and the use of forward citation search may be a reasonable adjustment to ensure these studies are properly captured.

- Study 2a: Reading through this, I wondered whether it would be worth specifically recording whether a non-standard or modified use of a measurement was applied as a variable in the template for extracting sample characteristics. I really liked the use of the COSMIN taxonomy to systematize the quality of the measurements, and think it is a particular strength of evaluating the measurement properties of the scales to be examined. However, I'm also conscious COSMIN doesn't capture some questionable measurement practices that are important in qualifying the use of many measurements, especially where the inconsistent use of measures is a key problem. From my own experience of scoping across a large literature, I find that I quickly begin encountering studies where existing scales have been modified (i.e. different response scales, subsets of questions), and that some scales are more susceptible to it than others (e.g. length of questionnaire, use of many or very few response options). When thinking about the Stage 2 discussion, this might also reinforce some of the evaluation of these measures.

In terms of PCI:RR's review criteria:

- 1A: Scientific validity of the research question: The scientific validity of the research questions is clear and obvious. This is an area where there is a clear need to understand and improve measurement practices, and the authors take a rigorous approach to understanding and evaluating the problems at hand.

- 1B. The logic, rationale, and plausibility of the proposed hypotheses, as applicable: Not directly relevant as the RR does not propose hypotheses.

- 1C. The soundness and feasibility of the methodology and analysis pipeline (including statistical power analysis or alternative sampling plans where applicable): Generally this was very well thought out. The OSF has detailed instructions regarding the literature search, coding of the studies and how that will segue into the formal analysis. I have included some minor comments on the methodological approach.

- 1D. Whether the clarity and degree of methodological detail is sufficient to closely replicate the proposed study procedures and analysis pipeline and to prevent undisclosed flexibility in the procedures and analyses: Yes. The authors are extremely clear with the reporting of their literature searches. 

- 1E. Whether the authors have considered sufficient outcome-neutral conditions (e.g. absence of floor or ceiling effects; positive controls; other quality checks) for ensuring that the obtained results are able to test the stated hypotheses or answer the stated research question(s): Not directly relevant, as the findings are not being compared between conditions. 

Reviewed by ORCID_LOGO, 25 Sep 2023

Dear Authors,

This RR is a very interesting project. There are multiple psychological constructs that are operationalized in diverse ways, and almost everyone would admit that it makes it very hard to reproduce and replicate psychological studies. I want to provide a few suggestions in my review, hoping you will find them useful.

First, the research questions/hypothesis seem well-rooted in theory.

Secondly, the analytical techniques correspond well with the research questions and can provide an adequate hypothesis test. The use of COSMIN is a very good idea; however, to me, it is not clear how the tool will be used. Of course, conducting an evaluation based on COSMIN is not a difficult task, but for reproducibility, more details on how you want to use would be welcome.

Thirdly, you mention PRISMA for the first time in the result section of Study 2b. Why at this stage? I wonder why because PRISMA is a general framework used for conducting systematic reviews. I would expect that PRISMA is mentioned in the overview of Study 2a. Instead, you want to use COSMIN guidelines only. What was the particular reason not to use PRISMA to design the study? In addition, on p. 25, you write that the analysis code can be found at – unfortunately, I could not find it. A similar comment applies to this:

Fourthly, I wonder why you have not considered applications of the Social Relationship Model (SRM) as an important source of instruments. The SRM posits that social perception/evaluation/traits variance can be partitioned into various components: target variance, perceiver variance, relationship variance, and error variance, and such information can be collected by implementing a round-robin design. I can imagine that specific aspects of social connections (e.g., social support, responsiveness, quality) can be attributed to each type of variance and depend on each other. This gap is a bit puzzling as a thorough understanding of social relationships should account for such specific components.

Fifthly, maybe I overlooked it while reading, but it is unclear whether you will evaluate the quality of the theory used to create a specific measurement. I can imagine a scenario in which a good measure is created (reliable, valid, invariant across genders and cultures), but the background theory is rather weak.

Despite the critical comments, I find the protocol clear and well-prepared for implementation. Before recommending the current version of the document, I would like to know what other reviewers indicated and what you reaction to my comment is.

All the best,
Jacek Buczny

Reviewed by , 10 Sep 2023

User comments

No user comments yet