This Stage 1 Registered Report proposes an ambitious programme of research, utilising systematic review in Study 1a to draw together different indices of social connection, and categorise them into three previously proposed domains: structural, functional, and qualitative. Then in Study 1b, it is proposed that an item content analysis will be undertaken on these measurements to categorise them into different sub-domains, and subsequently assess the extent to which there is overlap in content of measures that have been used to measure social connection. Then, in Study 2a the systematic review process will be extended to extract data from studies that have utilised the measurements reviewed in Study 1. In Study 2b, the extracted data will be evaluated using the COSMIN taxonomy to assess measurement properties, and whether measurement invariance has been established between countries and populations studied in subsequent research.
I thought the Registered Report was really well written and thought out, and it sounds like a really exciting piece of research. This is an extremely ambitious piece of work that I think has the potential to make a major contribution to improving measurement in this area. Although I am not a specialist on social connectedness, my own experience with population-wide data where measures of social connection have been collected highlights this is being a glaring problem that can easily prevent the development of our understanding in a number of directions (e.g. cross-national comparisons, use of poorly validated or invalid measures to draw fragile or biased conclusions).
That being said, I did have some specific comments that it would be good to get the authors' consideration on and make changes as necessary. I had some methodological comments where there is scope for making minor amendments to strengthen the approach. Also, while I thought the manuscript was very well written, some of the research questions (RQ 3 and 4) didn't seem to be strongly represented in the RR itself, and this is an area where I felt this could be revised. These are mostly pretty minor to be honest though, and are included below in the order of presentation in the manuscript:
- RQ 3 and RQ4: I didn't think these really came out in the Stage 1 Report as being key aims of the research. I thought the area where this was most clearly referred to was in the abstract. Reading through the report without reference to the RQ table, my impression would be that the results are be reporting the country/population of study i.e. to represent coverage in the literature, rather than the application of these measures to other contexts is meaningful (i.e through use of measurement invariance). The paragraph from lines 190-203 makes the case strongly for the importance of testing these questions, but I thought the end of the paragraph from line 204-212 ought to make it clearer that the aim of this exercise is to assess whether these generalizations are defendable. Similarly I would recommend re-working Section 2 a bit, ideally with a specific sub-heading in the methods for 2b highlighting this is a specific set of analyses, and how these will be reported in the results, with reference to the proposed findings on whether the measures have been validated in countries/populations where it has been applied.
- My main concern for Study 1 relates to the justification for the structural indicators searches. I completely understand that parsing through 400,000+ results is not feasible or an effective use of time. However, the use of a random subsample has potential drawbacks. Specifically, I have reservations that the variety of different types of structural indicator would be captured by random sample of a similar number of results as the number of functional and qualitative indicators. Given the information the authors' have presented, my impression is that there will be much greater heterogeneity among structural indicators relative to the functional and qualitative ones. Second, given the issues reported in the Stage 1 submission so far, it seems fair to expect the results to be far noisier. I wondered whether it might be preferable to stratify the sample to capture a subset of the most relevant results and a random sample (sorted by time), but am also conscious this has its own drawbacks. I would appreciate the author's thoughts on this, and some additional justification of the sampling approach in revising the methods section.
- The justification for 60% agreement on the item content analysis raises questions. Again, understand given the potential range and heterogeneity of measures how this would be difficult. I think some additional justification of this criterion would be useful with reference to specific studies where this has been a problem.
- Study 2a: I agree with the overall approach for the systematic review, and the searches are specifically defined to identify appropriate studies. The only concern I had was that the search strategy relies on the studies clearly flagging this, which in my own experience of gathering data to examine a scale or scope the literature isn't guaranteed.
I would like the authors to consider whether there would be value including select forward citation searches of key papers relating to the scales identified (i.e. initial validation papers), to ensure any relevant studies aren't missed. Otherwise, I agree with not conducting further reviews in the structural indicators domain given the use of single item scales. If validated measures do come up though from that search, and the use of forward citation search may be a reasonable adjustment to ensure these studies are properly captured.
- Study 2a: Reading through this, I wondered whether it would be worth specifically recording whether a non-standard or modified use of a measurement was applied as a variable in the template for extracting sample characteristics. I really liked the use of the COSMIN taxonomy to systematize the quality of the measurements, and think it is a particular strength of evaluating the measurement properties of the scales to be examined. However, I'm also conscious COSMIN doesn't capture some questionable measurement practices that are important in qualifying the use of many measurements, especially where the inconsistent use of measures is a key problem. From my own experience of scoping across a large literature, I find that I quickly begin encountering studies where existing scales have been modified (i.e. different response scales, subsets of questions), and that some scales are more susceptible to it than others (e.g. length of questionnaire, use of many or very few response options). When thinking about the Stage 2 discussion, this might also reinforce some of the evaluation of these measures.
In terms of PCI:RR's review criteria:
- 1A: Scientific validity of the research question: The scientific validity of the research questions is clear and obvious. This is an area where there is a clear need to understand and improve measurement practices, and the authors take a rigorous approach to understanding and evaluating the problems at hand.
- 1B. The logic, rationale, and plausibility of the proposed hypotheses, as applicable: Not directly relevant as the RR does not propose hypotheses.
- 1C. The soundness and feasibility of the methodology and analysis pipeline (including statistical power analysis or alternative sampling plans where applicable): Generally this was very well thought out. The OSF has detailed instructions regarding the literature search, coding of the studies and how that will segue into the formal analysis. I have included some minor comments on the methodological approach.
- 1D. Whether the clarity and degree of methodological detail is sufficient to closely replicate the proposed study procedures and analysis pipeline and to prevent undisclosed flexibility in the procedures and analyses: Yes. The authors are extremely clear with the reporting of their literature searches.
- 1E. Whether the authors have considered sufficient outcome-neutral conditions (e.g. absence of floor or ceiling effects; positive controls; other quality checks) for ensuring that the obtained results are able to test the stated hypotheses or answer the stated research question(s): Not directly relevant, as the findings are not being compared between conditions.