DOI or URL of the report: https://osf.io/3zhna/
Version of the report: 3
Thank you for the revised version of this Stage 1 RR, and your careful responses to reviewer comments. I am sufficiently satisfied with these that I do not think it necessary to involve the reviewers again at this point. However, there are still some minor issues to be sorted out fully before IPA is issued for this study, and I list these below.
1) There is still ambiguity about how the outcomes of the different hypothesis tests will be combined to inform conclusions with respect to the overarching theoretical question (LPP vs HPP). What follows may seem like nitpicking, but it is crucially important to be clear up front about how your conclusions will follow from your results.
At line 145 you state the overall rationale for the study as follows:
“If prior expectations are weaker in VR (LPP account), the magnitude of both the SWI and the MWI will be smaller in VR compared to the real world (see hypotheses H1A and H1B in table of questions). Additionally, the difference in peak grip force and load force rates between small and large objects (SWI), or more and less dense-looking objects (MWI), will be smaller in VR than in the real world (see hypotheses H2A and H2B in table of questions).”
What you seem to state here is that H1a AND H1b AND H2a AND H2b all test the same hypothesis (and the final column of your design table implies the same). If this is the case, then the implication would be that they must all have significant outcomes (in the same direction) for the hypothesis (LPP or HPP) to be supported. If so, then the hypothesis would be confirmed by the congruent conjunction of all four outcomes, and not by any other set of outcomes. This may or may not be what you mean to state, but it is what I would infer from what is written.
If this is not what you mean to state, then you will need to specify how you would interpret a set of results that provided partial support for one hypothesis (i.e. that had some outcomes significant but not others), or where different outcomes supported different hypotheses (LPP vs HPP). At present, it is not clear how you will draw conclusions across the pattern of results. This arises because all hypothesis tests are related back to the same global hypotheses.
2) This uncertain status is compounded by the design table, in which you state:
“NOTE: these hypotheses for the SWI and MWI tasks are being treated as individual hypotheses that are related to the same question¬¬, rather than employing a disjunctive or conjunctive logic (Rubin, 2021).
There is some linguistic ambiguity here. Either they are individual (i.e. separable) hypotheses, in which case they lead to conclusions on separate questions, or they relate to the same question (in which case they are two tests of the same hypothesis). The word ‘relate’ may be intended to mean only that there is a thematic relationship between the separable questions (e.g. LPP vs HPP for SWI; LPP vs HPP for MWI), but you need to be clear about whether the tests can lead to separate conclusions or not.
3) In the design table, in several places, you make statements about the conclusions that will follow from non-significant outcomes, e.g. “No statistically significant difference between conditions would indicate no difference in strength of prior expectations.”; “No statistically significant difference in pGFRdiff scores would indicate no difference in strength of prior expectations.” Etc. In NHST, a failure to find a significant difference does not allow one to accept the null hypothesis (only to fail to reject it), and so you should remove these statements.
4) You are very clear that the main hypothesis tests will not be meaningful if manipulation checks are not passed. I would advise you to state which checks apply to which tests (e.g. failing a manipulation check for the MWI would presumably not stop you testing hypotheses relating to the SWI). Also, if a manipulation check is failed, so that a hypothesis test is deemed uninformative, does this mean that you will not run that hypothesis test? If so, state this. If you will still run it, then state why (given that it will be uninformative).
5) In the Abstract, you have the statement: “hypothesis posits increased reliance on predictions relative to current sensory information due to sensory uncertainty.” I think you should probably delete the “due to sensory uncertainty”, which wrongly implies that the cause of a relative change in weighting could be known in this experiment.
6) “windsorised” >> “winsorised”
DOI or URL of the report: https://osf.io/3zhna/
Version of the report: 2
Thank you for submitting this revised Stage 1 plan, which has now received external review from two relevant experts. I think that their reviews are very helpful, both at a theoretical and practical level, and that consideration of these comments should help you clarify the Stage 1 plan.
To my reading (although I could be wrong), the reviewers make very similar points regarding the framing of your hypotheses, which mean that - at least with the present design - your experiment is only able to inform about the relative influence of priors vs sensory information. You should clarify what conclusions your hypothesis tests can and cannot support, and perhaps re-name the hypotheses if this will help to avoid confusion.
Also, Reviewer#1 was somewhat confused by the status of the exploratory questions, which seems to have resulted from your removing these from the manuscript (as suggested at the previous round), but keeping the detailed design table as supplementary material. I understand the desire to keep these design components online, but it will be much clearer to pre-register only the primary hypothesis tests, and to save exploratory investigations until Stage 2 (keeping the design table for exploratory parts as supplemental material occupies an uneasy half-way house).
I hope that the reviewers' thoughtful comments will be helpful for you in fine-tuning your plans. If you choose to address these comments in a revision, then you should include a responses document that specifies how you have addressed each comment.
Best wishes,
Rob
The plan is for an interesting study comparing reaching behaviour in real vs virtual environments. A number of analyses are carefully planned and described. I am just unsure of the rationale for designating most of these as “exploratory” despite the high level of detail provided – see below.
14-43. VR vs real environments. Another notable issue is that VR settings vary in the nature and quality of online visual feedback. People may either not be able to see the hand they are reaching with, or it may be tracked and represented inaccurately - this affects studies in which online control of movements is included / a potential issue.
44-75. Opposite predictions for priors. The two possibilities of (1) reduced use of priors and (2) reduced sensory info are not mutually exclusive – they could both occur. Really, you are asking about the final balance – does the balance in potential changes in these two shift people towards using the prior more, or using the prior less? It makes sense that these two outcomes could come about because of various combinations of changes in the prior and/or the sensory info (likelihood). However, it’s over-simplistic to suggest that either only one changes (reduces), or only the other does.
I would instead suggest to hypothesis either a shift towards using the prior less (expected if prior is weakened more than sensory info is – they could both be weakened somewhat) or using the prior more (if sensory info is weakened more than prior is)
129-142. Exploratory analyses:
Mindful of the suggestions at https://rr.peercommunityin.org/help/guide_for_reviewers#h_7586915642301613635089357
“Have the authors minimised all discussion of post hoc exploratory analyses, apart from those that must be explained to justify specific design features? Maintaining this clear distinction at Stage 1 can prevent exploratory analyses at Stage 2 being inadvertently presented as pre-planned.”
As far as I see, the details and justification for the exploratory analyses at lines 129-142 might be needed in order to justify the design (i.e. collecting fingertip force data). Is there a way to explain more clearly why this info is being provided now (if not to ‘pre-register’ these analyses)? (e.g. in order to explain why the measures are being collected?)
151 Primary and exploratory questions and Table (Table of questions_revision.pdf).
The mapping between the document and the table is very unclear. On the one hand, the first item in the document refers us to hypotheses 4 and 5, skipping over 1-3. On the other hand, 1-3 from the table are not mentioned in the text.
I would suggest for the table to follow the order that is in the document, and everything in the table to be referred to (at least in summary) also in the document.
More fundamentally - I will preface this by saying that I don’t have experience with this format. However: the decision to call many of these analyses exploratory, but at the same time to carefully list them in the table in terms of hypotheses, measures, conclusions to be drawn, does not make sense to me. The advice above notes that it is good to “prevent exploratory analyses at Stage 2 being inadvertently presented as pre-planned”. There is so much pre-planning here that I struggle to see what is exploratory about these analyses? The only distinction that calling them exploratory seems to me to provide is that, depending on the outcomes, the authors might not feel that they need to report them all (e.g. if some are unclear or uninteresting). I would suggest that these are all carefully pre-planned analyses, and you could plan to report them all, regardless of the outcomes. Or, to maintain these as exploratory, there would be much less need to pre-register these plans for them.
180 Methods. These seem to be sound and well grounded in previous studies, including those by the authors.
235 I am concerned that grasping via control of a white sphere vs normal hand could lead to large differences in visuomotor control. This is a valid part of how VR interactions can be different to real ones, but should be discussed more among the reasons for real-VR differences (see comment in intro). I also wonder if the authors have considered an open-loop situation instead to better match the tasks on this dimensions (but perhaps applicability of this to these tasks/illusions is unclear/unknown).
It took me a while to find a clear statement of the numbers of trials to be collected (there is in one place a mention in passing on line 267 of “10 lifts” as part of a calculation). Later at line 340 there is the plan for 30 test trials (10 per object). Some justification that this number is likely to have enough power to answer the hypotheses of interest (especially the main ones)?
350 Perhaps I should know this, but why choose 3.29 standard deviations from the mean as the outlier criterion (corresponds to xx% - 99% maybe?) A criterion based on quartiles / IQR can deal better with non-normal data. On the other hand, if this has been commonly used in similar studies, it sounds OK.
The submission proposes an experiment examining differences in use of prior expectations about object weight and material properties across real-world and VR object lifting, using the size-weight and size-material illusions to probe ‘weight’ given to prior expectations. The proposed study is technically accomplished, and the analysis pipeline is clearly specified and well designed. And I find the central question—how much do we ‘trust’ prior expectations about the world in VR?—to be deeply interesting. I think there are some fundamental problems with the theoretical conceptualization of the study and resultant hypotheses, however, that mean it cannot do the job the authors intend, and so I focus on those.
The study is formulated in the framework of Bayesian inference, where the ‘weight’ given to the prior (here, prior expectations about object weight etc.) should reflect the relative reliabilities of prior expectations vs. sensory input. The two hypotheses are then described as alternative propositions about the prior (low-precision vs. high-precision). But really they aren’t. Instead, they speak to the different terms in Bayes’ rule. LPP hypothesizes that the *prior* will be less reliable (or treated as less reliable) when the subject knows they are in a virtual world, whereas HPP supposes that the *sensory input* (the likelihood, in Bayesian inference) is less reliable in VR. These are both reasonable propositions, but they are orthogonal. Both can simultaneously be true (or false, or any combination thereof). So the question appears to be ill-posed.
In more formal terms, there is (conceptually) one dependent measure (weight given to prior expectations) but two unknowns (i. reliability of sensory input; ii. reliability of prior expectations). If we allow that both sensory and prior reliabilities can change across VR and real-world—and I think we have to—it’s not possible to infer what caused any measured change in reliance on prior expectations (which the proposed theoretical interpretation depends on). Consider the case where sensory reliability and prior reliability are both reduced in VR (which seems plausible). Depending on the exact, quantitative nature of those reductions, this could result in prior expectations receiving more, less, or the same weight (though, note, here ‘no-reweighting’ would not be because nothing changed). Running the process in reverse, finding that people relied more heavily or less heavily on prior expectations does not allow us to infer how the underlying reliabilities of likelihood and prior have changed (except perhaps at the extreme ends of the possible outcomes).
A key step here would be to measure the reliability of sensory input empirically across the different situations, so it is no longer an uncontrolled variable. If the reliability of visual size information, for instance, was matched across real-world and VR, changes in the weight given to prior expectations in different contexts—which I’d argue is the deeper question here—could be ascribed unambiguously. I must say I’m not sure how to do this for the reliability with which material properties (granite etc.) are specified, or even if that’s necessary, but I think it’s worth thinking about.
Relatedly, and taking a step back, I’m a bit troubled by the VR vs. real-world manipulation as conceived here. It seems to presume VR to be ‘monolithic’, but visual information could be less reliable, same, or (in-principle at least) more reliable in VR compared to the real world, depending on the exact properties of the system used, the scene parameters and content etc. So in my view it isn’t meaningful to think in terms of general conclusions about VR vs. the real world. They need to be qualified by the type of understanding of the constituent signals and their reliabilities in a given situation, as above.
DOI or URL of the report: https://osf.io/suakb
Version of the report: 1
See attached PDF.
Thank you for submitting your Stage 1 manuscript to PCI-RR. As the recommender assigned to this manuscript, it is my role to perform an initial triage assessment to determine whether the submission is ready to be sent for external review. This assessment is primarily with respect to the RR aspects of the proposal, rather than its specific topical content. On the basis of this initial assessment, I would say that the manuscript is generally very well prepared, but that there are a number of specific points that you should consider before it sent for external review. These all relate to the analysis plan/design table, which is what I focused my assessment on.
1. It is great that you have included manipulation checks, but you must be explicit, in each case, about how your conclusions would be affected if the check were failed. Normally, a manipulation check would be to confirm an effect that would be expected if the task is working as intended, or is otherwise a necessary precondition for your experiment to be deemed capable of testing the experimental hypotheses of interest. If your manipulation check has this status, then it should be made explicit (your first manipulation check seems like it would probably have this status, but I am less sure for 2-4). If it does not have this status, then it is not an important manipulation check, and should probably be omitted.
2. Manipulation check 1 has two parts that support two different conclusions (about the SWI and MWI respectively). It should therefore probably be split into two (perhaps labelled 1a and 1b). In general, a separate hypothesis should be defined for each test that supports a separate conclusion.
3. As noted, you should consider whether manipulation checks 2-4 are necessary. Is it essential that there are no differences here (and would any differences invalidate or qualify the conclusions you can draw from your main experimental hypothesis tests)? If so, then why is it considered adequate to have 90% power only to detect a rather large effect size (dz = 0.66). If it is indeed necessary to confirm no differences then you should frame an equivalence test, rather than just failing to reject the null (or use a Bayesian approach). Also, since the verbal statement concerns ‘general differences’ between real and VR, it seems like you should have a simple independent t-test (which tests the overall difference between conditions), rather than an ANOVA (which would be sensitive to any pattern of differences between means).
4. Although manipulation checks are not your experimental hypotheses, it would still be conventional to label them as hypotheses (e.g. H1). You can still contextualise as a manipulation check, explaining how your experiment is affected if the hypothesis is not confirmed.
5. The targeted effect sizes for your experimental hypotheses reflect mean estimates from prior research. Unless these prior studies were registered reports, then it is likely that these are over-estimates of the true effect (indeed, the very fact that you are choosing to follow up these findings may also mean they are likely to be over-estimates). It might be preferable to target a lower-bound or otherwise conservative effect size. In any case, you need to provide a rationale for why the targeted effect size is appropriate (i.e. that failing to detect an effect of this size would be an important message for the field).
6. H2a, H2b: A conventionally ‘medium’ correlation is selected as the smallest effect size of interest, but no rationale is provided for why this is a theoretically or practically relevant SEOSI.
7. For some hypotheses, the associated theoretical conclusion will be drawn if the associated test is significant for either SWI or MWI. This ‘disjunction’ logic (X if either Y or Z) probably requires alpha adjustment (e.g. Rubin, 2021. https://doi.org/10.1007/s11229-021-03276-4).
8. Exploratory research questions should not be included in the Stage 1 plan, but can be added at Stage 2. This would also help to simplify the current Stage 1 manuscript.
9. In the Data treatment section of the main text, it is stated that “Data will be checked for extreme deviations from normality based on skewness and kurtosis scores. Assuming data adhere to these assumptions the tests outlined in the table of questions will be run. Non-parametric alternatives will be used if data deviate substantially from normality.” You need to provide precise statements of what will constitute sufficient deviations for you to switch analysis strategy, and you also need to state what the non-parametric alternative will be. Note that the above assumptions are not necessary for your regression analyses (which assumes normality of residuals).
10. In the Data treatment section of the main text, it is also stated that “Bayes factors using a symmetric Cauchy prior will also be used to quantifying the strength of evidence for the alternative and null hypotheses.” These do not feature in the design table, so I assume they play not role in your inferential logic. It is OK to mention that these BFs will be calculated, if it is your plan, but you should make it clear that your main conclusions will be driven only by the outcomes of the tests specified in the design table. If the purpose is to try to quantify evidence for null results, and you regard this as important for your conclusion, then perhaps a Bayesian approach would be more appropriate as your main strategy (or you could include some frequentist tests of equivalence).
====
December 2023 submission closure: Please note that to accommodate reviewer and recommender holiday schedules, PCI RR will be closed to ALL submissions from 1st December 2023 until 10th January 2024. During this time, reviewers will remain able to submit reviews, and recommenders can issue decision letters for ongoing submissions, but no new or revised submissions can be made by authors.