DOI or URL of the report: https://osf.io/s72bt
Version of the report: 2
Thank you for your revisions. I am not sure if you fully updated the version of the manuscript I see in the osf link. You still need to address the point of specifying your IV more clearly: State explicitly in the results section and the design table that the IV will be absolute prediction error. State explicitly in both places that you will use model 2. And for power, you still need to justify a roughly smallest effect of interest. I am not familiar with the pre-exisiting power programs for lme, and it may be easiest to simulate yourselves: generate data from a H1 model with a fixed raw slope (e.g.) a 1000 times, run the lme, and determine the proportion of significant effects. This way you can estimate error variance from your pilot, fix the raw slope at a just interesting value, and vary N. Maybe hiring a graduate statistics student for a few hours would enable this to get done.
DOI or URL of the report: https://osf.io/s72bt
Version of the report: 2
Thank you for your revisions, which have addressed many of the reviewers' points. One reviewer asks for some further clarifications. I would like to go back to the points I made.
1) Only report those analyses in a Stage 1 that you will base inferences on. That is, if you plan to base inferences only on model 2, do not even report the possibility of model 3 at this stage. If you want to leave open that you will base conclusions on model 3, then indicate precisely how you will decide which model the conclusions will be based on. Incidentally as you plan to keep all variables in the equation, you don't need diffferent steps. Just enter all variables simultaneously. So if you are just going to use model 2, specify only the final equation with all variables in, and then base your conclusions on that. Note that pre-registering a specific analysis now does not stop you exploring other analyses later in Stage 2 when the data are in. But these other analyses will be in a separate non-pre-registered results section, and the abstract will draw conclusions only from the pre-registered analyses, and the discussion will keep pre-registered analyses centre-stage.
2) You could treat your two IVs, absolute and relative prediction error likewise. Choose one now, and keep the other for exploration later. Just bear in mind your main conclusion in the abstract and discussion will be based on the pre-registered analysis. If you keep both in the pre-registration you need to decide how you will base conclusions on different possible patterns of results. The multiple testing problem I mentioned before remains. So if you keep both, use a multiple testing correction, e.g. Bonferroni. If you use Bonferroni, the decision rule can be if either or both slopes are significant, then there is an effect of prediction error on pleasure. You also need to calculate power with respect to the alpha determined by the multiple testing correct, e.g. .025 for Bonferroni. Now more on power. Power is to control the risk of missing out on an effect of interest. Thus, it must be calculated with respect to the smallest effect just of interest. This is of course hard to determine, but you mention the rough heuristic of using the smallest available estimated effect. Based on one pilot, this heuristic no longer captures the spirit of what is needed. I find thinking in terms of raw effect sizes more helpful than standardized ones. You usefully provide raw regression slopes from the pilot. No one can judge whether an R2 of 0.40 is interesting whereas one of 0.3 is getting boring. Whether a one unit change on your raw scale of absolute prediction error predicts 0.15 units of pleasure on your scale is interesting at least feels vaguely judgeable. (Presumably the slope mentioned next, 0.57, is for relative prediction error, so I think there is a typo here.) I presume you found this interesting, even though it is small, because it motivated you to plan this study. But it does feel on the edge. How about setting an interesting effect just a bit lower than this, say 0.1 pleasure units/abs PE unit, and caculating the power for this? I think this would work best by referring to the raw slopes in the power calculation section. As I mentioned earlier, a 80% CI on the slope, and using the bottom limit, would be a touch more objective (even if 80% is relatively arbitrary), because it is a procedure that could be repeatedly used in many cases.
best
Zoltan Dienes
The authors have addressed my previous comments. I have no further comment.
DOI or URL of the report: https://osf.io/at62v
Version of the report: 1
I have gone with the two reviews I have in, following your email, to expedite the process. The reviewers are largely positive, but have a number of points to clarify. I have a couple of points as well:
1) In a Registered Report, one ties down all analytic and inferential flexibity. While in the pilot models 2 and 3 fare best, in the main study this may or may not be true. Be absolutely explicit about the criteria you will use to end up with the one model you will draw inferences from. Make sure anyone reading your planned analyses and having your raw data for the main study would end up making the same decisions. Further, you have two DVs- there is room for inferential flexibility here (what if one DV shows one thing and the other another?), as well as a familywise error rate problem. Given your pilot, I would suggest picking one of the DVs - probably absolute prediction error. Otherwise you need to correct for familywise error and specify your decision rule depending on the different pattern of possible results.
2) The function of calculating power is to control the error rate of missing out on interesting effects. But that means one needs to calculate power with respect to roughly the smallest effect that you do not want to miss out on. The PCI RR guidelines for authors puts it "power analysis should be based on the lowest available or meaningful estimate of the effect size". As one reviewer points out, you actually use an effect size estimate for power larger than that obtaind in the pilot. Yet presumably an effect considerably smaller than found in the pilot would still be interesting and one you would not want to miss out on. The issue is discussed in some detail here: https://doi.org/10.1525/collabra.28202 I would suggest finding the 80% CI for each effect in the pliot that is in your Design Table, and using the bottom limit of the interval in each case as the effect you use for calculating power. But you might have other ideas, after the reading the linked paper, for how to address the point.
Finally, you phrase the issue as predicting pleasure from prediction error, but prediction error is not completely assessed until pleasure is. A causal diagram of what is going on could of course take many forms. No need to do anything about this now, but I presume the discussion will touch on this point - all in good time.
best
Zoltan
The authors present a stage 1 manuscript to investigate how error predictions of perceived exertion are related to feelings of pleasure. I commend the authors on their willingness to do a registered report, detailed methods, and their sharing of data and code. I view this is as an opportunity to try and help the authors improve their future study (planned to start in October, 2023).
I have a few major comments and some minor comments.
Major comments:
Minor Comments