Using gamification to improve food response inhibition training
The effects of isolated game elements on adherence rates in food-based response inhibition training
Recommendation: posted 16 November 2023, validated 17 November 2023
Leganes-Fonteneau, M. (2023) Using gamification to improve food response inhibition training. Peer Community in Registered Reports, . https://rr.peercommunityin.org/articles/rec?id=499
Related stage 2 preprints:
Alexander MacLellan, Charlotte R. Pennington, Natalia Lawrence, Samuel J. Westwood, Andrew Jones, Anna Slegrova, Beatrice Sung, Louise Parker, Luke Relph, Jessica O. Miranda, Maryam Shakeel, Elizabeth Mouka, Charlotte Lovejoy, Chaebin Chung, Sabela Lash, Yusra Suhail, Mehr Nag Katherine S. Button
https://doi.org/10.31234/osf.io/2e73b
Recommendation
Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA.
List of eligible PCI RR-friendly journals:
- Addiction Research & Theory
- Advances in Cognitive Psychology
- Collabra: Psychology
- Journal of Cognition
- Peer Community Journal
- PeerJ
- Royal Society Open Science
- Studia Psychologica
- Swiss Psychology Open
References
1. MacLellan, A., Pennington, C., Lawrence, N., Westwood, S., Jones, A., & Button, K. (2023). The effects of isolated game elements on adherence rates in food-based response inhibition training. In principle acceptance of Version 1.6 by Peer Community in Registered Reports. https://osf.io/jspf3
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.
Evaluation round #1
DOI or URL of the report: https://osf.io/waqdh?view_only=c87013dfff8e43aea702c8cc83b7d2e1
Version of the report: 1.5
Author's Reply, 09 Nov 2023
Decision by Mateo Leganes-Fonteneau, posted 23 Oct 2023, validated 23 Oct 2023
Thank you for submitting your Stage 1 manuscript, “The effects of isolated game elements on adherence rates in food-based response inhibition training” to PCI RR.
The reviewers and I were all in agreement that this pre-print and the planned project adheres to the principles of open science and is suitable for a level 6 registered report format, but that the Stage 1 manuscript would benefit from some revisions. Accordingly, I am asking that you revise and resubmit your Stage 1 proposal for further evaluation.
I agree with the reviewers that the proposed project is well-motivated, and the methodology was described in detail. The reviewers have some comments that I believe will be helpful. I concur with the comments made by Reviewer 1 regarding the power analysis, I also had noticed some of the inconsistencies pointed out in the review.
Reviewer 2 makes some interesting methodological suggestions that the authors may want to incorporate. The inclusion of an implicit test might be worthwhile considering the amount of work that will go into this project. If the authors plan to use additional measures as part of the confirmatory hypotheses, then it would need to be incorporated precisely into their design (including any implications for the analysis plans and sampling plan, if applicable). But if they wish, they could include it purely for exploratory analyses, including a general description of the exploratory questions tackled with this test.
When submitting a revision, please provide a cover letter detailing how you have addressed the reviewers’ points.
Thank you for submitting your work to PCI RR, and I look forward to receiving your revised manuscript.
Mateo Leganes-Fonteneau
PCI RR Recommender
Reviewed by Miguel Vadillo, 28 Aug 2023
Before I start my review, I must disclose that although I am not completely unfamiliar with this literature, I do not consider myself an expert in cognitive training programs for unhealthy eating and I can’t judge the more conceptual and theoretical aspects of the proposal. As far as I can see, the protocol addresses an interesting topic and in general the proposal seems well designed.
Perhaps my most important concerns are related to sample size and power analysis. In fact, I wasn’t able to understand what’s the sample size the authors are planning to recruit. On pages 7-8 the authors state that “Detecting effect sizes of f=0.23 … would require 80 participants per group… which was deemed achievable with our resources… this was selected as our target sample size”. But just a few lines below… “Our total sample size is therefore set at 150 to detect effects in our primary and secondary hypothesis”. So, is it 50 per group or 80? To make things more complicated, in the final table in the supplementary material states “we propose to recruit 51 participants per group” and in the next row “we propose to recruit 30 participants per group”. Maybe I missed something, but I wasn’t able to follow any of this.
Also, in general, the structure of the power analysis section looks a bit awkward to me. The authors begin by discussing reasonable effect sizes that could be expected based on previous studies, but in truth their sample size is not based (a priori) on any of those estimates. Instead, it looks like sample size will be mainly determined by the availability of resources. Therefore, perhaps this section would be much clearer changing the other of ideas. Perhaps along the lines “With our resources we can afford to test X participants. With this sample size we can detect an effect size of X with 90% power. This effect size is reasonable based on previous evidence.” In other words, if sample size is based on resource availability, it doesn’t make sense (I think) to present the power analysis section as an a priori power analysis (i.e., based on effect size X we need Y participants); a sensitivity analysis is possibly more appropriate and clear (i.e., with the Y participants we can afford, we have reasonable power to detect X).
Another detail of this section (and other bits of the text) is the constant change from f-units to d-units. If the final sample size is going to be, say, 50 per group, then it would be nice to have a simple sentence explaining what’s the smallest f that can be detected across 3 groups with 50 x 3 participants and what’s the smallest d that can be detected with in a pairwise comparison with 50 x 2 participants.
Also, wouldn’t it be more appropriate to explain the power for the TOSTER equivalence test on this same section instead of presenting it on p. 14? I must confess that a minimal effect size of d = .6 doesn’t sound terribly convincing for TOSTER. Technically, this choice means that the authors consider d = .6 too small to matter, but the average observed effect size in psychological research is around d = .5 (see, e.g., Bakker et al. 2012 PPS). So, in principle this logic implies that the authors consider that most effect sizes reported in psychology are irrelevant!
I understand that this is a somewhat biased comment, but the authors might want to mention that one of the shortcomings of this area of research is, precisely, that statistical power is often too low (Navas et al., 2021, Obesity Reviews).
In RQ1 and RQ2, the authors plan to test their hypothesis with one-way ANOVAs, but I imagine that if they find a significant result they will want to follow up on this with pairwise analyses. Shouldn’t these be mentioned in the analysis plan?
In RQ3, the authors plan to test if motivation/adherence mediate the effects of the intervention on food evaluations, but wouldn’t it be even more interesting to test the mediating effect of these variables on actual snacking?
In the same vein, in RQ4 the authors only plan to test whether both interventions are comparable in terms of motivation/adherence, but wouldn’t it be more interesting to test whether they are similar in terms of effectiveness? (i.e., in terms of food evaluations and snacking?)
Minor comments
The authors will run frequentist and Bayesian analysis, which I think is great. But what will they conclude if different analyses lead to different conclusions? In the same vein, the authors state that they will run all the analyses both including and excluding participants who fail the attention check. But what will they conclude if the results are not identical? In general, it is not a good idea to have multiple confirmatory tests for the same hypotheses in Registered Reports, as this leaves too much analytical flexibility and provides more opportunities for biases in the interpretation of results. If the authors think that excluding participants is best (or that frequentists statistics are more appropriate) they should probably stick to those analyses in the pre-registered protocol. This doesn’t prevent them from presenting additional analyses in the exploratory section. But ideally the authors should state a priori what are the analyses that in their opinion provide the strongest test for their hypothesis.
The second paragraph on page 8 mentions for the first time “secondary” analysis. Although these hypotheses are not of primary interest, maybe something about them should be explained at the end of the introduction, so that the reader knows that further tests will be run before they reach this paragraph. This will also help the reader understand the “exploratory outcome variables” section on p. 11.
P. 10. Participants will be asked to report the confidence in their food evaluations. Is it possible that participants prefer one food to another but with little confidence?
p. 12. Isn’t it weird to remove participants who perform the task too well? (2 SDs above the mean?)
Appendix B. What’s the effect size unit in the power curve?
Final table, first row. In the sampling plan the author present a g = .72 as reference but the corresponding analysis is a one-way anova with 3 groups, for which cohen’s g is undefined (to the best of my knowledge). Note also my previous concerns about sample size and power analysis, as they apply to this table as well.
Reviewed by anonymous reviewer 1, 25 Sep 2023
Does the research question make sense in light of the theory or applications? Is it clearly defined? Where the proposal includes hypotheses, are the hypotheses capable of answering the research question?
YES
Is the protocol sufficiently detailed to enable replication by an expert in the field, and to close off sources of undisclosed procedural or analytic flexibility?
YES
Is there an exact mapping between the theory, hypotheses, sampling plan (e.g. power analysis, where applicable), preregistered statistical tests, and possible interpretations given different outcomes?
YES
For proposals that test hypotheses, have the authors explained precisely which outcomes will confirm or disconfirm their predictions?
YES
Is the sample size sufficient to provide informative results?
YES
Where the proposal involves statistical hypothesis testing, does the sampling plan for each hypothesis propose a realistic and well justified estimate of the effect size?
YES
Have the authors avoided the common pitfall of relying on conventional null hypothesis significance testing to conclude evidence of absence from null results? Where the authors intend to interpret a negative result as evidence that an effect is absent, have authors proposed an inferential method that is capable of drawing such a conclusion, such as Bayesian hypothesis testing or frequentist equivalence testing?
YES
Have the authors minimised all discussion of post hoc exploratory analyses, apart from those that must be explained to justify specific design features? Maintaining this clear distinction at Stage 1 can prevent exploratory analyses at Stage 2 being inadvertently presented as pre-planned.
YES
Have the authors clearly distinguished work that has already been done (e.g. preliminary studies and data analyses) from work yet to be done?
YES
Have the authors prespecified positive controls, manipulation checks or other data quality checks? If not, have they justified why such tests are either infeasible or unnecessary? Is the design sufficiently well controlled in all other respects?
YES
When proposing positive controls or other data quality checks that rely on inferential testing, have the authors included a statistical sampling plan that is sufficient in terms of statistical power or evidential strength?
YESDoes the proposed research fall within established ethical norms for its field? Regardless of whether the study has received ethical approval, have the authors adequately considered any ethical risks of the research?
YES
Reviewed by anonymous reviewer 2, 23 Oct 2023
This registered report tests several response inhibition techniques with varying forms of gamification. I find the overall research topic to be valuable and interesting, and the manuscript thus far to be well written and informative. My comments on the manuscript are as follows:
Main Comments
· The authors might consider explicitly adhering to a reporting guideline for trials, such as SPIRIT or similar. It appears the content of such checklists is largely covered in the manuscript, but an explicit report of a checklist may add value to the already strong open science basis of this trial.
· For power analysis, please include the specified alpha value (I assume .05?). Otherwise, I accept the authors explanation. I also suggest for the less informed reader that the authors note .23 constitutes a medium effect size in Cohen effect size taxonomy, give using this taxonomy is another common method of arriving at power estimates.
· I agree with the authors choice of measures. Another potentially valuable addition here might be some measure of automatic or implicit attitude towards target foods such as the implicit association test or affect misattribution procedure. I believe templates for these measures are available on Gorilla already if the authors choose to make use of this suggestion.
· Will training and data collection be restricted to any particular type of device? i.e., will training be required to be conducted on a computer or is a touch screen version for tablets or phones available? If there is a restriction I suggest noting this.
· I have not used the particular Bayes package in question, but details on the default priors is somewhat essential here, especially as many default priors are uninformed. If this is the case you would expect almost identical results using Bayes, so I am unsure of what additional value this analysis adds. Give the authors cite several previous tests and analyses, could these be used as priors?
Minor Issues
· Some minor typos throughout, e.g., in H2c “the” is missing
· Given there are so many forms of food frequency questionnaire out there, I would suggest referring to the measure here as “a unhealthy snacking based food frequency questionnaire” or something similar, rather than “the food frequency questionnaire”
· RQ3 ANOVA says 3x2, but lists 4 groups. I assume intervention is the duplicate as its covered later by the actual group names.