Close printable page
Recommendation

Can gamified response training with sugary drinks help people to resist consumption?

based on reviews by Loukia Tzavella, Matthias Aulbach and Pieter Van Dessel
A recommendation of:

Sugary drinks devaluation with response training helps to resist to their consumption

Abstract
Submission: posted 22 June 2023
Recommendation: posted 02 October 2023, validated 02 October 2023
Cite this recommendation as:
Chen, Z. (2023) Can gamified response training with sugary drinks help people to resist consumption?. Peer Community in Registered Reports, . https://rr.peercommunityin.org/articles/rec?id=493

Recommendation

The excessive consumption of energy-dense but nutrient-poor foods and drinks can lead to many health problems. There is thus an increasing need for tools that may help people reduce the consumption of such foods and drinks. Training people to consistently respond or not respond to food items has been shown to reliably change their subjective evaluations of and choices for these items, mostly within laboratory settings. However, evidence on whether such trainings can also modify real consumption behavior remains mixed.

Najberg et al. developed a mobile-based response training game that combines two training tasks, one in which people consistently do not respond to sugary drinks (i.e., the go/no-go training), and one in which they consistently respond to water items rapidly (i.e., the cue-approach training). Recent work showed that after the training, participants in the experimental group reported more reduction in liking for sugary drinks and more increase in liking for water items compared to the control group. However, both groups showed equivalent reduction in self-reported consumption of sugary drinks (Najberg et al. 2023a). 

In the current study, Najberg and colleagues will further examine the efficacy of the gamified response training, by testing whether the training can help people resist the consumption of sugary drinks (Najberg et al. 2023b). Participants will be divided into experimental and control groups, and will receive the respective training for a minimum of seven days (and up to 20 days). After completing the training, they will be asked to avoid the trained sugary drinks, and the number of days in which they successfully adhere to this restrictive diet will be used as the dependent variable. Reporting the time at which one consumed a certain drink is presumably easier than reporting the exact volume consumed (cf. Najberg et al. 2023a). Furthermore, certain diets may require people to avoid specific foods and drinks entirely, rather than merely reduce the amount of consumption. Examining whether the training will be effective in this setting will therefore be informative. The authors will additionally examine whether the amount of training one completes, and the changes in subjective valuation of drinks after training, will be correlated with the successful avoidance of sugary drinks. These results will offer insights into the underlying mechanisms of the training and provide guidance on how they may best be implemented in applied settings.

This Stage 1 manuscript was evaluated over three rounds of in-depth reviews by three expert reviewers and the recommender. The recommender judged that the manuscript met the Stage 1 criteria and therefore awarded in-principle acceptance (IPA).
 
URL to the preregistered Stage 1 protocol: https://osf.io/97aez

Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA.
 
1. Najberg, H., Mouthon, M., Coppin, G., & Spierer, L. (2023a). Reduction in sugar drink valuation and consumption with gamified executive control training. Scientific Reports, 13, 10659. https://doi.org/10.1038/s41598-023-36859-x

2. Najberg, H., Tapparel, M., & Spierer, L. (2023b). Sugary drinks devaluation with response training helps to resist their consumption. In principle acceptance of Version 4 by Peer Community in Registered Reports. https://osf.io/97aez
Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

Reviews

Evaluation round #3

DOI or URL of the report: https://osf.io/9wyuq?view_only=4934c0215f2943cfb42e019792a30b53

Version of the report: 3

Author's Reply, 28 Sep 2023

Decision by , posted 27 Sep 2023, validated 27 Sep 2023

Dear Dr. Najberg,


Thank you again for submitting the revised version of your Stage 1 RR to PCI RR, and for being responsive to previous comments. Most comments have been addressed satisfactorily. The R script makes the data analysis plan much more concrete, and I would recommend including it as part of your pre-registration to reduce researchers' degree of freedom in data analysis. The sample size justification is now based on resource constraints, which seems to be a more accurate reflection of the actual situation. That being said, I think important issues on sample size justification remain, and need to be further addressed.


First, since no new participants will be recruited to replace excluded ones, I assume the final sample size will be below 140. Based on your prior experiences, how many participants do you expect to exclude, and accordingly, how many do you expect to retain in the final sample? The sensitivity analysis will need to take this into account, and thus be based on the final sample size (i.e. after potential exclusions).


Second, and more importantly, after conducting a sensitivity analysis, you will still need to interpret the smallest effect size that can reasonably be detected, and justify why you think an investigation with such a sample size is worthwhile. After all, if a certain sample size only offers a reasonable chance to detect fairly large sample sizes, one may argue that the result will not be informative, and it is not worthwhile to embark on a project. In the manuscript, you mentioned that 140 participants (but see above) provide sufficient power to detect Cohen’s d of 0.5, and r of 0.4, which are deemed ‘relevant’ and ‘non-negligible’. Most people would probably agree that these two effect sizes are relevant in the current context. However, personally I would consider a smaller effect size for H1, e.g. Cohen’s d of 0.3, also to be relevant, but 140 participants only provide around 55% power to detect this effect. The question that needs to be better addressed is thus why you think this investigation is still worthwhile, given that it will likely miss smaller but potentially also relevant effects. (In other words, why is it okay to miss smaller effects in the current investigation?)


To be clear, I am not asking for a ‘general-purpose justification’. The interpretation of effect sizes is supposed to be highly context-dependent. The reasons that you mentioned in the response letter seem relevant. In the manuscript, you mentioned the sample size to be sufficient, without really saying why. This ‘black box’ needs to be open, so that readers can better judge the soundness and validity of the justifications. This is a rather important issue - sample size justification based on resource constraints need to be subject to the same level of scrutiny as other methods (e.g., a priori power analysis), to ensure that the results will be informative. I sincerely hope that this issue can be sufficiently addressed in this round of revision.


Kind regards,

Zhang Chen

Evaluation round #2

DOI or URL of the report: https://osf.io/e68ja?view_only=4934c0215f2943cfb42e019792a30b53

Version of the report: 2

Author's Reply, 25 Sep 2023

Decision by , posted 20 Sep 2023, validated 20 Sep 2023

Dear Dr. Najberg,


Thank you for submitting the revised version of your Stage 1 RR, now titled “Sugary drinks devaluation with response training helps to resist their consumption” to PCI RR. Two expert reviewers who have assessed the initial version have now re-reviewed the revised manuscript. Most of the previous comments have been addressed satisfactorily. However, there are still some remaining issues that are not fully resolved yet. I would therefore like to invite you to further revise the manuscript to address these issues.


1. One main issue is that sample size justification is not entirely clear. I appreciate the increased sample size, which will for sure make the results more informative. However, this does not really answer the question of how the smallest effect sizes are determined. For H1, a difference of 5 days (with an estimated standard deviation of 10) is said to be relevant in an applied setting. But what are the exact reasons and/or justifications behind this choice? In the response letter, you mentioned that this decision was made based on discussions with board certified dieticians and your own previous studies on item valuation. I think the exact content of the discussions, and the previous effect sizes, ought to be mentioned in the manuscript. For instance, is this effect size determined by comparing it to other existing interventions, or based on certain guidelines in the field? Such information will allow readers to be better informed on what this effect size exactly entails in the current context. Similar issues exist for H2 and H3, where r = 0.4 is the smallest effect size of interest. At the moment, this is said to be “based on clinical subjectivity”, but it is unclear what this means.


In the response letter, you additionally mentioned that you had to take into account the resources available for this project, which is of course a constraint that we often face. Basing the sample size on resources available is completely legitimate (resource constraints, see https://doi.org/10.1525/collabra.33267). In this case, an alternative approach may be to start with a maximum sample size allowed by resources, and then conduct a power sensitivity analysis, to see what is the smallest effect size that can reasonably be detected. Again, it is important to put these smallest detectable effect sizes into context, e.g. explain what they entail in clinical and applied settings, and/or in relation to previously observed effect sizes.


Related, I think the “Rationale for deciding the sensitivity of the test” in the design table is relevant, and should contain the justifications that are currently missing. I would therefore suggest putting this column back.


2. Some aspects of data analysis are a bit vague to me. To make it more concrete, can you please generate some 'fake' data and use it to write down the R code that you would use to analyze the real data? Some more specific questions include:


2.1 This may be due to my own lack of understanding. For H1, you wrote that you would apply the Greenhouse-Geisser correction. However, I've only seen the GG correction for repeated measures, whereas H1 involves two independent groups. Also, it is unclear to me how the GG correction would then be combined with an independent t-test.


2.2 If I understood it correctly, participants that were excluded based on 2.5 MAD range (i.e., 'distribution outliers') would not be replaced. Please mention this explicitly in the manuscript.


2.3 Positive controls. The exact steps for removing participants are a bit vague, so writing down the R code would really help. One reason for not replacing distribution outliers (see 2.2) is that the thresholds may change each time when participants were replaced. I wonder whether the positive controls would not potentially create a similar 'circularity' problem? After all, Cohen's d between groups is computed on all data points, which is essentially the same issue that one may face when computing e.g. MAD? Also, data collection for this project is very time-consuming (up to about 3 months), so in case there is a need to repeatedly replace participants (e.g., if by replacing some participants, new participants will need to be replaced), the data collection phase might be very long.


2.4 For Bayes factors, please add the priors that you are going to use into the manuscript itself. I also agree with some previous comments from reviewers, that you should report Bayes factors for all results, not just for null results. You can still specify that the statistical inference will be based on p values. However, adding Bayes factors to all results does not seem to complicate the results too much (it's just BF = x for each effect), but does provide a more complete picture of the results. 


3. One reviewer (Dr. Van Dessel) questioned the use of a strict cutoff value for determining the 'relevance' of a finding. I have a related question, which is about how this inference will be made. Will you formally test this, for instance by comparing the 95% confidence interval of the estimate, and declare an effect as relevant only when the lower bound of the CI exceeds the cutoff value? Or would you simply look at the point estimate itself and see if it exceeds the cutoff value or not? I think the former approach is the better one, but it will likely require a much larger sample size (the CI will need to be rather narrow). The second approach is not principled, because there is uncertainty in the point estimate. As such, reporting the estimates and the associated variations, and putting these estimates in the current context (i.e. what they mean in the current setting, see Point 1 above) seem like a more nuanced approach.


Some more minor issues:


1. Title: This is a matter of personal taste. At the moment, the title assumes that the training will be effective, which may not be the case after the data is in. You may want to pose the title as a question, and also add 'Registered Report' to highlight that this is a RR.


2. Page 2: "There is, however, little evidence supporting real-life effects of cognitive bias modification". Cognitive bias modification sounds like a very general term. Since this paragraph is about the cue-approach training, you may want to modify the statement to be more specifically about the CAT.


3. Page 3: 'it is easier to report and less biased by memory …' than food frequency questionnaires and food journals?


4. Page 3: "letting the participant stop their training whenever they want in a two-weeks window enables to investigate the link of the intervention’s length on its real-world effect size, thereby allowing to formulate recommendations for its use in applied settings." I believe the training window is now 20 days? Furthermore, the reviewers correctly pointed out that H3 cannot be explained as causal effects, but the implication (i.e., formulating recommendations on how long the training would be) still implies a causal interpretation?


5. Page 6: There is a mention of ECT. 


6. Page 7: The section heading for the CAT says 'Attentional bias modification'.


7. Page 8: Table 2 and 3 should be Table 3 and 4.


8. Page 12: The content of Table 1 is not updated. The 'Interpretation given different outcomes' only depicts one possible outcome, but I think you really ought to list all possible outcomes and explain how you would interpret each of them. The interpretation for H3 implies a causal interpretation, but I do not think the current data can support that.

Reviewed by ORCID_LOGO, 08 Sep 2023

I extend my appreciation to the authors for their receptive approach to the suggestions provided. Overall, the authors have made substantial improvements to the paper, resulting in a notably enhanced manuscript.

Here are several observations I made while reading the revision of this manuscript:

1. Introduction Clarity:

The introduction section contains several areas that could benefit from further clarity. It would be advisable to take out any reference to cognitive processes, as the primary focus appears to be on evaluating effectiveness rather than delving into cognitive explanations.

Specific points of concern include:

Page 2, paragraph 1: The statement, "Interestingly, recent evidence indicates that the practice of tasks involving the execution or inhibition of motor responses to food cues modulates their self-reported value, and their consumption," could be nuanced to indicate that the evidence suggests these practices can modulate these variables. In general, it is best to avoid making strong claims or clearly outline evidence supporting such claims in the event that there would be strong enough evidence.

Page 2, paragraph 2: “The repeated inhibition of motor response to unhealthy cues is thought to reduce their reward value to solve the conflict between the task demand for response withholding and their tendency to respond to palatable cues”. This sentence is very complex and it is not well explained (what is this "reward value" or this "tendency"). It would also be crucial to specify the source of these theoretical explanations as there are many theoretical explanations of observed effects. It might be better if the authors omit this sentence and instead of talking about these cognitive explanations simply explain the procedure and the findings more. The same holds for the next paragraph on the cue-approach-task.

The fourth paragraph is relevant as it goes into evidence of effectiveness. However, the next paragraph again goes into the cognitive processes (making some strong claims without presenting evidence and without explaining the cognitive constructs well). Consider omitting paragraph 4 as this would streamline the introduction and allow it to flow more smoothly into the next paragraph, which is well-written and clear.

Page 3: There is again a strong claim, this time the claim is that adherence to a restrictive diet is valuable due to it being easier to report and less biased. This claim requires substantiation with references or additional context or should be omitted.

2. Sampling Plan and Effect Size:

The section regarding the sampling plan raises questions about the choice of stringent cutoff values for effect sizes. For instance, the authors note: "any smaller effect than r =.4 will not be interpreted as relevant even if significant.” This seems very arbitrary. Why would there be such an important difference between  an effect of r=0.40 versus r =0.39. It's important to consider that researchers are increasingly avoiding strong cut-off points and instead reporting all values clearly (every piece of evidence is relevant), allowing readers to assess the findings in a nuanced manner.

Authors note that “For H2 and H3, which only consider the experimental group, the smallest effect size of interest was estimated to be small (r = 0.4) based on clinical subjectivity”. I’m not sure what they mean with clinical subjectivity. Additionally, it's worth noting that "r = 0.4" is generally considered a moderate to large effect size, rather than a small one, which should be accurately reflected.

3. Demand Compliance Measure:

I also still wonder why the authors did not include a demand compliance measure. The authors note in their response to the suggestion to include this measure: “Concerning the demand compliance question, the experimental group should not have a larger response bias than the control group. Contrasting experimental vs. control should thus isolate any effect of this potential bias.” I’m not sure if the authors have data to support this claim. It seems there is contrasting evidence that, in the experimental group, participants are often much more likely to figure out the purpose of the training and are more likely to become demand aware and give demand compliant answers (and sometimes reactant answers as well). 

Reviewed by ORCID_LOGO, 18 Sep 2023

The authors have replied to all my comments in a satisfactory manner. There remains, however, one point to be clarified, as I keep thinking about the personalized item set: what happens if a participant reports drinking less than eight of the drinks at a value above 0? Will a random selection of zero-value items be included in the training? I might be worrying about this too much as I don’t have experience studying sugary drinks consumption, so maybe the authors could clarify this point. Relatedly, will the study be advertised as relating to the reduction of sugary drinks consumption? That, of course, would lead to a selective sample in which consumption is probably rather common.

Apart from this, I have no further points and wish the authors best of luck with their study.

Evaluation round #1

DOI or URL of the report: https://osf.io/qe6j7?view_only=4934c0215f2943cfb42e019792a30b53

Version of the report: 1

Author's Reply, 07 Sep 2023

Decision by , posted 21 Aug 2023, validated 21 Aug 2023

Dear Dr. Najberg,


Thank you for submitting your Stage 1 Registered Report “Sugary drinks devaluation with executive control training helps to resist their consumption” to PCI Registered Reports.


I have now received comments from three expert reviewers in this field. As you will see, while we all agree that this RR addresses worthwhile and relevant questions, several aspects of the manuscript can be further improved. All three reviewers have provided valuable feedback on how you may do that. Based on the reviews and my own reading, I would therefore like to invite you to submit a revised version of the manuscript.


1. For the introduction, I agree with Reviewer 1 (Dr. Van Dessel) that it is important to not make too strong and too general (and thus sometimes inaccurate) claims. Similarly, I think 'executive control training' is not the best term here, as neither the go/no-go training nor the cue-approach training trains 'executive functions' or 'cognitive control' per se. It is also unclear to me whether the 'executive control training' in the second paragraph refers specifically to the go/no-go training, or a combination of both training tasks. If the former, then the claim that "instruction to withhold responses to cues may reduce their hedonic value by developing attentional biases away from them" does not seem entirely accurate (and one of the two references for this claim is actually about the cue-approach training). If it refers to both training tasks, then it is also a bit odd, as different underlying mechanisms have been proposed for these two tasks. For these reasons, I think it would be clearer if you would start directly with the two specific tasks, explain what they entail (for readers who are not familiar with them), the general findings and the proposed underlying mechanisms. Note that should you adopt this advice, the title of this RR will need to be adjusted accordingly.


2. One of the proposed interventions does not look like an attentional bias modification task to me, but more like the cue-approach training (see comment by Reviewer 2, Dr. Aulbach). Although the effects of the cue-approach training have indeed been explained via attentional mechanisms, it would be more in line with previous work and less confusing if you would refer to this task as the cue-approach training.


3. One critique of previous work is that they employed self-reports that might be susceptible to "memory and social confounds". This is very true. However, I do not think that the main dependent variable here is completely free from these confounds. After all, it is also a self-report, and will likely also suffer from memory biases and social desirability issues.


4. Sampling plan: All reviewers had concerns about the sample size. First of all, What are the "principled grounds" to determine the potential effect size of interest? Obtaining an effect size as large as a Cohen's d of 0.7 (7 divided by 10) for H1 seems highly unlikely. A recent meta-analysis on the effects of food-specific go/no-go training on explicit food liking revealed an effect size of Hedge's g of 0.285, much smaller than 0.7 (Yang, Y., Qi, L., Morys, F., Wu, Q., & Chen, H. (2022). Food-specific inhibition training for food devaluation: a meta-analysis. Nutrients, 14(7), 1363). Second, for H2 and 3, please specify whether they will be tested in the experimental group only (which makes more sense to me), or with both groups combined. Similarly, please provide more justification for why r = 0.4 is a reasonable effect size of interest. Note that the "Rationale for deciding the sensitivity of the test" column in Table 1 does not provide sufficient justification. Rather, it merely re-states that the effects need to be this big otherwise it is not relevant (but why?). Third, given the many exclusions planned, please make clear whether the excluded participants will be replaced until the planned sample size is reached. If not, then the planned sample size will need to be larger still to leave room for potential exclusion.


5. As Reviewer 3 (Dr. Tzavella) suggested, please share all experimental materials (e.g., the custom-made health questionnaire for assessing eligibility, the exact questions in the weekly questionnaires and the final debriefing questionnaire; these can be shared in e.g. Supplemental Materials). This will allow the reviewers to better assess the experimental procedure, and also facilitate the (re-)use of these materials in future work. Another major issue is that the two intervention tasks need to be described in much more detail, so that readers will understand the interventions without having to go to a previous publication.


6. There is some ambiguity in exactly how "the end of the training phase" is defined, and accordingly the main dependent variable "days of successful restraint". I imagine that you will plan two weeks as the training phase, and administer weekly questionnaires after those two weeks. If a participant finished 7 days of training and then stopped, is the end of training for this particular participant the day they finished the last training, or the end of the two-week training phase, the same as with all other participants?


7. The analysis plan section is a bit difficult to follow. I think it will be clearer if for each hypothesis, you would start with the raw data, explain the data exclusion and aggregation methods step by step, and then specify the eventual confirmatory analysis plan. At the moment, the information is scattered around in different sections and not always in the order of how you would process the data. Some more specific comments about data analysis:

7.1 I think the independent t test function in base R uses Welch's t test by default, which handles unequal variances between groups better than Student's t test (Delacre, M., Lakens, D., & Leys, C. (2017). Why psychologists should by default use Welch's t-test instead of Student's t-test. International Review of Social Psychology, 30(1), 92-101).

7.2 For H1, it should be "one-sided independent t-test" rather than "one-sided dependent t-test"?

7.3 "The mean explicit liking of each participant will be trimmed of their 20% highest and 20% lowest rated items at pre-intervention". If I understood it correctly, only the 8 most drunk items will be selected for each participant. In that case, what does trimming the 20% highest and 20% lowest rated items mean (20% * 8 = 1.6 items)? Also, how would this influence H2? The consumption behaviour is still about the 8 selected items I assume, but the explicit liking is only about a subset of these items.

7.4 Some of the positive control criteria seem better suited for exploratory analysis. For instance, it would be informative to examine dieting compensatory strategy, but why condition the analysis of H1 on this variable? There seems to be much flexibility in exactly which participants will be excluded, so it is not really a 'confirmatory' analysis anymore.

7.5 Related, to control "Baseline reported consumption", wouldn't including it as a covariable be a better approach than removing participants from data analysis?

7.6 For "Pre-post explicit liking reduction", I agree with Reviewer 2 that you can run the correlation analysis even if the overall effect is not statistically significant. Again, I feel removing participants to reach a certain criterion does not sound like a good approach here. Removing participants will further limit the range of this variable, which may make it even more difficult to detect the correlation, should it exist.

7.7 For both H2 and H3, I think it's a good idea to make e.g. scatter plots and visually inspect the underlying distributions. However, the analysis plan for H3 seems rather flexible to me because it involves visual inspection of a distribution (and thus much room for subjective judgement). If there is previous data showing that the distribution is likely to be uniform, I think that's a good working assumption for the confirmatory analysis. If that's not the case, you may adopt alternative statistical methods but those can then be clearly labelled as exploratory.


The reviewers made other excellent points that I will not reiterate here, but it is important to carefully consider and respond to each of their comments. I wish you good luck with revising the manuscript, and I look forward to seeing the revision.


Kind regards,

Zhang Chen

Reviewed by , 20 Aug 2023

Review for Stage 1 RR: 

“Sugary drinks devaluation with executive control training helps to resist to their consumption”


Sampling plan

If 50 participants are needed in the experimental group for H2 and H3 why is the target sample size based on H1 which requires a smaller number of participants in each group (N= 36)? Also, I would be more inclined to use a d of 0.4 for the power analysis and not the chosen mean/sd differences that result in a Cohen’s d of 0.7, as in the rest of the manuscript you mention a d of 0.4 as your benchmark (smallest effect size of interest). In any case please make this clearer as reading this the first time I thought that the 7-day difference resulted in a d of 0.4. If you apply this change, your total sample size would be 216. A greater sample size may also allow for more power for H2 and H3 analyses (under 50 participants in the experimental group - which will further be reduced if you apply data exclusions as per my comment for the Pre-post explicit liking reduction section).


Recruitment and screening

As an inclusion criterion, willingness to follow a restrictive diet is important but it is also worth recording participants’ baseline consumption behaviour. Are you including everyone from people who rarely drink sugary drinks to people who drink more than a few sodas a day? In this section it is important to also add any methodological details or at least point to where the reader can access them (what are the exact questions for your screening- e.g. how do you define ‘healthy’ individuals in this context). 


Training tasks

I understand that the tasks have been used in previous studies but for this Registered Report I think you should not omit the methodological details and specific parameters of the tasks being administered (contingencies, time limits, number of trials, feedback element etc.). They are central to the study and should be presented as part of the Stage 1 proposal for further evaluation - even if the app cannot be changed at this time. The video for the app demonstration was really helpful - you could add a figure with screenshots from the game in the main text for convenience as well. 


Questionnaires

Please add a reference to supplementary material or an online repository where the full questionnaires can be found. In the text  you can add more details about the 10 items being included in the health questionnaire - what are you measuring with regards to participants’ health?  


Analysis plan

In this section  you need to specify all the analyses that will be run and treated as ‘confirmatory’ with details so you can move the paragraphs from the Statistical contrasts section here. For Bayesian analyses, what priors will you be using for the t-test and correlations? Also, while BFs can be very informative in the case of inconclusive results it would be preferable in my opinion to report them for all results irrespective of significance.  Also, although it may seem obvious please state on which statistics you will base your conclusions on (e.g. frequentist but BFs reported in a supplementary manner?). 


Data exclusions

Could you add details here about potential missing data and related exclusions with regards to your questionnaires? For example, what if participants don’t complete the weekly questionnaires, or if information is missing (e.g. exact dates of first consumption etc.)?


Given that reaction times from the analogue scales are recorded I presume that you can also have access to training performance data. While technically a day of training can be counted as successful if completed, I believe you should mention whether potential exclusions can apply to adherence. For example, does training proceed if you miss the reaction time window (if there is one, not currently known based on the details presented) or if participants simply skip trials and not interact with the game? 


Since the proportion of successful inhibitions in such training tasks may be a moderator of training effects (see meta-analysis by Jones et al. 2016), it is worth considering a performance benchmark for data exclusions - e.g. if one day of training is completed but participants fail to stop for more than half of the trials. As this may be a conservative criterion for data exclusions given the sample size you propose, it would be interesting to add training performance as a secondary outcome or consider certain exploratory analyses at Stage 2 to look at learning effects and inhibition success.


Statistical contrasts

Please state why have you chosen this criterion for your effect interpretation - i.e. why is a min d of 0.4 required to consider the result ‘relevant’ - I have found this confusing as mentioned in the Sampling plan. You mention that the result will only be relevant if the difference is at least 7 days or more of successful dieting, but in your power analysis this corresponds to a Cohen's d of 0.7 and yet in the text a Cohen’s d ≥ 0.4 is treated as relevant. 


Baseline reported consumption

If I understood this correctly, will you inspect the data, run the analyses and if for H1 you get a Cohen’s d greater than 0.4 you will exclude participants and report the results with the reduced sample size? For other data exclusions that do not require statistical analyses I assume that recruitment will continue until the sample size target is met, but for this exclusion criterion please add more details regarding the sampling plan - that is, how your sample size may be affected given that the target is based on an a priori power analysis. 


Pre-post explicit liking reduction

Please add a justification for this in a narrative format - e.g. for H2 you only want to run the correlations if a devaluation effect is observed (defined by your chosen threshold) - this should be clear in your hypotheses as well. It may be good to present results without all the effect-related exclusions in the supplementary material as well for comparison.

Download the review

Reviewed by ORCID_LOGO, 15 Aug 2023

Review for “Sugary drinks devaluation with executive control training helps to resist to their consumption”

In the Stage 1 report “Sugary drinks devaluation with executive control training helps to resist to their consumption”, the authors lay out a randomized controlled trial that tests an app-based cognitive control training against an active, “sham-training” control group in its effectiveness to reduce evaluations and intake of sugar-sweetened beverages. Further, the authors aim to investigate the relation between devaluation and consumption effects as well as a dose-response effect between the amount of conducted training and consumption. The planned study is generally well-designed and tackles worthwhile questions in the field. I have some thoughts and suggestions that I hope would improve the study and its possible interpretations. Please note that this is the first time I am reviewing a Stage 1 report (which I’m quite excited about!), so please bear with me for any possible “breach of protocol”.

I was somewhat unclear about one of the proposed intervention tasks. The description of the ABM task in the introduction did, in my view, not fit an attentional bias modification. Looking at the video of the task in the OSF folder, the ABM task looks more like a mix between Approach-Avoidance Task,  GNG, and Cue approach task, which might actually work quite well, see van Alebeek, Veling, and Blechert (2023) (https://www.sciencedirect.com/science/article/pii/S0950329323000150). In past papers, the same authors have also termed this "Cue approach training". Why do the authors frame this as an attention modification paradigm here? I fail to see how it modifies attentional biases as it requires attention allocation to both approach and withhold items.

The authors include an active “sham-training” control condition (which I think is great) to control for confounding factors of expectations and cue exposure. This is surely a good idea but in addition, the authors could also include a measure of expectations (for example as we have done in an ongoing study of our own group: https://bmjopen.bmj.com/content/13/4/e070443.abstract). If expectations play a crucial role (and there is evidence that suggests that they might), we should then see an effect of expectations across both groups in the study. I see that this is a tangent to the current study’s goals but might still be interesting to consider.

I must admit that I never fully grasped what exactly the dependent variable for H1 is. Is it the number of days until the first consumption of trained drinks? Or the total number of trained drinks consumed during the follow-up period? I was thinking that the best outcome measure would be the amount of drink consumed: arguably, this is easier to report than foods because of fixed common serving sizes. Such a consumption measure would be much more sensitive to change than cruder measures.

Regarding H2: will this relation be calculated across groups? One might argue that we would only expect this effect in the intervention group because changes in the control group should be rather random. If you only use data from the intervention group, the sample size calculation does not add up because you would need 50 participants in the intervention group alone.

Regarding H3: to really speak of a dose-response effect, one would need to assign dosage experimentally, as done in Moore, White, Finalyson, and King (2022) (https://www.sciencedirect.com/science/article/pii/S0195666322002720?casa_token=N-C479gEscQAAAAA:wAz8DeDD_lzReFYBdfACTmcNnfIXdlc_7q_x5PunfGXmcB-j7slBgnh0QfF40afP0RTKdUuCgQ). With the current design, we would probably mainly see an effect of motivation on both training use and consumption. We discussed this in some detail in our paper after having been told off by a reviewer for using the term “dose-response relationship” (see Aulbach et al., 2021; https://www.sciencedirect.com/science/article/pii/S0195666321002221). Maybe this could be taken into account by adding a measure of motivation and controlling for it in the analysis but it still does not solve the problem of self-selection. Relatedly: will the analysis for H3 be conducted across groups? If yes, adding group as a moderator in a regression could show whether this “descriptive dose-response effect” differs between groups. That (especially when combined with expectancies) would be very interesting: if more training produces larger effects but only in the intervention group while expectations are similar between groups that would be quite strong evidence for the intervention.
Another aspect that limits this analysis is that the distribution is truncated at 7 and 14: there will be no participant with less than 7 or more than 14 sessions which severely limits interpretability of any found (or non-significant) effect.

Specific methods points:

-        What is the rationale for the age range of included participants?

-        20 minutes of intervention per day seems like a lot. What kind of dropout rates do you expect, especially given that you want to exclude anyone with less than seven sessions?

-        Will excluded participants be replaced or will the analysis be conducted with what you have after exclusions? If it is the latter: how will that affect statistical power at different exclusion rates?

-        Participants receive 8 sugary drinks for reduction. That seems like a lot of different drinks! I would consider lowering this number, as otherwise drinks will need to be included that are not drunk very often in the first place.

-        Relatedly, the consumption frequency question has a very subjective scale. Why not ask e.g. “On how many days during the last month have you drank this?” Otherwise, participants will scale the responses to their own standards: For person A, "often" might mean once a week, for person B it might mean several portions a day. 

-        I’m not sure if the liking measure is ideal to measure what is of key interest. With the current wording, it seems very context-dependent: I might like cola on a hot afternoon but imagining drinking it in the morning, I would not like it. From the introduction I read that the main interest is relatively stable preferences. If you measure a very variable momentary preference, you will get a lot of noise in your measure and might not measure so much what you care about.

In the analysis plan section, there is mention of “deltas” – I assume those are pre-post changes? Please clarify.

Regarding the compensatory strategies: I would find it interesting to see if the intervention group engaged more in those because it would suggest that the training works for the specific trained foods but does not generalize to similar items.

The authors write that a pre-post reduction of explicit liking is necessary for investigating H2. I do not think this is necessary: you would just correlate one distribution with another (liking – consumption) and it should be irrelevant where on the scale this correlation occurs/where these distributions are on the scale.  

In summary, I think that this study will produce very interesting evidence on the effects of mobile cognitive bias modification. I hope that the authors find my comments and suggestions helpful to further improve the study. I’m aware I referred to some of my own and my colleagues’ papers which is somewhat frowned upon in peer review. By no means do I insist they be cited in any publications, I only included those because I know those best and found them helpful pointers for the issues at hand.

Thank you for giving me the opportunity for this stage 1 review and possibly contributing to making this study even better.

Reviewed by ORCID_LOGO, 14 Jul 2023

In this registered report, the authors aim to investigate the impact of a combination of Go/No Go training and approach bias training on participants' consumption of their favorite sugary drinks. The study addresses an important and relevant topic, as the exploration of online training methods to improve unhealthy consumption patterns holds practical and theoretical significance. The authors are commendable for their commitment to good scientific practices by conducting this study as a registered report.

However, while reading the paper, I identified several areas where the authors could enhance the quality of their work. Primarily, the introduction falls short of the expected standards. It contains inaccuracies and lacks clarity regarding the constructs under investigation. It is crucial for the authors to be more precise when describing the behavior of interest, clearly distinguishing between observed behavior and the explanation of behavior in terms of mental processes. For instance, in the first sentence of the abstract there are already several inaccuracies: “Food executive control training has been shown to reduce the perceived value of palatable food items”. The authors mention a reduction in the perceived value of palatable food items, but since the participants' perception is not directly probed, it would be more accurate to refer to it as self-reported value. Furthermore, instead of using the term "executive control training," which implies training of the mental construct 'cognitive control,' a construct that cannot be directly observed, it would be better to consistently refer to the specific tasks employed, such as the go/no go training task and the attention bias modification task (and please make a note there as well that this task does not directly modify attentional bias, it merely targets this bias). I recommend that the authors critically review all the constructs discussed in the paper, ensuring accurate definitions and clear differentiation between behavioral effects and mental constructs.

Similarly, it is essential to avoid making inaccurate claims. For example in the second paragraph of the introduction, the authors state that conventional reflective approaches to reduce overconsumption behaviors usually fail because they target conscious processes, while (palatable) food consumption is largely driven by environmental cues. However, this represents only one explanation of findings and it should not be presented as if it is a truism. Additionally, in the next sentence, the authors claim that recent evidence indicates automatic motivational processes driving unhealthy overconsumption can be modulated by executive control training (ECT), with ECT robustly reducing the perceived value of targeted cues in the eating domain. These claims are not accurate and the sentences would better be discussed in reversed order. First, highlight evidence that specific types of training can reduce self-reported value. Then indicate that this reduction has been explained as potentially targeting automatic motivational processes driving unhealthy consumption (while noting other explanations as well). It is crucial to avoid oversimplifying the literature. In this sense, it is also worth noting that training effects are often limited, and there is little evidence supporting real-life effects of cognitive bias modification, except perhaps in the context of alcohol approach bias modification for alcoholic patients. It is essential to address this omission and discuss relevant work in the field.

Moving on to more minor issues within the paper:

-          The explanation of the sample size rationale lacks clarity, as the authors consistently fail to state the effect of interest for the t-test as Cohen's d. Additionally, it is unclear why only an effect of 7 days would be of interest, considering that finding such an effect appears highly unlikely. Moreover, with 36 participants per group, the sample size seems relatively small, which raises concerns about the informativeness of the results. Further explanation is needed to address these concerns adequately.

-          It would be beneficial to report Bayesian factors for significant results in addition to other statistical measures.

-          The paper states that adherence to a restrictive diet constitutes a robust and valuable dependent variable to assess the real-world effect of food ECT, as it is not biased by memory or the relationship with the experimenter. It would be helpful to explain the basis for this claim. Why would there not be a bias by “memory”. Isn’t there always a bias by a broad cognitive construct like memory? Additionally, it is important to note that experimenter demand still presents a possible explanation of any effect that may be observed. Including a demand compliance question at the end could provide more information about the possibility of participants in the experimental group conforming to the hypothesis and falsely indicating that they did not consume the sugary drink.

-          More background information is required to understand the rationale behind combining Go/No Go (GNG) and Attentional Bias Modification (ABM) training. The authors should provide a clearer explanation and reference their previous studies that employed this combination. It is crucial to highlight the added value of this research compared to their prior work.

-          The statement, "This contrast will allow us to control for the confounding factors of cue exposure and of expectations developed by the participants on the effects of the intervention," raises questions about expectations as a confound. The authors should clarify why expectations would not drive the effects as in fact, I think expectation may always drive any behavior effect. Additionally, it is evident that the control group does not control for expectations. 100% and 50% contingencies are typically noticed very well by participants (and in fact this awareness seems crucial for effects) and lead to different inferences (including causal inferences or predictions). 

-          H3) should be rephrased. The current phrasing suggests that the more participants train, the larger the effect of the intervention will be on their dieting behavior. However, if an effect is found, it does not necessarily imply that the intervention caused it. An alternative explanation could be that participants who train more are more motivated to stop drinking sugary drinks and thus exhibit reduced consumption.

-          Including the compensatory strategy in an ANOVA to determine its role in explaining the observed effects would be beneficial and should be considered.

By addressing these concerns and incorporating the suggested improvements, the authors can significantly enhance the quality and clarity of their paper, making it a valuable contribution to the research community.