Does relaying ‘house edge’ information influence gambler’s perceived chances of winning and their factual understanding of the statistical outcomes?

ORCID_LOGO based on reviews by Zhang Chen, Graeme Knibb and Luke Clarke
A recommendation of:

How does the phrasing of house edge information affect gamblers’ perceptions and level of understanding? A Registered Report


Submission: posted 18 July 2022
Recommendation: posted 28 November 2022, validated 28 November 2022
Cite this recommendation as:
Pennington, C. (2022) Does relaying ‘house edge’ information influence gambler’s perceived chances of winning and their factual understanding of the statistical outcomes?. Peer Community in Registered Reports, .


Many products that can impact upon health and wellbeing (e.g. alcohol, food) relay information to consumers about the potential risks. However, such information is commonly provided in suboptimal format for gambling-related products. To encourage safer gambling, research has therefore recommended that information about the average loss from a gambling product (“house edge”) or percentage payout (“return-to-player”) should be communicated, with the former translating to better perceived understanding by gamblers. In this study, Newall et al. (2022) aim to experimentally compare two phrasings of the house edge against a control return-to-player to arrive at the most effective phrasing to aid gambler’s perceived chances of winning and their factual understanding of the statistical outcomes of their bet. Using a hypothetical gambling scenario, a sample of 3,000 UK-based online gamblers will be randomly assigned to receive two alternative phrasings of the house edge or the equivalent return-to-player information. Two outcome measures will be used to judge the effectiveness of the house edge information: gamblers’ perceived changes of winning and rates of accurate responding on a multiple-choice question measuring factual understanding of this information. This study will therefore assess the most effective communication of gambling risk, which can inform public health policies to reduce gambling-related harm.
Following a positive initial appraisal, and after two rounds of in-depth review, the recommender judged that the manuscript met the Stage 1 criteria and awarded in-principle acceptance (IPA).
URL to the preregistered Stage 1 protocol:
Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA. 
List of eligible PCI RR-friendly journals:
1. Newall, P. W. S., James, R. J. E. & Maynard, O. M. (2022). How does the phrasing of house edge information affect gamblers’ perceptions and level of understanding? A Registered Report, in principle acceptance of Version 3 by Peer Community in Registered Reports.
Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

Evaluation round #2

DOI or URL of the report:

Version of the report: r1 house edge comparison.pdf

Author's Reply, 24 Nov 2022

Decision by ORCID_LOGO, posted 15 Nov 2022, validated 15 Nov 2022

Dear Philip Newall and co-authors,

I have now received the review of your revised PCI-RR submission from one reviewer who also reviewed your initial submission. Based on their and my own evaluations (below), I am recommending a revision of this paper. Satisfactory responses to the reviewer (and editor) comments is likely to result in a Recommendation.

Reviewer 1 makes two particularly good points which I also endorse. Regarding the reproducibility of the power analyses for H1 and H2, I suggest that you upload supporting information to the OSF showing how you arrived at these numbers -- having ran into stumbling blocks myself with this, I now upload these to remind myself exactly what I did. Regarding the equivalence tests, Reviewer 1 is correct that these are informative (and should be conducted) for both significant and non-significant effects (to discern a significant result from a practically meaningful one).

Here is my own review of the manuscript; please address these comments along with those of Reviewer 1s.

Editor Comments/Review

1.       The Abstract is rather confusing because it first reads as though you will test two conditions of the house edge format with 2000 participants, but then later (after explaining the DVs), states that you will also compare this to an existing suboptimal format with 1000 gamblers. It would be clearer to state the full design within the Abstract: specifically, the three conditions, the total sample size recruited (3000?) and then the DVs.

2.       Introduction, Page 4, Line 5: I suggest removing the term “Similarly” to improve this sentence (“Gambling is another public health issue…”)

3.       Page 4, Line 9-11: You state that “However, by comparison, gambling information can be criticized on the grounds of a lack of prominence, and suboptimalities with which it is communicated (Newall, Walasek, et al., 2022). This Registered Report contributes to this second issue, by experimentally testing two different phrasings of relevant information about gambling products”, but it’s not clear that you are doing this in order to get over this suboptimality. Could you end this sentence by clarifying why you are doing this/why it’s important?

4.       The Introduction provides some examples of the return-to-player and the house edge communication, but these odds seem very high in favour of the gambler (“This game has an average percentage payout of 90%”). Are these factual examples? Are the odds sometimes this high in gambling?

5.       Page 5, Line 6 states “However,  seeing as how replication is an important aspect of gambling psychology research (Heirene, 2021), a secondary aim of the present research is to attempt to replicate findings on rates of understanding and perceived chances of winning from the original studies on this topic (Newall, Walasek, & Ludvig, 2020a, 2020b)”, but this is quite confusing because you haven’t yet outlined the main aim specifically.

6.       As per my comment about the Abstract, Page 6 outlines the present research but does not make it clear that there are actually three conditions, with the two phrasings of ‘house edge’ also being compared to a ‘return-to-player’ condition. Can you make this clearer? The reader needs to be able to fully understand the design at this point.

7.       Please also give the manuscript a thorough proofread – there are some instances of missing words (e.g., Page 7, ‘the’ in the sentence ‘as this is the closest round number which exceeds the required sample size\e in each of the below power analyses”.

8.       Please note that at Stage 2, the approved Stage 1 text cannot be revised (unless there is a significant error or you are changing tense). For this reason, ensure that there is no information that you’d wish to remove or change in some way at Stage 2: e.g., the sentence that states “For the reviewers, a link to the experiment is here:”

9.       What is the rationale for a change in H1’s outcome from 4.1 to 3.8/4.4 (representing an effect size of 0.188) and change in accuracy of 6% (representing an effect size of d = 0.133. Put simply, why an effect of 0.188/0.133 and not any other (small) effect? You should provide a rationale for your smallest effect size of interest (SESOI).

With best wishes,


Reviewed by , 14 Nov 2022

Thank you for the opportunity to review this revised version of the Stage 1 RR on house edge information in gambling. Most of my previous comments have been addressed satisfactorily. I have some remaining minor comments/questions, which are listed below.

More details on the power analyses will be useful, especially the planned analysis for H1 and H2. For H1, when I used an independent t test as the planned analysis (G*Power version, d = 0.1875, alpha err prob = 0.05 and Power = 0.95, I was able to reproduce the reported results - 741 participants per condition. However, for H2, I was not able to get the same number, probably because I was not doing this correctly (I used the logistic regression in the family of z tests, and there seemed to be many extra parameters that could be set).

The authors proposed to use equivalence tests only when H1 or H2 is not statistically significant. I wonder if it is possible for both the null hypothesis significance test and the equivalence test to be significant - which may suggest that the effect is statistically significant (i.e. differs from zero), but not large enough to be practically significant based on the SESOI. Equivalence tests may therefore be informative regardless of whether the initial NHST results are significant or not. Related, in the tutorial on equivalence tests by Lakens and colleagues (, they also mentioned minimal effects test, which are complementary to equivalence tests and entail comparing an effect to the SESOI (d = 0.133 in this case) rather than zero. This may also be an interesting analysis to run. To be clear, I am not suggesting that the authors should include these extra analyses in the RR at this stage - just some exploratory analyses that they may consider after data collection.

I wish the authors good luck with the project and I look forward to seeing the results.

Evaluation round #1

DOI or URL of the report:

Author's Reply, 01 Nov 2022

Decision by ORCID_LOGO, posted 18 Aug 2022

Dear Philip Newall and Olivia Maynard,

Thank you for submitting your Stage 1 Registered Report “How does the phrasing of house edge information affect gamblers’ perceptions and level of understanding? A registered report” for consideration by PCI Registered Reports.

I have now received comments from three expert reviewers in this field. As you will see, these reviews are overall positive and based on these reviews and my own assessment, I would like you to revise your manuscript accordingly. You will see that the majority of the reviewer comments are minor, but I would like you to pay attention to the following, which will be a particular focus on re-review. These two criteria are essential for Recommendation at PCI Registered Reports. 

Review criteria 1C. The soundness and feasibility of the methodology and analysis pipeline (including statistical power analysis or alternative sampling plans where applicable).

1.    Please ensure that the power analysis is consistent with the analysis being used: as one Reviewer points out, it seems as though the power analysis is based on M/SD and an effect size of Cohen’s d, but the main planned analysis is a regression which is inconsistent with this. [but also, see Point 3 below which needs to be considered in addressing this]. Importantly, your plan is to compare two groups based on the phrasing of the message, but the justification of your power analysis is not currently based on this and is instead based on an average point on a Likert scale. Perhaps I am misunderstanding something here, which could simply be clarified, but again the power analysis should be based on the specific analysis you plan to conduct. 

2.    You state that the sample size would be able to “detect even relatively small effects”, which is ambiguous – what specific small effect could be detected? Have you considered a Smallest Effect Size of Interest (SESOI) and powered your study according to this?

3.    As per the Reviewer comment, the study is not sufficiently powered for the ‘understanding’ variable. An explicit reason should be given for this, which can also include resource limitations (i.e., funding/time), but must be apparent.

4.    You need to ensure that your data will still be informative if you arrive at a null result (e.g., p > .05) rather than stating that it is not “statistically significant/different”. This requires either Bayesian analyses or equivalence tests. Please see our Stage 1 criteria for guidance on this, which includes supporting references that will be of use. Note that this will have implications for your power analysis too: a power analysis should be based on the analysis tests you will conduct.

Review criteria 1E. Whether the authors have considered sufficient outcome-neutral conditions (e.g. absence of floor or ceiling effects; positive controls; other quality checks) for ensuring that the obtained results are able to test the stated hypotheses or answer the stated research question(s).

1. You will see from the Reviewers that there are different perceptions of including an attention/manipulation check in this study. On one hand, the study is short which allows the Authors to speculate that participants will be attentive. On the other hand, we know that data quality from online crowdsourcing platforms can be impacted by careless responders (see Jones et al., 2022). This should be carefully considered. One reviewer suggests adding an attention check at the end of the experiment which asks participants to select the statement they have just seen and to analyse the data with those that fail this included and excluded. I agree with this, but also highlight that you will need to consider such exclusions in your sample size planning: in Jones et al. (2022), we estimated that around ~12% of participants were careless responders through failed attention checks or implausible response times, which can give you a base estimate for your own study.


Jones et al. (2022). Careless responding in crowdsourced alcohol research: A systematic review and meta-analysis of practices and prevalence. Experimental & Clinical Psychopharmacology, 30, 381-399. Link: OA:  

2. The third Reviewer also makes an important point regarding the replication of the original framing difference. Whilst acknowledging that this will increase the sample size considerably, a justification of the exclusion of this control condition is required. The Authors may want to list this as a limitation in their Discussion section at Stage 1, showing that they considered this now and not later in the review process.

3. I agree with the Reviewer that you should run some simulated data to check your analysis pipeline and to document the analysis syntax. This avoids problems later down the line at Stage 2 if a planned analysis doesn’t seem appropriate and/or reduces any errors at the planning stage.


Other minor points:

- I am unsure whether you’ve considered the journal you’d like to submit this too given a positive Stage 2 acceptance but wanted to flag that your manuscript is not consistent with APA 7th edition style. For example, anything over 2 authors can now be stated as ‘et al.’ if you are planning to use this formatting style.

- Under design, the instructions to participants state: ““Imagine that you are a member of an online casino. You have played many of this online casino’s games over the last year.” Shouldn’t this be ‘these’ (online casino games)?

-  Please refer to Table 1, the design table, within the manuscript itself and also provide a title for this. Previous Stage 1 accepted Registered Reports can help with guide you with this.

- This links back to what I state under Review criteria 1C, point 1, but in the design table I do not understand the following sentence: “In order to detect a reduction on this outcome from 4.1 (see main Methods) to 3.8 (SD = 1.6), with 95% power and an alpha of 0.05 requires 741 participants in each condition”. This reads as though this is a within participant design where the phrasing of the sentence reduces a Likert point average from 4.1 to 3.8, but this is not the case given your design is between-participants with no baseline measures. Can you clarify this throughout the manuscript, please? It may be that you change your power analysis given the points above, which would mitigate this.

- A minor note is that the term ‘Registered Report’ is usually written in capitals and there are numerous times that you use this term (in lowercase) in your manuscript.

Please note, you will need to download some of the reviewer’s comments from the PCI-RR portal; others will be shown without the need to download a file.

I look forward to receiving your revision in due course. 

Yours sincerely,

Dr Charlotte Pennington

Reviewed by , 29 Jul 2022

This study  aims to compare two different 'house-edge' messages on gamblers understanding of loss. 

Overall, I am enthusiastic about this report. Throughout, the writing and methodology was clear and precise. The design and methodology have been well thought out and are well considered. However. there are some aspects that I think require further clarity. This includes further information regarding the chosen phrasing and the power analysis (please see below). 

1A. The scientific validity of the research question(s). 

I have no concerns regarding the scientific validity of the research question. The study aims to assess the phrasing of ‘house-edge’ information. This is a valid question and I have no doubt that the proposed method will address this question effectively. 

1B. The logic, rationale, and plausibility of the proposed hypotheses, as applicable. 

The hypothesis seems reasonable and logical. No specific direction is predicted, which is fine given that this is the first study assessing wording variants. However, some further information regarding this phrase could be included. For example, why was this particular phrasing chosen? If this was based on psychological theory, then this could be outlined. Or perhaps this phrasing was based on previous research in other domains which have assessed the effect of such phrases? Or the work of the Victorian Responsible Gambling Foundation? 

1C. The soundness and feasibility of the methodology and analysis pipeline (including statistical power analysis or alternative sampling plans where applicable). 

The methodology and analysis plan are generally clear and appropriate. There are some aspects I would like clarification on (even just to satisfy my own curiosity). The use of an ordered logistic regression is interesting- what is the reason for using this approach over a simple t-test (or non-parametric equivalent)?

I think there could be more clarification regarding the power calculation. Firstly, the power calculation is based around a reduction of a response on a 7-point Likert scale from 4.1(SD= 1.6) to 3.8 (SD= 1.6), why is this? Was this reduction based on any previous research or deemed to be meaningful in some way? This power analysis is said to require 741 participants, so why are significantly more participants being recruited? 

Further information from other sections could also be included within the power analysis discussion. For example, the fact that the study is not powered for the ‘understanding’ variable (which I think is understandable) could be included here rather than within the measures section. A clear statement regarding why the study was not powered for this dependent variable could be included (the same information presented in the table at the end of the manuscript perhaps). 

Finally, the authors propose some exploratory analyses. These are fine and will be interesting to assess. Can the authors clearly state within the registered report that these analyses will be labelled as exploratory in any future publication? 

 1D. Whether the clarity and degree of methodological detail is sufficient to closely replicate the proposed study procedures and analysis pipeline and to prevent undisclosed flexibility in the procedures and analyses. 

There is sufficient detail to replicate the study. Although, as highlighted above, it wuld be beneficial to state that exploratory analyses will be clearly stated as such in any future publication. 

1E. Whether the authors have considered sufficient outcome-neutral conditions (e.g. absence of floor or ceiling effects; positive controls; other quality checks) for ensuring that the obtained results are able to test the stated hypotheses or answer the stated research question(s). 

The authors have considered these aspects well. There is discussion of potential ceiling effects regarding the ‘understanding’ outcome and this is being addressed. They have considered the use of attention checks, and I agree with them that this is not necessary. They provided an example experiment for review. The study was short and to the point which should mitigate issues regarding online recruitment. 

Finally, I want to commend the authors on what is a well-considered and produced registered report. This work is strong and, in my opinion, only requires minor clarifications. Thank you for the opportunity to review this piece of work. 


Best wishes,


Graeme Knibb


Reviewed by , 01 Aug 2022

Reviewed by , 15 Aug 2022

I appreciate the opportunity to review this Stage 1 RR and I co-reviewed this paper with one of my graduate students, who found it a useful training exercise.

The authors present a study to compare the effects of two different ‘house edge’ labels on online gamblers’ responses to a hypothetical gambling scenario. The study builds on recent work from Newall that found a single, specific ‘house edge’ label (“this game keeps 10% of all money bet on average”) was associated with superior performance relative to a ‘return to player’ label (“returns 90% of your money”). A natural question, addressed in this study, is whether the effect generalizes across different house edge phrasings. Indeed the alternative format presented here (“costs you 10%”) could lead to even better outcomes. This is an important research question that will inform gambling policy. The design, sampling, and hypotheses are clearly specified.

One difficult decision that the authors must have faced is whether to include the original ‘return to player’ label as a third condition. They elect not to do that, and recruit only two groups (original house edge, alternative house edge). I can see that the third cell would add another ~1000 participants to the study, with cost implications, and that the Prolific platform may not have the capacity to recruit so many experienced online gamblers. At the same time, the third ‘return to player’ label is currently the industry standard that the authors are looking to challenge. In the eventuality that they see no difference between the two house edge labels, I feel there would be much value gained from the basic replication of the original framing difference (i.e. 2=3>1). (Conversely, if the original label performs better, is the alternative label still sufficient to generate the framing difference?). I would be interested to hear the authors’ justification for the exclusion of the third group.

Other points

-          The authors explicitly state that they will not apply any data cleaning e.g. attention checks to the Prolific data. This is a bold decision. I agree with their logic that the insertion of additional ‘attention check’ may do more harm than good (although my perspective is that the academic wind is blowing in the opposite direction). Applying data cleaning for fast completion times is, in my view, a rather different point; could there be bots in the data, or some participants who do not read the materials at all, and submit the entire survey in ~30 seconds? (Data quality on Prolific is higher than MTurk, I agree, but I’m not convinced it is “high” – pg 7). My own recommendation would be to run a sensitivity analysis using a pre-registered threshold for completion time. (I recognize any such threshold is arbitrary)

-          The authors propose to use the Prolific data balancing function to balance gender. The analysis plan does not mention any gender-based analyses. While under-powered, it would be useful to test whether any observed differences are separately robust in both men and women.

-          Statistical analysis. Given the possibility (high possibility, in my view!) that the two house edge labels will not differ, I note the analysis plan does not mention any Bayesian testing of support for the null.

-          The analysis plan on the 7 point rating proposes an ordered logistic regression. I have not encountered this technique before, and we have been discussing different approaches to analysing ordinal ratings in my lab meetings recently, so I will look into this further (note of thanks!). I raise it because the Newall 2020 Addiction paper applied a simple ANOVA to the ratings data, thus treating the rating as a fully continuous variable. If the authors have not used the ordered logistic technique elsewhere, a methods reference would be helpful.   

User comments

No user comments yet