DOI or URL of the report: https://osf.io/dkxg7
Version of the report: 1
Please find (i) my response to the recommender comments and (ii) the manuscript file with all changes tracked separately attached below. Thank you!
All files (including a clean version of the manuscript with all changes confirmed) are available as one document via https://osf.io/7xft9 (version 1).
Dear Dr. Xiao,
Thank you for your submission to PCI-RR. I now had the opportunity to read your revised manuscript and the response to the reviewers. Let me first apologise for the considerable delay in my response. I understand that this may have caused changes to the timeline of your project, with the original date of the first sampling of games being January 18. There were unfortunate private and professional circumstances on my side that I failed to manage to provide a timely response. Again, my sincerest apologies.
I have decided not to send out again your revised manuscript for re-review. The points raised by the reviewers in round 1 were quite clear, and it felt like an undue use of their time to ask them again to check your response to them.
Overall, I am quite happy with your points, counterpoints, and changes to the manuscript. Referring to the major points in my previous letter:
1. I no longer see any issues with using Ukie and non-Ukie games, given their claims that you cite, but also their normative influence on the gaming industry in the UK. I also appreciate the changes you made to the title. Further, I also find it acceptable that you do not scale your sample against the number of players per games, and essentially give each game the same “weight” in your assessment. However, I expect that this importance point will receive extensive attention in the discussion of your findings, whatever they will be.
2. You offer a useful definition of what you consider a lootbox “event” in your study. This is now much clearer, and also how you will proceed playing and encountering lootboxes, if they exist.
3. To be frank, I still have some concerns with regards to your use of hypothesis tests and cutoffs. In your response letter, you argue that the cutoffs are arbitrary, but that you will use them regardless to prevent yourself from changing your interpretation the final %, as different people might interpret the same rate quite differently (industry person vs advocacy group). Further, you argue that the observed rate should not be generalised beyond the top 100 grossing games, and you argue that it is not a sample, but a population.
I can follow your reasoning for the first point, though I would have expected an elaboration of this point in the manuscript itself rather than merely the response letter. Certainly, one could argue that rules are rules, and that therefore any deviation from 100% is a failure of self-regulation. But then: I am sure that the same industry, or other industries, also do not perfectly self-regulate with regards to other issues, although one might generally say that, on average, the system works. It would probably be best to offer your line of reasoning to the reader.
Regarding the second point, I will admit I remain unconvinced. I can only imagine that it will be excruciatingly difficult for you to write the results section of your manuscript in a way that does not generalise beyond the two populations examined (the January and the July top 100 grossing games). Simply considering the title of the current manuscript, “Assessing compliance with UK loot box industry 1 self-regulation on the Apple 2 App Store”, readers might not get the impression this is merely a study on two specific lists of games at two arbitrary points in time. Similarly, the abstract says “[c]onclusions will be drawn as to whether the measures have been complied with by companies to an adequate degree”, which certainly suggests you seem to think your findings can be used to infer something about companies in this industry generally (which I’d agree with). Finally, although this may be more a linguistic habit, you do refer to these lists of games as “samples” throughout the manuscript (the word is mentioned 19 times, whereas “population” is not mentioned once).
I do not think this point is super important for the manuscript or the empirical work. Looking at the top 100 grossing games makes sense, as these are certainly the games where compliance with the regulation would be more important. However, I do not see what is special about the top 100 grossing games that would justify excluding games not on this list from your interpretation. At the very least, you have not yet provided a compelling argument for this.
4. Finally, I appreciate that you have shifted from a programmatic RR to a standard one.
With kind regards
Malte Elson
DOI or URL of the report: https://osf.io/3en2x
Version of the report: 1
Please find (i) my response to the recommender and reviewer comments and (ii) the manuscript file with all changes tracked separately attached below. Thank you!
All files (including a clean version of the manuscript with all changes confirmed) are available as one document via https://osf.io/dkxg7 (version 1).
Dear Dr. Xiao,
Thank you for your submission to PCI-RR. I now had the opportunity to read the paper in-depth, and the excellent reviews provided to evaluate the merit of your research proposal.
All three reviewers – Dr. Przybylski, Dr. Chambers, and Dr. Gunschera – mention they found your proposed study timely, and the research question worthy of an in-depth investigation as suggested. I too, share this view: Lootbox regulation, and industry compliance with it, are a topic of increasing attention within the gaming community and the public sphere. As such, a study as the one proposed could easily become a material piece of evidence in the evaluation of policy effectiveness, and perhaps even affect compliance with regulation itself.
However, all three reviewers raised important concerns with the proposed design of the study. I fully concur with them, and will add a few of my own observations below. Some of these points you might disagree with, and I invite you to provide counterarguments in a response letter. Others might be addressed by providing more details and improving clarify of the manuscript. And yet others, I believe, will require changes to your study protocol. The reviewers have offered guidance how the study design and the writing in the manuscript might be improved – please consider these points as you prepare a revision of your research protocol.
STUDY SAMPLE AND GENERALISABILITY
Dr. Przybylski has remarked on the choice to include games not represented by Ukie, and that this weakens the severity of your test. I agree with this point: If this study is designed to test compliance with self-regulation principles by an industry trade body, then it does not seem ideal to include studies that do not fall under this self-regulation, and whose developers are not represented by Ukie. Whether there is a difference in regulation compliance between Ukie and non-Ukie games may itself be an interesting empirical question. I will leave it up to you to decide whether to pursue this or not, but if you do, then you need to account for this in your sample size and sampling strategy somehow. For example, if only 10% of the top 100 games are actually represented by Ukie (or vice versa), a serious empirical estimate of the difference would probably not be within reach. If resources are an issue, as you state, then it may be advisable to only include those games that are represented by Ukie, at the price of narrowing generalisability of your findings.
On this point, I also agree with the reviewer that the focus on the UK market should be represented in the title and conclusions of the paper. Going further, I believe it would also be appropriate to highlight the focus on mobile games, as the sample is restricted to games in the Apple store.
WHAT IS A LOOTBOX?
Dr. Chambers and Dr. Gunschera both raised aspects that regard the definition of lootboxes in your study. Whereas Dr. Chambers asks whether one hour is enough to “encounter” a lootbox, Dr. Gunschera raises concerns regarding the focus on lootboxes that can be bought with real currency rather than in-game currency. Both of these points are important, and I believe they concern a mutual point: What is a lootbox, empirically, in your study? Surely it is not the virtual representation as a box, nor can it be any in-game purchase, nor any chance-based event. As such, I invite you to provide further details how you define and identify lootboxes in games, and by which means: You mention each game will be played for an hour. Does that mean “typical” game actions will be performed (as if you were a regular player), or will you just have the app open for this time? I am asking because it is conceivable that certain in-game actions are linked to lootbox drops. Overall, the manuscript lacks procedural and methodological details that the readers of the paper would surely appreciate.
GAMES VS GAMERS
There is another important point by Dr. Przsybylski regarding the sampling framework as it affects the conclusions from your observations: Are you studying games or gamers? That is, if only games that with a small following are noncompliant, then surely we would have to conclude that the problem is smaller than if the top games (by number of “encounters” with lootboxes) were noncompliant. I think this is a conceptual problem that deserves further attention, and that may not be easily “fixed” given that even obtaining reliable numbers on the games’ market share might be difficult to obtain.
CUTOFFS
Dr. Przybylski and Dr. Gunschera have both remarked on the somewhat arbitrary choice of cutoffs to determine the compliance level. I, too, was confused where they came from, and to be honest I was wondering about the utility of defining cutoffs for the purpose of making a dichotomous decision in a hypothesis framework when just knowing about the empirical rate itself is of great interest (though I am happy to be convinced otherwise, maybe this just needs some justification). Further exacerbating, the point estimates you propose using will suffer from substantial uncertainty. For example, an incidence rate of 95 in a sample of 100 games has a 95% confidence interval of 76.861 to 116.133, the lower bound being below your cutoff for “inadequate compliance”. Of course, I understand this is not a random sample of an unknown population of games: the top 100 are the top 100. Then again, I am sure you would prefer generalising your findings to games not included in the sample.
PROGRAMMATIC RR
Dr. Chambers raises a concern regarding your proposal to register this study as a programmatic RR. To be honest, I overlooked this point until I read his review, but I tentatively agree that I currently do not see the value or necessity to have two separate publications rather than one comprehensive paper that encompasses all research questions and data. Of course, I cannot stop you from writing two papers rather than one, but if you do insist on submitting this as a programmatic RR rather than a single RR, please consider the guidance offered by the reviewer, and highlight the different contributions of each paper, and why it is important or sensible to treat these differently.
With kind regards
Malte Elson
Question 1A. The scientific validity of the research question(s)
Reply 1A. The question of whether companies comply with statutory or suggested regulatory initiative is an interesting one to me. I approach reading this believing that there is very low compliance, the report suggests I should expect 1 in 3 games might comply if the UK is like the US. I am not quite sure that they research questions that are research questions in the classic academic sense. It is some form of policy or programme evaluation to my reading. I will defer to the editor on this point but note that the UK focus should be consistent from title to interpretation.
Question 1B. The logic, rationale, and plausibility of the proposed hypotheses (where a submission proposes hypotheses)
Reply 1A. Given the UK-specific focus that justifies the research questions (and by extension the hypotheses) I am concerned by the framing of the research questions and how they’re translated into testable hypotheses. If this is indeed a study of industry practices in the UK and premised on principles articulated by UKIE, shouldn’t these hypotheses be focused on paid loot boxes in games that are represented by UKIE?
I do not believe it is a fair test of the principles if they don’t only apply to companies who are represented by UKIE. Like social media, and online safety conversation more generally, this is a thorny problem. How might we regulate global tech industries (e.g. porn, social media, games) when these firms and the decisions they take are determined in Beijing or Palo Alto? I think the VGRF and these principles are very good ideas but I don’t think it’s a fair test of their local effectiveness to examine top grossing games in the UK if they’re creators are based in the USA (ESA), or EU (VGE), or other jurisdictions. I believe the UK has many smaller developers, but I am not sure if they’re represented in the top 100 or more likely to be on mobile or console/pc platforms. Is this the case?
Similarly, I’m not sure that 100 top grossing makes sense given that I doubt these are equally profitable or popular games in the UK. For example, it might be the case that the top 4 or 5 games accounts for 80% of the play volume and spending. And the remaining 95% of the top 100 are just 20% of the market. If these 5 games were 100% compliant with the principles would you count this as 80% compliance or 5%? I think this materially effects all of the research question including the incidence/prevalence of probability disclosures.
Finally, without knowing the base rate of “ask to buy” I find it difficult to assess how well-justified disregarding this feature is. I know I use this feature with our under18s and I would not allow our children to use the app store at all without it. I think that this would introduce an unknown source of error or uncertainty in any of the point estimates which would be reported in the work.
Question 1C. The soundness and feasibility of the methodology and analysis pipeline (including statistical power analysis or alternative sampling plans where applicable)
I think this is a feasible way to approach it if the above base rate and geography issues are tackled. I am not really sure where the 95, 80, and below 80 levels come from though. Reading earlier in the report, I might expect the rate to be 35%. I could envision this being the standard and movement starting from this level (to I would hope something much higher) being the standard. My sense is that the author is interested in improvement, so how much improvement would be needed to know if progress is being made in the UK?
The author might also consider starting with a prior belief there is a 50/50 chance that a UK game creator is getting things right is the correct starting point and seeing if this is true at the start of the data collection and if this has improved at the 6 month mark.
I do not understand how (or who) at DCMS or UKIE would preregister their hypotheses (lines 473 and 474) or what the value of this would be. I don’t think most video game researchers would be able to do this.
Question 1D. Whether the clarity and degree of methodological detail is sufficient to closely replicate the proposed study procedures and analysis pipeline and to prevent undisclosed flexibility in the procedures and analyses.
I do not believe so. I think a detailed protocol with its own figure would be helpful.
Question 1E. Whether the authors have considered sufficient outcome-neutral conditions (e.g. absence of floor or ceiling effects; positive controls; other quality checks) for ensuring that the obtained results are able to test the stated hypotheses or answer the stated research question(s).
I do not think so, but I am not sure that is a problem. As the study is framed currently, I do not see a situation where the hypotheses won’t be confirmed.
The manuscript at hand addresses an important issue, the compliance of the mobile game industry with UK self-regulation loot box measures. This work is timely and will make a great contribution to literature and policy concerning gaming consumer protection. That being said, I have found that the manuscript may be improved in the following areas.
1) The scope of the present manuscript concerns loot boxes purchased with real currencies as opposed to in-game obtained currencies. Although this distinction is common in the literature, I believe it warrants elaboration and think the proposed work would benefit from recording data on all possible avenues of purchasing loot boxes (i.e., whether players have the option to purchase the loot box with in-game currencies in addition to real currencies). I believe this is informative due to the fact that the gambling-like characteristics of loot boxes persist irrespective of the currency used to obtain them. The value of any currency, whether real or virtual, is learned. Therefore, beyond the concerns for parents’ wallets, the psychological effects of loot box purchasing may span across the currencies used to purchase them.
Furthermore, the psychological effects of loot boxes may even be strengthened for purchases with in-game obtained currencies, as opposed to money. Players who have invested many hours into obtaining the said in-game currency may perceive this to be a much larger investment than money, especially when the money comes from their parent’s wallet. While I understand that the distinction between real-world and in-game currencies is common, I believe it would be worthwhile collecting information on the currencies that can be used to obtain loot boxes (money, in-game, both) for each of the 100 mobile games (ll. 87-93, 364-368).
2) Despite resource constraints and stakeholders’ heightened interest in the highest-grossing mobile games, the sample size rationale is insufficient. A power analysis/simulation would help determine which effects the study would be sensitive to, especially in consideration of the fact that precise decision cut-offs are given for all hypotheses (ll. 245-254).
3) For Hypothesis 4 the decision criterion is different to the preceding hypotheses. Please add a brief explanation for this change (ll. 227-230).
4) Overall, the manuscript would benefit from some type-editing. This includes breaking up long and convoluted sentences; using accessible language as opposed to unnecessarily complex words; and using precise and objective wording. Some examples below:
ll. 152-155 Convoluted sentence structure
ll. 171-174 Complicated wording
ll. 196-198 Subjective/moral wording
Download the review