Expanding the Intervention Potential of Tabletop Role-Playing Games

ORCID_LOGO based on reviews by Charlotte Pennington, Matúš Adamkovič and Matti Vuorre
A recommendation of:

Can playing Dungeons and Dragons be good for you? A registered exploratory pilot program using offline Tabletop Role-Playing Games (TTRPGs) to mitigate social anxiety and reduce problematic involvement in multiplayer online videogames

Submission: posted 06 February 2023
Recommendation: posted 30 March 2023, validated 14 April 2023
Cite this recommendation as:
Karhulahti, V. (2023) Expanding the Intervention Potential of Tabletop Role-Playing Games. Peer Community in Registered Reports, .


The human capacity and need for play has been recognized as a central psychotherapeutic component for a long time (e.g. Winnicott 1971). More recently, experts have started developing specialized digital gameplay to be used as therapeutic tools and even utilizing existing videogames for similar purposes (see Ceranoglu 2010). On the other hand, the concerns about some players becoming overinvolved in videogames also led the World Health Organization to include “gaming disorder” in the 11th edition of the International Classification of Diseases, which echoes the nuance required to address human-technology relationships in general.  
In the present registered report, Billieux et al. (2023) make use of analog structured role-play in a new intervention aiming to mitigate social anxiety and problematic gaming patterns in online gamers. The authors carry out an exploratory pilot to test a 10-week protocol over three modules inspired by the well-known Dungeons & Dragons franchise. Through multiple single-case design, the authors explore the feasibility of the intervention and its effectiveness on social skills, self-esteem, loneliness, assertiveness, and gaming disorder symptoms.
The Stage 1 manuscript was evaluated over two rounds by three experts with experimental specializations in psychopathology and gaming. Based on the comprehensive responses to the reviewers' feedback, the recommender judged that the manuscript met the Stage 1 criteria and therefore awarded in-principle acceptance (IPA).
URL to the preregistered Stage 1 protocol:

Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA.
List of eligible PCI RR-friendly journals
1. Billieux  J., Bloch J., Rochat L., Fournier L., Georgieva I., Eben C., Andersen M. M., King D. L., Simon O., Khazaal Y. & Lieberoth A. (2023). Can playing Dungeons and Dragons be good for you? A registered exploratory pilot program using offline Tabletop Role-Playing Games (TTRPGs) to mitigate social anxiety and reduce problematic involvement in multiplayer online videogames. In principle acceptance of Version 2 by Peer Community in Registered Reports.
2. Ceranoglu, T. (2010). Video Games in Psychotherapy. Review of General Psychology, 14 (2).
3. Winnicott, D. (1971/2009). Playing and Reality. Routledge.
Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

Reviewed by ORCID_LOGO, 19 Mar 2023

Reviewed by ORCID_LOGO, 20 Mar 2023

The authors have addressed all of the points raised in my initial review and have responded thoroughly to both myself and the other reviewers. I am now satisfied that this exploratory pilot report meets the Stage 1 IPA criteria and have no other comments I wish to make at this stage. I wish the authors the best of luck with their research.

Evaluation round #1

DOI or URL of the report:

Version of the report:

Author's Reply, 17 Mar 2023

Decision by ORCID_LOGO, posted 10 Feb 2023, validated 12 Feb 2023

Dear Authors,

Three reviewers have generously provided detailed rapid feedback, considering your hard deadline. They are all positive, but some critical things need to be carefully considered. The MS sits between an exploratory pilot and a confirmatory intervention: a key goal is to explore feasibility, but there are also hypotheses to be tested. As reviewers point out, hypothesis testing would require solid corroboration/falsification rules and clarity when success would be left undecided. A complete data analytic plan regarding how efficacy will be measured would be needed for assessing hypothesis testing. It also remains possible to register this as an exploratory pilot, in which case evaluation is more flexible (but you cannot make confirmatory claims at Stage 2). Although I personally see the exploratory option most feasible -- especially considering your time limit -- below is a list to help you revise if you wish to pursue hypothesis testing (skip this if you choose the exploratory path). 

1. There are discrepancies between the hypotheses on p. 8- and the expected outcomes on p. 22-. E.g., PO1 concerns gaming frequency, but this is not among the previously named hypotheses. It’s important to consistently justify each hypothesis; you may also set expectations without testing them (= no confirmatory claims at Stage 2), but they need to be clearly distinguished from tested hypotheses.
2. Justify the smallest effect of interest. Currently only the term “reduction” is used, but we need to be more specific. E.g., reduction of gaming by 1min/day would hardly be meaningful. Each effect/hypothesis used for confirming effectiveness needs a justification, respectively. See e.g., Anvari et al. (2022;
3. All outcomes are currently expected both at the end of the TTRPG-based program (P1A) and at the 3-month follow-up (P1B). We need to agree beforehand which of these, or what combination thereof, corroborate/falsify hypothesis. E.g., what if we see no reduction at P1A but reduction at P1B, would this corroborate hypotheses? 
4. Considering that some effects will not be meaningful, please specify when the result will be considered null, i.e. what are the results that will conclude the intervention had no meaningful effect or a non-meaningful effect.
5. Carefully consider how dropouts are assessed. E.g., what if you have 50% (10/20) dropouts and find meaningful effects in the remaining participants, would this be considered corroborating hypotheses? 
6. What about missing data, e.g., if a participant fails to deliver P1B data, will this be considered a dropout? What is the overall rule structure, considering all scenarios, for corroboration and falsification of hypotheses? 
7. A complete data analytic plan would be required for each to-be tested hypotheses. 

Because constructing a robust hypothesis testing design within the present time limitations may be challenging, you may also choose a simplified confirmatory design where only feasibility is tested. Following the main goal of the study (“to test the feasibility e.g., number of dropouts -- ability of the participants to complete regularly the online assessment”), you could formalize this into feasibility hypotheses:

1. Define what counts as dropout and justify success/failure by the number of dropouts, e.g., in relation to common dropouts in similar interventions. Consider the degree of flexibility, e.g., with confidence intervals. 
2. Define and quantify online assessments to be completed by participants and justify a sufficient completion rate that will qualify successful and unsuccessful intervention. 

The above would allow you to make confirming claims about the practical feasibility of the intervention at Stage 2 with relatively little revision. Note that you can (and should!) also report the current primary/secondary outcomes, but only as non-confirmatory, tentative results that will inform future efficacy testing of the design. 

In case you choose either of the two confirmatory designs, please add each hypothesis separately in the design table with justifications. Note that currently some of the explanations are not fully sufficient. E.g., regarding sample justification, you have stated it to be non-relevant, but there should be a justification for having n=20 and not e.g., n=1 or n=200. I see this is already touched on p. 11. See e.g., Lakens (2022; Also the rationale for confirming and disconfirming hypotheses still appears to be highly relevant for this design (if tested as confirmatory).

Note that if you choose not to test any hypotheses, a fully exploratory approach is totally ok and does not need the design table (or any of the other confirmation concerns either). In this case, make sure to remove the hypotheses and/or clearly state that they will not be tested.

Minor points

Title: Because “registered reports” include preregistration, it might be more informative to use the former term in the title.

Figure 2: We’re in mid-February, which is the time for filling consent forms. Please update how far the recruitment is when you return the revision. It’s totally ok if some data have already been collected (e.g., participant demographics are known), but then we just take this into consideration with bias control (author guidelines section 2.6).

P. 10: Will one of the team members serve as a game master or is this an external expert? Please clarify. 

P. 11: Because participants with as few as 1/9 IGD symptoms are included, it remains a bit unclear how this will affect the analytic strategy and the interpretation of results. E.g., there is some evidence that 2/9 symptoms are connected to lower wellbeing (Ballou & Zendle 2022: ), but it’s not clear how the reduction from 1/9 to 0/9 symptoms should be interpreted. Would it imply the participant’s health/wellbeing improved?

P. 13: The participants will be randomly distributed into 4 groups, but is that optimal? Considering that the study addresses social anxiety, taking into consideration e.g. gender in group distribution seems relevant. Imagine you have 5 women and 15 men; having mixed groups would likely lead to different outcomes vs if all men and women would be in gender-based groups. Which would be better in the light of current knowledge?

P. 20: Qualitative feedback is collected. Please also explain how and what kind of, and how it will be analyzed in this study (if it is). 

P. 22: PO1 mentions frequency and hours, both. In my understanding, frequency refers the number of times of engagement (“three times per day”), not the total time of engagement (“three hours per day”). Please clarify. 

P. 23: It is noted that deviations will be justified at Stage 2, but I must note that PCI RR guidelines (section 2.10) advise authors to consult the recommender for deviations immediately and prior to the completion of data collection whenever possible. If you choose to have this as a fully exploratory RR, deviations are more flexible. Especially if any confirmatory elements remain, it remains important to notify of them as soon as possible. 

Scales: because at least some of the scales (like DSM-based IGDT-10) include both core and peripheral construct criteria, it feels reporting omega would be better than alpha.

Please also consider the reviewers’ separate comments. I hope you find the reviewers’ feedback and my additions helpful. You may contact me directly for any clarifications if needed. This is a highly interesting and promising study, and I’m happy do my best to support it.

Best wishes,
Veli-Matti Karhulahti

Reviewed by ORCID_LOGO, 10 Feb 2023

The authors aim to test the feasibility and initial efficacy of a tabletop role-playing game (TTRPG) intervention on social anxiety and dysregulated gaming. It seems that the TTRPG intervention is designed with great care and informed by expertise & experience in role playing games. Well done! The manuscript addresses a real need in addressing an important problem, but also tries to understand the potential psychological effects of ludic activities. I therefore think that the intervention has promise. My review focuses on the evaluation of the intervention.

Authors propose to conduct both a feasibility study and a test of initial efficacy. These two aims seem at odds because the former would only track practical issues in the procedure including things like dropout and whether the participants understand what they are doing etc. The latter would require a detailed statistical investigation of a sufficiently large dataset. In my view the project looks like a success regarding the former aim, but falls somewhat short regarding the latter. 

My recommendation is that the authors either consider reframing this manuscript to focus on the first--also valuable--aim, or greatly increase the sample size to allow studying the latter. 

The design involves running 20 individuals through the experimental procedure after baselines of varying duration. Effectiveness is then evaluated by comparing participants' outcome scores during and after the treatment to their baseline scores (at last measure and the 3 month follow up). Authors could clarify what the exact comparison will be--is it average baseline vs. last measure/follow up? I understand the data analytic plan will be pre-registered later, but I didn't find sufficient information here to determine whether they will plausibly have enough precision to evaluate effectiveness. In light of the sample size of 20 I don't think the precision will be sufficient.

Thank you and good luck with the project
respectfully signed
Matti Vuorre

Reviewed by ORCID_LOGO, 09 Feb 2023

Dear Authors,

Thank you for this interesting submission. The main topic of this RR – piloting the usage of TTRPGs as an intervention to alleviate problems with gaming, self-concept, social anxiety, and loneliness - is highly relevant and innovative. I’m very sympathetic to the fact that the authors decided to submit this pilot as a RR. I’d also like to highlight the rigor of the proposed design. Below, I’ll try to provide several suggestions and will also depict some points that, in my opinion, would require further clarification.


The theoretical framework is well-written. I’ve only two minor suggestions. Please consider adding an estimate of the number of video game players worldwide. Please consider adding subheadings.

Goals of the study

The authors acknowledge that the present RR is a pilot to test the efficacy of their intervention program. They hypothesize that the intervention will reduce GD symptoms, social anxiety, and loneliness. It’ll also lead to the improvement of self-concept and assertiveness. The expected outcomes are further summarized in Data analytic strategy and Study Design Table, however, no evidence thresholds (i.e., the evidence needed to dis/confirm a hypothesis) are mentioned. Given it’s a pilot study, I’ve been missing a crucial aspect - a (qualitative) examination of the participants’ experiences with the intervention program and the analysis of their feedback. Although the authors briefly mention this in Study Design Table, I think this point requires much more attention throughout the paper. A minor note - the introduction contains many distinct (although related) constructs. I’ve noticed that, for example, assertiveness, which is one of the focal variables in the study, is first mentioned when describing the potential effects of the intervention program. Please consider introducing the construct earlier in the text.

Procedure and participants

These two sections are, again, well-written and provide details that will allow independent researchers to carry out a replication study. Figure 2 increases the understanding of the procedure. I, however, got a bit puzzled by the frequency of the psychological assessment. Could the authors clarify it in the text or create a table (maybe not necessarily a table and a graphical extension to Figure 2 will suffice) that will summarize which measure will be administered at what time point? The inclusion/exclusion criteria are clearly summarized and reasonable given the nature of the study. However, why do the authors think that prior experience with TTRPGs should be an exclusion criterion? Furthermore, the authors justify the sample size based on the expected dropout rate and inter-subject replication of the experimental effect. Could the authors elaborate on that? For example, what dropout rate do the authors expect? Will they try to contact the participants who drop out of the study to learn about their reasons? I’m asking this because participants who drop out from the study may be those who felt that the intervention had no (or even adverse) effect on them. Consequently, this could overestimate the success of the intervention. Is there a possibility to control for that?

Psychological assessment

Just a minor suggestion – since all the measures have been well-established in the psychological literature, the descriptions of the measures could be shortened/moved to supplementary material. The description of the TTRPG program is detailed and all the supplementary files help understand the procedure.

Data analytic strategy

Although I’m not familiar with single-case data analysis, the proposed analytical workflow appears to be well-thought. I especially appreciate the authors’ decision to use multiple analytic approaches and test the robustness of their findings. As I mentioned above, please consider specifying the evidence thresholds (not necessarily based on p-values given the sample size). Please consider providing the analytic code (with a simulated dataset) at Stage 1. I also wonder – will the authors control for potential confounders (besides the inclusion/exclusion criteria)? A minor comment – the link for the SCDA package / Rcmdr plugin doesn’t work. 

The authors (will) share all the data and materials at the study’s OSF project ( I’d like to appreciate this level of transparency and authors’ adherence to open science practices.


I hope the authors will find the suggestions useful. Looking forward to reading the revised version of the RR.


Matúš Adamkovič

Reviewed by ORCID_LOGO, 06 Feb 2023

User comments

No user comments yet