What psychological factors predict long-term success in esports?
Psychological predictors of long-term esports success: A Registered Report
Abstract
Recommendation: posted 04 April 2023, validated 07 April 2023
Chen, Z. and Pennington, C. (2023) What psychological factors predict long-term success in esports? . Peer Community in Registered Reports, . https://rr.peercommunityin.org/PCIRegisteredReports/articles/rec?id=269
Related stage 2 preprints:
Marcel Martončik, Veli-Matti Karhulahti, Yaewon Jin, Matúš Adamkovič
https://osf.io/b6vdf
Recommendation
In the current study, Martončik and colleagues (2023) propose to examine potential predictors of long-term esports success, in three currently most impactful PC esports games, namely League of Legends, Counter Strike: Global Offensive, and Fortnite. Based on an extensive review of the literature and four pilot studies, the authors will examine to what extent naive practice and deliberate practice, as well as other psychological factors such as attention, speed of decision-making, reaction time, teamwork, intelligence and persistence, can predictor player's highest rank in the past 12 months, as an indicator of long-term success. Deliberate practice has been proposed to play an essential role in the development of expertise in other domains, and the current study offers a test of the role of both naive and deliberate practice in long-term esports success. The novel measurement on naive and deliberate practice, developed as part of the current investigation, will also be a valuable contribution to future research on esports. Lastly, from an applied perspective, the results of the current study will be of great interest to individuals who are considering pursuing a professional career in esports, as well as professional and semi-professional esports teams and coaches.
This Stage 1 manuscript was evaluated over two rounds of in-depth review. Based on the comprehensive responses to the reviewers' feedback, the recommender judged that the manuscript met the Stage 1 criteria and therefore awarded in-principle acceptance (IPA).
Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA.
List of eligible PCI RR-friendly journals:
- F1000Research
- Journal of Cognition
- Peer Community Journal
- PeerJ
- Royal Society Open Science
- Swiss Psychology Open
Martončik, M., Karhulahti, V.-M., Jin, Y. & Adamkovič, M. (2023). Psychological predictors of long-term esports success: A Registered Report, in principle acceptance of Version 1.4 by Peer Community in Registered Reports. https://osf.io/84zbv
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.
Evaluation round #2
DOI or URL of the report: https://osf.io/saz7x
Version of the report: v1.3
Author's Reply, 30 Mar 2023
Dear Dr. Chen,
Dear reviewers,
Thank you very much for the constructive feedback. We hope that the revised version of the paper and our responses have made the paper even clearer and easier to understand.
Best regards,
Marcel Martončik
Decision by Zhang Chen, posted 07 Mar 2023, validated 07 Mar 2023
Dear Marcel Martončik and colleagues,
I have now received the review of your revised PCI-RR submission from two reviewers who also reviewed your initial submission. As you will see, both reviewers (and I) think that most previous comments have been addressed very satisfactorily in the revised manuscript. The results of the new pilot studies are also very informative, and certainly addressed some of the potential concerns with the measurements. Both reviewers also provided some new or remaining comments. All comments are relatively minor, and aim at further improving the readability of the manuscript. Addressing these comments will very likely result in an IPA.
Although you are now not planning a direct statistical comparison between different titles, the issue that the ranks are not comparable still remains when you interpret the results across the different games. This is an important point, and Reviewer 1 (Dr. Bonny) suggested a potential metric of long-term skilled performance that may be more comparable across the games. Please carefully consider whether such a metric is available with the three games that you have selected. If yes, I would recommend including such a measurement, even only for exploratory purposes. If such a metric does not exist, or cannot be easily obtained, this point will definitely need to be mentioned when interpreting the results after data collection.
Reviewer 1 suggested a potential distinction between competitive versus non-competitive play. While this distinction may indeed be informative, I feel this is a difficult decision to make at this point. The results of your pilot study show that the reliability of the practice measure is okay, supporting the use of this measurement without further changes. The competitive vs. non-competitive distinction may be addressed in follow-up research. In any case, I agree that psychometric analyses on the questionnaire once the data are in are recommended, as it may further increase the impact of this work.
Reviewer 2 (Dr. Behnke) gave some suggestions on using sub-sections in the Introduction, and I agree that this can further improve the structure and readability of the introduction.
The effect size for the first entry in Table 1 is missing. Most entries in the 'Notes' column succinctly summarised (the direction of) the main finding of each study, which I found very informative. However, some entries did not contain this information. For instance, for Mora-Cantallops & Sicilia (2018) - competence and presence (immersion), the 'Notes' column only mentioned the instrument used. Could you also briefly mention how these two factors relate to a player's rank (e.g., in which direction)? Other entries that may also benefit from adding such info includes the rows from Li et al. (2020) till Trotter et al. (2021).
The information in Table 2 (descriptive data from Pilot 2) does not seem to be crucial for the introduction. Perhaps it can be moved into the appendix, to make the introduction more compact?
I agree with reviewer 2 that table 3 is difficult to read, possibly because it combined multiple sources of information (i.e., statistically significant results in bold, predictions of the current study highlighted in purple or green, and the smallest effect size of interest and its interpretation for both titles). Personally, I think table 3 will be easier to understand if it would only show the statistical results from Pilot 2. You may consider making a similar table separately for the hypotheses, with the cells in different colours to distinguish the different predictions (null vs. alternative). This table may serve to replace much text in which you now spell out the hypotheses. (A similarly-structured table may also be used to show the results once the data are in, so that the pilot results, the predictions and the confirmatory results may be easily compared.) For the interpretation of the SESOI, I think presenting them in Appendix 5 seems sufficient (see my next comment).
The rationale behind the smallest effect of interest in Appendix 5 was very nuanced and thoughtful. While I like this information a lot (it certainly made it much more concrete for me what a certain effect size means!), I feel including all this information in the introduction would interrupt the overall flow. Thus, referring the readers to Appendix 5 (as you currently do) seems like a good solution to me.
Very minor point: It will be useful to have a final check of the whole document once you are ready with the edits - sometimes there are two spaces instead of one between two words.
Kind regards,
Zhang Chen
Reviewed by Justin Bonny, 03 Mar 2023
# Overall Reviewer Response
I commend the authors for the improvements throughout the manuscript. Most of my prior comments have been addressed save for a few clarifications on the practice measures and planned analyses.
## Practice Questionnaire
This in and of itself could be a valuable contribution to esports research. Having a measure that can assess different facets of esports practice would be useful for subsequent studies. That being said, the ‘practice’ question, “Routinely playing the game (ranked mode, non-ranked mode, with or without friends, etc.)”, seems to span across multiple types of play. A prior study distinguished competitive (ranked matches) and non-competitive (non-ranked matches) video game play with some evidence of differences in connections between the two with psychological traits (e.g., Bonny et al. 2020 Intelligence). It may be worth considering taking a similar approach here, splitting the question into two different ones that distinguish between competitive and non-competitive play. Although your reliability statistics in Pilot 4 suggests this may not be necessary, doing so could make a bigger impact on the future use of the instrument in esports research.
I do recommend that psychometric analyses be provided in the study for the practice questionnaire items. Specifically, including a factor analysis, in addition to reliability statistics, would provide further evidence of the performance of the measure.
## Planned Analyses
The number of levels in the dependent variable of rank is still different across each esports game, with 27 in LoL, 18 in CSGO, and 10 in Fortnite. It is still not clear whether the differences between each rank are commiserate across the esports games. It does appear that direct comparisons across esports game are not included in the analysis plan. But it is still worth considering how these differences in skill rank could impact conclusions about cross-title differences. For example, if intelligence is a significant predictor for LoL but not Fortnite, is that due to intelligence being more important for LoL or differences in the ranks between LoL and Fortnite? If there are any other metrics of long-term skilled performance that are common across the games, like MMR for Dota 2 which is modeled on ELO in chess, collecting these could be helpful. Although there are still limitations (e.g., games may use ELO-like ratings, but are calculated differently), this could provide corroborating support for the use of rank as a performance metric and additional support for cross-game comparisons.
I encourage the authors to provide the R-scripts via OSF to provide further information for replicating analyses in subsequent research.
Reviewed by Maciej Behnke, 03 Feb 2023
Evaluation round #1
DOI or URL of the report: https://osf.io/aj8rp
Author's Reply, 24 Jan 2023
Dear Dr Chen,
Thank you very much for considering our submission and for the constructive feedback from you and the reviewers. We hope that the revised version of the paper and our responses to the reviewers’ comments have made the paper clearer and easier to understand. Of course, we’ll be happy to revise the paper further if needed.
All the best,
Marcel
Decision by Zhang Chen, posted 28 Sep 2022
Dear Marcel Martončik,
Thank you for submitting your Stage 1 Registered Report “Psychological predictors of esports success: A Registered Report” for consideration by PCI Registered Reports.
I have now received comments from two expert reviewers in this field. As you will see, both reviewers agree that the proposed study examines scientifically valid research questions, and addresses an important gap in the literature. The reviewers have also provided constructive comments on how to further improve various aspects of the manuscript. Based on the reviews and my own reading, I would therefore like to invite you to revise the manuscript accordingly.
1B. The logic, rationale, and plausibility of the proposed hypotheses, as applicable.
1. One reviewer points out that the stated goals of the research are not entirely clear. As a result, there seems to be a certain disconnection between the research goals and the specific hypotheses. Both reviewers provide useful references that can be used to further strengthen the introduction and better situate the current study in the broader literature.
2. I would like to add that it is not entirely clear to me how certain predictions are made. For instance, hypotheses 2 and 3 are said to be based on “pilot findings (Table 2), theory, and the research literature (Table 1)”, but it is unclear to me how these different sources of information are combined to arrive at a certain prediction. For instance, attention is predicted to be related to esports performance (H2a), but the meta-analysis by Sala et al., (2018) showed a very weak correlation between gaming expertise and visual attention. Gender is predicted to be not related to esports performance (H3a), but gender is actually negatively related to performance in both titles in Pilot 2, and no study reviewed in Table 1 seems to have examined the role of gender. It would be helpful if you can more explicitly explain the rationale behind each prediction.
1C. The soundness and feasibility of the methodology and analysis pipeline (including statistical power analysis or alternative sampling plans where applicable).
1. One reviewer has some concerns over how the practice questions are being measured, and whether the dependent variables (i.e., ranking in a game) will be comparable between the two titles examined. It will be important to carefully consider these points, as they will have implications for the interpretation of the results.
2. The planned analyses may not be optimal, due to the potential issue of multicollinearity, and that the dependent variable being ordinal. If you do agree that the analyses will need to be adjusted in light of these issues, the power analysis will need to be adjusted accordingly (e.g., using an ordinal regression as the planned analysis; see some more comments on power analysis below).
3. One reviewer points out that the decision to use r = .1 as the smallest effect size of interest needs to be better justified. I also wonder how we should interpret an effect size of r = .1 for the different predictors here. For instance, what does it concretely mean if e.g. the predictor career length has an effect size of r = .1? Does r = .1 mean an effect is very small, regardless of which predictor is involved? I think interpreting the effect sizes in the current context (e.g. something like r = .1 means X extra years of playing the game will lead to an increase of one rank) will help readers better grasp the magnitude of the effects, and the rationale for using a certain smallest effect size of interest.
4. I feel the statistical inferences do not match the power analysis. For H1a-c and H4 to be corroborated, the point estimate of the effect needs to exceed r = .1 (with p < .05). My understanding of equivalence testing is that this means the 95% CI does not include r = .1 (otherwise one cannot claim that there is a meaningful effect size). However, the power analysis seems to be based on the classic null hypothesis testing, i.e., it tests against the null hypothesis of r = 0. The statistical power for corroborating H1a-c and H4 (i.e., comparing the point estimate to r = .1) may therefore not be 80%.
5. The analyses involve multiple predictors, for each of the two titles. I think multiple comparisons may be a concern, and there may be a need to correct for it by e.g. adopting an alpha level more stringent than .05.
1D. Whether the clarity and degree of methodological detail is sufficient to closely replicate the proposed study procedures and analysis pipeline and to prevent undisclosed flexibility in the procedures and analyses.
1. Please provide more detailed information on the questionnaires (e.g., the items of a scale, how participants respond etc.) and cognitive tasks (e.g., the trial procedure of a task, the number of trials etc.) that you plan to use. To reduce the length of the manuscript, such information may be provided in an Appendix or the Supplemental Materials.
2. To reduce researcher’s degree of freedom in data analysis, please share the R code that you plan to use for the current Registered Report.
1E. Whether the authors have considered sufficient outcome-neutral conditions (e.g. absence of floor or ceiling effects; positive controls; other quality checks) for ensuring that the obtained results are able to test the stated hypotheses or answer the stated research question(s).
1. There seem to be different interpretations of H2e. At the moment, this hypothesis is said to be supported if at least two constructs exceed r = .1 in one game and null is supported in the other game. However, a certain construct may predict performance in both games, yet the effect sizes can still differ. Would this be considered as support for H2e? This would require a direct comparison between the two titles, but the issue of whether the two titles are comparable (as mentioned by one reviewer) will need to be considered.
Other minor comments:
1. Table 1. For Thompson et al. (2013), no effect size is mentioned. It would also be useful to briefly describe what a certain effect means, as you have already done for some but not all of the included studies.
2. Table 1. Li et al. (2020) has three effect sizes. It is not clear what these three effect sizes refer to.
3. Above Table 1, when mentioning the results of Sala et al., 2018. “cognitive control (𝑟̅ = -16)”. Should be r = -.16?
I look forward to receiving your revision in due course.
Kind regards,
Dr Zhang Chen
Reviewed by Justin Bonny, 17 Sep 2022
# Manuscript Summary
The authors propose a study investigating the relative impact of deliberate practice and psychological traits on esports player performance. They point out that prior research has not differentiated between deliberate practice and other types of accumulated esports experience when predicting performance. They further differentiate esports performance between long- and short- term metrics, suggesting the contributions of experience and psychological traits to each metric may vary. They select two esports games to recruit players from, LoL and Fortnite, to test the hypotheses.
# Overall Reviewer Response
The authors rightly point out that the relative contributions of different types of experience on esports player performance remains to be examined. The inclusion of pilot study results is helpful for gauging the feasibility of the project. However, I recommend that the authors strengthen the approach of their study by utilizing prior methods for assessing cumulative experience, especially deliberate practice, and narrowing down and refining the scope of their analysis and hypotheses to more tightly connect them to the stated goals of the study. I fully support the authors intent of investigating the development of esports player performance similar to what has previously been done for traditional sports and games such as chess.
## Improving measures of accumulated experience
The validity and reliability of the proposed measures of experience, especially deliberate practice, is questionable. Prior research has commonly used different metrics when it comes to deliberate practice.
I encourage the authors to review some of the prior research by Ericsson and others who used retrospective estimates of deliberate practice. In these studies, participants are asked when they started practicing an activity and to then estimate, for each year since they started, how many hours a week they engaged in deliberate practice. These responses can then be used to estimate the cumulative amount of deliberate practice a person has with a specific activity. Do note that retrospective estimates have limitations due to memory errors (see the literature for discussion). This approach can be adapted for the proposed research with each esports experience measure: play time, deliberate practice, coached practice. This would be more in line with the expertise and deliberate practice literature and put the authors on firmer ground when explaining the measure.
Here are a couple of examples where retrospective estimates have been used to assess cumulative deliberate practice:
Ericsson, K. A., Krampe, R. T., & Tesch-Römer, C. (1993). The role of deliberate practice in the acquisition of expert performance. Psychological review, 100(3), 363. doi: https://psycnet.apa.org/doi/10.1037/0033-295X.100.3.363
Meinz, E. J., & Hambrick, D. Z. (2010). Deliberate practice is necessary but not sufficient to explain individual differences in piano sight-reading skill: The role of working memory capacity. Psychological Science, 21(7), 914-919. doi: https://doi.org/10.1177/0956797610373933
The authors need to consider how much overlap there is between the different experience measures and how they are interpreted by participants. For example, compare the current version prompts for deliberate practice and daily practice time: “Think about the past 12 months. Of all the time you spend on esports, how many hours per day is deliberate practice, i.e., gaming and non-gaming activities that need focused attention and are intended for improving specific esports skills?” versus “During those years of playing GAME NAME and/or similar games, how many hours per day have you played, on average?” Are these supposed to be mutually exclusive? Does daily hours played include deliberate practice? When revising and presenting instructions to participants, the authors need to be especially concerned with multicollinearity in their analyses. If there is too much overlap in experience measures (e.g., deliberate practice is included in daily game play) then this will be a major issue when entered into the same regression equation, making it much more difficult to detect effects and tease apart the relative contributions of highly-correlated experience predictors of performance.
## Connecting to deliberate practice and expertise literature
The authors should make greater reference to the deliberate practice and expertise literature when justifying why psychological traits contribute to skilled performance in addition to experience. There have been recent studies suggesting that factors in addition to deliberate practice are important to the development of expert-like performance (e.g., Macnamara et al., 2014, https://doi.org/10.1177/0956797614535810; Hambrick et al. 2020, https://doi.org/10.3389/fpsyg.2020.01134). These perspectives appear to align with the stated goals of the proposed research. The authors do indeed cite some of the effects of deliberate practice from these lines of research. However, the authors focus on the effect of deliberate practice only; they should place greater emphasis on how this literature has provided evidence that other factors, such as psychological traits, in addition to deliberate practice are important for the development of skilled performance. By doing so the authors can frame the study around two competing theories in accounting for skilled performance: deliberate practice theory and the more recent theory that deliberate practice is not sufficient alone in developing skilled performance.
The specific studies that these articles reference may also be useful for identifying prior methods and analyses that could be used as frame of reference for the present research.
## Connecting hypotheses back to study goals and stated gaps in research
It seems that too much emphasis has been placed on the pilot study results at the expense of the stated research goals. These goals, as listed in the current version of the manuscript, are as follows:
“In the present confirmatory study, our goal is test whether deliberate practice theory, which has successfully been applied to other sports earlier, can also predict high esports performance.” (abstract)
“Along these events, a relevant research question has emerged: what skills and attributes are needed to become a successful esports player? This is our research question.” (first paragraph of introduction)
“In the present study, our goal is to test if the deliberate practice theory of performance development applies to esports, and how other psychological and environmental components might be relevant for esports performance, too.” (second paragraph of introduction)
To me, in synthesizing these statements, the goal of the study is to investigate the relative impact of deliberate practice and psychological traits on the development of esports player performance. However, some of the stated hypotheses do not seem relevant to the stated study goals and are instead the result of the pilot study (e.g., effect of gender, age, teamwork and physical training). If my synthesized goal is indeed what the authors intend to examine with their study, I encourage them to revise and condense their hypotheses to be more specifically aligned with the study goal. They should certainly be informed by the pilot study, but more connected to the goal. For example, the authors may hypothesize that cognitive trait measures (attention, reaction time) will be a stronger predictor of skilled performance than deliberate practice.
However, if I am incorrect in my interpretation of the study goal, the authors should revise the introduction to focus on the pilot study-generated hypotheses and identify the theories that support them more clearly.
## Clarifying Skilled Performance
In the introduction the authors compare short- versus long-term skilled performance. In doing so I was expecting this to be a component of the study design: including measures of short-term and long-term performance. However, only a measure of long-term skilled performance is included in the methods section. The authors should either include a measure of short- and long-term performance or remove this distinction in the introduction. Furthermore, the authors need to justify how their measure of skilled performance (highest rank in [game]) is a valid and reliable indicator of long-term skilled performance.
## Justifying Inclusion of Two Esports Titles
It is unclear why two esports games are included in the study. Are the authors using this as a way to generalize across players of different esports games or to compare and contrast the relative contributions across different games? Please include additional justification in the introduction. Additionally, it would be helpful to include a comparison of the titles based on included game mechanics rather than describing them as being from different genres.
A concern about including two different esports titles is the equivalency of skilled performance metrics. The stated measure of long-term skilled performance is “In the past 12 months, what is your highest rank in [game]?” with the indicated rank presumably being converted into a number (this needs to be clarified) with higher numbers indicating higher skilled performance. However, in doing so, the authors are assuming that, for example, a rank of 5 in LoL is equivalent to a rank of 5 in Fortnite. But is that actually the case? The authors need to justify the use of this operational definition of skilled-performance across both esports titles. If these ranks are not equivalent across titles this could lead to serious issues in statistical analyses. For example, the authors could observe a between-title difference of deliberate practice when predicting skilled performance: would this be due to an actual difference in impact of deliberate practice (i.e. it is more important for LoL than Fortnite) or due to dependent variable of rank for one title not being equivalent to the other?
## Refining Planned Analyses
Is the dependent variable, skilled performance, an ordinal measure? If so, the authors should consider whether a generalized linear model is required for analysis instead of a linear regression and provide justification.
The entry of all variables into a single regression model makes it very likely that multicollinearity will be a problem, especially for measures of experience (but see earlier comment about possible ways to reduce this). The authors need to specify how will this be addressed.
Are the authors predicting that psychological traits will moderate / mediate the impact of deliberate practice and experience on performance? This would be in line with previous studies that have observed such interactions. This needs to be stated more clearly in the analysis plan.
The authors suggest interaction effects in H2 but the sample size is based on main effects observed in the pilot study. The sample sizes need to be re-estimated to detect the smallest predicted interaction effects. It appears that the pilot study can be used to estimate the effect size of such interactions for use in a revised power analysis