Social media positivity bias
Unveiling the Positivity Bias on Social Media: A Registered Experimental Study On Facebook, Instagram, And X
Abstract
Recommendation: posted 31 May 2024, validated 31 May 2024
Karhulahti, V.-M. (2024) Social media positivity bias. Peer Community in Registered Reports, . https://rr.peercommunityin.org/PCIRegisteredReports/articles/rec?id=666
Recommendation
In the present study, Masciantonio and colleagues (2024) will test positivity bias in the context of three social media platforms: Facebook, Instagram, and X. The experiment involves recruiting participants into platform-specific user groups and crafting posts to be shared with friends as well as respective social media audiences. If positivity bias manifests in this context, the social media posts should introduce more positive valence in comparison to offline sharing—and if the platforms differ in their encouragement of positivity bias, they should introduce significant between-platform differences in valence.
The Stage 1 plan was reviewed by four independent experts representing relevant areas of methodological and topic expertise. Three reviewers proceeded throughout three rounds of review, after which the study was considered having met all Stage 1 criteria and the recommender granted in-principle acceptance.
URL to the preregistered Stage 1 protocol: https://osf.io/9z6hm
Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA.
List of eligible PCI RR-friendly journals:
- Collabra: Psychology
- International Review of Social Psychology
- Peer Community Journal
- PeerJ
- Royal Society Open Science
- Social Psychological Bulletin
- Studia Psychologica
- Swiss Psychology Open
1. Bayer, J. B., Triệu, P., & Ellison, N. B. (2020). Social media elements, ecologies, and effects. Annual review of psychology, 71, 471-497. https:// doi.org/10.1146/annurev-psych-010419-050944
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.
Evaluation round #3
DOI or URL of the report: https://osf.io/29qyv
Version of the report: 2
Author's Reply, 27 May 2024
Decision by Veli-Matti Karhulahti, posted 24 May 2024, validated 25 May 2024
Dear Alexandra Masciantonio and co-authors,
Thank you again for all careful revisions. The work is now almost ready for IPA. I didn't send it for more external reviews, as the revisions generally respond to all requests comprehensively. I noticed a few very minor issues that still need to be fixed, but I believe you can tackle them quickly. Meanwhile, I prepare the IPA so that you should be able to receive it within 24h from the next version if everything goes as planned.
1. As a response to R2’s comment 9, the interpretation is now “We will also take effect sizes into account, in line with the chosen SESOI.” This is good but since it’s a formal registration of hypothesis, we need to be even more precise about what “in line with” means. I suggest the following, but you’re naturally free to choose something else: “We reject H0 if the effect of time on valence is significant (p <0.05) and exceeds the size that the study was designed to detect (r > .21).” [adaptable to both H1/H2]
2. Currently alpha is marked as .5, please double check.
3. I apologise for not being clear in my previous request to add pilot effects. I was not referring to non-signigicant effects but group comparisons (d); currently there are only ANCOVAs but it would be valuble to report platform-specific effects too (they were significant). You’re free to move any of the pilot information to a supplement if you’re worried that the MS is getting too long.
4. Could you please edit p. 16 so that it doesn’t include “RQ1” (can be confusing for readers since it’s the same as RQ3 earlier) and add the word “exploratory” to make it explicit. Something like: “We therefore additionally explore our previously stated research question: Does positivity bias have an influence on emoji use?”
5. On p. 16, could you please change the word "large" to "meaningful" so that "though not necessarily
indicative of a meaningful effect" (because large doesn't mean anything unless connected life or theory).
Best wishes
Veli-Matti Karhulahti
Evaluation round #2
DOI or URL of the report: https://osf.io/29qyv
Version of the report: 2
Author's Reply, 23 May 2024
Decision by Veli-Matti Karhulahti, posted 02 May 2024, validated 03 May 2024
Dear Alexandra Masciantonio and co-authors,
Thank you for all careful revisions. Three reviewers returned to assess the work and they all agree that the plan is significantly improved and closer to ready. They have some further notes, however. I summarize the main points and add some of my own.
Theory
The additions to the introduction are a major step forward, yet all three reviewers still believe the theoretical part could be improved. Below, I try to follow the study’s rationale as clearly as I can. Perhaps this also helps with clarifications in the final edits.
Your construct is “positivity bias in social media self-presentation”, which has the first appearance in Reinecke & Trepte (2014): “We thus suggest that positive forms of authenticity are shown more frequently on SNS and are more likely to receive reinforcement in the SNS context than negative forms of authenticity. SNS users are thus more likely to engage in positive authenticity than in negative forms of authentic self-presentation. We propose the term ‘positivity bias in SNS communication’ to refer to this phenomenon.” As mentioned earlier, this seems to potentially happen in two ways:
PB1: selectively posting more positive daily experiences (and/or not posting negative daily experiences).
PB2: exaggerating the positivity of any posted experience (both positive and negative).
- You want to test PB2 (2nd construct pathway), and this your H1.
- You justify H1 with Goffman’s theory: people present themselves differently (tactically) in social contents.
- The auxiliary hypothesis is that all social media are contexts where presenting oneself positively brings more rewards (on average), hence tactical positivity.
I think all this makes sense. Perhaps more explicitly organizing the text to follow a chain of reasoning in subsections (as suggested by one reviewer) could be useful. What could be expanded is the last point, i.e. the assumption that positive self-presentation brings additional rewards in social media (vs F2F). This could be done easily by referring more to the vast literature on likes, retweets, etc. which gamify and quantify social media interaction (not present in F2F).
In turn, H2 is based on the idea that social media are different.
- Social media differ by architecture, affordances and social-cultural context.
- Such differences can further reward and punish positive self-presentation.
- Instagram excessively supports positive self-presentation (vs X/FB), hence tactical positivity.
This also makes sense. As the reviewers imply, what could be further explained is again the last section: what are those specific features in Instagram that cause its users self-present more positively? Is it the very fact that all posts are images? As someone who checks Instagram once or twice per month, the vast amount of posts I see go “here’s my cat”, “here’s my coffee”, etc. I find it difficult to imagine how one would even be able to post a “negative” image (e.g., “I had a terrible argument today, here's a picture of it” or “my family member is ill, here’s a picture of it”). In other words, because the UX is designed for visual content, this seems to make it extremely clumsy for negative sharing. This likely pushes the social-cultural context to be even more positive. This makes your H2 logical for me, but it could be spelled out for other readers too, theoretically, in more detail.
Btw, you might want to check a paper from a few weeks ago by Avalle et al. that quite comprehensively finds no differences in negative behaviors between social media platforms. But they only studied text so it's still consistent with H2 (if the auxiliary hypothesis is based on the visually driven design of Instagram): Avalle, M., Di Marco, N., Etta, G., Sangiorgio, E., Alipour, S., Bonetti, A., ... & Quattrociocchi, W. (2024). Persistent interaction patterns across social media platforms and over time. Nature, 628(8008), 582-589.
Methods 1
Two reviewers are concerned about your selected effect r=.21. I see their worries. The term “SESOI” often causes confusion, so I try to open it (please allow this explanation, hopefully it doesn’t sound too instructive -- it helps myself better see the context). SESOI tends to have two different meanings/implications:
a) effect for planning statistical power, and
b) smallest pragmatically or theoretically Meaningful Effect (ME).
Optimally, (a) and (b) are the same, but that’s rare because ME is often difficult to calculate and justify. Most studies don’t have ME but use a heuristic like meta-analysis to plan for statistical power. That can be ok, but doesn’t allow making general inference about whether the effects are meaningful or not.
In the present case: it’s ok to power this study for .21 if you believe the effects in the field are usually this size. But as reviewers note, this doesn’t mean that .15 or .2 would necessarily be meaningless effects to find (or >.21 meaningful to find!). Here .21 is just a number to help estimate needed statistical power. One can choose .21 for power and say: “In this study, we have power for .21. If we find >.21, this is informative but not necessarily a meaningfully large effect. If we find <.21, we are powered to rule out larger effects—but also smaller effects might still be meaningful for theory and practice.”
That said, as noted earlier (and by reviewer), in this study you could also easily calculate ME based on what's the smallest agreeable difference in valence. Assuming you have high agreement, a step from “positive” to “very positive” is practically meaningful because your raters were able to observe a consistent difference (if the scale would be wider, e.g. -10 to +10, consensus would become unlikely). Although we don’t know beforehand what the standardized effect of a raw step is, it’s likely going to be more than .21 (which would be logical, considering we’re talking about sensing differences in degrees of positivity).
In sum: you can use .21 as ad hoc SESOI, as long as you’re clear about the above limitations. Or you can further improve the design and define ME, e.g., based on the agreement of raters. The latter would be similar to what Anvari and Lakens call the smallest subjectively experienced difference: Anvari, F., & Lakens, D. (2021). Using anchor-based methods to determine the smallest effect size of interest. Journal of Experimental Social Psychology, 96, 104159.
Methods 2
One reviewer makes an important note about your coding plan. Indeed, you need to predefine a threshold for agreement. We have recently developed guidelines for transparently coding open-ended data, which seem very relevant here: https://osf.io/preprints/psyarxiv/86y9f (to be clear, I'm not asking to cite the paper but review for more detailed answers). To avoid kappa-hacking and other such common concerns, it's important to spell out things like:
- What's a sufficient threshold of agreement to proceed with the test at Stage 2?
- What will be done if agreement isn't sufficient after first coding round? As re-coding same data with same raters will cause huge bias, will guidelines be updated and new coders trained?
- Will the confirmatory design be changed to an exploratory one if it turns out that high agreement is impossible to obtain?
- Considering that ratings are categorical (“positive”, “very positive” etc.), would Fleiss’ kappa be more informative than ICC?
Brief notes
- Please report the effect sizes of your pilot (especially that related to H1, seems to be around d ≈ .2 ?)
- In the design table “Interpretation” H1/H2 cannot be corroborated solely based on significance but also needs a sufficiently large effect. E.g., if .21 is decided as interesting here, one could reject H0 when >.21. Note that null tests haven’t been planned (e.g., equivalence test) so the design doesn’t allow obtaining evidence for no effect (H0), just evidence for effect (H1) or no evidence for effect (no H1).
- I’m not aware of data on this but for the record, I believe that the chronology for most people's Instagram posts is reverse to the present study design: not selecting a memory but an available image that was created as an affect response ("oh, my food/pet looks nice"). Perhaps worth discussing at Stage 2.
- Related to the above: it feels that most Instagram posts have recently become short videos (ala Tiktok). You might consider allowing participants to describe either a picture or video. I agree with reviewers that this option should be for all participants, all platforms.
- I agree with the reviewers about oversampling because open-ended data often include many low-quality responses. Please ensure that all discarded responses will also be shared as part of the available data.
- Please include the R code for all confirmatory Stage 2 analyses for final check to obtain Stage 1 IPA.
- As reviewers note, the screening question “Which of the following social media sites do you use on a regular basis (at least once a month)?” allows participants who only read and don’t post. It could be useful to ask their posting habits at some point to have an idea how many lack posting experience. I agree that if a user lacks posting experience on a platform, they seem unable to exress platform-specific bias.
- Like one reviewer, I don’t follow this: “the questionnaire will only be able to be answered on a smartphone to get as close as possible to a real-life situation.” Why is only smartphone real-life? I personally use social media only via laptop.
- For transparency, please add a footnote on page 7 to clarify that some of the RQs were revised during Stage 1 review. If you prefer, you don't need to refer to the main tests as preregistered (all RRs are).
- Regarding H2 (“The posts’ valence is dependent on the social media”), do you really mean absolute valence or increase in valence from T1 to T2?
It’s many things, but this is an experiment where small details matter a lot. I hope you find the reviewers' feedback useful and my additional notes of some value. If you disagree with something or believe that some suggestions are unclear or wrong, you may rebut them as usual or contact me directly for additional clarifications.
Good luck with the final stretch!
Veli-Matti Karhulahti
Reviewed by Julius Klingelhoefer, 20 Apr 2024
I would like to thank the authors for the opportunity to review the resubmitted manuscript. I share the editors’ positive outlook on the manuscript and believe that the revised manuscript and the proposed study have been improved in central ways. I have a couple of smaller suggestions that the authors may want to consider, specifically regarding the theoretical outlook and how the control group is implemented, but overall, I believe that the proposed study looks very promising and would recommend that after consideration of the feedback from this round of reviews, the study should go to data collection.
Please find my more detailed comments below.
Theory
I think the additions and changes to the theory section improve the manuscript, yet, I believe the theory section still could be a bit clearer and slightly more well-structured in some areas:
- As it is structured and discussed now, the different theoretical approaches seem somewhat disconnected from each other. I think relating the approaches to each other and organizing/systematizing them would make it more clear what the theoretical contribution of the paper is. E.g., I would suggest explicating more how architecture, affordances, and social-cultural context relate to self-presentation. In some respects this link becomes clear, e.g., through the discussion of norms, but with shareability, for example, it is not really clear in which way the authors would expect higher shareability within a platform to affect post valence.
- Further, I believe that the revised paper would benefit from using advance organizer paragraphs and/or including one additional heading level.
- Particularly, “afford” is used before being properly introduced as an approach. I think explicating the theoretical approaches more clearly would also address this issue with clarity.
The authors propose in the revision that the baseline (control) condition should be relating an event to a group of friends. This seems reasonable (see also my later comment). However, the theoretical section mostly focuses on social media in general and later on the platforms, but the comparison to a friend group or other types of points of comparison are not discussed or only briefly touched upon. I think introducing these comparisons earlier and discussing existing empirical evidence on such comparisons and theoretical mechanisms that explain the differences would make it more obvious and provide better arguments as to why this was chosen as the point of comparison for the social media platforms.
More arguments within the theoretical background would be beneficial, for example, why are shareability and visibility specifically more relevant than other affordances?
The statement “Indeed, this bias is rooted in the face theory […]” makes it seem like face theory is the only (or main) explanation for positivity bias. If this is what the authors want to say, this should be supported with arguments. If other mechanisms are presumed, this statement should be qualified.
I think the authors should include a short definition of what they consider to be emojis, as the discussion and the footnote in the previous manuscript version make it clear that the definition is not necessarily straightforward.
Method
I think it is a good choice to identify a smallest effect size of interest. However, I do not believe that the choice of using r = .21 as the SESOI is appropriate here, or at least needs more justification. In my view this would mean that if you find an effect that is smaller than average media effects on self-disclosure, it would not be viewed as practically or theoretically relevant. This would mean that around 50% of effects would be considered not practically or theoretically relevant. An effect of r = .20 would mean that the hypothesis is not accepted (vs. r > =.21). Maybe I am interpreting this incorrectly and you can point out the implications that you were considering but based on Lakens’ Paper, I would suggest that a power analysis based on an expected effect size of .21 may be more appropriate based on the response.
The use of a control group is a good change to the method. However, I am not quite sure whether this new operationalization may not be confounded with theoretically relevant characteristics. As you plausibly argue in the literature review, existing connections between the social media may influence how participants post on social media. However, when the control group is instructed to think about a group of friends, posting on a “follower”-style social medium (i.e., twitter, Instagram) may be different in at least two aspects, that is being a social media platform and type of shared pre-existing connection. This could reduce internal validity. However, it might be a sacrifice that the authors may deem appropriate for theoretical reasons or ecological validity.
It needs to be explained what “understand the instructions” means.
“To reflect the fact that Instagram is an image-oriented social media, they will also be asked an optional question: ‘If you plan to use an image or photo to accompany this post, please describe it briefly here’.” I assume this will be there for all platforms? Additionally, will participants only be asked about a post on the timeline (“traditional” posts) vs. a post to the story? To me this was not clear from these instructions and stories may differ substantially from “traditional” posts in the use of visual elements, e.g., GIFs, Stickers, etc.
Would it make sense to randomly vary whether the control vs. experimental condition will be introduced first to account for potential order effects in this within-participant factor?
Style
Latin letters should be italicized in results (e.g. r = x)
The authors say that sample size would be rounded to 300. I think this is somewhat misleading and I would suggest talking about oversampling, e.g., to account for participants who do not meet attention checks or other inclusion criteria.
Reviewed by Marcel Martončik, 23 Apr 2024
Dear Authors,
I would like to greatly appreciate and thank for the thorough and thoughtful responses to the comments from myself and the other reviewers and the related edits of the manuscript. Your effort in addressing each point is commendable and has significantly improved the manuscript. I appreciate the time and effort you’ve put into making these revisions.
Having read the revised version of the manuscript, I’d like to highlight the following points:
1) How will the dependent variable be calculated? In the measurement section, I found only this: ‘Three researchers will qualitatively analyze all the texts to estimate their valence on a 7-point scale (-3 = Very negative; 3 = Very positive).’ What happens if their assessments do not match? Will an aggregate score be calculated (what kind of?), or will they need to reach a unified conclusion through a reconciliation process, or is there another method?
2) I am considering the appropriateness of applying the effect from a meta-analysis by Ruppel et al. (2017) to the context of this study. Hypotheses 1 and 2 compare the positivity of posts between different forms of media. However, the justification of SESOI is based on differences in self-disclosure (Ruppel et al., 2017). How closely related are these constructs? Is it possible to expect a similar effect in positivity as in self-disclosure? In addition, I agree with the suggestion from the Recommender, “because you’re using human raters to observe differences in posts: we thus already know that one step up in the scale is noticeable and meaningful,” that having a 1-point difference (as an unstandardized unit) would be much more meaningful.
My subsequent question pertains to the difference between the hypotheses. Should the SESOI be the same for H1 and H2, even though they relate to different phenomena?
Minor comment/suggestion: During the pilot, a large number of participants were excluded for various reasons (e.g., misunderstanding the instructions). Perhaps it would have been advisable to plan for more oversampling…?
3) Thank you for sharing the power analysis calculation script. I think it is even more important to share in advance an analysis script for future analyses. There are many different ways of how ANCOVA can be calculated even in the same software and it would be useful to know the author's method of analysis.
4) This is rather my reflection:
I wonder why users are not allowed to complete the survey on the device they prefer for using social media, but instead are forced to use a smartphone. Additionally, they are instructed to answer the question: ‘We will ask participants on which devices they most often use social media (computer, tablet, or smartphone).’
In terms of theory, how relevant is it for a person to be familiar with a given social media platform? Should the positivity bias effect manifest itself on any social media platform regardless of whether the person is familiar with and uses it, or does it primarily manifest itself on the platform that the person prefers?
I wonder, what is the source of the differences in positivity bias between platforms? The theoretical background suggests (if I understand it correctly) that the different effects should be due to differences in features between platforms. Thus, in order for different positivity biases among platforms to manifest, the user must be aware of the specific features of a given platform (I will demonstrate a strong positivity bias on Facebook only when I often use Facebook and therefore know its features and similarly weaker positivity bias on Instagram because I am familiar with its features that are distinct from Facebook and therefore cause different effect). From this perspective, then, wouldn’t it be more appropriate for each participant to choose their preferred platform (as opposed to random assignment) and have to write a post for that platform? Perhaps even more appropriate would be to control this variable completely and incorporate it into the research design (some participants would write a post for their preferred platform, some for the one they use minimally or not at all) - but I understand that this would increase the sampling requirements.
5) I am a bit confused about the hypotheses and research questions (RQ). The first RQ is not explicitly formulated as an RQ; it is listed at the top of page 14 as ‘The main research will therefore aim to address the following fundamental question:’. The explicitly formulated RQ1 at the beginning of page 15 then has no associated hypothesis. If it is not even supposed to have one, and is only part of the exploratory analyses, then I would suggest to explicitly label it as such.
6) I apologize but I do not understand why participants are advised not to report very positive event if the goal is to prevent participants from reporting traumatic experiences.
7) I also wonder that each social media platform has different word limitations (e.g., there is a significant difference between Instagram and Facebook). I presume that frequent users of these platforms are aware of these limitations and may naturally adjust the messages they write for a particular platform – writing condensed messages (on a platform with a low word limit) where it is necessary to highlight the most salient elements of the event, etc. Should this be incorporated into the study? For instance, should the same word limit be used for all social media platforms, or should participants be asked after writing a post if they have taken into account the word limit of a given social media platform when writing a post, and consider that as a covariate?
8) Do the authors plan to use any threshold for inter-rater reliability? What is the planned procedure if the reliability is very low?
9) In Table 1, Interpretation given different outcomes should be based not only on the p-value but also on the size of the effects.
Reviewed by anonymous reviewer 1, 30 Apr 2024
Review Stage 1 – Round 2
Thank you for providing the opportunity to review the revised version of the manuscript "Unveiling the Positivity Bias on Social Media: A Registered Experimental Study On Facebook, Instagram, And X". I appreciate the authors' efforts to address my previous comments and make significant improvements to the content of the manuscript. I have a few minor suggestions to further improve the proposed research.
General comments:
- While the authors have added more information on emoji usage, I believe the argument and theoretical background on why it is important to consider the frequency of emoji usage could be strengthened. It would be beneficial to clarify the implications and insights gained from understanding the differential usage of emojis across different social media platforms.
- As previously noted by the authors, asking participants to write about the same event twice (in this case once for a friend and once for social media) may introduce bias. To mitigate this, I recommend randomly assigning participants to write first about either sharing with a friend or posting on social media and then about the other event.
Minor comments:
- There is some inconsistency in language usage: The authors alternate between using "X" and "Twitter." It would be preferable to maintain consistency throughout the manuscript.
- In section 2 “the present research” this sentence reads repetitive: “In support of open science, the research will be pre-registered on OSF.”
I wish the authors all the best with their planned research!
Download the reviewEvaluation round #1
DOI or URL of the report: https://osf.io/va7rj?view_only=4c0e3ffbdf6b4397adc0c797f5f3c6f9
Version of the report: 1
Author's Reply, 05 Apr 2024
Decision by Veli-Matti Karhulahti, posted 16 Feb 2024, validated 16 Feb 2024
Dear Alexandra Masciantonio and co-authors,
Thank you for submitting your interesting Stage 1 to PCI RR. As the first two reviews were mixed, I wanted to ensure your manuscript receives comprehensive assessment and invited two more reviewers. After carefully considering all four reviews, I agree that major changes need to be implemented for the study to be informative, but I am also optimistic that such changes are possible if the reviewers’ feedback is carefully taken into consideration. I summarize the key points.
Construct/theory
The reviewers collectively voice that there is some confusion in ‘positivity bias’ as a concept as well as in its related theoretical implications. For instance, there’s also a well-known effect of ‘negativity bias’. It could be explained in more detail how and why is the former associated with social media (i.e., what’s the mechanism/theory).
My impression is that two different positive biases are relevant here. The first one is about the positive/negative ratio of communication, i.e. which type is more prevalent. The second one is about exaggeration, i.e. how much positive become more positive. It feels that your study addresses and is most suitable for the latter construct. Here it’s important to clarify: if positivity bias is true, this should mean that positive events become more positive and negative events become less negative. If positive events become more positive and negative events become more negative, then the effect is not a positivity bias but an exaggeration bias (i.e., all content is exaggerated, perhaps to maximize attention).
Other theoretically relevant elements, as the reviewers note, are the platforms. Because the goal is to study differences between platforms, it would be valuable to explain how the design and mechanics of platforms differ (I believe this was also a central idea of the cited Meier & Reinecke 2021). For example, closed Facebook groups provide protection and safe spaces, whereas open Twitter debate is more riskly (meanwhile, in both users can customize modes of participation). I hope the above examples help you to further clarify the construct of the study and how it may be more explicitly connected to the hypotheses and theory.
Methods/materials
I believe the main problem in the current plan is that there’s a discrepancy between the dominant modality of social media (images/videos in Instagram, TikTok etc) and the lexical nature of the study. Another main problem is that of controlling measurement. I believe these issues can be solved by dropping out visual-driven media (Instagram) and adding nonsocial media as controls. As the reviews imply, you could consider e.g., comparing ‘personal diary’ and ‘talking to a friend’ (f2f) to text-driven social media. Because both Facebook and Twitter can be used in large/open and small/closed groups, I share you an idea of framing these options—instead of Facebook and Twitter—based on how they are used: “Imagine posting this event for a closed group of friends in social media such as Facebook or Twitter”, “Imagine posting this event publicly for millions of people to read in social media like Facebook and Twitter”. In this way, you would get to study the mechanisms. It would also allow discussing (albeit not testing) platform differences, as we know that certain mechanisms are more characteristic to certain platforms.
Another important methods issue is the effect size. As reviewers note, to test a hypothesis, it’s necessary to justify and state the smallest effect size of interest and also explain what will corroborate the null, e.g. by means of equivalence testing (see Section 2.3. Evidence Thresholds in Guidelines). At PCI RR, Cohen’s benchmarks are not used so I suggest carefully thinking what would be a meaningful raw effect. Intuitively, I think you’re in a good position because you’re using human raters to observe differences in posts: we thus already know that one step up in the scale is noticeable and meaningful. Maybe this can help you define a SESOI. As one relevant source on this, I refer to the paper that is part of the PCI RR guidelines: https://doi.org/10.1525/collabra.28202
With the above considered, I encourage the hypotheses to be redesigned and specified and explicitly formulating the current RQ1/RQ2 section as exploratory analysis because it does not involve confirmatory tests. To make the structure coherent, the first study could be just “Pilot”.
***
Most reviewers wished to see more materials to be able to better assess the design. I very much enjoyed the clear pilot materials with translations, but it would indeed make reviewing easier if the upcoming materials would be accessible too. This would also allow making direct improvement suggestions that can efficiently support the development of materials. E.g., one thing that the reviewers didn’t seem to notice is the final sentence in endnote iii: “Please choose an event that is neither very painful nor very positive.” Alas, you still plan the following: “Regarding the event that participants are thinking about, they will be asked to what extent this event is positive or negative (-3 = “Very negative”; 3 = “Very positive”).” If they are not allowed to think about very positive events, it seems conflicting to have it as an option (unless it's a control question).
I know you have tons of feedback already, so I stop here. I hope you will find the reviewers comments constructive and helpful. If you have any questions about specific points, or wish to discuss any details of the feedback, feel free to email me during the revisions. I will do my best to support you in the process and help make this as informative study as possible.
Veli-Matti Karhulahti
Reviewed by Linda Kaye, 23 Jan 2024
Reviewed by anonymous reviewer 1, 13 Feb 2024
General comments
Thank you for the opportunity to be a reviewer for the registered report " Unveiling the Positivity Bias on Social Media: A Registered Experimental Study On Facebook, Instagram, And X". I believe the research explores a very interesting and important topic, which is how the positivity bias differs across different social media platforms. Although I am not an expert in the field of social media research, I believe that the goal and need for the study are clearly explained and that the proposed methods seem appropriate. Below are some comments and suggestions to further improve the proposed research.
Specific comments
Major comments
· The proposed research is both interesting and relevant, and the study is well explained. However, it would enhance clarity to explicitly state that H1 replicates an established phenomenon (the positivity bias), while H2 introduces a novel perspective by examining this bias across various social media platforms. Although this distinction is mentioned in the general introduction, it could be discussed more explicitly in section 4.1.
· While the different hypotheses are well explained, it would be beneficial to also formulate a specific research question for the confirmatory aspect of the study, not solely for the exploratory part.
Adding additional background information on the relevance and significance of emoticon usage and post length in social media would enrich the relevance of the exploratory aspect of the study.
Clarification is needed regarding the statement "whose text will not be coded by the researcher”. Why are some texts not coded by the researcher?
The authors do a sample size calculation based on expected effect sizes; however, they do not mention calculating effect sizes for the planned analyses. Incorporating effect size calculations alongside the planned statistical analyses would be beneficial.
Similarly, while the authors adeptly outline their interpretation of significant results, it would be beneficial to explicitly address how this interpretation may be contingent upon the effect size.
Clarification for the criteria for data exclusion is needed. Will neutral data be excluded again? Are participants removed if valence cannot be accurately coded? Additionally, are there any other criteria for outlier exclusions, such as excessively short or long posts?
Minor comments
The introduction and/or abstract would benefit from an earlier definition of "positive bias," (this sentence may be written earlier “leading users to predominantly share and engage with positive content rather than negative or neutral ones”)
· Clarify whether the question regarding participants' usage of Facebook, Instagram, and Twitter at least once a month relates solely to interacting with the platform or includes posting content as well.
· Instagram also requires to add a picture alongside the text (which the other platforms do not). This should be discussed somewhere in the paper.
· Including the full questionnaire in the supplement would aid in study replication. Clarify the types of socio-demographic questions asked, such as defining "current situation" mentioned in the exploratory research.
Specify what "enough participants" means for the planned sensitivity analyses.
The authors mentioned having three conditions when doing the power analysis (section 4.2) but did not clarify what these conditions are. While it was understood as the type of social media in the exploratory study, it would be helpful to explain what they are in the registered study.
Since the questionnaires will only be answered on smartphones, it would be interesting to include a question about where people typically use each social media platform (could be added as a control variable). I could imagine that especially older adults use Facebook and Twitter also on their computer.
Another suggestion is to include a question where participants rate the valence of their social media posts themselves and compare these ratings with those from the authors. It would be interesting to know whether those differ and whether conclusions would change. A control question regarding whether people usually share events on several platforms could also be added.
Download the review
Reviewed by Marcel Martončik, 14 Feb 2024
Dear recommender and dear authors,
I appreciate the opportunity to review this manuscript and contribute my comments to this research. I fully agree with the authors that social media has a profound effect on people’s experiences and behaviour. This underlines the importance of exploring this topic. Please find below my comments in the order in which the commented parts of the manuscript appear.
Positivity bias is a central construct of the research, yet its definition seems to be missing in the text. I found myself intrigued by how the authors might have defined it. Given its significance, it might be beneficial to introduce its definition early in the Introduction, clarifying what the authors mean by this term. It could be insightful to devote more discussion to this construct, such as elaborating on its various influences or the potential mechanisms of its effect on social media, or even drawing from related fields for its effect mechanism, if applicable. On page 4, the authors state: ‘These findings underline the pivotal role of the positivity bias in understanding the effects of social media on mental health.’ However, the text doesn’t seem to elaborate on what this role entails. Maybe the meta-analysis by Chu et al. 2023 or other studies might provide valuable context in this regard.
After reading the first paragraph of the Introduction, I felt especially after the sentence: "This is precisely the case for the positivity bias, which can provide insights into how social media platforms shape our perceptions, emotions, and overall mental health" that the aim of the paper would be to explore the effects of the social media content on users' thinking and experiencing. But the following section of the text, beginning with the heading Self-expression on Social Media, did not correspond with this. I see 2 distinct effects: 1) the effect of social media content on thinking and experiencing, and 2) self-expression in the space of social media, which may not be related to already existing content on social media at all (the authors also plan to recruit people who use social media infrequently = at least one a month into the sample). That is, I had a feeling that this study is more of an investigation of the effect of enduring characteristics of individuals on the way they self-express in a social media environment. Which of these do the authors intend to investigate? Apologies for my misunderstanding.
In this context, I found the phrasing of RQ1 (How does the positivity bias affect self-expression on social media?) somewhat unclear, as I didn’t find this effect implied or explained in the Introduction. Could you please provide some clarification?
One of the key premises of the research is that social media platforms differ from each other in various ways, such as the purpose they fulfill, the needs they satisfy, their features, etc. These differences could potentially cause variations in self-expression on these platforms. Therefore, it might be beneficial to describe how these three media differ. The authors state only very generally: 'They also differ in terms of accepted media (e.g., images, text, hyperlinks, etc.) and privacy settings" or "Second, social media platforms offer different features (Bossetta, 2018). " without specifying the differences. It might be helpful to include a table or figure that describes all the differences and features. Without a detailed understanding of the differences among the three social media platforms, it becomes challenging to discuss their potential differential effect and the origins of this effect. I was particularly intrigued by the statement, “Users’ relationships are dyadic on Facebook, but unidirectional on Instagram and X.” Isn’t mutual following and conversation between users on Instagram a form of a dyadic relationship? The same question applies to X.
This sentence doesnt make sense to me: „For example, positive emotions are perceived as more appropriate on Instagram and Facebook, while negative emotions are perceived as more appropriate on Twitter and Facebook (Waterloo et al., 2018).“ Is Facebook there twice by mistake?
Could the authors kindly clarify the recommendations or insights intended to convey when stating on page 5?: „From a practical point of view, knowing if certain social media favor a negative information presentation has the potential to inform public health recommendations. “
Exploratory study
It would be greatly appreciated if the authors could elucidate the purpose of conducting the exploratory research (such as testing of instruments, procedures, estimation of effect sizes, etc.).
The justification for a sample size of N = 50 appears to be absent. Could the authors provide a rationale for this choice? For instance, accuracy, a-priori power analysis, heuristics, etc..
It has been noted that 136 participants were excluded from the sample because they “did not give their informed consent or did not fully complete the study”. I am curious as to whether it is ethically appropriate to exclude participants who gave consent and have provided at least partial responses (e.g. due to the browser or OS crash, etc.). Could the authors explain why they did not opt for missing data imputation?
Furthermore, a justification seems to be needed for the removal of 22 participants for whom the valence of their text was not coded by the three coders. Could the authors provide some insight into this decision?
For a more comprehensive understanding of the research methodology, it would be beneficial to have access to the precise instructions for the scales used. In line with this, it would be helpful if the authors could share the survey, complete with questions and instructions. However, if there are copyright concerns, perhaps the completed questions could be removed. Alternatively, the authors could consider sharing at least the specific instructions they developed for the study.
I noticed that one of the constructs measured was the number of words. However, it was not immediately clear from the manuscript why this was measured and how it relates to the research question. Could the authors provide some clarification on this?
The valence of texts is identified as one of the main constructs. However, the methodology for evaluating the texts is not clearly outlined. The manuscript mentions that “three researchers qualitatively analyzed all the texts to estimate their valence on a 7-point scale (-3 = ‘Very negative’; 3 = ‘Very positive’)”. Could the authors elaborate on the instructions given to these researchers? On what grounds were they supposed to evaluate the valence of the text? What exactly was this valence intended to express? A detailed explanation of the instructions given to the three researchers would be very helpful in this context.
I also wondered if the authors had considered using sentiment analysis. It appears to me that it could be well-suited to the task at hand but I admit I have no idea how difficult it is to use.
Without access to the survey, it is challenging to understand how and by what means some constructs were measured. I was particularly interested in examining the wording of the items used to measure descriptive norms, especially considering the ω value for Instagram was .62. However, I was unable to find information on either the number of items or their wording. Could the authors shed some light on this?
Exploratory results. Effect of Social Media on Texts’ Valence
I noticed that the authors provided respondents with the option to choose the social media platform on which they would like to share or write a given text. This differs from a procedure where respondents would write a text for each social media platform and their valence would be compared. Consequently, it’s conceivable that the observed differences in valence between social media may not be attributable to the platform itself, but rather to certain characteristics of the respondents that influence their preference for a particular social media platform.
I’d like to kindly ask for clarification on the justifications for the covariates, as I was unable to locate this information.
In the results section, along with stating the main outcome for the interaction, it might be beneficial to provide a detailed explanation of what the interaction implies, including differences between groups and effect sizes, among other things.
Regarding the “Effect of Event’s Valence on the Choice of Social Media”, it would be helpful if the process of creating the dichotomous variable “valence” could be explained, especially considering that three researchers qualitatively analyzed all the texts to estimate their valence on a 7-point scale, with -3 representing “Very negative” and 3 representing “Very positive”. Also a justification for dichotomization is missing.
It would be appreciated if the authors could provide a clear explanation of how the “Effect of Social Media on Emoticons/number of words” section of the results is connected to the research questions of this study.
I’m curious as to why a result with a p-value of .051 is interpreted as statistically significant.
In the discussion of the exploratory section, the authors suggest, “It would therefore be intriguing to propose a design where participants are not required to write the event beforehand.” If the authors were to implement this design, I’m interested in understanding how they would ascertain whether the text produced for social media is positively biased compared to text that was not intended for social media. Could you please elaborate on this?
Confirmatory - RR part
It would be beneficial if the hypotheses could include precise estimates of effect sizes or their intervals. Without these, the hypotheses might be unfalsifiable in their current formulation. The chosen effect sizes or their intervals should be justified based on more than just convention, such as Cohen’s guidelines.
I’m also curious about whether the positivity bias is specifically generated by social media, or if it would also be present when writing text for another type of medium or purpose, such as a print newspaper, a blog, or a diary. I wonder if the positivity bias results from the need to abbreviate text, among other factors. For this reason, I would recommend considering the use of a control group in this research.
One aspect that I find missing in the manuscript is the rationale for measuring emoticons and the number of words in the context of positivity bias. Could you please provide some insight into this?
Method
It would be beneficial if the authors shared the analysis script for future analyses, as well as the script used in the power analysis calculation. The authors mention, “For all analysis, we used small effect sizes (r = .3).” I’m curious to know the rationale behind considering r = .3 as a small effect size.
In relation to the exploratory part, the authors intend to employ a similar procedure in the confirmatory part: “As with the exploratory study, participants who will not give consent to take part in the study, who will not respond to the entire study, or whose texts will not be coded by the three coders will be removed from the study.” This raises again a question about the ethical implications of using forced choice items and the lack of missing data imputations - that is, excluding participants who, for instance, only complete up to the last question in the survey. Additionally, it would be helpful to understand why all texts should not be coded by three coders.
Similar methods as used in exploratory part is outlined for the variable „valence“: „the participants will have to think about an event, but this time they will not be asked to write a text to describe it.“ I’m curious about how the authors plan to compare the change in valence between the text intended for the media and the original text. I noticed in the Measures section that the participants themselves will be tasked with evaluating the valence of this event. Wouldn’t this approach introduce a degree of subjectivity and bias, given that each text will be assessed from a unique perspective by a different participant? Wouldn’t it be less biased if the same researchers evaluated all the texts? Alternatively, could software sentiment analysis be used to ensure consistency in the evaluation process?
According to the Methods participants should only be able to complete the questionnaire on a smartphone. The authors attribute this to the absence of emoticons on PC. Since I don't use these networks, nor have I ever used them except on X, I consulted GPT4 :D and got this response: „Yes, you can use the same amount of emoticons on a PC as on a smartphone. Using a PC does not limit your ability to use emoticons on these platforms. You can express yourself just as freely and creatively as you would on a smartphone! 😊“ So how is it?
In the manuscript, I was unable to locate a clear justification for the use of the POMSS tool, or an explanation of how it relates to the research question.
I’m also curious as to why the authors didn’t consider a more precise measurement of social media usage frequencies. For instance, they could have used the time logs provided by social media platforms, if such a feature is available. This could potentially offer a more accurate assessment.
Reviewed by Julius Klingelhoefer, 16 Feb 2024
I would like to thank the authors for their work opportunity to review this manuscript. I view the already existing exploratory work very positively and believe that the proposed study builds nicely upon this. My more detailed suggestions are listed below. Overall, I think the background section highlights the basic question but theory could be expanded and clarified in key areas, as indicated in my comments below. I like the vignette experiment that is suggested here but I also identified some issues with the methodological approach that I believe should be addressed.
While I see some challenges, I believe that it is quite possible to revise these and would encourage the authors to address the suggested changes and further follow this line of investigation.
I hope that my comments can improve this registered report and wish the authors the best of luck.
Background
p. 4: When discussing self-expression across different platforms, I believe the background would benefit from employing an affordances perspective to ground it in a theoretical approach (for example, Steinert & Dennis, 2022). Some aspects seem to be touched upon but explicating and systematizing theoretical assumptions would be helpful in my opinion.
Sections 1.3 and 1.4 should include a more systematic overview of the mechanisms and effects at play within the different social media platforms. As it is now, the theoretical insights remain somewhat shallow. As the authors emphasize the importance of assessing specific platforms, a central contribution lies in the assessment of the platforms. I would suggest reviewing literature more in depth and explicating which mechanisms are at play in which context.
In the discussion of the exploratory study, the authors highlight that motivations are central and may even be more relevant than factors assessed in the exploratory study. However, the planned research does not account for this possibility. It seems likely to me that motivations for sharing self-expression posts, yet this is not addressed or measured in the proposed study.
Method
In the planned vignette experiment, there is no text as a control/baseline condition to assess a potential positivity bias. As in the exploratory study, I believe there should be a condition to compare the social media conditions to, i.e., a non-social media condition, for example a diary entry, a description like in the initial study, etc. Without this, it is only possible to assess positivity relative to other social networks but not in general. It thus seems impossible to assess H1, as it compares the valence of social media posts vs. event valence. In my opinion, it is not possible to assess positivity bias without a text, as Likert-type questions cannot plausibly be compared to the coded valence of a text.
In the footnote the authors define the difference between emoticons and emojis and argue that both are mostly used as complementary or surrogate to text. From this background, would it not make sense to include both emojis and emoticons in these analyses, as both can be used to express positive or negative feelings?
Regarding the method of recollecting an event, I am somewhat worried about the external validity of the instructions. The experimental situation of asking to post about an event is already different from natural online behavior. Yet, within the confines of the experiment this seems like an appropriate choice. However, I am wondering how exactly the instructions will specify this. As most social media posts are about recent events, recalling a “early childhood” (p. 7) event may provide an unrealistic scenario. Similarly, writing a text may not be equally appropriate for every platform. For example, would participants post the text as a picture for Instagram, as it is an image-/video-based platform?
Another interesting research question could be whether positive and negative events are affected differently due to a positivity bias. This could be done by introducing another factor by specifying whether participants should recall a negative or positive event. However, I believe this is only something that might be considered for future research, as this would complicate the design and reduce power. It could also be an interesting exploratory question to look at differences between self-selected positive and negative events in the data.
I appreciate that a power analysis was conducted. However, I see issues with how this was reported or conducted. First, as a pre-study exists for the effects, I would expect to use the specific effect sizes discovered, not a rule of thumb (i.e. small effect size). An effect size of r = .3 would be considered a medium effect based on the classic Cohen’s (1977) convention. Please indicate where this claim comes from or if this is a mistake. To alleviate these concerns, I would suggest addressing these points and providing the power analyses scripts as open materials via the OSF folder.
P. 13: Valence: I appreciate the use of a qualitative analysis. However, with a registered report, I think it is vital that the instructions for the raters are described. I was further wondering whether the authors have considered using computational text analysis methods, e.g. dictionary approaches or language models to assess the texts valence or to validate the reviewers’ decisions?
P. 15: “For all analyses, results will be considered significant if less than .05”. Please specify that this means the p-value.
Results: I would suggest avoiding phrases like “tendential effects” (p. 11) to describe p-values between .05 and .1. If the pre-determined threshold is not met, the test is not significant (e.g., Gibbs & Gibbs, 2015).
Clarity
The authors should clarify which of the claims they make can – with limitations – be interpreted causally and which cannot. Some interpretations of associations imply causality when the direction of the effect is not known. For example, the authors write: “[…] age and social media frequency of use seem to impact self-expression” (p. 12). However, it could be the case that different types of self-expression cause differences in frequency of media use (e.g. from a Uses-and-Gratifications Perspective) and that age is associated with a specific history and cultural impressions that could be the cause of different social media habits. Similarly: “[…] results show that as age increases, the frequency of Instagram use […] and Twitter […] decrease, while the frequency of Facebook use increases […]” (p. 11). I think this sentence implies a false causality. Aging probably does not cause individuals to use Facebook more, it could be generational differences that explain higher Facebook use for older individuals.
I would suggest being specific about the constructs that are used and describing them accurately. One example for a lack of clarity about concepts/constructs can be seen in the heading “Effect of Social Media on the Number of Words”. Here, this would imply that whether social media is/was used or not or to which degree would be the independent variable. However, the analysis is about the social media platform. Something like “type of social media” etc. could be helpful.
There are several spelling errors in the script, e.g. “will be carry on”, “analysis” instead of analyses. Please conduct a thorough spell check of the manuscript.
Figures: Generally, I don’t think it is necessary to include “chart” in the caption, as this is redundant to the figure label.
Figure 1: I would suggest changing the caption from “Chart of text’s valence at time 1 and 2 according to the social media” to “Text valence at time 1 and 2 by social media” or similar. I would follow a similar naming scheme for the rest of the figures.
Literature
Cohen, J. (1977). Statistical power analysis for the behavioral sciences (Rev. ed). Academic Press.
Gibbs, N. M., & Gibbs, S. V. (2015). Misuse of ‘trend’ to describe ‘almost significant’ differences in anaesthesia research. BJA: British Journal of Anaesthesia, 115(3), 337–339. https://doi.org/10.1093/bja/aev149
Steinert, S., & Dennis, M. J. (2022). Emotions and Digital Well-Being: On Social Media’s Emotional Affordances. Philosophy & Technology, 35(2). https://doi.org/10.1007/s13347-022-00530-6