Testing cross-cultural difference in the emotionality and visual associations of music
Cross-cultural relationships between music, emotion, and visual imagery: A comparative study of Iran, Canada, and Japan [Stage 1 Registered Report]
Abstract
Recommendation: posted 16 April 2024, validated 21 April 2024
Schwarzkopf, D. (2024) Testing cross-cultural difference in the emotionality and visual associations of music. Peer Community in Registered Reports, . https://rr.peercommunityin.org/articles/rec?id=416
Recommendation
Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA.
List of eligible PCI RR-friendly journals:
- Advances in Cognitive Psychology
- Collabra: Psychology
- Journal of Cognition
- Peer Community Journal
- PeerJ
- Psychology of Consciousness: Theory, Research and Practice
- Royal Society Open Science
- Studia Psychologica
- Swiss Psychology Open
- WiderScreen
1.Hadavi, S., Kuroda, J., Shimozono, T., Leongómez, J. D. & Savage, P. E. (2024). Cross-cultural relationships between music, emotion, and visual imagery: A comparative study of Iran, Canada, and Japan. In principle acceptance of Version 6 by Peer Community in Registered Reports. https://osf.io/zdnkm
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.
Evaluation round #4
DOI or URL of the report: https://psyarxiv.com/26yg5/
Version of the report: 5
Author's Reply, 15 Apr 2024
Decision by D. Samuel Schwarzkopf, posted 08 Apr 2024, validated 09 Apr 2024
Dear authors
As you can see, the final reviewer has now recommended in-principle acceptance of your Stage 1 RR manuscript. However, I notice that there are still some issues to be fixed before I can grant IPA. Specifically, under Analysis Plan 1.2 you list the same hypothesis twice. There may be other issues that I might have missed, so please carefully check the manuscript.
Moreover, looking through the Appendix with the survey materials it occurred to me that the survey is considerably longer than one would expect from reading your Methods, probing aspects such as colour associations, emotion type, etc, and the Goldsmiths Musical Sophistication Index. Given this is therefore a considerably richer data set than strictly needed to address your preregistered hypotheses, some caution is advised. Obviously, it is perfectly fine to collect additional measures for further exploratory analyses, which can possibly be included in Stage 2 as long as they are explicitly labelled. It is also sensible to limit the preregistration to clear-cut predictions you can make now.
However, given the large amount of additional information it is important to avoid the scenario where the Stage 2 article consists mostly of exploratory analyses. This would in my view undermine the purpose of conducting a Registered Report in the first place. One could also argue that the methods description should contain the necessary details of all the measures being collected. I apologise for not spotting this earlier (but I am somewhat reassured by the fact that the reviewers didn't either).
Given the late stage and the fact that the preregistered study itself is now quite straightforward, I don't think this requires an extensive revision. For the Goldsmiths questionnaire, a short sentence with the appropriate citation explaining its inclusion will suffice. For the other measures in your actual survey, you could possibly even keep that very succinct, simply stating that the survey contains other measures, briefly list what they are, and that they are collected for future hypothesis generation etc but will not be presented in the Stage 2 article. If you plan on exploratory analyses of any of these data, more detail about them should however be provided in the Methods. Even then, I don't think this calls for extensive rewrites - these are simple survey questions that can be described easily.
Keep in mind that all exploratory analyses in Stage 2 will be subject to peer review and could possibly constitute grounds for rejection.
As always, please contact me directly if you have any questions or comments on this.
Best wishes
Sam Schwarzkopf
Reviewed by Elena Karakashevska, 08 Apr 2024
This is a revision of a stage 1 manuscript: Below is my original summary of the manuscript which still stands:
This is a potentially interesting paper. Authors have noticed the lack of research in cross-cultural links between visual imagery, music and emotion. The authors aim to add to the literature by doing a cross cultural study to test for differences in emotional arousal and perception of density of visual imagery, across cultures by manipulating tempo in solo and group performance pieces. The authors aim to conduct this study online and compare within subjects effects only.
Review
This is a compelling revision. The authors have done a good job of addressing concerns raised in the initial round of reviews. I have no major remaining concerns.
Evaluation round #3
DOI or URL of the report: https://psyarxiv.com/26yg5/
Version of the report: Version 3
Author's Reply, 28 Mar 2024
Decision by D. Samuel Schwarzkopf, posted 30 Sep 2023, validated 02 Oct 2023
Dear authors
Your revised RR Stage 1 manuscript has now been reviewed by the same two previous reviewers. Their comments are generally positive but as you see one reviewer still has several substantial concerns that I would ask you to address in another round of review. Some aspects are probably judgement calls, in particular the question of the effect size of interest and the resulting necessary sample size. As suggested by the reviewer, I may make a final decision about granting in-principle acceptance based on that in the next round.
Both reviewers also raised concerns about the statistical approach and it took me a while to figure out why. In section 1.2 "Analysis Plan" you still refer to paired t-tests instead of the new analysis approach you are now using. (I stared at the relevant page for some time without noticing it, so I thank the reviewers and Chris Chambers for pointing it out!) I assume this is a mistake and can be fixed easily. But please go through the whole manuscript carefully to ensure all parts are up-to-date as this is the version that will define your preregistered methodology.
Best wishes
Sam
Reviewed by Elena Karakashevska, 19 Sep 2023
Reviewed by Nadine Dijkstra, 28 Sep 2023
I thank the authors for addressing most of my comments. My only remaining issue is with the proposed analyses: three paired t-tests to determine whether the same effect is present in each country. To establish whether there are differences between the countries, they need to be compared within the same statistical test. One option is already pointed out by reviewer 3: an ANOVA with country as a between-subject factor.
Evaluation round #2
DOI or URL of the report: https://psyarxiv.com/26yg5/
Version of the report: Version 2
Author's Reply, 30 Aug 2023
Decision by D. Samuel Schwarzkopf, posted 12 May 2023, validated 12 May 2023
Dear authors
Your Stage 1 RR manuscript has now been reviewed by three experts in the field. You can read their detailed comments on the PCI:RR website. Based on their comments, the manuscript is not yet ready for in-principal acceptance by PCI:RR.
Please submit a revised version of the manuscript, including one with tracked changes, and a point-by-point response to each reviewer's comments. I summarise the main points here, and also flag up a small but important typographic error.
Rationale:
I concur with the reviewers' concerns that the rationale and the whole premise of the study lacks clarity. For example, it should be explained early in the manuscript what is meant by "visual density", how this relates to visual imagery, and why you expect that to depend on music.
Statistical approach:
Moreover, all reviewers raise issues with the statistical approach that parallel my own comments from pre-screening. The series of t-tests does not seem to accurately reflect the hypotheses. An effect across all cultures could be reflected in a significant main effect in an omnibus test. But as you see, reviewer Juan David Leongómez has an alternative suggestion and even provided some example code to be used for a potentially better-suited approach. I suggest you develop these proposals further - the reviewer offered to assist with this but of course you may wish to also draw upon the expertise of others.
Sample size:
One reviewer calculates a sample size for your expected effect size of n=41 rather than 14, which makes me wonder if the 14 you've used is a typo that permeated through the manuscript? All reviewers raised a concern with this small sample size. You seem to expect a relatively large effect size that is perhaps justified for the arousal and visual density ratings, but for any difference between cultures I would expect this to be a lot smaller. As mentioned by several reviewers, given the fact you are planning online data collection, it should be entirely feasible to collect a much larger sample than n=14?
Typographic errror:
Finally, there is still an error I already flagged up during prescreening: Under hypothesis 1.2.2 you talk about 'arousal ratings' instead of visual density judgements.
Note on one review:
Judging by their comments, one reviewer may have commented on the originally submitted version of the manuscript rather than the revision after my pre-screening. So some of their points may no longer be relevant. But it is also possible that I missed issues in my (hasty) pre-screening. Please check the comments carefully and respond to each point by this reviewer to clarify.
Best regards
Sam Schwarzkopf
Reviewed by Juan David Leongómez, 11 May 2023
This is a Stage 1 Registered Report that aims to investigate the relationship between emotional arousal and visual density induced by six musical excerpts differing in tempo and texture (solo vs group) in participants from Iran, Canada, and Japan. The study design is relatively simple and straightforward, with clear independent and dependent variables, and I commend that the authors commit to best practices in cross-cultural studies, as well as including both participant samples and music excerpts from non-WEIRD countries.
However, the authors acknowledge that the study violates some of the assumptions of the statistical analysis, such as using 5-point Likert scales instead of normally distributed continuous data, and the 6 paired responses from each participant not being independent of one another. For this, I would like to offer some recommendations and took the liberty of doing a simulation-based power analysis in R for different models, that hopefully will assist the authors in this regard. Once this limitation is addressed, I think the authors should move forward and start data collection.
Statistical power and test
I appreciate the power analysis made by the authors, and their direct statements regarding limitations. Also, the decision of testing each hypothesis three times (once per country) and only confirming predictions if all three tests is sensible. However, there are several important issues here:
First, it is not good practice to conduct separate analyses and infer differences between populations from them (see a summary in the section ‘Interpreting comparisons between two effects without directly comparing them’ in Makin & Orban De Xivry, 2019). An omnibus test to test the significance of the main effect of tempo (irrespective of group) and any interaction (in case the effect differs by culture) is an alternative, but maybe not the best.
Second, treating a 5-point Likert scale as if it was normal is problematic. Not only it is a discrete variable, but it also involves a finite set of possible values. This should be modelled as an ordinal scale. For this, there are several options, including ordinal logistic regression.
However, there is also the problem that (as the authors mention), there are 6 paired responses from each participant. This could be addressed by using a generalised mixed model, with random effects for each participant. For an ordinal dependent variable, this could be a Cumulative Link Mixed Model. In R, this can be achieved, for example, using the clmm function from the ordinal package.
Finally, in their response to the Triage at pre-screen, the authors mention that they could not find an appropriate test that would also allow to perform the appropriate matching power analysis. This is absolutely true, but there is always the possibility of doing a simulation-based power analysis, which I attempted and will try to summarise:
In this case, I simulated a population of 60,000 values, in two conditions (A and B, which could be low and high tempo), from binomial distributions in 5 attempts (meaning each “person” could get a score from 0 to 4), but I modified the likelihoods, so that conditions A and B have different distributions, and added 1 to each value so that the possible results range from 1 to 5. Then, I randomly assigned a participant ID to 6 values for both the A and B conditions. Finally, to match the authors assumptions, I made sure that the (paired) difference between conditions was d = 0.4 (Cohen’s d is NOT an appropriate effect size in this case, but it serves the point).
Then, I simulated 1,000 samples from that population, and tested the power reached with different sample sizes (basically, the number of simulations in which the p value was below α), using 4 different statistical tests/models:
- t-test (not considering the multiple responses from each participant): with a sample size of 82, a power of 0.946 was obtained
- Wilcoxon signed-rank tests (not considering the multiple responses from each participant): with a sample size of 83, a power of 0.951 was obtained.
- Linear Mixed Model (considering the multiple responses from each participant): with a sample size of 14 participants (6 paired responses per participant), the power was 0.961. These models were fitted with the call: lmer(Value ~ Condition + (1 | Participant))
- Cumulative Link Mixed Model (considering the multiple responses from each participant): while I was able to fit the model on single samples, I was not able to run 1,000 simulations as before (or even more that 3, for reasons I have not yet understood) and quite literaly ran out of time to submit my review. So, sadly, I haven’t managed to run a proper simulation-based power analysis for Cumulative Link Mixed Models, but with more time (or, even better, the help of a statistician) this would definitely be possible. While I am not an expert, and I need time to make progress in this, I would be happy to assist the authors in this if needed.
This simulation, including all the code, is attached as an RMD file.
Minor suggestions
P.1, second paragraph of the introduction: I think “and” is missing in “Audiovisual associations are shown to be mediated by “psychological and socio-cultural” elements (Taruffi and Kussner, 2022), musical training (Kussner and Leech-Wilkinson, 2014), and language (Dolschied et al., 2022).”
P.1, second paragraph of the introduction: Probably the authors meant to say tempo instead of time
P.2, second paragraph: The authors provide information about dimensional models, but not about categorical models. I suggest adding a short description/example with relevant citations.
P.7, second paragraph: “Texture one consists of one horizontal line in a circle. (Fig 2) The subsequent textures…”. I think the period should be after the citation of Fig. 2
References
Makin, T. R., & Orban De Xivry, J.-J. (2019). Ten common statistical mistakes to watch out for when writing or reviewing a manuscript. ELife, 8, e48175. https://doi.org/10.7554/eLife.48175
Download the reviewReviewed by Nadine Dijkstra, 09 May 2023
Reviewed by Elena Karakashevska, 12 May 2023
Evaluation round #1
DOI or URL of the report: https://psyarxiv.com/26yg5/
Version of the report: Version 1
Author's Reply, 28 Apr 2023
Dear Dr. Schwarzkopf,
Thank you very much for your prompt screening and providing your valuable feedback. We appreciate your detailed review and advice and have made revisions accordingly. Kindly find our responses for each comment below.
1. Hypotheses: Please ensure that all hypotheses listed in the text match those in your Design Table. Also consider if the statistics used to test the hypotheses are appropriate. Your text describes the hypotheses as testing for correlations, but your statistical approach does not explicitly test for correlations. I appreciate that you could informally phrase the research questions this way but the hypotheses should match and explicitly describe the analysis used. In the same vein, please also spell out your inference plan in the main text to match what is explained in the Design Table.
We revised the hypothesis and inference plan in the text to match the table.
2. Statistical approach: You propose a series of three paired t-tests for each cultural group (plus possible equivalent tests). Consider using an omnibus test to test the significance of the main effect of tempo (irrespective of group) and any interaction (in case the effect differs by culture). I believe this would be more sensitive. Alternatively, please provide a clear justification for the approach adopted here. You also write that the use of Likert scales mean that the parametric statistical tests are inadequate. You may consider using non-parametric alternatives like a Kruskal-Wallis and Wilcoxon signed-rank tests. However, I also appreciate that parametric statistics are commonly used for such ordinal data and this use may not necessarily pose a major problem. Nevertheless, if possible I would seek out the advice of a statistical expert.
Thanks for this excellent advice. However, a quick search of options did not allow us to decide on the most appropriate way of doing this that would also allow us to perform the appropriate matching power analysis, and Japan is beginning a week-long “Golden Week” holiday that means we will be unable to recruit and consult an appropriate expert within a timeline that would be compatible with the scheduled review process. Therefore we have added the following footnote indicating this to the manuscript and hope that reviewers might be able to help us weigh in on this decision:
NB: We are also open to revising the planned analysis/power analysis to include potential alternative statistical approaches should they be considered more appropriate. Possible alternatives identified include:
-an omnibus test to test the significance of the main effect of tempo (irrespective of group) and any interaction (in case the effect differs by culture);
-non-parametric tests like a Kruskal-Wallis and Wilcoxon signed-rank tests;
-ordinal regression; and/or
-two-way within-sample ANOVA
3. Exploratory analyses: In a RR Stage 1 submission, all planned tests should be formulated as preregistered hypotheses with a detailed statistical analysis plan, supported by power analysis. The RR format does not preclude carrying out exploratory analyses at Stage 2, provided they will be labeled explicitly. However, this does not belong in the Stage 1 manuscript. Please remove the Exploratory Analyses section or add these tests as fully preregistered hypotheses to the manuscript and Design Table, with appropriate power analysis. This same point applies to any comparison between solo and group excerpts. It is fine to describe these differences in your pilot data, but if you plan to conduct this comparison you will need to add specific hypotheses for this supported by power analysis, or only add these as exploratory analyses in Stage 2 but not mention them in Stage 1.
We have excluded this section from our stage 1 submission and will add it to the stage 2 accordingly.
4. Power analysis: On this note, you state in the Power Analysis section on page 6 that the minimally interesting effect size is d=0.04 but later on and in your Design Table you power the experiment for d=0.4. I assume this first instance is a typo? The power analysis is based on a conventional alpha=0.05 (Bonferroni corrected for number of hypotheses) but 95% power. This is fine and the decision is yours, but please note that many journals accepting RRs for publication have specific power requirements, most commonly 90% power with alpha=0.02.
Thanks for catching the typo - both should be d=0.4. Thank you for noting the journal requirements - we are happy with our initial plan for now.
5. Missing methods details:
a) Please add details about the visual stimuli used in the experiment, such as the dimensions of the patterns, how they are generated, etc. The methods should make independent replication possible. Since your data collection is online, I appreciate that some parameters (like visual angle or luminance) are outside your control - but at least relatively parameters can be reported in detail. My suggestion is also to add a figure showing the texture patterns.I may be missing something but I also do not follow how X2+1 results in 31 lines (assuming X is the density scale variable X=5 in this case?)
We have included more information about the visual textures including moving the Figure showing these from the Appendix to the main text, and adding the formula for density increase to facilitate replication.
b) Under Exclusion Criteria please explain what you mean by participants who "most associate with the culture of the country" and how this is determined.
Thanks for pointing out this ambiguous phrasing: we have omitted this phrase in the revised version.
c) Under Materials it is unclear to me why there are 24 excerpts. With 6 excerpts (one solo one group, from each of the 3 countries) and two tempos, aren't there only 12 excerpts?
The additional 12 excerpts were for the exploratory analysis of pitch, but since these have now been removed from the manuscript we have also removed them in the revised version.
d) Also under Materials, please rephrase the first sentence to clarify that this is effectively an online survey participants "will fill out".
Thank you for pointing this out. We revised the opening sentence to read “...will be conducted as an online survey and participants…”
6. Error in Analysis Plan 1.2: Please remove the "emotion categories" from the first sentence in this section as this refers to potential exploratory analyses but is not preregistered. Also, under 1.2.2 Temporal-density correlation you state that arousal ratings are being collected instead of visual density.
Thank you for catching this typo. We have corrected the phrase to read “emotional arousal” rather than “emotion categories”.
7. Phrasing of hypotheses: In the text you refer to the hypotheses as testing the effect of tempo "changes". Technically the experiment compares stimuli with different tempos not specifically the change in tempo. This is a minor point and you may want to keep it as is - just food for thought.
We have revised the wording to reflect your advice. It now reads “Increasing tempo consistently increases emotional arousal across cultures…” Thank you for your suggestion.
We look forward to hearing from you and hope the explanations above clarified the manuscript. As we are going to be on leave till the next week due to the Golden week holiday in Japan, we gathered it would be more efficient to make as many changes as possible to get the draft ready and send on time for our scheduled review process instead of waiting a week before making further changes. We hope that this revised version would meet the screening requirements to proceed to the next step.
Thank you very much for your time and consideration.
Sincerely,
Shafagh Hadavi and Dr. Patrick Savage (On behalf of all authors)
Decision by D. Samuel Schwarzkopf, posted 27 Apr 2023, validated 28 Apr 2023
Dear authors
Thank you for your submission of your manuscript to this scheduled RR review. We regularly prescreen RR submissions to ensure they are ready for review. In this case, there are several issues that need to be addressed before I can invite the scheduled reviewers.
First up, let me apologise that I am writing this in some haste. This is the first scheduled review I am handling and the tight schedule means that I want to get these comments to you as soon as possible so you can redraft the submission in time for the reviewers to conduct their review. In your revision, please include a quick response to each of my points but I won't ask you to write long-winded prose here - bullet point replies will suffice. However, please include a version of the revised manuscript with tracked changes on the sysyem though to facilitate the screening.
1. Hypotheses: Please ensure that all hypotheses listed in the text match those in your Design Table. Also consider if the statistics used to test the hypotheses are appropriate. Your text describes the hypotheses as testing for correlations, but your statistical approach does not explicitly test for correlations. I appreciate that you could informally phrase the research questions this way but the hypotheses should match and explicitly describe the analysis used. In the same vein, please also spell out your inference plan in the main text to match what is explained in the Design Table.
2. Statistical approach: You propose a series of three paired t-tests for each cultural group (plus possible equivalent tests). Consider using an omnibus test to test the significance of the main effect of tempo (irrespective of group) and any interaction (in case the effect differs by culture). I believe this would be more sensitive. Alternatively, please provide a clear justification for the approach adopted here. You also write that the use of Likert scales mean that the parametric statistical tests are inadequate. You may consider using non-parametric alternatives like a Kruskal-Wallis and Wilcoxon signed-rank tests. However, I also appreciate that parametric statistics are commonly used for such ordinal data and this use may not necessarily pose a major problem. Nevertheless, if possible I would seek out the advice nof a statistical expert.
3. Exploratory analyses: In a RR Stage 1 submission, all planned tests should be formulated as preregistered hypotheses with a detailed statistical analysis plan, supported by power analysis. The RR format does not preclude carrying out exploratory analyses at Stage 2, provided they will be labelled explicitly. However, this does not belong in the Stage 1 manuscript. Please remove the Exploratory Analyses section or add these tests as fully preregistered hypotheses to the manuscript and Design Table, with appropriate power analysis. This same point applies to any comparison between solo and group excerpts. It is fine to describe these differences in your pilot data, but if you plan to conduct this comparison you will need to add specific hypotheses for this supported by power analysis, or only add these as exploratory analyses in Stage 2 but not mention them in Stage 1.
4. Power analysis: On this note, you state in the Power Analysis section on page 6 that the minimally interesting effect size is d=0.04 but later on and in your Design Table you power the experiment for d=0.4. I assume this first instance is a typo? The power analysis is based on a conventional alpha=0.05 (Bonferroni corrected for number of hypotheses) but 95% power. This is fine and the decision is yours, but please note that many journals accepting RRs for publication have specific power requirements, most commonly 90% power with alpha=0.02.
5. Missing methods details:
a) Please add details about the visual stimuli used in the experiment, such as the dimensions of the patterns, how they are generated, etc. The methods should make independent replication possible. Since your data collection is online, I appreciate that some parameters (like visual angle or luminance) are outside your control - but at least relatively parameters can be reported in detail. My suggestion is also to add a figure showing the texture patterns.I may be missing something but I also do not follow how X2+1 results in 31 lines (assuming X is the density scale variable X=5 in this case?)
b) Under Exclusion Criteria please explain what you mean by participants who "most associate with the culture of the country" and how this is determined.
c) Under Materials it is uncear to me why there are 24 excerpts. With 6 excerpts (one solo one group, from each of the 3 countries) and two tempos, aren't there only 12 excerpts?
d) Also under Materials, please rephrase the first sentence to clarify that this is effectively an online survey participants "will fill out".
6. Error in Analysis Plan 1.2: Please remove the "emotion categories" from the first sentence in this section as this refers to potential exploratory analyses but is not preregistered. Also, under 1.2.2 Temporal-density correlation you state that arousal ratings are being collected instead of visual density.
7. Phrasing of hypotheses: In the text you refer to the hypotheses as testing the effect of tempo "changes". Technically the experiment compares stimuli with different tempos not specifically the change in tempo. This is a minor point and you may want to keep it as is - just food for thought.
I look forward to receiving your revised manuscript in due course.
Best wishes
Sam Schwarzkopf