Complexity of Shakespeare’s Social Networks
Using Shakespeare to Answer Psychological Questions: Complexity and Mental Representability of Character Networks
Abstract
Recommendation: posted 07 February 2024, validated 10 February 2024
Karhulahti, V.-M. (2024) Complexity of Shakespeare’s Social Networks. Peer Community in Registered Reports, . https://rr.peercommunityin.org/PCIRegisteredReports/articles/rec?id=489
Recommendation
Level of bias control achieved: Level 3. At least some data/evidence that will be used to the answer the research question has been previously accessed by the authors (e.g. downloaded or otherwise received), but the authors certify that they have not yet observed ANY part of the data/evidence.
List of eligible PCI RR-friendly journals:
2. Thurn, C., Sebben, S. & Kovacevic, Z. (2024) Using Shakespeare to Answer Psychological Questions: Complexity and Mental Representability of Character Networks. In principle acceptance of Version 3 by Peer Community in Registered Reports. https://osf.io/6uw27
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.
Evaluation round #2
DOI or URL of the report: https://osf.io/s97y3
Version of the report: 20231130_PCI-RR_Stage1_Revision1.pdf
Author's Reply, 06 Feb 2024
Please see attachment for our replies to the Editor & Reviewer comments and for tracked changes. In the manuscript we also changed the spelling to American English (e.g., "theater" instead of "theatre"), which is not contained in the tracked changes file.
Decision by Veli-Matti Karhulahti, posted 06 Jan 2024, validated 07 Jan 2024
Dear Christian Thurn and colleagues,
Thank you for all careful revisions and detailed responses to previous feedback. Two reviewers were able to return to carry out another feedback round and they both were very satisfied with this improved version. They only had a few minor suggestions. I’ll let you consider that feedback in a final revision, but I won’t invite the reviewers anymore for a third round. I agree this version is good and near-ready for IPA.
I have only one comment of my own. This concerns H1.
- Because you're testing a confirmatory hypothesis, it would be good to explicitly justify why you expect a certain outcome in the Study 3 section before H1. Currently you write, "we are interested in how the number of characters in a play relates to the complexity... our goal is to understand the relation of the number of characters to the complexity of networks in theatre plays," which is an exploratory description. But your H1 is a confirmatory test ("We test the hypothesis that the number of characters positively predicts complexity (H1)") so it would be important to briefly recap in Study 3 section why do you predict a positive result.
- It feels to me that your assumptions ("we assume that plays are more likely to be well-received and popular if they make it possible for recipients to follow the narrative") predict the null insted of a positive correlation. I might be mistaken, but it would be worth clarifying what you expect and why.
- Because this is a hypothesis test, and PCI RR is very strict about justifying effect sizes in hypothesis tests, it would be good to have some kind of justification for r < .30s as small effects. I know it’s difficult to think about justification in this context. I also find it challenging to help with this—especially as I lack a comprehensive understanding of Kolmogorov complexity in the present context—but here are some ideas.
· One option would be to seek existing character network data in fiction and see what r=.3 looks like. E.g., in fiction (of any media), what is r=.3 in terms of complexity? Can you effectively separate actual works of fiction by complexity? This could help both you and readers grasp the raw effect size and justify it.
· Another option that comes to mind would be to simulate data with r=.3 and see how the effect size appears in these simulated instances. Being able to pinpoint reasonable raw differences even in simulated form would be better than nothing.
· A third option could be to select one actual play by Shakespeare, provide a description of its character network, and demonstrate a hypothetical raw change of .3 in practice.
· A fourth (meta)option would be to take a Bayesian approach and rely on (non-informative) priors. Some justification would still be nice to have but the basis of the rationale would be less problematic for inference.
· (The same concerns RQ4 but since it’s exploratory it doesn’t matter. Btw, also noticed you don’t mention alpha anywhere in the paper -- is it 5% throughout?)
Some of the above ideas may be unfeasible, so please read them primarily as food for thought; I hope they guide you to the best solution from your own topic-expert position. As a standard note, I refer you to PCI RR evidence thersholds and these two papers by Zoltan Dienes on specifying theoretically relevant effect sizes for statistical hypothesis testing:
https://doi.org/10.1525/collabra.28202
https://doi.org/10.1037/cns0000258
I sadly don’t have any good examples from theatre or literature, but if you wish to have practical examples of effect size justification from other fields, contact me and I will seek some from PCI RR archive. If you are unsure about anything else or wish to discuss, you can email (as usual) before submitting.
Best wishes,
Veli-Matti Karhulahti
Reviewed by James Stiller, 05 Jan 2024
This is a far more coherent proposed research plan with a clearer rationale as to why do the study.
The authors responses to the feedback and new plan, have from my perspective, addressed the key issues with the initial draft, and addressed methodological concerns. As before the analytical approach appears robust. I have made a few comments below, not all in relation to this stage of the process but as things to consider when interpreting findings and justifying the approach in their final write up.
- There has been a clarification of the “slices” and what these are, that makes the use of the analytical approach more comprehensible and justifiable.
- The response to my questioning of the justification of the scene as opposed to entrances and exits was very clear, and I think the response mentioning that using scenes is more feasible/ reproducible is actually a strong argument and could be used to justify this study further. As using entrances and exits will not necessarily produces the same results as scenes. Within that argument the strength of the scene is that it is not subject to individual interpretation of entrance and exits. However, I think their phrasing of “Please note that slicing a play based on exits and/or entrances automatically slices a play by scenes too” will ideally need unpacking in their final research as there would be potentially network implications.
- In relation to the use of speaking characters, there will at some point need to be a justification and consideration in the interpretation of the results in terms of what constitutes a speaking character. Some characters such as Livinia in Titus Andronicus will spend part of the play interacting with other characters (communicating) but not speaking or in some plays there could be scenes with significant onstage action and minimal dialogue.
- They could perhaps make more of the applicability to literary analysis and the application of the validation approach, however, this can be done in the final stages/ write up.
Reviewed by Matus Adamkovic , 05 Jan 2024
Dear Authors,
I was very pleased to read the revised Stage 1 protocol and your responses to the initial feedback. I'd like to acknowledge the substantial improvements made to the paper. The introduction, including the study’s rationale, are now well-described. The methodology for the current four studies is sound and highly rigorous. Before suggesting in-principle acceptance, I’d like to request a few more edits and clarifications.
In several instances, the authors assume that the number of characters in theatre plays is mainly determined by recipients’ cognitive capacity. However, this upper limit for theatre plays might be influenced by other factors (e.g., historical and technical/pragmatic considerations). It might be worthwhile to briefly mention these alternative explanations.
A more accessible description of some of the concepts and methodology would benefit a broader readership. For instance, examples of how to apply Kolmogorov complexity (as a focal construct of the present study) in the context of theatre plays would be helpful. Specific examples applied to the context of theatre plays, along with brief insights into their potential implications, would also be very beneficial for Study 3. (Btw, I really liked your second operationalization of complexity described in Study 3.)
In Table 2, the authors state “If many plays lie above 3*IQR, they represent outliers of high complexity. If no or very few plays lie outside 3*IQR then Shakespeare’s plays are not particular with regard to complexity.” Please specify the exact number of plays that you consider as “very few.”
After reading your responses, I would appreciate a summary of the perceived/expected limitations and challenges mentioned in the paper. I’m not insisting on this, as it's a bit unusual request at Stage 1, but reflecting on this at this stage could greatly improve transparency and increase confidence in the methodological choices and results.
All the best,
Matúš Adamkovič
Evaluation round #1
DOI or URL of the report: https://osf.io/nqf7e
Version of the report: PCIRR-Snapshot_ReplicationBattery.pdf
Author's Reply, 30 Nov 2023
See PDF please
Decision by Veli-Matti Karhulahti, posted 30 Aug 2023, validated 31 Aug 2023
Dear Christian Thurn and colleagues,
Thank you for submitting to PCI RR and your patience with a small delay. I am delighted to have received four reviews from diverse experts, including those of social networks, statistics, literature, and Shakespeare. The feedback is generally positive and I am personally excited to serve as a recommender for this genuinely interdisciplinary work. There are comments that need careful attention, however. I summarize some key points below and add few of my own.
1. A primary worry coming from all four reviewers is that (to synthesize it in my own words) the “scientific goal” of the study is unclear. To be precise, this does not refer to how the research plan is presented — this is exceptionally clear — but rather what is the scientific question that the study wants to figure out. Do you wish to contribute to the theory behind Dunbar’s number? Do you wish to learn more about Shakespeare, drama, or character networks in fictional narratives? As the reviewers point out in different ways, the extended replication will surely yield new useful information, but it is not clear what that means. If the original study replicates or not, what can we deduce from that, theory- or otherwise? Especially because this is a carefully designed RR which allows robust tests of hypotheses and theories, it feels like a lot of potential value can be “wasted” without committing to theoretically risky interpretations. See the next comments for follow-up.
2. Although the MS explicitly says that is not designed to test hypotheses (Bias control), there are several criteria set for different outcome interpretations and in some cases they even lead to falsifying certain theoretical positions (as the four RQs show in the end). On the other hand, this seems like very traditional hypotheses/theory testing, sometimes with clear H1/H0/undecided interpretations. It is a bit unclear how this is different and why it has been separated from hypothesis testing and/or confirmatory work? I will list more detailed examples next.
3. RQ1: “The theory is that Shakespeare’s plays and the ethnographic observation of human group size come from the same distribution.” Indeed, it is clear here that we are curious about similarity, statistically. Now, taking a few steps back, why is this similarity interesting? One could say, e.g., if similar, Shakespeare’s fiction accurately simulates real human social life (Dunbar's number serving as an auxiliary hypothesis for social life), but this would be unlikely be true due to reasons pointed by reviews showing how such simulation appears to be very inaccurate if we look at details. One could alternatively say, as you hint on page 5, that “drama is especially effective if it mirrors reality” i.e., if similar, one of the reasons for Shakespeare’s success is that people are able to cognitively reflect on social networks, which are (on average) similar size to theirs. Again, this seems unlikely for various reasons (which we don’t need to discuss here). In sum, there are interesting data and analyses, but we are not fully sure what the results will tell us (beyond statistical outcomes). The same applies to RQ2: “The theory that the average conversational clique size which is between three and four people can be found in Shakespeare plays”, and RQ4 “The theory that Shakespeare plays as dramas are in an Aristotelian view reflecting reality and show similar small world-properties in their networks.” I want to be very clear that it is fully ok to register exploratory analyses, and there is no need for confirmatory tests in RRs, but currently the MS is sitting between the two sides without having fully outlined the rationale (how do these exploratory analyses contribute to the literature, or what does it mean if a certain position/theory is falsified).
4. The fourth (anonymous) reviewer is an expert in Shakespeare as well as literature in general. Because the review was not submitted via the system, I am attaching it manually at the end of this recommendation. This (the most critical) reviewer is explicitly concerned that Dunbar’s number is not suitable for drama in general due to huge genre variation. If you agree and believe that this may be true, it is one of the possible hypotheses to test and, if corroborated, it could make a major contribution to the literature on fictional social networks and their analyses.
5. If you follow the reviewers’ suggestions to set smallest effect sizes of interest, please carefully justify the SESOI by some raw effect if possible; this is a recurring matter discussed in depth at PCI RR.
The reviewers also provide plenty of detailed comments on the design and methodology. Please consider them all carefully. I hope you find them useful and valuable in your revisions. Last, I want to stress that the value of this study, to me, is generally sufficient to be carried out even without the theoretical, pragmatic, or other contributions which most of my comments above address. I can see it can be a useful methodological exercise and resource for future scholars to learn from. However, I do hope you consider the above notes because with a medium effort, much more value could be generated.
If something is unclear or you wish ask anything during the revision process, you can contact me directly for clarifications or checks.
Best wishes,
Veli-Matti Karhulahti
***
Reviewer 4 (anonymous)
Let me start by saying that I was asked to report on this research as a literary expert. I will thus not discuss the stastical side of the authors’ work, but only their potential interest and validity for literary analysis.
1A. The scientific validity of the research question(s)
The study they intend to replicate had virtually no value for literary study. The selection of the 10 plays made little literary sense, because a) it focused only on those plays that most coveniently agreed with the “Dunbar number”, and b) ignored what is crucial from the viewpoint of drama (and of dramatic networks), namely the difference in genre. On a) this replication may indeed prove useful, in testing, and almost certainly falsifying, the original study. On b) no, because the study shows to have a total disregard for dramatic genre. (Genre is meaningful because comedies have always a much higher density than tragedies which have a much higher density than histories; ignoring this initial fact creates only confusion.)
In addition, the (infrequent) moments in which the study mentions literature its categories – and references – can only be described as primitive; even when they refer to quantitative and/or network analysis of drama they mention very peripheral studies, and ignore crucial ones – such as Yarkho’s on speech groups.
1B. The logic, rationale, and plausibility of the proposed hypotheses (where a submission proposes hypotheses)
I do not believe a hypothesis is being proposed.
1C. The soundness and feasibility of the methodology and analysis pipeline (including statistical power analysis or alternative sampling plans where applicable)
I am not qualified to evaluate that.
1D. Whether the clarity and degree of methodological detail is sufficient to closely replicate the proposed study procedures and analysis pipeline and to prevent undisclosed flexibility in the procedures and analyses
It might ; I am not qualified to judge. But the question assumes that the original study deserves to be replicated – an assumption I personally consider groundless.
1E. Whether the authors have considered sufficient outcome-neutral conditions (e.g. absence of floor or ceiling effects; positive controls; other quality checks) for ensuring that the obtained results are able to test the stated hypotheses or answer the stated research question(s).
I am not qualified to evaluate that.
Reviewed by James Stiller, 30 Aug 2023
Apologies if this is a repeat of the review I submitted on the 21st Aug.
In brief this was an excellent and thorough proposed plan of research. The proposed research is interesting, and it would be good to see how the findings of Stiller et al (2003) fair across all plays. I particularly welcome revisiting the network analysis measures and distribution models. It would be interesting to hear how the researchers envisage this research being extended beyond Shakespeare. The following comments are in no way a negative criticism of the research proposal but instead highlighting areas where there is potential for more clarity and perhaps opportunity for a stronger connection with the source material.
1) Overall, the proposal sounds like a thorough critical re-test of Stiller et al (2003), however, beyond critiquing Stiller et al (2003) it is not clear what the extension and application of this research is. Is it about providing a 'tidier' research approach, finding out more about Shakespeare (as a genre) or furthering our knowledge about studying cultural phenomena?
2) Slices: This could do with more justification as a rationale; There is need for a more easily replicable approach, but this does have significant issues. In table 1 it is unclear what options a and b are in the associated text and therefore slightly confusing on what they are trying to communicate. I may have misinterpreted but they have equated scenes as the same exits and entrances (which are not).
A more automated approach would remove selection error; however, the researchers from my interpretation have perhaps provided an over-simplification of the 'exits and entrances' as slices, suggesting a scene is the same. Across a scene you can get an idea of clique size, i.e. who a character can 'potentially' interact with and this would be easier in an automated analysis, however, you lose the dynamic information of the scene. A scene does not reflect the dynamics of the actual scene.
Historically, the nature of Shakespeare's plays were that they would have limited cast playing multiple characters, and within one scene you could quite frequently get small part characters that only interact with one or two characters and leave before others join the scene and never meet more than one character (e.g. messengers) – this is a common plot device to make sure the audience can track the intentions of different characters and understand that some characters, even in the same scene, might not have the same knowledge. This was a big issue in the Stiller et al analysis for A Midsummers Night Dream, where characters in theory are on stage a lot of the time but actually asleep and not interacting. The judgement was made to treat this as an “exit” as from a cognitive perspective for tracking drama they were not actively part of the scene. Therefore, there needs to be a clearer rationale on their term of “slice” and why a scene would be sufficient to capture such detail.
The proposed method will not pick up on these finer points and they are important as many characters are not that well connected and there tends to be only a few key characters per play (see Stiller and Hudson paper). The use of the scene as the slice misses the detail of the on-stage groups and for some plays that can result in exceptionally high connectivity that does not exist. I would therefore disagree that the scene is equivalent to the entrances and exits as this is not looking at the cognitive complexity of the plays and raises the question of what the goal of the research is. However, I am intrigued to see what they find out as the entrances and exits can vary from different editions of the plays and is undeniably an issue of interpretation (Stiller et al, did this based on having more than one folio/ edition of each play analysed).
3) Analysis: The analysis sounds interesting, and it would be good to see the network analysis in a more robust automated way, the original paper was largely calculated by hand as access to that network analysis software or free statistical/ modelling software was not available. The addition of Jaccard index and additional small world/ network analyses. IThe Latent Class Analysis sounds like a very good way for evaluating the distributions, however, if they are looking at doing a robust look at the similarities to naturally occurring networks/ groups it would be useful to have a clear overview of the data sources.
4) I woudl recommend avoid claiming Stiller et al (2003) stated facts (“As a criterion, Stiller et al. (2003) used the fact that “all the naturally occurring observations fall with the range of the ten plays, and within two standard deviations of their mean” (p. 400).”) this misrepresents the original researchers intentions. The paper was not fact but interpretation.
5) Clarify how the proposed research and replication will be of use to understanding the plays:
As the researchers touch on Dunbar’s number etc, if this is part of what is being evaluated then there needs to be a bit more contextualisation. One of the points the researchers make about the cognitive load of the plays could be contextualised more. Stiller and Dunbar were interested initially in tracking intentional stances (who knows what about someone), this can be complex and research such as Stiller and Dunbar (2006) have shown that people can struggle to do this in complex scenarios where multiple intentional stances need to be tracked. The structure of the play is not the only reason for the success of the plays, but it could play a part in making the unfamiliar appear more familiar. The small world network of the Shakespeare plays could provide a way to navigate this as the key characters, those that are most connected and act as "weak links" between scenes can be followed more confidently than less connected characters (obviously some poorly connected characters are essential to plot e.g. the messenger in Romeo and Juliet). Subsequent research by Stiller on the plays of Agatha Christie (where often characters have 100% connectivity) shows that by going away from a small world structure can allow for complex story telling e.g. in detective work and tracking complex intentions as this makes following viewpoints more cognitively demanding.
Reviewed by Matúš Adamkovič, 16 Aug 2023
Reviewed by Tomáš Lintner, 25 Aug 2023
The authors present a research aiming to link the overlap between human's capacity to follow on social relations and its presentation in cultural artefacts. The authors explain their intentions in a clear, concise, and transparent manner. The report provides sufficient jutifications for the objectives and for the methods planned to be used.
There are, however, a few comments/questions I would like to share:
- the authors largely introduce their research on the paradigm of Dunbar's number. The authors provide a short introduction to the paradigm and support it with seminal works. However, at this stage, the authors neglect a large research array of research standing in contrast to the paradigm of Dunbar's number. In future, it would be useful if the authors could briefly problematize the paradigm in the introduction/theory, and not just present it as granted.
- on p. 6, the auhors start describing the Stiller et al.'s (2003) reports on the cliquishness and the small world properties which Stiller et al. (2003) relate to "naturally observable human social network properties". I find it hard to follow on why it makes sense to analyze and interpret the connectivity within the play networks in which the ties between the characters represent not their social relations, but the occurence on the stage at the same time, in a similar manner than regular social connections between people would be analyzed and interpreted. Like the authors themselves write and rely on Latora & Marchiori (2001), in real-world social networks, connectedness usually denotes the ability to spread information. However, the presence of characters at the stage at the same time creates a very different type of ties. It would be useful if the authors could clearly define what the ties within the plays represent and how the structural network properties created by these ties will be interpreted.
- on line 244 of the code and further, the authors plan to drop the "All" speaking characters. If I understand this correctly, this is because at these moments, all characters are speaking at the same time? If yes, I probably understand why the authors would want to drop all those occurences, but I think it would be useful to explicitly describe that, because if I understand that correctly, it can influence the results of the analysis.
Overal, the report is comprehensive and the R code makes sense. My main suggestions for improvements revolve around the future interpretation of data since the authors will be dealing with a very different of type of ties and the frequently-used structural indices may have different meaning compared to, for example, interpersonal relationships.