Capturing Perspectives on Responsible Research Practice: A Delphi Study

ORCID_LOGO and based on reviews by Moin Syed, Veli-Matti Karhulahti, Thomas Evans, Priya Silverstein and Sean Grant
A recommendation of:

Mapping Cross-Disciplinary Perspectives on Responsible Conduct of Research: A Delphi Study


Submission: posted 19 May 2023
Recommendation: posted 31 May 2024, validated 02 June 2024
Cite this recommendation as:
Pennington, C. and Raghavan, M. (2024) Capturing Perspectives on Responsible Research Practice: A Delphi Study. Peer Community in Registered Reports, .


​​The responsible conduct of research (RCR) is crucial for the health of the research ecosystem: high quality research should lead to more credible findings and increase public trust. However, the dimensions and responsibilities that make up RCR differ across disciplines, who together can learn from one another to ensure rigorous, transparent, and reliable research and foster healthier research culture.
Bridging this gap, in their Stage 1 Registered Report, Field and colleagues (2024) outline their plans for a large-scale Delphi study to evaluate academics' perceived levels of importance of the most crucial elements of RCR and how these align and differ across disciplines. First, they plan to assemble a Delphi panel of RCR experts across multiple disciplines who will evaluate a list of RCR dimensions to suggest any additions. Then, these same panellists will judge each RCR dimension on its importance within their discipline of expertise, with iterative rounds of ratings until stability is reached. In this latter phase, the goal is to probe which items are more broadly appreciated by the sample (i.e., those that are perceived as a universally valuable RCR practice), versus which might be more discipline specific. The findings will present the median importance ratings and categories of response agreement across the entire panel and between different disciplines. Finally, to contextualise these findings, the team will analyse qualitative findings from open-ended text responses with a simple form of thematic analysis. From this, the team will develop a framework, using the identified RCR dimensions, that reflects the needs of the academic community. 
By mapping a broader multidisciplinary perspective on RCR, this research will fill the gap between the two extremes that existing conceptualisations of RCR tend to fall under: high-level frameworks designed to be universally applicable across all disciplines (e.g., the Singapore Statement on Research Integrity) and prescriptive guides tailored to the practical instruction of researchers within a specific discipline or field (e.g., RCR training designed for members of a university department). The hope is that this will stimulate a more nuanced understanding and discussion of cross-disciplinary conceptions of RCR.
Five expert reviewers with field expertise assessed the Stage 1 manuscript over two rounds of in-depth review. Based on detailed and informed responses to the reviewer’s comments, the recommenders judged that the manuscript met the Stage 1 criteria and therefore awarded in-principle acceptance (IPA). 
URL to the preregistered Stage 1 protocol:
Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA.  
List of eligible PCI RR-friendly journals:
Field, S. M., Thompson, J., van Drimmelen, T., Ferrar, J., Penders, B., de Rijcke, S., & Munafò, M. R. (2024). Mapping Cross-Disciplinary Perspectives on Responsible Conduct of Research: A Delphi Study. In principle acceptance of Version 3 by Peer Community in Registered Reports.
Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

Evaluation round #2

DOI or URL of the report:

Version of the report: 2

Author's Reply, 24 May 2024

Decision by ORCID_LOGO and , posted 19 Apr 2024, validated 20 Apr 2024

Dear Sarahanne M. Field and co-authors,

I have now received the reviews from four experts who originally reviewed your Stage 1 Registered Report and have undertaken a review of your revised manuscript myself. All reviewers agree that the manuscript is much improved and the main concerns that were present in the first submission have been carefully resolved. Two reviewers, however, still have some points for consideration which will make the study stronger by clearly outlining the research questions and methodology and mitigate remaining concerns about participant anonymity. I summarise these main points below but note that the reviewers also recommend some more minor revisions that you will want to consider.

  • Clearly state the Research Questions in the Introduction (to satisfy review criterion 1A]
  • Consider a minimum number for to represent the four UoA panels
  • Consider clarifying the first paragraph to improve readability
  • Include a protocol for ‘pseudonymising’ data to ensure participant’s data remain anonymous
  • Link the initial reference document in the main manuscript (and not just within the response to reviewers)
  • Include the rules/processes applied to the inclusion of new dimensions 
  • A personal reflexivity (as recommended by Thomas Evans) [although I note that these are present in your supplementary materials on your OSF Project Page, so you may wish to simply refer the reviewer to this or make this point more salient in your main manuscript).
  • Consider and mitigate risks associated with breaking down results by discipline


Personal reflections: 

  • The Abstract states “Responsible conduct of research (RCR) is generally agreed to be a laudable goal”, but I would argue it is a necessary and imperative goal given that research is a public good. I recommend reconsidering the use of the word ‘laudable’ within this opening sentence. 
  • The following part of this sentence within the Abstract is also a little cumbersome: “then will proceed with several rounds of rating the importance of each dimension to particular discipline”. 
  • You state within the eligibility criteria that participants must have published at least one article including the following key words: “RCR”, “RRI”, “responsible research and innovation” “research integrity” or “responsible research”. I understand this is based upon your scoping review, but do you not think that the key words of ‘open research’, ‘open science’, and/or ‘open scholarship’ could also be applicable here?
  • There seems to be an inconsistency in the section on panel size. In the first paragraph you state “Ideally, we would recruit two to three members of each of the 34 UoA’s, in order to maximise the chance that the outcomes of the study are in fact disciplinary differences instead of personal differences”, which would mean approx’ 68 participants invited (2 from each of the 34 UoA’s). However, in a later paragraph you state: “As such, we have set the minimum panel size at the start of the process to be 30 panellists.” If, as you suggest, you successfully recruit 30 panellists, do you think you can rule out that the results will represent disciplinary differences and not personal differences? Can this section be clarified?
  • Recall that at Stage 2, the main aspects of the Stage 1 manuscript cannot be revised/changed (other than, for example, minor tense changes from future to past). For this reason, you may wish to reconsider how the section titled “expected results” is written.

If you can provide a thorough consideration and detailed Response to Reviewers document alongside your revised manuscript, I see no reason to send this manuscript back out for a third round of peer review; I hope this is reassuring given your stated timelines in your most recent cover letter.

Yours sincerely,

Dr Charlotte R. Pennington

Reviewed by ORCID_LOGO, 03 Apr 2024

The authors have responded to my previous comments adequately, and I was pleased to be able to review the additional materials that were omitted from the last round of review. I wish them good luck with the study!

Reviewed by ORCID_LOGO, 11 Apr 2024

​​​​This manuscript is a revision of a Stage 1 proposal for which I previously served as a reviewer. The authors have done an outstanding job with the revision, clearly addressing the broad concerns I had with the previous version. The revision, to me, is clear and compelling. I had just two small suggestions:

1.      Registered Reports (well, all types of reports) should have clearly stated research questions. In the current version, I sort of had to piece together and infer the RQs, and then it was not until reading the expected results section that I got a firm sense for what the RQs are, namely to a) map the existing dimensions with respect to importance, and b) describe any variations in this mapping by field. Stating these RQs in the Introduction would help clarify the purpose and contribution of the project. Alternatively, if I have the RQs wrong, then the analysis plan need to be modified—this is why clear RQs are needed, as without them the project is difficult to evaluate properly.

2.      The sampling process is now much clearer. However, it is still possible that with a 15-discipline minimum only two of the four panels could be represented. Having no representation from some of the panels would severely compromise the impact of the project. I suggest adding a layer, in which there should be some minimum number (3?) from each of the four panels.

Congrats on a fine project, and I look forward to seeing the results!

Reviewed by ORCID_LOGO, 01 Apr 2024

I have read the revised version and the authors' responses. The authors have carefully responded to feedback and in my view the study is now ready to be carried out. 

If other reviewers or recommenders end up asking for more revisions, I would like to leave a short cosmetic note: while I can appreciate the poetic tone of the very first paragraph, it is also a bit difficult to follow (at least for this non-native English reader) and might be a good to streamline -- or if the word count matters -- even remove. This minor comment on style can naturally be ignored and it's certainly not worth considering unless another review round is commenced for other reasons. 

All best wishes for the work and looking forward to read the Stage 2,

Veli-Matti Karhulahti

Reviewed by ORCID_LOGO, 18 Apr 2024

Thanks for the opportunity to re-review this exciting Registered Report. It was great to see the amount of support and constructive feedback invested in your work on the previous submission! The revisions have contributed to a much more refined proposition and I am grateful for the positive approach to the feedback the authors have taken. You will be pleased to see that my response to the manuscript is significantly reduced in volume since last time, and this is much a reflection of the work conducted to address the comprehensive range of ideas and concerns raised by the previous round of reviews. I feel like many of the larger concerns have been directly addressed and that the work as a whole represents a valuable contribution to our understanding.

A few further ideas to consider:

A protocol for ‘pseudonymising’ data that might identify participants is absent. I appreciate it’s unlikely based upon the style of questioning but being a little clearer about what you will and won’t edit to ensure it’s easily sharable might cover off concerns later.

I can't see the initial reference document signposted in the main manuscript (apologies if I've missed this!)– I can see it in the response to reviewers document so it should be in the main document. This looks to be a helpful resource in of itself and I can see lots of different applications of this so I’m hopeful that the final work will produce an even more insightful mapping of ideas. I think personally I still have concerns about some of the blurred lines of scope of constructs/dimensions included but I’m not recommending any action based on this as I think it's probably an inherent part of the landscape.

At stage 2 – inclusion of new dimensions will be at the authors’ discretion – will there be any rules/processes applied here?

I really like the idea of participants at each stage being able to look at the ‘analysis’ or interpretation whilst also being able to cross-reference this with the raw data – this is a really transparent and effective way of engaging with this and I really appreciate the attempts to increase openness in this type of design.

I appreciate the inclusion of a reflexivity section but would perhaps encourage the authors to remove the list of citations and instead reflect more upon the factors that have influenced the design, and will influence the implementation, of the work. For example, the fact that this is part of a larger project and with a bigger broader goal – what influence has that had upon the study? You already work in this space – what role will personal networks have in recruitment of participants and might your reputations and position in the field influence the results you get?

In the second-to-last paragraph you suggest breaking down results by discipline. We already get a sense of disciplinary differences by the reporting and visualisation of median ‘importance’ scores so I’m interested to hear more about what is considered the purpose of being explicit about where each discipline stands on each specific dimension in this specific manuscript and whether any negative consequences could be anticipated and mitigated. I can see the value for it being in a supplementary or such but no case has been provided for why this should be a priority in the main manuscript given the primary research questions have been addressed by the previous analyses. There’s the scope for this to be used to misrepresent fields etc. given small numbers and different practices/approaches and it’d be interesting to get your response to this potential for negative impact (I imagine this won’t lead to any change to the manuscript but in the case that this might be a helpful thought I’ve left this here to be considered).

Some very minor things:

Awkward wording on p4: "This broader remit of RCR includes dimensions that overlap with those of RRI, such as the responsibility research has for honest and transparent dealings with citizens and society."

Some of the new content on p6/7 is a little awkwardly presented e.g., "lastly" which doesn't seem to follow a "firstly". It might be worth considering whether some of the context of the wider project might be better situated in the methodology (you later introduce it in p14) or reframed such that it helps flow from the established needs of the field as you discuss earlier.

You refer to dissensus in p15 when it's no longer discussed elsewhere so this may be confusing or seen as an inconsistency for readers.


In sum, I see the value of the proposed work and the focus and processes involved have been significantly refined from the previous version. I hope that these comments are helpful for the project, I have thoroughly enjoyed reading, reflecting and getting stuck-in with the different components of this research, and as before, I look forward to reading the next version of this work whether that be as reviewer or (hopefully!) reader!

Dr Thomas Rhys Evans

Evaluation round #1

DOI or URL of the report:

Version of the report: 10.31219/

Author's Reply, 26 Mar 2024

Decision by ORCID_LOGO and , posted 11 Jul 2023, validated 11 Jul 2023

Dear Sarahanne M. Field and co-authors,

I have now received five peer reviews for your article titled “Capturing Perspectives on Responsible Research Practice: A Delphi Study” and am inviting you to submit a major revision of your Stage 1 report. You will see that each reviewer, whose expertise lies within open/responsible research and/or Delphi methodology, has provided extensive and thorough reviews, which need to be considered carefully in both your response and revision. I will not reiterate these reviews in full but would like you to pay close attention to comments regarding the rationale and proposal of your new RCR framework, which satisfies review criteria 1A; the clarity of the methodology which satisfies review criteria 1D; and the sampling strategy, which satisfies review criteria 1C and 1E. Three out of five reviewers also note that the current analysis plan is flexible, which can introduce bias, so you need to restrain this as much as is possible. Finally, the ‘Initial Reference Document’ referred to within the article research should be shared so it can be reviewed alongside the revised manuscript (which was commented upon by all reviewers who would have found these helpful in their initial review). 

To note for full transparency, because I have co-authored with one of your colleagues, this evaluation has also been ratified by a second recommender, Dr Maanasa Raghavan.

Yours sincerely,

Dr Charlotte R. Pennington

Reviewed by ORCID_LOGO, 26 Jun 2023

Thank you for the opportunity to review this Stage 1 Registered Report for PCI-RR. I enjoyed reading this manuscript proposing a Delphi study to map responsible conduct of research. I think it is important and interesting work, but I have some concerns regarding the mapping of research aims and methods and some other aspects of the methodology. Please see below for my comments and suggestions.

To contextualise my review: I am a psychologist and metascientist who has worked on several aspects of open science (replicability, reproducibility, generalisability, big team science, uptake of open science practices in undergraduates and researchers, and diversity in [open] science). However, I do not have any experience in mixed methods research, Delphi studies, or existing frameworks underpinning individual and institutional codes of research conduct. I have carefully read the Stage 1 manuscript, but have not read all the referenced work. I will therefore provide my review from this positionality, and hope that in combination with other reviews the authors will have feedback on all aspects of their proposed study.

1. I was surprised to read that the idea is to make one framework, and for the “bullseye” and more central circles to include facets that span multiple disciplines and the outer rings to contain aspects that are more “niche”. I’m not sure that this is in keeping with the aim of allowing for disciplinary differences. The fact that the majority of disciplines agree with one particular aspect of responsible conduct of research doesn’t mean that it’s necessarily the most important part of RCR, and similarly, something which might be very specific to only some disciplines may be one of the most important aspects of RCR for those disciplines. In addition, if there is only one person participating from each discipline, having a rule that “items where only one person considers them important will fail validation and be excluded from the framework” seems odd if this is a really key aspect of RCR for this discipline.

2. I am not sure about the feasibility of the snowball sampling method of requesting “that each existing panelist provides us with other possible participants”. This feels like it may not work as well for getting one participant per discipline (i.e. panellists may be more likely to know people within their own discipline). For this reason, I might suggest instead using an approach that is more similar to the process sometimes used when seeking reviewers for a manuscript or speakers for a symposium, whereby you ask for recommendations from people who decline the invitation for people who could take their place. This could also help with issues of diversity (see next point), if you had some text to include something like “if you decide to decline this opportunity, we would be very grateful if you could recommend someone that fulfils X criteria, prioritising researchers who are Y”.

3. I would like to see more detail regarding the diversity of participants, particularly how you will prioritise diversity of participants and what information will be collected about them. It is clear that the proposed methods (if successful) will result in disciplinary diversity. How do you plan to “work to ensure that as much of that diversity as possible filters into the final sample” and ensure the sample is “as diverse as possible”? Which aspects of diversity will you be prioritising (only gender and geographical region are mentioned), and how? In addition, will readers have access to information about the diversity of the participants (for example: methodological background, career stage, ethnic background, et cetera)? I know you plan to collect some of this information already, but will it be shared?

4. You say that you “will continue to contact possible candidates until we receive consent to participate from 40 people” but does this mean making replacements when no one from a particular discipline says yes? Otherwise you might end up with a sample biassed towards the disciplines interested more in research integrity.

5. As this is a Stage 1 RR, I would love to see some open materials included in the revision so that these can also be reviewed before data collection. This could include for example a visual example of what the RCR map could end up looking like, templates of emails you will send to recruit participants, participant study information, the initial reference document that will be provided to participants, demographic questionnaires, pilot data, et cetera.

Reviewed by ORCID_LOGO, 07 Jul 2023

This proposed registered report is part of a larger project aimed at developing a new, inclusive framework for responsible research. I had a bit of a chuckle at footnote 3 about the distinction between people involved in scientific reform vs. those involved in RCR, as I am firmly part of the former group. Accordingly, my comments here are coming from outside of the RCR ecosystem, and thus at times may reflect some level of ignorance. Alternatively, I think my outsider perspective may be helpful for improving some aspects of the proposal.

My feedback pertains to two broad issues that I think need to be addressed before the proposal can be assessed further: properly motivating the study and providing a sufficient level of detail for a registered report. I will take each, in turn.

The Introduction section did not provide me with a strong understanding of the authors’ framework, how it improves upon the existing frameworks, and thus why this project is necessary. As written, it assumes a lot of common ground knowledge that a naïve reader will not have. The many different terms and acronyms-- ELSA, RRI, RR(I), RCR—make it all the more difficult to follow. The authors need to take some more time explaining these different terms and frameworks and their interrelations. Doing so would then provide a foundation for the reader to understand the need for a new framework. The authors make reference to their newly developed framework, including reference to “dimensions,” but do not explain the framework in any detailed way. For example, the authors state that there is a need for, “a new RCR framework that balances breadth and specificity with feasibility and practicality,” but don’t explain how existing ones fail at this or how theirs achieves it. I understand that the current project is just one piece of a much larger project, but nevertheless this paper needs also to stand alone.

A somewhat related concern is that I was not clear on what the context for this work was, or to put it another way, who the audience would be. The authors rely on the REF units of assessments and reference the European Commissions’ Frameworks Program, which suggest that the context is the UK and/or Europe, but other aspects of the text suggest that this is a global framework. Either is fine, but it should be explicitly stated. As part of this, it would be helpful to know how this framework would be put into action. Who will enforce this, or who will pay it any mind? How will this framework be successful? Who will take it up?

My second broad concern is the lack of specific detail in places, which is expected for a registered report. I like the idea of using the registered report for developing a framework vs. testing hypotheses, and I understand that this project is largely exploratory, but you should still nail down as many details as possible and avoid vague decision criteria. A few examples of unclear procedures are as follows:

The sampling strategy is unquestionably complicated for this kind of project. However, the authors should have clear criteria for what would constitute a sufficient sample for the project to proceed as agreed. This is critical given that the IPA comes with a guarantee* to publish the final paper. The authors state that they “hope” to retain a sample of 20 at the final round, but this is not a commitment. Moreover, nothing in the procedure precludes this sample of 20 from coming from a small slice of disciplines and/or countries. The authors should be much more specific about what the minimum acceptable sample will be, both in terms of numbers and characteristics. Statements that the authors will monitor the diversity of the sample until they are content should be avoided in favor of more formal criteria.

The dissensus approach is a strength of the project, but it was unclear whether a dimension mentioned by a single participant would be included in a subsequent round, or whether there would need to be a higher frequency of mentions.

I appreciated that the authors included the IQR and median values for the different levels of the “bullseye,” although the low ratings will lead to the dimension either being placed on the outer ring or being dropped entirely—these are very different outcomes that should have clear criteria. Moreover, how will the authors determine that there is “no change” in these ratings across rounds? What amount of change constitutes a level of meaningful change?

The authors reference an, “Initial Reference Document,” which appears to be central to the study, but was not included for review or discussed in any detail (related to my first broad point). This, along with other study materials, should be included for the next round of review.

As a final, somewhat distinct point, the authors indicate in parentheses that the Delphi method is sometimes considered to be a sequential mixed methods design. How exactly this is the case should be described in text.

(*not a guarantee)

Reviewed by ORCID_LOGO, 27 Jun 2023

This is a highly interesting study that can help the academic world better grasp responsible research. In general, the plan has been very carefully crafted and the design is promising. In terms of my reviewer point of view, although I’m familiar with Delphi studies and have participated in them many times, I haven’t published any myself so my pragmatic know-how can be lacking in this regard. Below, I try to provide feedback that is helpful for further improving the plan. I list the comments one by one to make it easier to read. My writing style is sometimes a bit blunt so please do not see it as an adverse signal, I really like the study plan.
1. Perhaps my largest comment concerns the reference document, which is the starting point of the work and is narratively described in the MS. Because the data and information for the document are already available, I was surprised not to find this document as a supplement. I believe Stage 1 RR would have been an excellent opportunity to gather external feedback for this document, the structure of which represents the core of the study. One can comment on the narrative descriptions too, but I personally find it a challenging without seeing how the document is fully structured. I would really encourage attaching this document for the review round 2 (although it might be already too late because reviewers / recommender might find it not practical to suggest major revisions anymore at that point).
2. Related to the above, it remains a bit unclear how the reference document was or will be constructed. I understand the review serves as a basis, but there are also mentions of interviews, thematic analysis, etc. This seems like a gap methodologically, as there is no further information. E.g., how many interviews were carried out, what were they like (questions list as a supplement?), are these data open or are there reasons for not sharing, etc. I was also unable to find out what kind of thematic analysis was applied (there are dozens of different TAs!), who was involved in that analysis, or what that analysis process was like in general (are any of the coding materials shared?). I don’t want to unnecessarily complicate this study which is already going to be a laborious enterprise, but it would also be unfair to leave this central element uncommented. I will be happy with many explanations or solutions, but adding more information about this step would be necessary IMO.
3. The introduction is clear and explains the background well to someone who hasn’t been directly involved in related program development. That said, the topic appears to be tackled largely from a Western perspective, and I don’t see many cross- or multicultural aspects addressed (before footnote 2). Different countries and cultures have different ethical and legal approaches, and it would be good to discuss this diversity explicitly in the introduction. I know there are limits to content, so I leave it for the authors to negotiate to what degree they wish to integrate this aspect in the study. 
4. Regarding the participating experts, I would prefer to have clear inclusion/exclusion criteria. There’s currently a general description (p. 9-10), but it would be good to know explicitly what criteria the 95 listed experts meet and who have been excluded (for what reason). This is mainly a cosmetic note, as the information is distributed there to a large extent already. Having details like this noted at Stage 1 will add transparency to the process, even though changes may have to be done later in data collection or analysis. 
5. There is an important note about representation (p. 12) among experts. Especially related to my comment #3, I think it would be critical to somehow ensure significant representation of non-Western experts if the project aims at universal findings. Alternatively, it would be totally ok to clearly focus on specific, selected regional expertise. I just see a risk here that the results suggest a global concept when only a small proportion of experts come from countries that may have different cultural perspectives and produce half of the world’s research (China, Japan, India, etc.). Again, I leave it for the authors to negotiate how to tackle this; the most important thing is that the authors are aware of it and will be able to consider the aspect critically in their analysis and reporting at Stage 2. 
6. I really like how the multidisciplinary dimension has been considered and how comprehensively it has been integrated in the design. As an interdisciplinary researcher myself (having worked across all panel areas A-D), I was thinking whether a separate panel for interdisciplinary experts could even further improve the design. Such people might have distinct viewpoints. But you can also fully ignore this comment; I understand it could unnecessarily add to the already-hefty load of work. 
7. A technical comment: will there be any measures for careless responding or other data quality checks? It’s less common to have data issues in a Delphi, but it would be good to somehow plan to control data beforehand since it’s an RR. Also, I didn’t see data or document version sharing discussed anywhere in the MS. How will data sharing and document development be managed? I am assuming that everything will be shared as per TOP guidelines.
8. Considering drop-outs, I am thinking whether it would make sense to recruit new participants at later rounds if the N drops too low. This is hardly optimal, but to me it seems like a better Plan B versus hypothetically going forward with a very small participant group.
9. Since there is a lot of flexibility in the design and decisions between rounds can be made by unforeseen motivations (p. 18), I think it would increase transparency to add brief notes on positionality, i.e. the authors’ own core disciplines and perhaps some perceptions of what they personally consider important in RCR. When we then see the decision tree at Stage 2, it will be possible for readers and reviewers to reflect on those decisions and results against the stated positions. If this suggestion feels unfitting, it can naturally be rebutted. 
I hope some of the above feedback is helpful in further improving this interesting Stage 1 proposal. If some of my comments feel unclear or unfair, I can be directly contacted. I always sign my reviews,
Veli-Matti Karhulahti

Reviewed by ORCID_LOGO, 04 Jul 2023

Dear Authors,

Thanks for the opportunity to provide feedback on your delphi project on responsible research. I was delighted to be invited to provide comment on this proposal, and I write this review in context of my experience in the open scholarship movement, and with supporting scientific reform through communities like FORRT and the UKRN.

I wish to highlight early a minor conflict of interest in that I am fully engaged with the UKRN, both as institutional lead for the University of Greenwich, and as contributor to the Research England-funded project where I have been developing communication plans and a maturity framework for OR4. I have therefore experience working under Prof Munafò’s leadership, a named author on this manuscript. Having discussed this with the Recommender and in the understanding that our experience of working together directly has been limited, I have been given the green light and as such, have provided my comments below.

I sincerely hope they are of value in helping the development of this work, and I wish you all the very best.

·       Title: Consider a wording change in the title (p1) to clarify that you are focusing upon academic practice, or research practices, or practicality, rather than applied research (which “research in practice” could be misinterpreted to be).

·       The use of RCR rather than RR as the central term could perhaps be justified or discussed further – I think in this space there are lots of overlapping terms and it might be fruitful to provide a clearer definition of what the concept you refer to includes and excludes. Given this fields’ propensity to allow concept (concept) creep, particularly when acknowledging the fields of ethics and integrity, it’d be fruitful to make a more decisive statement.

·       “For example, reproducibility is a concept that applies to quantitative disciplines, but less so qualitative disciplines and the social sciences, and even less so in the humanities” (p2) – I would revise the sentence structure for clarity.

·       “to reorient scientific research [practices] to make it [them] more effective and – crucially – more ethical and self-aware” (p3).

·       “Von Schomberg points out that there is no agreed-upon definition of what RCR is; rather, it holds an invitation to discuss what RCR as a top-down signifier might in fact denote, in relation to the disciplines and research processes it engages with.” (p4)– wording is a little awkward here.

·       I read your preprint “Exploring the Dimensions of Responsible Research Systems and Cultures: A Scoping Review” with great interest and noted how little of the core conclusions (relevant to this particular manuscript) were discussed. I appreciate the need to minimise repetition across manuscripts, however I feel like the preprint provides a more comprehensive grounding for the justification of the proposed work and so could perhaps be discussed in a little greater detail. This links to the next point.

·       As a whole, this introduction feels a little light when attempting to convince the reader for the need of a new framework/approach and thus a slightly richer discussion of the existing frameworks and content of such might help situate this work a little more clearly. A more comprehensive mapping of existing frameworks and their scope of relevance (i.e., which fields they can successfully be applied to) might provide a more robust justification for a centralised/singular framework.

·       The development of a single framework itself might be considered to be too big of a demand and there is a fair risk that this project can’t deliver despite the teams’ best efforts – to provide something which is sufficiently detailed to be practical and helpful, whilst also diverse enough to be relevant to all fields. Is a singular framework where some components are irrelevant to certain fields more preferable to various discipline-specific frameworks? I remain receptive but skeptical about the potential for the proposed work to achieve this goal and encourage the authors to reflect upon the justification presented for this goal.

·       More details on the ‘previously devised reference document’ (p6) might be of benefit for the reader, and could be included as part of the appendix and open materials of the study.

·       The use of delphi methods are well-suited to the aims and outcomes of the project, and whilst it will be useful in equalising voice, it has been well-designed with a dissensus approach to acknowledge that consensuses/majorities are unlikely given the broad ambitions of the project.

·       The criteria of “importance” (p7) for which participants rate RCR dimensions could be elaborated upon.

·       It would be useful to understand how you will negotiate the jingle-jangle, the nomenclature, for a range of ideas like integrity where there is already vast proliferation of broad definitions, models and terms. There is a risk that this study can lead to a list with lots of broad and inter-related ideas with little method to differentiate between semantics and content.

·       The selective recruitment of individuals with experience of RR frameworks feels very sensible and is a core strength of the proposal given the detail provided in determining the lists and inclusion criteria. You may want to consider whether that approach may lead to more homogenous and less diverse sets of ideas based predominantly upon pre-existing models and thus may limit the contributions proffered. It may be that there could be methods used alongside the delphi to complement the process to defend against less minor editing and encourage more substantive or transformative ideas. This is a suggestion that I don’t necessarily expect to see actioned, but could be considered in context of the contributions of the proposed work.

·       The practicalities of the delphi could be noted e.g., will it be facilitated through an emailed document and google forms link to provide data etc?. What does “relatively important” refer to? It might be useful to get a sense of how you will make decisions as to including/excluding suggestions at each round – how the dissensuses are maintained could be clearer as it currently sounds like any ideas not widely endorsed would be dropped.

·       On p16 by “no change” do you mean ‘minimal change’? You could provide some scope of what this might be to be more precise.

·       This protocol is quite brief and could include a more detailed note of what dimensions of the research process will be made openly available. For example, will iterative versions of the reference documents, participants’ data and ratings, decision-making log, etc. be made fully available?

In sum, the project and manuscript as a whole are well-constructed and provide a clear account of a delphi study that has the potential to form an RCR framework of benefit to our scientific community.  There is no doubt that there is much work necessary in this space and that this proposed study has potential value to contribute to a number of developments. However, I hope my feedback encourages the author team to reconsider how to manage the broad terms they discuss (and might negotiate with participants), the justification for a new framework, the need/contributions of a single framework, the potential for the project to do little more than merge pre-existing models (this may itself be a valuable contribution but seems different to the intentions outlined here), and to encourage a little more detail on the procedural and practicality of the project for further transparency. I do hope my thoughts are of value to the research team, I wish them all the very best in conducting work in this important space, and I look forward to reading the next version of this work whether that be as reviewer or reader!

Take care and stay safe,


Dr Thomas Rhys Evans

Reviewed by , 27 Jun 2023

I have been asked to conduct a Stage 1 review of “Capturing Perspectives on Responsible Research Practice: A Delphi Study” for PCI Registered Reports. This study is part of an important program of research that aims to develop an understanding of how responsible conduct of research is conceived and practiced across disciplines and geographic regions.

I provide my review below using the criteria recommended by PCI-RR, as well as a few additional comments at the end.


1A. The scientific validity of the research question(s). 

The research question is scientifically justifiable based on the existing evidence found by the authors’ scoping review, as well as their intended application in future project stages (i.e., to develop and evaluate RCR communities of practice). The research question also falls within established ethical norms (ethics approval has been granted by the University of Bristol’s School of Psychological Science Research Ethics Committee, and the University of Leiden).


1B. The logic, rationale, and plausibility of the proposed hypotheses, as applicable

Not applicable


1C. The soundness and feasibility of the methodology and analysis pipeline (including statistical power analysis or alternative sampling plans where applicable). 

See comments in 1D (I had trouble understanding and separating 1C from 1D).


1D. Whether the clarity and degree of methodological detail is sufficient to closely replicate the proposed study procedures and analysis pipeline and to prevent undisclosed flexibility in the procedures and analyses.

The methodological details sufficiently demonstrate the links between the research question, sampling plan, analysis plan, and interpretation given different outcomes. However, there are important gaps in information reported and materials shared that do not make it possible to affirm that the protocol contains sufficient detail to be reproducible. In addition, the authors propose flexible analytic procedures that actually risk the soundness of the analysis pipeline by inducing researcher bias rather than protecting against it (i.e., reserving the right to change their definition of consensus based on the nature of their findings). I offer the below in chronological order as they appear in the manuscript:

- Can the authors please provide a citation to and/or summary of the methods of their “interviews with a range of RCR scholars and practitioners”?

- I find it difficult to appraise and approve the Stage 1 protocol without seeing the initial reference document underlying the document (or, at the very least, a list of the dimensions/items to be rated). The reader does not know what this document looks like, how many items it has that will go into the Delphi, whether these items are potentially double-barreled, etc. Stage 1 review seems premature without seeing the document/items to be rated in the Delphi process.

- I applaud the thought that has already gone into the eligibility criteria for experts. However, without further detail, I am not sure that I could replicate all of the eligibility criteria for expert panelists aside from publications (at least one manuscript, and a sufficiently senior/leading role on the manuscript). To be an expert, do the frameworks/codes need to be of a sufficient quality or influence, and the role of the person in developing the framework/code to be of a sufficient significance? What kind of role in training/community activities does one need to have, and how many trainings/activities completed? What does “support” of researchers entail operationally to be eligible? A table of operationalized inclusion version exclusion criteria would help make these recruitment decisions more interpretable and replicable.

- It is not clear how the authors operationalize “a very diverse and somewhat large participant sample, in terms of disciplinary, geographical, and institutional contexts”. For example, they later write “Once we have reached the target sample and are content that the sample is as diverse as possible (we will be monitoring this aspect as we approach candidate panelists), we will proceed with the Delphi panel.” What criteria will be used to be content that the sample is sufficiently diverse in terms of discipline, geography, and institution: at least one person from each of the disciplines and geographies listed? What about institutions?

- It is not clear why the authors chose 40 panelists as their targeted sample size for the first round, given the literature that they cite recommends either 20-30 participants (Melander) or 10-50 participants (Turoff).

- It is also not clear why the authors chose 20 panelists as the targeted sample size for the final round.

- The authors write that they have a list of “approximately 95 individuals”. What does “approximately” mean here: can an exact number be provided instead?

- What “scholarly literature” did the authors use to identify potential panelists: the studies included in their scoping review?

- Given the number of universities in the world and their often-limited search functionality, it would be helpful for the authors to explain how they found the “online researcher profiles” from which they identified eligible participants.

- The above two recruitment strategies (scholarly literature, online researcher profiles) can help identify experts based on the eligibility criteria related to publications and frameworks/codes. How will the authors find those who are not researchers, but rather only meet the training, community activity, or support inclusion criteria?

- Are experts from North America explicitly excluded? Or is their omission a consequence of the methods used to create the sampling frame? Whichever the reason, please clarify and provide a rationale for why this is not an issue.

- The authors write “We will work to ensure that as much of that diversity as possible filters into the final sample.” How? Incentives? Follow-up emails? Obtaining explicit commitment upfront? Offering co-authorship?

- The authors write “We will initially approach 68 (i.e., two persons for each of the 34 UoA) and will continue to contact possible candidates until we receive consent to participate from 40 people.” Do the authors already know which 68 people from the list of ~95 they plan to approach first? If so, what is the breakdown of how they were identified (i.e., how many are researchers with publications, framework/code authors, trainers/activists, or support staff). Given that the preliminary work has focused heavily on researchers and the published literature, I am wondering whether this panel is already skewing toward researchers with publications at the outset.

- What concretely do the authors mean by “actively recruit people through our own networks”?

- Related to the request for more operationalized eligibility criteria: how will the authors vet the eligibility of the people offered through snowball sampling?

- Do the authors have a copy of the recruitment email that they can share?

- The authors write “If one month has elapsed and we have not successfully recruited more than 30 individuals, we will go ahead with the Delphi.” Why 30 instead of 40 as stated earlier in the protocol? Clarity and coherence in these cutoffs are important to allow Stage 2 reviewers to assess to what degree the panel achieved what it set out to achieve.

- The authors write “Dissensus is operationalized as people adding new elements to the framework that they consider important yet are ‘missing’ from their ideal conceptualization of RCR.” This definition of dissensus differs from how the term is typically used in the Delphi literature (i.e., variation/dispersion in ratings/rankings). Consensus-oriented Delphi processes commonly allow participants to identify missing items (e.g., this is standard practice in reporting guideline development). In addition, the authors later write “Subsequent rounds will focus on validating items in an increasingly refined reference list (see the following subsection Developing the Framework for details on this), as participants (hopefully) converge on the most important items, modifying their previous responses based on others’ ratings and feedback.” This desire for participants to “hopefully converge” is the goal of a consensus-oriented Delphi and antithetical to a dissensus Delphi. As such, I think the use of “dissensus” is inappropriate to describe this proposed Delphi study.

- Do the authors have a copy of the demographics survey/questionnaire that they can share?

- Do the authors have a copy of the Phase 1 survey/questionnaire instrument that they can share? I find it difficult to appraise the quality of proposed Phase 1 methods without explicitly seeing the instrument questions (and the Initial Reference Document).

- The authors write “We will ask them to answer in relation to their primary field of expertise (that is, the one we recruited them for).” Will the panelists be told the field to which the authors assigned them? This may differ from the field with which panelists self-identify.

- The authors write “The authors will pool the information derived from the first round, construct a feedback report for the participants, and revise the reference document in preparation for Phase 2. The feedback report will include, for instance, the calculated median and IQRs per item, a depiction of the distribution of the responses per item, and a report of what items will be excluded based on low importance ratings. The revised reference document will reflect the participant's suggested dimensions.” However, the authors only reported qualitative questions via open-text boxes in Phase 1 (i.e., missing dimensions, additional insights). What closed-item questions are there in Phase 1 that can yield a median/IQR? What is the rating scale? And how many items/dimensions will be rated in Phase 1?

- The authors write “In Phase 2, Round 1 the participants are … asked to rate how important each dimension is for their sense of RCR on a 9-point scale (where 9 corresponds to Highly Important and 1 to Unimportant).” This scale is not symmetrical around the middle-point 5. Best practice is to use the same stem for “1” and “9” (e.g., Highly Unimportant to Highly Important). Based on the scale provided, I assume participants will interpret 5 as “neither unimportant or important”, so “important” would fall somewhere between 5 and 9 (probably 7). This is problematic, as the symmetrical opposite of “important” is “unimportant”, but “unimportant” is 1 while “important” is 7.

- The authors write “In the feedback report and the revised reference document, we will also include information about what items were added and which were dropped.” What are the operational criteria for dropping an item?

- The authors write “Subsequent rounds will focus on validating items in an increasingly refined reference list (see the following subsection Developing the Framework for details on this), as participants (hopefully) converge on the most important items, modifying their previous responses based on others’ ratings and feedback.” What do the authors mean by “validating” items? That the panelists will reach consensus or stability in responses?

- The authors write “We will conclude the process after a maximum of 4 Delphi rounds. Melander’s review suggests between 2 and 3 rounds is the average for a consensus Delphi, therefore, since we are including an initial dissensus round, we will conduct a maximum of 4 rounds in Phase 2 (for a possible maximum of 5 rounds including the one round in Phase 1, where participants suggest dimensions).” I had to re-read this section a few times, as it reads first as if there will be 4 rounds overall, then only 4 rounds in Phase 2, and therefore 5 rounds overall. Perhaps this could be solved by changing the first sentence to say a maximum of 5 overall rounds, including the initial round in Phase 1?

- The authors write “We will conclude the process earlier if no change is observed in the IQR and Median calculations for all items between two given subsequent rounds, or if the author team agrees that little enough change (i.e., so little change as to render the difference conceptually meaningless) has occurred between two rounds.” This concept is referred to as “stability” in the Delphi literature; please provide an operationalized analytical measure for stability (

- I find the authors’ operational definitions for consensus problematic for two inter-related reasons. Firstly, they don’t capture all possibilities (e.g., what happens to items with a Median > 7 but IQR > 2, a Median = 5 but IQR < 2, or a Median = 2 but IQR < 5?). Secondly, by varying the IQR thresholds across the three categorical levels of importance, the definitions conflate agreement/consensus with the panel decision if consensus is reached. One example of resolving this issue: the IQR of 2 could be used as the cut-off for agreement/disagreement. In this case, any item with an IQR of 2 or less would reach “consensus” in the panel, and then the tertile in which the median falls would determine the decision (7-9 is high importance, 4-6 is moderate importance, 1-3 is low importance). Any item with an IQR > 2 would mean that the panel did not reach consensus.

- Related but distinct from the above point: the thresholds for low/moderate/high make sense to me (tertile in which the median falls), though I think a justification/citation for the chosen IQR threshold would be helpful.

- The authors write “We re-emphasize at this juncture the exploratory nature of this Delphi study. We will need to see the distributions of each item’s data before determining for certain whether these quantitative categories (i.e., the median and IQR threshold ranges defined earlier) are valid and applicable. If they are not, we will redefine our categories and transparently report the change and its motivation.” I understand that the authors are not testing a hypothesis, though I do not believe that this Delphi process is “exploratory” in that the authors will not merely report descriptive statistics of what they find, but rather will use the results instrumentally to create their framework using thresholds of significance. Consequently, I am alarmed by their statement here as it reads as data mining for consensus ( The authors essentially appear to be saying that they will choose the thresholds for consensus based on the pattern of results obtained, or “consensus hacking” (akin to p-hacking but for consensus rather than statistical significance). If the authors want to make claims about consensus decisions, the thresholds for these consensus decisions need to be pre-specified and the authors open to the decisions yielded by the panel (e.g., the panel finds that all items are important, or perhaps no items are important). If the authors are truly conducting an exploratory study, then they should simply report the results from the panel using simple descriptive statistics and avoid defining consensus post hoc based on the nature of the panel results.


1E. Whether the authors have considered sufficient outcome-neutral conditions (e.g. absence of floor or ceiling effects; positive controls; other quality checks) for ensuring that the obtained results are able to test the stated hypotheses or answer the stated research question(s). 

Not applicable


Additional comments (abstract/intro)

- The title in the submitted document (“Mapping the Scope of Responsible Research in Practice: A MAD (Modified ReActive Dissensus) Delphi Study”) does not match the title in the submission system (“Capturing Perspectives on Responsible Research Practice: A Delphi Study”).

- There is an (unintended?) implication that the social sciences are not quantitative: “For example, reproducibility is a concept that applies to quantitative disciplines, but less so qualitative disciplines and the social sciences, and even less so in the humanities.” I am a social scientist that primarily conducts quantitative research involving meta-analyses of randomized trials conducted by other social scientists; reproducibility is quite applicable to this area.

- The abstract should provide a summary of the methods of the proposed Delphi process.

- While I agree with the underlying sentiment, I personally found the introductory paragraphs a bit grandiose/hyperbolic. For example, I understand how scientific research is irrelevant when it “becomes misaligned with the needs and expectations of society”, but I’m not clear how it is “putting the lives of people and the environment at risk.” Fraudulent research on vaccines, sure: but irrelevant research on an unimportant topic?

User comments

No user comments yet