DOI or URL of the report: https://osf.io/7rhkf?view_only=db4c1a620de841c28d7fc9a52e326cfd
Version of the report: 1
Dear specialist recommender,
In attachment, you can find our response letter and revised manuscript (with revised sections highlighted). In our response letter, additional revision points raised by you, the recommender, are as well answered.
Sincerely,
Anneleen Dewulf (on behalf of all authors)
Your preregistration entitled “Do Ecological Valid Stop Signals Aid Detour Performance? A Comparison of Four Bird Species.” has now been been seen by two reviewers and the reviewers’ comments are appended below. I share the reviewers' view that your study plan has a great potential to provide helpful insights into the potential forces shaping differences in cognition through the careful expansion on earlier studies. However, both reviewers raise some questions about details in your approaches. I think expanding on the analyses and rationale behind some of the choices you made will help to make the best use of the data you are planning to collect. One particular point raised by both reviewers is about the food deprivation period. I recognise that most bird species tolerate overnight food deprivation, but your protocol appears to span beyond that. Both reviewers point out that the potential effect of food deprivation might depend on body size differences among species, and I also noticed that the planned photoperiod of the different species is not the same. It would help if you consider both the methodological and ethical aspects of the planned food deprivation protocol. There are two aspects from my side in addition to the points raised by the reviewers. First, I think it would be good if you could expand on the explanations of the relationships you expect between the covariates you chose to include (barrier neophobia and barrier order) and the outcome variables. I am worried that these two measures, rather than being covariates, might present as confounds. Second, in the study design template, I think you could split some of the information by the three questions you set up. I recognise that it is difficult to clearly separate between these points, but I think the (i) 'Analysis Plan' and (ii) 'Theory that could be shown' columns could have separate entries for each question that focus on (i) the specific effects of the statistical analysis and (ii) the specific rationale underlying each question. For the revision, it will also help if the pdf of your study plan has line numbers. I hope you find the comments helpful, and I look forward to a revised study plan.
It was a pleasure to review this well-written and very detailed pre-registered report. The data analysis plan is relatively sound, and the methods are well-designed and appropriate to answer the question at hand.
I have a few questions and potential recommendations regarding the proposed behavioural parameters (1 and 2), the rationale for the statistical analysis (3), and ethical issues arising due to food deprivation (4).
1. Parameters for behavioural analysis: I am a bit critical of using “touches barrier with beak” as indicator of persistence as physical inspection/pecking might differ across species. This would run the risk of zero-inflated data for some species (i.e., those with a low motivation to establish physical contact with the barrier) – hampering comparisons between species, and also barrier-design. Unless the background literature would indicate otherwise, I would argue that proximity to the barrier (or certain areas of the barrier) would probably be a better indicator for persistence.
2. Inclusion of maximum trial time and consideration of additional behavioural parameters: The authors set the maximum trial limit to 135s and opt to include trials in which the barrier is not being detoured as data points with “135”. This can be problematic for at least 2 reasons: 1. If subjects do not detour the barrier in a considerable number of trials, this will skew the data and might hamper further statistical analysis. 2. Failing to detour the barrier can be due to an inability to find a detour route but can also be caused by other factors such as lack of motivation, increased stress, or distraction (i.e., subjects that might otherwise be easily able to detour will get assigned with the highest value). To get around 1., it might be advantageous to also score accuracy (yes/no) and analyze it in a corresponding GLMM (similar to what has been done in previous studies using this paradigm).
3. Statistical analysis: I was wondering why the authors opt for an ANCOVA rather than a (G)LMM – with the latter being more flexible in assigning variance / estimating effects. Mixed models are generally more powerful compared to conventional repeated measures AN(C)OVAs and they also have fewer assumptions (e.g., sphericity). In addition, the authors need to state how they will proceed if model assumptions for ANCOVA (or LMMs) cannot be met (e.g., the need for data transformation).
See also here: Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of memory and language, 59(4), 390-412.
4. Ethics: food deprivation (from 4:00PM - 8:30AM the following day) appears quite exhaustive for small birds (e.g., canaries) – is there literature showing that these are deprivation times that are commonly used and do not pose strong additional stress or harm to the animals? Otherwise, I would argue to reduce deprivation times considerably.
I also have some rather minor comments:
Introduction
1st paragraph, first sentence: Needs reference.
1st paragraph, last sentence: “if they fail to stop” – what does this mean? Stop what?
2nd paragraph: it might be worth outlining some details about the limitations of detour tasks to assess response inhibition, e.g., Horik et al. 2018.
van Horik, J. O., Langley, E. J., Whiteside, M. A., Laker, P. R., Beardsworth, C. E., & Madden, J. R. (2018). Do detour tasks provide accurate assays of inhibitory control? Proceedings of the Royal Society B: Biological Sciences, 285(1875), 20180150.
Last sentence of 2nd paragraph: other factors than unpredictability might affect the expression of diverse foraging strategies so this suggestion might be a bit too general. In addition, “learn” would imply that there will be a gradual increase over lifetime/experience – is this what the authors imply?
3rd paragraph, second sentence: please outline why “inhibitory control” would be even less suitable to serve as a general umbrella term (“or even worse…”).
3rd paragraph, last third: “Furthermore, these core process …” – please explain what is meant with timescales here.
5th paragraph, last third: “The authors found that RI performance (…) was worse …” what does “worse” refer to here? Longer or shorter latencies to perform the detour?
Table 1: Without reading the full introduction it was not clear which species were included in Zucca (2005) – might be worth considering adding some additional lines as separators
Why was this specific baseline measure for neophobia chosen? Please briefly elaborate and consider adding references.
Figure 1: please reconsider using red/green colour differences – maybe change to other colours or shades of grey.
Methods
Comparing birds of different ages: I fully see the need to account for the time birds can experience their enclosure (canaries vs all other species) but I was wondering from a developmental perspective whether the additional age might give canaries a general head start in the task (although a mere species comparison is not the aim of the study).
Assessing barrier neophobia: running only one trial makes it quite prone to outliers/distractions. Shouldn’t it also be corrected with a different baseline, e.g., the time needed without the barrier (as some species might be simply slow in approaching the food in general)?
Unpublished literature. Is there any chance to make Troisi et al. and Garcia-Co et al. publicly available (e.g., via preprinting) as they are used to justify the task measurements for two of the tested species?
This registered report describes an experimental setup, to evaluate the influence of stop-signal detection in the performance of birds in a detour-barrier task, assessing whether results will be predicted according to the ecological niche of each species. In the proposed research plan an experimental procedure will compare the performance in inhibitory response of 4 different bird species, namely white leghorn chickens, Japanese quails, herring gulls and domestic canaries, all hatched and raised in captivity. The proposed research plan is also a partial replication of 2 previous studies on the same issue (Regolin et al. 1994, Zucca et al. 2005), but improving some critical aspects raised from those studies.
In my opinion the research questions are scientifically valid and personally I think they are interesting to research in cognition, specifically to better understand the roles and evolution of response inhibition in birds.
Overall, the research plan, the experimental setup, and the statistical procedure proposed sounds reasonable, plausible and logic to test the hypothesis presented, and able to drive robust results. The research plan and methodology also seem highly feasible and with enough detail to be understandable and replicated. Thus, I have no doubt that the research plan is possible to occur and give valuable results. Finally, I think the authors anticipated in a reasonable way the control conditions needed to validate results of the test procedures.
However, I think the research plan would benefit from some further clarifications in the methodology, especially explicitly justifying some methodological decision. I made comments for that below. I hope the authors find them useful.
From the critics the authors provide to the previous works of Rogolin et al. and Zucca et al. I agree that in this proposed plan the consistency in the sample size, with an enough sample size to test their prediction (as the authors show), is an improvement regarding the previous 2 studies. Furthermore, I also agree with the proposed adaptation of the test apparatus to the body size of each species and also with doing a simple detour barrier task to each species, in order to have more robust comparisons of the results between species. Finally, I also agree that using a more standardized reward such as food makes results between species more comparable.
However, I do not understand why only 3 trials will be performed per species. If authors want to take into account the influence of learning, are 3 trials enough to assess this goal? A previous comparative work by McLean et al. (2014, 10.1073/pnas.1323533111) and in several other empirical works with different species, the number of trials given to each individual is higher. In several of those studies with fixed number of trials, researcher have chosen around 10 trials to perform, or to do trials until a learning threshold is reached (i.e., a specific number of consecutive trials where individuals successfully detoured the barrier to retrieve food). I agree that in the case of this proposed work the same number of trials should be done to each species, but I lack to understand why only 3 trials will be made, especially considering that authors clearly intend to assess the influence of learning or performance improvement across trials.
I think the methodological procedure should be clarified. It states that “3 trials per day” will be made, but it is not clear whether it would be overall or to each bird. Is each bird entering in habituation or testing days at a time, or this happens to some number of individuals each time or perhaps the group. It is not clear to me how this part of the procedure will occur.
“Prior to each day, birds will be food deprived at 4PM”. Would not make sense that food deprivation was different in each species, being proportional to their body mass? For example, for a canary being food deprived since 4pm of the previous day should result in more hungry, and perhaps more motivation to feed and engage with the apparatus, that for a herring gull which has a larger body mass, as for a canary the costs of being food deprived will be larger, considering the same amount of time. Furthermore, please clarify whether birds will be allowed to feed ad libitum after the end of the habituation and testing periods until 4pm in each day. Finally, if in each day different individuals are tested, will the birds continue to be food deprived until the end of the habituation/testing periods? If not, wouldn’t this influence the willing to feed as well, as they will feed more before the start of the next food deprivation period.
"Sessions" are mentioned throughout the proposal, but only in the statistical analysis a clear indication of what different sessions would be exists: indicating that session 1 uses one type of bars in the barrier, and the session 2 the other type of bars. Before in the predictions section and Table 1, there is a slight indication that a session could be trials using the same barrier type, but not explicitly. But then in the video recording analysis, sessions are used to distinguish habituation vs 2 test sessions.
I am not sure if I missed this in the statistical analysis description, but will the social group be controlled for? As I understood there must be 6 groups of 10 individuals each, for each species. I would imagine that group variation could be accounted for as random effect. If not, could the authors explain their decision?
In the last sentence of the predictions section sound a little bit too vague. Could the authors add a little on what they can predict on this? At least why having a significant three-way interaction could make sense or not in the context of this work.
One last thought, do the authors expect any influence of the fact that all birds in each species did not experience a wild or more natural environment prior the experiments. Furthermore, do the authors expect an influence from differences in development social context prior being included in the groups of 10 individuals.
Minor comments:
- I struggled a bit to understand what is specifically being stated in the first sentences of the 3rd paragraph of introduction.
- in “sample size” section, why are 60 individuals “the largest number that is practically feasible”? Is this only according to the authors’ aviaries conditions constraints?
- I think some explanation should be given on how the maximum testing times were defined, and why the times are different between habituation and testing periods.