Close printable page

Recommendation

How much practice is needed before daily actions are performed in a way that feels habitual?

Zoltan Dienes based on reviews by Benjamin Gardner, Wendy Wood and Adam Takacs

A recommendation of:

STAGE 1

How long does it take to form a habit?: A Multi-Centre Replication

de Wit, S., Bieleke, M., Fletcher, P.C., Horstmann, A., Schüler, J., Brinkhof, L.P., Gunschera, L.J., Murre, J.M.J. https://osf.io/hpsft/?view_only=c8ec62553146496e8b5e4d100a0f08b5 version v4

Read report on server

Abstract

EN

AR

ES

FR

HI

JA

PT

RU

ZH-CN

How long does it take to form a habit?: A Multi-Centre Replication

How long does it take to form a habit? This question will be addressed by an innovative study by Lally et al. (2010), in which they tracked the subjective automaticity of a novel, daily (eating or exercise-related) routine, using the Self-Report Habit Index. They showed that the gradual automatization of a novel routine is best described by an asymptotic curve, and that it takes (a median of) 66 days to reach the asymptotic ‘habit plateau’, with a range of 18 to 254 days (based on statistical extrapolation). However, these findings were based on a small sample of 39 participants, and this influential study has not been replicated yet. Therefore, the aim of the present study is to conduct a near-exact, multi-centre replication at four different locations. We aim to recruit 800 participants to increase reliability.

habit formation, automaticity, behavioural complexity, behavioural consistency, individual differences

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

كم من الوقت يستغرق تكوين العادة؟: النسخ المتماثل متعدد المراكز

كم من الوقت يستغرق تكوين العادة؟ سيتم تناول هذا السؤال من خلال دراسة مبتكرة أجراها Lally et al. (2010)، حيث قاموا بتتبع التلقائية الذاتية لروتين يومي جديد (تناول الطعام أو ممارسة الرياضة)، باستخدام مؤشر عادة التقرير الذاتي. لقد أظهروا أن أفضل وصف للأتمتة التدريجية للروتين الجديد هو المنحنى المقارب، وأن الأمر يستغرق (في المتوسط) 66 يومًا للوصول إلى "هضبة العادة" المقاربة، مع نطاق من 18 إلى 254 يومًا (استنادًا إلى إحصائية). استقراء). ومع ذلك، استندت هذه النتائج إلى عينة صغيرة مكونة من 39 مشاركًا، ولم يتم تكرار هذه الدراسة المؤثرة بعد. ولذلك، فإن الهدف من هذه الدراسة هو إجراء تكرار شبه دقيق ومتعدد المراكز في أربعة مواقع مختلفة. نحن نهدف إلى توظيف 800 مشارك لزيادة الموثوقية.

تكوين العادة، التلقائية، التعقيد السلوكي، الاتساق السلوكي، الفروق الفردية

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

¿Cuánto tiempo se tarda en formar un hábito?: Una replicación multicéntrica

¿Cuánto tiempo lleva formar un hábito? Esta pregunta será abordada por un estudio innovador realizado por Lally et al. (2010), en el que rastrearon la automaticidad subjetiva de una rutina diaria novedosa (relacionada con la alimentación o el ejercicio), utilizando el Índice de Hábitos de Autoinforme. Demostraron que la automatización gradual de una nueva rutina se describe mejor mediante una curva asintótica, y que se necesitan (una media de) 66 días para alcanzar la "meseta del hábito" asintótica, con un rango de 18 a 254 días (según datos estadísticos). extrapolación). Sin embargo, estos hallazgos se basaron en una pequeña muestra de 39 participantes y este influyente estudio aún no se ha replicado. Por lo tanto, el objetivo del presente estudio es realizar una replicación multicéntrica casi exacta en cuatro ubicaciones diferentes. Nuestro objetivo es reclutar 800 participantes para aumentar la confiabilidad.

formación de hábitos, automaticidad, complejidad conductual, coherencia conductual, diferencias individuales

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Combien de temps faut-il pour prendre une habitude ? : Une réplication multicentrique

Combien de temps faut-il pour prendre une habitude ? Cette question sera abordée par une étude innovante de Lally et al. (2010), dans lesquels ils ont suivi l'automaticité subjective d'une nouvelle routine quotidienne (liée à l'alimentation ou à l'exercice), à l'aide de l'indice d'habitudes d'auto-évaluation. Ils ont montré que l'automatisation progressive d'une nouvelle routine est mieux décrite par une courbe asymptotique et qu'il faut (en moyenne) 66 jours pour atteindre le « plateau d'habitudes » asymptotique, avec une plage de 18 à 254 jours (sur la base de données statistiques). extrapolation). Cependant, ces résultats étaient basés sur un petit échantillon de 39 participants et cette étude influente n’a pas encore été reproduite. Par conséquent, le but de la présente étude est de mener une réplication multicentrique presque exacte à quatre endroits différents. Nous visons à recruter 800 participants pour accroître la fiabilité.

formation d'habitudes, automaticité, complexité comportementale, cohérence comportementale, différences individuelles

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

एक आदत बनने में कितना समय लगता है?: एक बहु-केंद्र प्रतिकृति CE90a0c0ae134b45b42fd33064f04dd5 आदत निर्माण, स्वचालितता, व्यवहारिक जटिलता, व्यवहारिक स्थिरता, व्यक्तिगत भिन्नताएँ

आदत बनने में कितना समय लगता है? इस प्रश्न को लैली एट अल द्वारा एक अभिनव अध्ययन द्वारा संबोधित किया जाएगा। (2010), जिसमें उन्होंने सेल्फ-रिपोर्ट हैबिट इंडेक्स का उपयोग करके एक उपन्यास, दैनिक (खाने या व्यायाम से संबंधित) दिनचर्या की व्यक्तिपरक स्वचालितता को ट्रैक किया। उन्होंने दिखाया कि एक नवीन दिनचर्या के क्रमिक स्वचालितीकरण को एक एसिम्प्टोटिक वक्र द्वारा सबसे अच्छा वर्णित किया गया है, और एसिम्प्टोटिक 'आदत पठार' तक पहुंचने में 18 से 254 दिनों की सीमा के साथ 66 दिन लगते हैं (सांख्यिकीय के आधार पर) एक्सट्रपलेशन)। हालाँकि, ये निष्कर्ष 39 प्रतिभागियों के एक छोटे नमूने पर आधारित थे, और इस प्रभावशाली अध्ययन को अभी तक दोहराया नहीं गया है। इसलिए, वर्तमान अध्ययन का उद्देश्य चार अलग-अलग स्थानों पर लगभग सटीक, बहु-केंद्र प्रतिकृति का संचालन करना है। हमारा लक्ष्य विश्वसनीयता बढ़ाने के लिए 800 प्रतिभागियों की भर्ती करना है।

आदत निर्माण, स्वचालितता, व्यवहारिक जटिलता, व्यवहारिक स्थिरता, व्यक्तिगत भिन्नताएँ

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

習慣を形成するのにどれくらい時間がかかりますか?: 多施設の複製

習慣を形成するのにどのくらい時間がかかりますか?この疑問は、ラリーらによる革新的な研究によって解決されるでしょう。 (2010) この研究では、自己報告習慣指数を使用して、新しい毎日の (食事または運動に関連した) ルーチンの主観的な自動性を追跡しました。彼らは、新しいルーチンの段階的な自動化は漸近曲線によって最もよく説明され、漸近的な「習慣プラトー」に達するまでに 66 日 (中央値) かかり、その範囲は 18 日から 254 日であることを示しました (統計に基づく)外挿）。ただし、これらの発見は参加者 39 人の少数のサンプルに基づいており、この影響力のある研究はまだ再現されていません。したがって、本研究の目的は、4 つの異なる場所でほぼ正確な多中心複製を実行することです。信頼性を高めるために、800 人の参加者を募集することを目指しています。

習慣形成、自動性、行動の複雑さ、行動の一貫性、個人差

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Quanto tempo leva para formar um hábito?: Uma replicação multicêntrica

Quanto tempo leva para formar um hábito? Esta questão será abordada por um estudo inovador de Lally et al. (2010), em que rastrearam a automaticidade subjetiva de uma nova rotina diária (relacionada à alimentação ou ao exercício), usando o Índice de Hábitos de Auto-Relato. Eles mostraram que a automatização gradual de uma nova rotina é melhor descrita por uma curva assintótica, e que leva (uma mediana de) 66 dias para atingir o 'platô de hábito' assintótico, com um intervalo de 18 a 254 dias (com base em dados estatísticos). extrapolação). No entanto, estas descobertas foram baseadas numa pequena amostra de 39 participantes, e este estudo influente ainda não foi replicado. Portanto, o objetivo do presente estudo é realizar uma replicação multicêntrica quase exata em quatro locais diferentes. Nosso objetivo é recrutar 800 participantes para aumentar a confiabilidade.

formação de hábitos, automaticidade, complexidade comportamental, consistência comportamental, diferenças individuais

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Сколько времени нужно, чтобы сформировать привычку?: Многоцентровая репликация

Сколько времени нужно, чтобы сформировать привычку? Этот вопрос будет рассмотрен в инновационном исследовании Lally et al. (2010), в котором они отслеживали субъективную автоматичность нового режима дня (связанного с приемом пищи или физических упражнений) с помощью индекса привычек самоотчета. Они показали, что постепенная автоматизация новой рутины лучше всего описывается асимптотической кривой и что для достижения асимптотического «плато привычки» требуется (в среднем) 66 дней с диапазоном от 18 до 254 дней (на основе статистических данных). экстраполяция). Однако эти результаты были основаны на небольшой выборке из 39 участников, и это влиятельное исследование еще не было повторено. Таким образом, цель настоящего исследования — провести почти точную многоцентровую репликацию в четырех разных местах. Мы стремимся привлечь 800 участников, чтобы повысить надежность.

формирование привычек, автоматизм, поведенческая сложность, поведенческая последовательность, индивидуальные различия

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

养成习惯需要多长时间？：多中心复制

养成一个习惯需要多长时间？ Lally 等人的一项创新研究将解决这个问题。（2010），其中他们使用自我报告习惯指数追踪了新颖的日常（与饮食或运动相关的）例行公事的主观自动性。他们表明，渐近曲线可以最好地描述新惯例的逐渐自动化，并且需要（中位数）66 天才能达到渐近“习惯平台”，范围为 18 至 254 天（基于统计）外推）。然而，这些发现是基于 39 名参与者的小样本，这项有影响力的研究尚未被重复。因此，本研究的目的是在四个不同地点进行近乎精确的多中心复制。我们的目标是招募 800 名参与者以提高可靠性。

习惯形成、自动化、行为复杂性、行为一致性、个体差异

Submission: posted 26 May 2022
Recommendation: posted 17 January 2023, validated 17 January 2023

Cite this recommendation as:
Dienes, Z. (2023) How much practice is needed before daily actions are performed in a way that feels habitual?. Peer Community in Registered Reports, . https://rr.peercommunityin.org/articles/rec?id=210

Recommendation

Even small changes in daily life can have a significant impact on one’s health, for example going to the gym at regular times and eating a healthy breakfast. But how long must we do something before it becomes a habit? Lally et al. (2010) tracked the subjective automaticity of a novel, daily (eating or exercise-related) routine. Based on 39 participants, they found a median time of 66 days. This estimate has never been replicated with their exact procedure, so the question remains of how well this holds up. Yet the estimate is useful for knowing how long we have to effortfully make ourselves perform an action until we will do it automatically.

In the current study, de Wit et al. (2023) propose a four-centre near-exact replication of Lally et al. (2010), for which they aim to test 800 subjects to provide a precise estimate of the time it takes to form a habit.

The Stage 1 manuscript was evaluated over four rounds of review. Based on detailed responses to the reviewers' comments, the recommender judged that the manuscript met the Stage 1 criteria and therefore awarded in-principle acceptance (IPA).

URL to the preregistered Stage 1 protocol: https://osf.io/bj9r2

Level of bias control achieved: Level 4. At least some of the data/evidence that will be used to answer the research question already exists AND is accessible in principle to the authors (e.g. residing in a public database or with a colleague), BUT the authors certify that they have not yet accessed any part of that data/evidence.

List of eligible PCI RR-friendly journals:

References

1. Lally, P., van Jaarsveld, C. H. M., Potts, H. W. W., & Wardle, J. (2010). How are habits formed: Modelling habit formation in the real world. European Journal of Social Psychology, 40, 998–1009. https://doi.org/10.1002/ejsp.674

2. de Wit, S., Bieleke, M., Fletcher, P. C., Horstmann, A., Schüler, J., Brinkhof, L. P., Gunschera, L. J., AND Murre, J. M. J. (2023). How long does it take to form a habit?: A Multi-Centre Replication, in principle acceptance of Version 4 by Peer Community in Registered Reports. https://osf.io/bj9r2

Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

Reviews

Evaluation round #4

DOI or URL of the report: https://osf.io/e8ws2/?view_only=c8ec62553146496e8b5e4d100a0f08b5

Version of the report: v3

Author's Reply, 12 Jan 2023

Download author's reply Download tracked changes file https://doi.org/10.24072/pci.rr.100210.ar4

Decision by Zoltan Dienes, posted 03 Dec 2022, validated 03 Dec 2022

We are almost there, but there is an inferential problem in your Design Table. For H2 (and similarly H3) you say "Therefore, not obtaining a significant effect will not be treated as evidence against the theory. " but this does not follow from your claim to have determined a minimally theroetically interesting effect. The point arises because you have not determined a minimally interesting effect, but rather an effect you happened to be powered to detect (which does seem very small, so it seems quite relevant to inference). The way to put it, if you are basing inferences on power, is that you will report the minimal effect you are powered to detect without claiming evidence against the theory as a whole, precisely because you have not determined a theoretically relevant minimal effect. The next column switches to a different inference procedure, what I call "inference by intervals". Namely you say "Conversely, if the lower bounds of confidence intervals are below the minimal interesting effect, then this indicates that performing the behaviour consistently is irrelevant for automatization." If you did have a minimally interesting effect, it would be if the upper bound of the CI was below the minimally interesting effect that the effect could not be larger than the miimally interesting; if the lower bound is below, your CI may span interesting and uninteresting effects, so one would withold judgment. But as you don't have a theoretically justified minimally interesting effect, this does not quite work anyway. If you stick to power you can say simply a non-significant result does not count against the theory as a whole, but you were powered to pick up effects larger than X units difference. (Incidentally I prefer inference by intervals, as you have in the interpretation column rather than simply relying on power, because it makes use of the data you actually obtained - the CI tells you what effects you are allowed to rule out in the light of the data, no matter what you were originally powered to do, without considering the data. But then inference hinges on the minimally interesting effect size, so for conclusions to related to theory, that minimal effect better depends on theory rather than on the theory-irrelevant issues such as a researcher's resources.)

https://doi.org/10.24072/pci.rr.100210.d4

Evaluation round #3

DOI or URL of the report: https://osf.io/e8ws2/?view_only=c8ec62553146496e8b5e4d100a0f08b5

Version of the report: v3

Author's Reply, 01 Dec 2022

Download author's reply Download tracked changes file

Dear Zoltan,

Please find the rebuttal letter and the revised manuscript attached. We look forward to hearing from you!

All the best,

Lukas

https://doi.org/10.24072/pci.rr.100210.ar3

Decision by Zoltan Dienes, posted 11 Nov 2022

Thank you for your excellent revision. Some of my points have not been yet fully taken on board; this is not surprising as RRs often require thinking in ways people are rarely used to.

1) minor point: p 4 " even more powerful"Given what has been said so far the issue is not power which is about testing hypotheses, but the precision of estimation (i.e. narrower confidence intervals). So it would be better to say your study will have more precision.

2) The main issues concern control of multiple testing; but especially how one draws inferences without justifying what minial effect size is of scientific relevance, for rows 2 and 3 of the Design Table.

But first row 1. The final column of the first row is no longer accurate; you will not be testing Hull's theory.

In terms of "interpretation" in the first row, you can leave it as it is. But whether the best estimate is really exactly 66 is not really the point. I would think if the overall confidence interval included any values in say the range 55 - 75, the original estimate was really pretty good. (Imagine you ran a million subjects and no CIs included 66, but they were all tightly formed around 61 days. One would still conclude that the original study had done a fine job of estimating the time taken.) So you mghit want to rephrase along these lines.

Second row. Three tests are suggested here. Will you conclude consistent performance is important if any one of them are significant? If so you should use a bonferroni or other familywise error correction. If not specify how you will draw conclusions - will you infer consistency is important only if all three tests are significant, for example? You do not justify a minimally interesting effect size i.e. give reasons why there is minimally interesting effect relevant to this scientific problem. Thus, a non-significant result does not count against any theory. Thus, the inference in the final column is incorrect.

Third row. Be clear about which family you're correcting for: Are you correcting for the fact you are looking at 5 DVs? Or for pairwise comparisons for each of those ANOVAs? State how you will correct for each. The same issue mentioned for row 2 also arises here, you find it hard to give reasons for why some effect is of minimal interest. That is not surprising, but it does mean a non-significant result is not support for any conclusion. What if an effect were interesting that were smaller than even the small ones you are powered to detect? Then you haven't got evidence against any theory.

So one has two choices: either give scientific reasons for why an effect size is relevant (in the case of the significance testing you are doing, that means a minimal interesting effect; see https://psyarxiv.com/yc7s5/); or else forgo any claims that one can find evidence against an effect existing. In the latter case one could simply estimate relevant mean differences and draw no existential conclusions. So for your second and third rows you could just estimate relevant mean differences and make claims like: given there is an effect, it is between these bounds (i.e. a 95% CI). The CIs would still be corrected for multiple testing. If you go this way, do not slip in any claims of having shown there is a an effect or not effect, just stick to saying how big it is.

A third option is that one suspends judgment for all non-significant results. This was a position Fisher himself sometimes took. I personally don't like taking this option because it means there is no way of getting evidence against a theory that predicts an effect. Many RR journals will not accept it either.

https://doi.org/10.24072/pci.rr.100210.d3

Evaluation round #2

DOI or URL of the report: https://osf.io/e8ws2/?view_only=c8ec62553146496e8b5e4d100a0f08b5

Version of the report: v3

Author's Reply, 04 Nov 2022

Download author's reply Download tracked changes file

Dear Zoltan,

Thanks for your patience. We have addressed the suggestions and comments in the manuscript and revision letter below, and look forward to hearing from you.

All the best,

Lukas

https://doi.org/10.24072/pci.rr.100210.ar2

Decision by Zoltan Dienes, posted 27 Jul 2022

Dear Lukas

I now have three positive reviews back for your submission, which are as a whole enthusiastic about the planned research. In addition to other points raised by the reviewers, also address the following in your revision:

1) Wood raises the issue of whether data has already been collected. You say in your cover letter that it has not, but the verb tense used in other places implies it might be otherwise e.g. "The detailed study protocol, materials, anonymized raw data, code used in the analyses and output are permanently stored on Open Science Framework (https://osf.io/n6srx/)" (which I didn't have access to), and the frequent use of past tense for Method. To be clear, assuming data have not been collected, for Stage 1 use future tense throughout for things that have not happened, including Method (and e.g. "the data will be permanently stored.."). Future tense can then be changed to past tense in all cases when the Stage 2 is submitted. (Of course if data have been collected, than clarify this as well, and also therefore what precautions are in place for bias control.)

2) In the results section indicate *exactly* what analyses you will do. The specification should be so clear that there is no analytic flexibility left. Make sure for example that anyone could fit the curves exactly the same way you will, and so they could precisely reproduce your results with your data. Be clear what the other "plausible curve" is that you will fit in addition (I know you refer to the original authors, but this information should be in your mansucript if it figures in your analytic pipeline.). be clear exactly what comparisons are done with what error control (more on this below). Another example "We will also perform multiple regression analyses to determine whether impulsivity, personal need for structure and conscientiousness were related to curve parameters and performance variables." State how many regressions you will perform. Specify the variables for each regression. If you mention analyses in the results section also put them in the Design Table and justify power for each analysis and what conclusions hang on each. Alternatively, leave out mentioning any analyses you do not want to tie down exactly (nor make sure are properly powered) and put them in a non-pre-registered section in the Stage 2. (There are other cases where the analysis needs to be tied down I have not mentioned.)

3) Relatedly, in comparing linear and s-shaped functions you say you will do a sign test on R2's. But you also provide cogent reasons for why this is a flawed strategy, and hence say you will also use BIC and AIC to aid the reader. This leaves a lot of inferential wriggle room. A possible outcome is higher R2 for the s-shaped than linear function because of the difference in parameter numbers, but BIC comes down the other way. Has your manipulation check on the form of the function failed or not? Justify one measure as most suitable - and I presume it will be either BIC or AIC or related, e.g. an informed Bayes factor - and test the form of the function with that measure, stating your criteria for good enough evidence for choosing s-shape over linear functions (or one s-shape over another).

4) As per my previous point, it seems to me that providing evidence for your exponential function is more an outcome neutral test than a test of theory. You point out that the original authors were motivated by Hull's theory of habit; and thus you frame the choice of function as testing Hull's theory, by testing the predicted shape of the function defining the increase of habit strength over time. However, strength will be measured on Likert scales, therefore with fixed minimum and maximum values. A linear function is therefore a priori ruled out if testing continues long enough. Something at least approximating an s-shape is guaranteed by the nature of the measurement. Therefore no theory can be at stake depending on the outcome of this test. Rather, obtaining something like an s-shape is a necessary precondition for your study in order to estimate when a habit has formed.

5) I take it you plan to perform all pairwise comparisons between the 5 data sets (your 4 plus original) with Kolmogorov-Smirnov tests, which is 10 tests. How will you control familywise error rate? Given that corrected alpha, determine N to control power to detect a difference you are interested in. How far apart should median number of days be before it is interesting? As Gardner points out, 66 days is just a rough figure; 59 is in the same ball park. Is there any way (other than reaching deep in one's soul) for specifying how far away from 66 would start to be interesting? (e.g. https://psyarxiv.com/yc7s5/ ) Once you have justified an interesting difference, use simulations to determine the power of KS to detect the effects that are just interesting. (And you should note in the paper that KS is sensitive to more than location differences, as a proviso on your analysis.)

6) For RQ3, specify an interesting effect size in raw units: What difference in rated automaticity would be just interesting? (When several LIkert ratings are combined, I find using an average rather than sum over number of ratings useful to put the final number on the same scale as the rating itself, so one has a more intuitive grasp of what one unit is.) Otherwise we just have "medium effect size" plucked out of the air; and being standardized it depends on measurement noise and reliability. But presumably what is interesting is the actual difference in automaticity.

7) In terms of what defines an interesting effect, Takacs asks "for RQ4, would a non-significant result prove that “complexity is irrelevant for automatization”?" You can just qualify the conclusion that it applies to this particular difference in complexity. (If in addition you could qunatify or measure the complexity difference (and I am not requiring you do) it would help place the conclusion in perspective too.) How will you control familywise error rate for number of Dvs? Determine interesting differences in raw units, then determine power for those diferences, taking into account the corrected alpha. Specify the IV and its levels - is it more than 2 as you are performing a KW? Will there be post hocs? What conclusions follow from different patterns? There is also inferential flexibility in specifying both ANOVA and KW. Justify one, or provide a decision procedure for choosing between them (one that does not allow wriggle room).

best

Zoltan

https://doi.org/10.24072/pci.rr.100210.d2

Reviewed by Benjamin Gardner, 26 Jun 2022

This is a note-perfect replication of a seminal study on habit formation (Lally et al., 2010). The report meets all criteria for a Stage 1 replication study: the research questions are scientifically valid; the hypotheses are logical and plausible; the methodology is sound and feasible, and as described, permits replication. I have only two comments.

1. One important methodological difference between the original study and the present study, as the authors openly acknowledge (on p6), is that, whereas the participants in the original study met with the researcher in-person in a lab, replication study participants will meet the researcher online via video conferencing. This is important because motivation is needed to initiate and maintain a habit formation attempt before habit solidifies. This difference could feasibly affect results in two ways. First, providing support, advice and/or guidance in person might be inherently more motivating than doing so online. Second, participants who are willing to travel to a lab in central London to participate (as in Lally et al's study) may be inherently more motivated than those who are only required to meet via video conferencing. Do the authors view the difference in meeting format as a problem, and if so, how might it affect their results, and to what extent might this mitigate this?

2. Hypothesis 2 focuses on testing whether habit really does peak after 66 days, as Lally et al found. This seems overly restrictive; even if Lally et al's findings are 'true', I very much doubt that a replication of this result would find habit to peak at exactly 66 days. (For example, Keller et al [2021] found a once-daily behaviour to peak in habit strength after 59 days. While not exactly 66 days, this finding intuitively appears in keeping with Lally et al's findings.) Will the authors conclude that Lally et al's findings have not been replicated if the peak habit duration is NOT 66 days? Or is there an acceptable range within which a peak other than 66 days might sit *and* Lally et al's findings be supported?

Benjamin Gardner

University of Surrey, UK

https://doi.org/10.24072/pci.rr.100210.rev21

Reviewed by Wendy Wood, 10 Jul 2022

This is an important research project that proposes to replicate an earlier investigation by Lally et al. (2010). The authors are correct that this earlie investigation had very few participants and consequently unstable results, despite that it has been cited over 2000 times on GoogleScholar. The opportunity to replicate this research and in addition to assess individual differences makes this a highly useful piece of research for science and for popular understanding. For these reasons, I believe it should be accepted for publication.

The guidelines for evaluating registered reports we were given are:

1A. The scientific validity of the research question(s): I take this as a question of how important is the research, and I answered this above.

1B. The logic, rationale, and plausibility of the proposed hypotheses: This project will be informative whatever the results.

1C. The soundness and feasibility of the methodology and analysis pipeline: The research has apparently already been conducted, and the data analysis is already in progress (?). For this reason, I am not going to comment in any detail on the methods and procedures except to note that it would be helpful to add a kind of intention-to-treat analysis that assesses the effects of participant attrition on the conclusions drawn.

1D. Whether the clarity and degree of methodological detail is sufficient: The authors rely heavily on the original project and the protocol given, and so clarity is presumably assured.

1E. Whether the authors have considered sufficient outcome-neutral conditions: In this case, the one major threat to validity is the daily report procedure. The authors do not address this or provide any insight into how they will handle it. But it leaves readers wondering whether the results would be the same if participants were not reminded each day about their behavior by the habit questionnaire.

Although I am very favorable toward acceptance, I wonder why the authors are submitting a registered report Stage 1 for a project with already-collected data (Level 5 o 6?). How far have the authors proceeded with data collection and analysis? I think that the project is so worthwhile, it should be attractive at a number of journal outlets. So, I'm not clear why this publication format.

https://doi.org/10.24072/pci.rr.100210.rev22

Reviewed by Adam Takacs, 05 Jul 2022

Download the review https://doi.org/10.24072/pci.rr.100210.rev23

Evaluation round #1

DOI or URL of the report: https://osf.io/e8ws2/?view_only=c8ec62553146496e8b5e4d100a0f08b5

Author's Reply, 24 Jun 2022

Download author's reply

Dear Zoltan,

Thanks for your comments, and my apologies for the delay in getting back to you. We have addressed your remarks by adding power calculations for the third and fourth research questions in the revised study design table. Moreover, we rewrote sections of the analysis and interpretation column to enhance readability.

We look forward to your comments.

All the best,

Lukas

https://doi.org/10.24072/pci.rr.100210.ar1

Decision by Zoltan Dienes, posted 09 Jun 2022, validated 11 Nov 2022

Dear Dr de Wit

Thank you for your clearly written submission "the shape of habits". Before I send to review, one point needs to be addressed. Namely, you should demonstrate sensitivity to answer each question in your Design Table; or put another way, you should be able to, or at least know whether you can, obtain evidence against any existential claims made. At the moment, N is determined with respect to one issue, namely the precision with which the median number of days is estimated. But you also ask a number of other questions, and it needs to be shown whether you have the power or sensitivity to answer each of them, considered on their own terms. Some journals require a certain power or Bayes factor threshold for any hypothesis testing; and for those that have no set requirement, it should be known whether you did or did not e.g. have the power to justify asserting H0 (or do or do not have the resources to achieve good Bayesian evidence). Some advice is given here for approaching the problem: https://psyarxiv.com/yc7s5/ For each row of your Design Table, show that you will (or will not) have the means to obtain evidence that would show the claim wrong if it were wrong.

best

Zoltan

https://doi.org/10.24072/pci.rr.100210.d1