Recommendation

Evaluation of an immersive virtual reality wayfinding task

Robert McIntosh based on reviews by Conor Thornberry, Gavin Buckingham and 1 anonymous reviewer

A recommendation of:

STAGE 1

Evaluation of spatial learning and wayfinding in a complex maze using immersive virtual reality. A registered report

Eudave L., Martínez M., Valencia M., Roth D. https://osf.io/c2zvr version 5

Read report on server

Abstract

EN

AR

ES

FR

HI

JA

PT

RU

ZH-CN

Evaluation of spatial learning and wayfinding in a complex maze using immersive virtual reality. A registered report

Objectives : Mazes have traditionally been used as tools for evaluating spatial learning and navigational abilities in humans. They have been also utilized in sleep and dream research, as wayfinding is a common dream theme and participants undergoing experiments in the laboratory often dream about it. One such maze is the virtual maze task (VMT) created by Wamsley et al. (2010) to study the impact of sleep and dreaming in learning. Despite positive results found in several of those studies (dreaming of the VMT improves task performance), others failed to replicate these findings, possibly due to intrinsic methodological difficulties such as low task incorporation in dreams and the presence of cybersickness symptoms during task execution. It is possible that by using an adequately designed immersive virtual reality experience, which allows for a more naturalistic, stimulating and engaging simulation, these handicaps can be overcome. This Registered Report therefore aims to reproduce the original VMT version and compare it with an immersive virtual reality (iVR) adapted version using several wayfinding performance dependent measures. Methods: In this within-subjects study, a sample of 62 participants carried out both versions (Desktop vs. iVR) of the VMT task (pseudo-randomly allocated, counterbalanced), where we measured performance and path variables. They then completed self-report measures of cybersickness symptoms, sense of presence during the task and a test for the assessment of perspective taking. Results : [TBD]. Conclusions : [TBD].

virtual reality, navigation, spatial learning, maze, cybersickness

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

تقييم التعلم المكاني وإيجاد الطريق في متاهة معقدة باستخدام الواقع الافتراضي الغامر. تقرير مسجل

الأهداف : تم استخدام المتاهات تقليديًا كأدوات لتقييم التعلم المكاني والقدرات الملاحية لدى البشر. لقد تم استخدامها أيضًا في أبحاث النوم والأحلام، نظرًا لأن اكتشاف الطريق هو موضوع أحلام شائع وغالبًا ما يحلم به المشاركون الذين يخضعون للتجارب في المختبر. إحدى هذه المتاهات هي مهمة المتاهة الافتراضية (VMT) التي أنشأها Wamsley et al. (2010) لدراسة أثر النوم والأحلام في التعلم. على الرغم من النتائج الإيجابية التي تم العثور عليها في العديد من تلك الدراسات (الحلم بالـ VMT يحسن أداء المهمة)، فشلت دراسات أخرى في تكرار هذه النتائج، ربما بسبب الصعوبات المنهجية الجوهرية مثل انخفاض دمج المهام في الأحلام ووجود أعراض دوار الإنترنت أثناء تنفيذ المهمة. من الممكن التغلب على هذه العوائق باستخدام تجربة واقع افتراضي غامرة ومصممة بشكل مناسب، والتي تسمح بمحاكاة أكثر طبيعية وتحفيزًا وجاذبية. ولذلك يهدف هذا التقرير المسجل إلى إعادة إنتاج نسخة VMT الأصلية ومقارنتها بنسخة معدلة للواقع الافتراضي (iVR) باستخدام العديد من المقاييس المعتمدة على أداء تحديد الطريق. الطرق: في هذه الدراسة التي أجريت داخل الموضوعات، نفذت عينة مكونة من 62 مشاركًا كلا الإصدارين (سطح المكتب مقابل iVR) لمهمة VMT (تخصيص عشوائي زائف ومتوازن)، حيث قمنا بقياس الأداء والمسار المتغيرات. ثم أكملوا مقاييس التقرير الذاتي لأعراض داء الإنترنت، والشعور بالوجود أثناء المهمة، واختبارًا لتقييم تبني المنظور. النتائج : [سيتم تحديدها لاحقًا]. الاستنتاجات : [سيتم تحديدها لاحقًا].

الواقع الافتراضي، الملاحة، التعلم المكاني، المتاهة، دوار الفضاء الإلكتروني

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Evaluación del aprendizaje espacial y orientación en un laberinto complejo utilizando realidad virtual inmersiva. Un informe registrado

Objetivos : Los laberintos se han utilizado tradicionalmente como herramientas para evaluar el aprendizaje espacial y las habilidades de navegación en humanos. También se han utilizado en la investigación del sueño y los sueños, ya que la orientación es un tema común en los sueños y los participantes que realizan experimentos en el laboratorio a menudo sueñan con ello. Uno de esos laberintos es la tarea de laberinto virtual (VMT) creada por Wamsley et al. (2010) para estudiar el impacto del sueño y los sueños en el aprendizaje. A pesar de los resultados positivos encontrados en varios de esos estudios (soñar con el VMT mejora el desempeño de la tarea), otros no lograron replicar estos hallazgos, posiblemente debido a dificultades metodológicas intrínsecas, como la baja incorporación de tareas en los sueños y la presencia de síntomas de ciberenfermedad durante la ejecución de las tareas. Es posible que utilizando una experiencia de realidad virtual inmersiva diseñada adecuadamente, que permita una simulación más naturalista, estimulante y atractiva, se puedan superar estas desventajas. Por lo tanto, este Informe Registrado tiene como objetivo reproducir la versión VMT original y compararla con una versión adaptada de realidad virtual inmersiva (iVR) utilizando varias medidas dependientes del rendimiento de la orientación. Métodos: En este estudio intrasujetos, una muestra de 62 participantes llevó a cabo ambas versiones (Desktop vs. iVR) de la tarea VMT (asignada pseudoaleatoriamente, contrapesada), donde medimos el rendimiento y la ruta. variables. Luego completaron medidas de autoinforme sobre los síntomas del ciberenfermo, la sensación de presencia durante la tarea y una prueba para evaluar la toma de perspectiva. Resultados : [por determinar]. Conclusiones : [por determinar].

realidad virtual, navegación, aprendizaje espacial, laberinto, ciberenfermo

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Évaluation de l'apprentissage spatial et de l'orientation dans un labyrinthe complexe en utilisant la réalité virtuelle immersive. Un rapport enregistré

Objectifs : les labyrinthes sont traditionnellement utilisés comme outils pour évaluer l'apprentissage spatial et les capacités de navigation chez les humains. Ils ont également été utilisés dans la recherche sur le sommeil et les rêves, car l'orientation est un thème de rêve courant et les participants aux expériences en laboratoire en rêvent souvent. L'un de ces labyrinthes est la tâche de labyrinthe virtuel (VMT) créée par Wamsley et al. (2010) pour étudier l'impact du sommeil et du rêve sur l'apprentissage. Malgré les résultats positifs trouvés dans plusieurs de ces études (rêver du VMT améliore la performance des tâches), d'autres n'ont pas réussi à reproduire ces résultats, probablement en raison de difficultés méthodologiques intrinsèques telles que la faible incorporation des tâches dans les rêves et la présence de symptômes de cybermaladie lors de l'exécution des tâches. Il est possible qu’en utilisant une expérience de réalité virtuelle immersive bien conçue, qui permet une simulation plus naturaliste, stimulante et engageante, ces handicaps puissent être surmontés. Ce rapport enregistré vise donc à reproduire la version originale du VMT et à la comparer avec une version adaptée à la réalité virtuelle immersive (iVR) en utilisant plusieurs mesures dépendantes des performances d'orientation. Méthodes : dans cette étude intra-sujets, un échantillon de 62 participants a réalisé les deux versions (Desktop vs. iVR) de la tâche VMT (attribuée de manière pseudo-aléatoire, contrebalancée), où nous avons mesuré les performances et le cheminement. variables. Ils ont ensuite complété des mesures d'auto-évaluation des symptômes de cybermaladie, du sentiment de présence pendant la tâche et un test pour évaluer la prise de perspective. Résultats : [à déterminer]. Conclusions : [à déterminer].

réalité virtuelle, navigation, apprentissage spatial, labyrinthe, cybermaladie

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

स्थानिक सीखने का मूल्यांकन और इमर्सिव आभासी वास्तविकता का उपयोग करके एक जटिल भूलभुलैया में रास्ता खोजना। दर्ज रिपोर्ट

उद्देश्य : भूलभुलैया का उपयोग पारंपरिक रूप से मनुष्यों में स्थानिक सीखने और नेविगेशन क्षमताओं के मूल्यांकन के लिए उपकरण के रूप में किया जाता रहा है। उनका उपयोग नींद और स्वप्न अनुसंधान में भी किया गया है, क्योंकि रास्ता खोजना एक सामान्य स्वप्न विषय है और प्रयोगशाला में प्रयोगों से गुजरने वाले प्रतिभागी अक्सर इसके बारे में सपने देखते हैं। ऐसी ही एक भूलभुलैया है वर्चुअल भूलभुलैया टास्क (वीएमटी) जिसे वैम्सले और अन्य ने बनाया है। (2010) सीखने में नींद और सपने देखने के प्रभाव का अध्ययन करना। उनमें से कई अध्ययनों में सकारात्मक परिणाम पाए जाने के बावजूद (वीएमटी का सपना देखने से कार्य प्रदर्शन में सुधार होता है), अन्य लोग इन निष्कर्षों को दोहराने में विफल रहे, संभवतः आंतरिक कार्यप्रणाली कठिनाइयों जैसे कि सपनों में कम कार्य समावेशन और कार्य निष्पादन के दौरान साइबरसिकनेस लक्षणों की उपस्थिति के कारण। यह संभव है कि पर्याप्त रूप से डिज़ाइन किए गए इमर्सिव आभासी वास्तविकता अनुभव का उपयोग करके, जो अधिक प्राकृतिक, उत्तेजक और आकर्षक सिमुलेशन की अनुमति देता है, इन बाधाओं को दूर किया जा सकता है। इसलिए इस पंजीकृत रिपोर्ट का उद्देश्य मूल वीएमटी संस्करण को पुन: पेश करना और कई तरह के प्रदर्शन पर निर्भर उपायों का उपयोग करके एक इमर्सिव वर्चुअल रियलिटी (आईवीआर) अनुकूलित संस्करण के साथ तुलना करना है। तरीके: इस भीतर-विषयों के अध्ययन में, 62 प्रतिभागियों के एक नमूने ने वीएमटी कार्य के दोनों संस्करणों (डेस्कटॉप बनाम आईवीआर) को पूरा किया (छद्म-यादृच्छिक रूप से आवंटित, प्रतिसंतुलित), जहां हमने प्रदर्शन और पथ को मापा चर। फिर उन्होंने साइबर बीमारी के लक्षणों, कार्य के दौरान उपस्थिति की भावना और परिप्रेक्ष्य लेने के मूल्यांकन के लिए एक परीक्षण की स्व-रिपोर्ट उपाय पूरे किए। परिणाम : [टीबीडी]। निष्कर्ष : [टीबीडी]।

आभासी वास्तविकता, नेविगेशन, स्थानिक शिक्षा, भूलभुलैया, साइबर बीमारी

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

没入型仮想現実を使用した複雑な迷路での空間学習と経路探索の評価。登録されたレポート

目的 : 迷路は伝統的に、人間の空間学習とナビゲーション能力を評価するツールとして使用されてきました。ウェイファインディングは一般的な夢のテーマであり、研究室で実験を受けている参加者はよく夢を見るため、睡眠や夢の研究にも利用されています。そのような迷路の 1 つは、Wamsley らによって作成された仮想迷路タスク (VMT) です。 (2010) 学習における睡眠と夢の影響を研究しました。これらの研究のいくつかで肯定的な結果が見つかったにもかかわらず（VMTの夢を見るとタスクのパフォーマンスが向上する）、他の研究ではこれらの結果を再現できませんでした。これはおそらく、夢の中にタスクが組み込まれていないことや、タスク実行中のサイバーシックの症状の存在など、本質的な方法論的困難が原因であると考えられます。より自然で刺激的で魅力的なシミュレーションを可能にする、適切に設計された没入型仮想現実体験を使用することで、これらのハンディキャップを克服できる可能性があります。したがって、この登録レポートは、オリジナルの VMT バージョンを再現し、いくつかのウェイファインディングのパフォーマンスに依存する尺度を使用して、それを没入型仮想現実 (iVR) に適応させたバージョンと比較することを目的としています。 方法: この被験者内研究では、62 人の参加者のサンプルが両方のバージョン (デスクトップと iVR) の VMT タスク (擬似ランダム割り当て、カウンターバランス) を実行し、パフォーマンスとパスを測定しました。変数。その後、サイバーシックの症状、作業中の臨場感、視点の捉え方の評価テストの自己申告測定を完了しました。結果 : [未定]。結論 : [未定]。

仮想現実、ナビゲーション、空間学習、迷路、サイバーシック

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Avaliação da aprendizagem espacial e orientação em um labirinto complexo usando realidade virtual imersiva. Um relatório registrado

Objetivos : Os labirintos têm sido tradicionalmente usados como ferramentas para avaliar a aprendizagem espacial e as habilidades de navegação em humanos. Eles também têm sido utilizados na pesquisa do sono e dos sonhos, já que a orientação é um tema comum nos sonhos e os participantes que passam por experimentos em laboratório muitas vezes sonham com isso. Um desses labirintos é a tarefa do labirinto virtual (VMT) criada por Wamsley et al. (2010) para estudar o impacto do sono e dos sonhos na aprendizagem. Apesar dos resultados positivos encontrados em vários desses estudos (sonhar com o VMT melhora o desempenho da tarefa), outros não conseguiram replicar esses achados, possivelmente devido a dificuldades metodológicas intrínsecas, como a baixa incorporação da tarefa nos sonhos e a presença de sintomas de ciberenjôo durante a execução da tarefa. É possível que, através da utilização de uma experiência de realidade virtual imersiva adequadamente concebida, que permita uma simulação mais naturalista, estimulante e envolvente, estas desvantagens possam ser ultrapassadas. Este Relatório Registrado visa, portanto, reproduzir a versão original do VMT e compará-la com uma versão adaptada de realidade virtual imersiva (iVR) usando diversas medidas dependentes do desempenho de orientação. Métodos: Neste estudo intra-sujeitos, uma amostra de 62 participantes realizou ambas as versões (Desktop vs. iVR) da tarefa VMT (alocada pseudo-aleatoriamente, contrabalançada), onde medimos o desempenho e o caminho variáveis. Eles então completaram medidas de autorrelato de sintomas de doença cibernética, sensação de presença durante a tarefa e um teste para avaliar a tomada de perspectiva. Resultados : [a definir]. Conclusões : [a definir].

realidade virtual, navegação, aprendizagem espacial, labirinto, doença cibernética

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Оценка пространственного обучения и ориентации в сложном лабиринте с использованием иммерсивной виртуальной реальности. Зарегистрированный отчет

Цели . Лабиринты традиционно использовались в качестве инструментов для оценки пространственных способностей человека к обучению и навигации. Их также использовали в исследованиях сна и сновидений, поскольку поиск пути является распространенной темой снов, и участники, проводящие эксперименты в лаборатории, часто видят это во сне. Одним из таких лабиринтов является задача виртуального лабиринта (VMT), созданная Вамсли и др. (2010) для изучения влияния сна и сновидений на обучение. Несмотря на положительные результаты, полученные в нескольких из этих исследований (сны о VMT улучшают выполнение задач), другие не смогли повторить эти результаты, возможно, из-за внутренних методологических трудностей, таких как низкий уровень включения задач во сне и наличие симптомов киберболезни во время выполнения задач. Вполне возможно, что эти недостатки можно преодолеть, используя адекватно спроектированную иммерсивную виртуальную реальность, которая обеспечивает более натуралистическую, стимулирующую и увлекательную симуляцию. Таким образом, этот зарегистрированный отчет направлен на воспроизведение исходной версии VMT и сравнение ее с адаптированной версией иммерсивной виртуальной реальности (iVR) с использованием нескольких ориентировочных показателей, зависящих от производительности. Методы . В этом внутрисубъектном исследовании выборка из 62 участников выполнила обе версии (настольную и iVR) задачи VMT (псевдослучайно распределенную, уравновешенную), где мы измеряли производительность и путь переменные. Затем они заполняли самоотчеты о симптомах киберболезни, ощущении присутствия во время выполнения задания и тест на оценку взгляда на ситуацию. Результаты : [TBD]. Выводы : [TBD].

виртуальная реальность, навигация, пространственное обучение, лабиринт, киберболезнь

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

使用沉浸式虚拟现实评估复杂迷宫中的空间学习和寻路。注册报告

目标：迷宫传统上被用作评估人类空间学习和导航能力的工具。它们也被用于睡眠和梦境研究，因为寻路是一个常见的梦境主题，在实验室进行实验的参与者经常会梦见它。 Wamsley 等人创建的虚拟迷宫任务 (VMT) 就是这样的迷宫之一。（2010）研究睡眠和做梦对学习的影响。尽管其中几项研究取得了积极的结果（梦见 VMT 可以提高任务表现），但其他研究未能复制这些发现，这可能是由于内在的方法论困难，例如梦中的任务融入度较低以及任务执行过程中出现晕机症状。通过使用经过充分设计的沉浸式虚拟现实体验，可以实现更加自然、刺激和引人入胜的模拟，这些障碍是可以克服的。因此，本注册报告旨在重现原始 VMT 版本，并将其与使用多种寻路性能相关测量的沉浸式虚拟现实 (iVR) 改编版本进行比较。方法：在这项受试者内研究中，62 名参与者样本执行了 VMT 任务的两个版本（桌面版与 iVR 版）（伪随机分配、平衡），我们测量了表现和路径变量。然后，他们完成了晕机症状、任务期间存在感的自我报告测量以及观点采择评估测试。结果：[待定]。结论：[待定]。

虚拟现实、导航、空间学习、迷宫、晕机症

Submission: posted 31 March 2023
Recommendation: posted 04 September 2023, validated 08 September 2023

Cite this recommendation as:
McIntosh, R. (2023) Evaluation of an immersive virtual reality wayfinding task. Peer Community in Registered Reports, . https://rr.peercommunityin.org/articles/rec?id=439

Recommendation

The Virtual Maze Task (VMT) is a digital desktop 2D spatial learning task that has been used for research into the effect of sleep and dreaming on memory consolidation (e.g. Wamsley et al, 2010). One limitation of this task has been low rates of reported dream incorporation. Eudave and colleagues (2023) have created an immersive virtual reality (iVR) version of the VMT, which they believe might be more likely to be incorporated into dreams. As an initial step in validating this task for research, they propose a within-subjects study to compare three measures of spatial learning between the 2D desktop and iVR versions. Based on a review of relevant literature, the prediction is that performance will be similar between the two task versions. The planned sample size (n = 62) is sufficient for a .9 power test of equivalence within effect size bounds of d = -.47 to .47. Additional independent variables (gender, perspective-taking ability) and dependent measures (self-reported cybersickness and sense of presence) will be recorded for exploratory analyses.

The study plan was refined across four rounds of review, with input from two external reviewers and the recommender, after which it was judged to satisfy the Stage 1 criteria for in-principle acceptance (IPA).†

URL to the preregistered Stage 1 protocol: https://osf.io/wba2v

Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA.

List of eligible PCI RR-friendly journals:

References

Eudave, L., Martínez, M., Valencia, M., & Roth D. (2023). Evaluation of spatial learning and wayfinding in a complex maze using immersive virtual reality. A registered report. In principle acceptance of Version 5 by Peer Community in Registered Reports.

Wamsley, E. J., Tucker, M., Payne, J. D., Benavides, J. A., & Stickgold, R. (2010). Dreaming of a learning task is associated with enhanced sleep-dependent memory consolidation. Current Biology, 20, 850–855. https://doi.org/10.1016/j.cub.2010.03.027

† There is one minor change that the authors should make to the Methods section, which is sufficiently small that it can be incorporated at Stage 2: "if both tests reject the null hypothesis (observed data is less/greater than the lower/upper equivalence bounds), conditions are considered statistically equivalent" >> suggest changing "less/greater" to "greater/lesser" for correct correspondence with "lower/upper".

Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

Reviews

Evaluation round #4

DOI or URL of the report: https://osf.io/z27w4

Version of the report: 4

Author's Reply, 02 Sep 2023

Download author's reply Download tracked changes file

Dear Editor,

Thanks for your comments and suggestions (not annoying at all). We understand our message and intention must be clear to everyone, so we've tried made modifications based on your comments, and hopefully the rationale is easier to understand now. If further changes need to be done please let us know (thanks for the patience, as well).

Luis (and team)

https://doi.org/10.24072/pci.rr.100439.ar4

Decision by Robert McIntosh, posted 30 Aug 2023, validated 31 Aug 2023

Thanks for your further revision of this paper. We are getting closer to an acceptable version, but each time that I try to formulate my recommendation (which involves a close re-reading of the paper), I find that I remain confused or dissatisfied with parts of the manuscript/plan.

In the current version:

1) You have now relegated the statistical tests relating to cybersickness, sense of presence and perspective-taking to exploratory status, and removed these analyses from the design table. However, you still have a paragraph in Methods (p19) that describes the planned analyses, including the alpha level, approach to multiple comparisons etc. (You also refer here to "within-group comparisons", which I am not sure I understand.) This is an uneasy half-way house, because you are designating these analyses as exploratory, and yet you are effectively pre-registering them, but only in a relatively imprecise way (and without a priori consideration of their sensitivity). It would be preferable to preserve a clean distinction between registered and exploratory components by not descibing these analyses a priori.

I understand that in order for the reader to appreciate why the extra measures are included, you may wish to refer briefly to exploratory analyses of perspective taking ability, self-reported cyber-sickness and sense of presence, but you could leave the precise form of these analyses open until Stage 2.

2) It seems obvious why you would wish to compare cybersickness and sense of presence betweeen task versions, but it is much less clear why you would compare perspective-taking, which seems more like a (presumably) stable measure of spatial ability. I can't find any clear rationale for the inclusion of this as a dependent variable (although you do also mention its possible inclusion as a predictor for a regression analysis, which makes some more sense). Nor do I follow why you introduce the PTT as a test of 'spatial learning skill' - it does not seem to include any measure of learning.

3) Your power analysis for the equivalence test seems (as far as I can tell) to be appropriate, but the description is not coherent or detailed enough. You state that you determined the SEOSI as "the mean critical effect size (maximum effect size that would not be statistically significant) product of of previously mentioned comparisons, resulting in a SESOI of d = 0.47." Aside from the typo ("of of"), the "mean critical product of previously mentioned comparisons" is too hard to follow, and needs better explanation.

Similarly, when you describe the critical power analysis, you state "... we conducted a series of sensitivity power analyses based on the two one-sided tests procedure for equivalence testing (TOST, Lakens, 2017) for dependent samples. With a statistical power of 90% and an alpha set at 0.02, the estimated sample size is 62 pairs/participants." This description does not seem precise. For instance: What is "a series of sensitivity power analyses"? What do you mean by "sensitivity power" (is it a typo)? What do your analyses actually consist of? What is it that you have power to detect? Is it the power to detect equivalence within your equivalence bounds? If so, then what is your power to detect differences (since your tests will also include these)? How does your conclusion about equivalence depend upon the set of outcomes across your different dependent measures? Do they all need to be equivalent in order to conclude equivalence overall? etc...

These points are not well explained in the appendix, which merely refers the reader to a downloadable spreadsheet from Lakens that will allow them to recalculate power for themselves (once they can work out for themselves how to use the spreadsheet). In any case, it would not be acceptable to offload the explanation of the TOST procedure (and power for it) to an appendix, because this is a critical part of the study design.

4) When I try to formulate my recommendation text for your study, I find that I am somewhat at a loss to pinpoint exactly what the point of your study is. I understand that you are testing for equivalent spatial learning (within certain bounds) between desktop and iVR versions of the VMT, on the basis that if they are equivalent then the iVR version could be considered as a substitute for the desktop version in some experimental contexts. But what if the iVR version shows greater evidence of spatial learning than the dektop version? Or lesser evidence of spatial learning? Would these outcomes mean that it could not be used as a valid version of the VMT?

Overall, it is clear that your main purpose is to compare spatial learning between task versions, but it is less evident why, in that it is not clear what you will conclude about the appropriateness of the iVR version given each of the different possible outcomes (including non-equivalence).

I hope that these concerns are clear and I hope that they do not seem too annoying. I am afraid that I cannot write a recommendation for this study plan until I am sure that I understand it, and confident that another reader would likewise be able to. To this end, it seems that there is still further work required.

Best wishes,

Rob McIntosh

https://doi.org/10.24072/pci.rr.100439.d4

Evaluation round #3

DOI or URL of the report: https://osf.io/up6rb

Version of the report: 3

Author's Reply, 30 Aug 2023

Download author's reply Download tracked changes file

Dear Editor and Reviewers,

Attached you will find the reply and tracked manuscript files for Round 3. We'd like to thank once more your work and thoughtful suggestions which have definitely improved this project.

Luis (and team)

https://doi.org/10.24072/pci.rr.100439.ar3

Decision by Robert McIntosh, posted 24 Aug 2023, validated 24 Aug 2023

Thank you for revising you Stage 1 RR to address previously identified issues of power. The manuscript has been seen again by both reviewers, who are happy with the changes made (GB reiterates the point that your chosen level of 0.8 power with alpha .05 will limit the set of possible publication venues, but I assume you have already considered this point).

However, we are still unable to provide IPA for the manuscript, for the reasons sketched below:

1) The power analysis relates to performance of the VRT, but this covers only one of the hypotheses. The second (set of) hypotheses relating to cyber-sickness and sense of presence ask quite different questions, but no consideration is given to the effect sizes of interest for these comparisons, or the adequacy of the sample size to these questions.

2) When considering this issue, you need to bear in mind the following further complications: (i) The tests are predicated on being able to detect a significant difference (in either direction) OR equivalence (by TOST), and your sensitivity for each of these possible outcomes may differ. (ii) Your conclusions will depend upon the combination of outcomes across multiple tests, specifically you state that the tests will be conducted on the 3 subscales of SSQ and the 4 subscales of ITC-SOPI. You need to make it explicit how your conclusions will be informed by the combination of outcomes across the subscales, and to correct the required alpha for multiple comparisons if appropriate. (iii) It is not clear where the PTT fits in to your research questions/hypotheses tests.

(If these complications cause too many problems, then you could consider relegating these further questions to a secondary exploratory status, and removing from the Stage 1 plan.)

3) The within-subjects design improves the power of your study, in principle, but it does create other issues. Specifically, is there a possibility of transfer effects between days; that is, might learning the task on one day (in one format) be expected to influence baseline performance (and thus opportunity for learning) on the second day? Unless I missed it, you do not even specify whether the same or different maze will be presented on each day, but these details seem potentially very important.

4) As a very minor issue, you state that participants will be randomly allocated to one of two task orders. I assume that allocation is not truly random if you intend to ensure that there are equal numbers for each order. Therefore, the allocation schedule may need to be stated more precisely.

https://doi.org/10.24072/pci.rr.100439.d3

Reviewed by Gavin Buckingham, 08 Aug 2023

Thank you for the changes you have made to the protocol - I think the within subjects design seems like a more sensible approach in this case. My one further suggestion would be merely an advisory one (perhaps the editor can weigh in) about the power calculation (and thus sample size) - the eventual outlet (https://rr.peercommunityin.org/PCIRegisteredReports/about/pci_rr_friendly_journals) may well have more stringent requirements than the 'bare minimum' alpha =.05 and power = 0.80 calcualted in the revision (e.g., Cortex uses alpha =.02 and power > 0.90). If this is a possible concern, then I'd recommend re-running the power calculation with 0.9 power, which seems approrpriate given the hypotheses.

https://doi.org/10.24072/pci.rr.100439.rev31

Reviewed by Conor Thornberry, 23 Aug 2023

The authors have accurately addressed my concerns from before.

The comments addressed by the authors from Round 2 have also in a way answered some of my other concerns.

I believe this will make for a useful and important manuscript for the VR/Spatial Cognition community. I have some final notes:

I now understand why the authors have used three trials. Considering this is based on previous research, it makes sense for this manuscript. However, I would strongly encourage any conclusions drawn about spatial learning to address this limitation.
The introduction addressed my concerns clearly and also has a nice flow.
Thank you for providing OSF and GitHub links within the manuscript.
Gender differences are an important one, and I appreciate you including it. However, I also appreciate that this was not part of the proposed hypotheses. It would be interesting to see if they are having an impact as some Desktop software can eliminate the classic water maze gender effect.
I would like to praise the authors for the inclusion and construction of a Spanish version of the PTT. This is great, and I hope it will provide you with some interesting perspectives on variable spatial ability within different virtual environments (Does better PTT ability facilitate better desktop or iVR performance?). I understand this meant changing how the experiment would be run, so thank you for this.

I look forward to reading the completed manuscript.

https://doi.org/10.24072/pci.rr.100439.rev32

Evaluation round #2

DOI or URL of the report: https://osf.io/xs8pt

Version of the report: 2

Author's Reply, 20 Jul 2023

Download author's reply Download tracked changes file

Dear Editor and Reviewers,

This version (v3) includes a solution to the issue regarding the estimated SESOI (and equivalence bounds) and the sample size calculation. We hope that these changes, along with the modifications from the previous round of reviews, will make this study a more suitable candidate for a Registered Report.

https://doi.org/10.24072/pci.rr.100439.ar2

Decision by Robert McIntosh, posted 05 Jul 2023, validated 05 Jul 2023

Thank you for sending this revised Stage 1 manuscript, with replies to reviewers. Having looked at your replies, I see one potentially major issue, and I think it most sensible to return the manuscript directly to you for further consideration before asking for reviewers to devote more time to this.

At the first round, Reviewer#2 raised the critical issue that it seems relatively unimportant to test for equivalence, where this is defined by an effect size for the difference smaller than d = .77, given that this actually describes a very large effect size. That is, your test would be prepared to declare equivalence between tests even in a statistically large difference between the tests existed - it is hard to see this as a useful form of equivalence,

In your response, your main line of reasoning is that d=0.77 is appropriate for your purposes because VMT performance scores are "wildly variable within and between participants... so only when the difference is large enough we should expect to find significant effect". This comment seems to reflect a misapprehension of what your measure of effect size (d) represents. Cohen's d is a standardised measure of effect size that is expressed in units of SD (it is the mean difference between groups divided by the pooled standard deviation). Therefore, d of .77 remains a very large effect size, regardless of how variable the performance is between participants. (If the performance is more variable between-subjects, this just means that the mean difference that d of .77 represents is proportionally larger.)

Given this fact, as far as I can see, your response does not address the problem at all, and you remain in a position of having an equivalence test that could rule out only very large (seriously non-trivial) differences between tasks. You suggest that you may be at the edges of practicality of sample sizes required for testing smaller effect sizes than this, but that might simply indicate that you are not in a position to run a meaningfully useful study of the sort that you would like. (You might also want to think about whether the VMT task itself is worth trying to adapt to iVR if performance is, as you say, so wildly variable within and between participants.)

Alternatively, you may be able to improve statistical sensitivity to smaller effects by designing a within-subjects study? And/or perhaps you could consider running your study with lower power e.g. .8), although this would reduce the strength of conclusions you could draw (and also affect the range of possible destination journals).

In any case I am sending this back to you for reconsideration of this key point. Perhaps the following article from Zoltan Dienes could be useful in helping you think about how to define realistic effect sizes that might be worth ruling out (the article takes a Bayesian approach, but the guidance for defining meaningful effect sizes of interest applies equally for a frequentist approach): https://doi.org/10.1177/2515245919876960

https://doi.org/10.24072/pci.rr.100439.d2

Evaluation round #1

DOI or URL of the report: https://osf.io/mctzn

Version of the report: 1

Author's Reply, 30 Jun 2023

Download author's reply Download tracked changes file

Dear Editor and Reviewers,

We thank your input, questions and suggestions which, in our point of view, have increased the quality (and clarity) of our study. Please find our responses in the attached files.

Changes related to your questions and suggestions have been highlighted in the tracked document. Minor changes, such as typos, grammar or vocabulary correctors were not individually marked. A clean version of the manuscript can be found at https://osf.io/xs8pt

Luis

https://doi.org/10.24072/pci.rr.100439.ar1

Decision by Robert McIntosh, posted 06 Jun 2023, validated 06 Jun 2023

Thank you for submitting your Stage 1 RR to PCI-RR. Your preprint has now been evaluated by two reviewers with relevant expertise, and I have read it myself.

Both reviewers clearly feel that the plan has promise as a potentially useful contribution to the literature, and that the hypotheses of your study are well stated and clear, although there are some queries over the details of your methods. Reviewer#1 has a number of constuctive suggestions to make, including the suggestion that gender be not only balanced but analysed as an additional variable of interest (or controlled as a covariate). I think that your response to this point should depend critically upon precisely what hypothesis you want to test and whether gender is a relevant consideration for that hypothesis.

Reviewer#2 has a number of stylistic comments to make, and I agree very much with this reviewer's impression that the multiple-framing of the Introduction (in terms of dream literature, and in terms of validation for new method) was quite unclear and potentially confusing. One might think that, if your purpose is to create a task more likely to be incorporated into dreaming then a critical part of its validation would include an assessment of the rates at which it is incorporated into dreaming; but the methods make it clear that this is not part of your purpose. Perhaps try to be more clear about your aims, and more linear in your introductory narrative to establish these aims. This reviewers' point 5 is also critically important from an RR point of view. Do we really believe that a meaningful 'equivalence' could be established by ruling out effects smaller than the very large target level? Would we really consider any differences between tasks that are below this level of effect size to be irrelevant? This seems somewhat unlikely. Perhaps rather than motivating your smallest effect size of interest from expectations based on prior literature, it would be more relevant to consider from first principles what size of difference you think would be of no practical consequence to know about if it exists. (Related to this is Reviewer#2's point 3, which asks why the tests are configured as tests of equivalence, when there would seem to be a priori reasons to expect that the iVR version might be superior.)

In any case, you should consider all of the reviewer comments carefully, and address all of them in any revised submission.

Your sincerely,

Rob McIntosh (PCI-recommender)

In passing, I noted a few linguistic oddities in the Abstract, which you may wish to amend (there may be more similar oddities in the main paper):

"commonground" >> "commonplace"

"understudied" >> "under-studied" or "little studied"

"One of such mazes" >> "One such maze"

"stimulant and engaging simulation" >> "stimulating and engaging experience" ?

https://doi.org/10.24072/pci.rr.100439.d1

Reviewed by anonymous reviewer 1, 15 May 2023

The authors present a pre-registration report to examine the learning ability and the usability of the Virtual Maze Task used originally by Wamsley et al. (2010). They will compare the Desktop version of the task to a more immersive VR version (using a HMD). The authors present an important research question that is often overlooked when using virtual tasks, do people actually learn better with greater immersion?

The authors outline their research questions and hypothesis well, demonstrating two clear and concise hypotheses that can be easily tested following the reading of the methodology. The protocol is well described, though I have some minor comments about this (see below for section breakdown). The sample size is calculation is efficient but again, there are some minor comments not about its calculation, but more so its demographic. I think the clear presentation of method and hypotheses would provide a reader with confidence that no additional analysis will be explored, as only what the authors propose (assessment of learning in VMT and questionnaire use) will actually answer their research question. Both are commonplace across the literature as a methodology for assessment.

Statistical analysis is well outlined and in my opinion, valid for the experiment proposed. It is straightforward and easily replicable from the description. Though perhaps a different approach (repeated-measures examining individual trials and how they vary) may be important here, and may actually reveal more about the task - as spatial "learning" may not actually occur until familiarity with the environment etc. increases. This is particularly important when there is no recall or retest (probe) trial being used. This is common in the human and animal literature. A single score of "improvement" may not be enough data to warrant a true behavioural measure of participants actually "learning" the space and goal location, particularly not after three trials. Though perhaps I have misunderstood the procedure.

Specific comments:

Introduction

Authors mention the lack of ecological validity of desktop tasks, but do not evaluate the greater ecological validity that may stem from iVR tasks.
There are a lot of iVR studies out there, particularly some that also use iVR and real locomotion (e.g. Delaux et al., 2021). I think some mention of these and the above point would help your research question.

Methods

Authors mention they will achieve an equal number of male and female participants. However, I think gender should be recorded and analysed as an additional variable or control variable/co-variate. This is a well known effect in the human literature, even in virtual mazes (Mueller et al., 2008; Woolley et al., 2010). It has also been shown that it is specifically spatial learning that gender can have an impact on and not memory (Piber et al., 2018). I think this is an important aspect to examine or control for for both hypotheses.
Is three trials really sufficient to demonstrate "learning" without a memory trial? Many more trials are always used in the literature. I can also see the view that learning is clearly occuring - but I think this needs to be better justified (again, unless I misunderstood).
The authors do a really good job at controlling for and assessing cybersickness.
For replication purposes, are both of the tasks used available Open Access? If not authors should recommend alternatives
Groups are controlled for cybersickness and well-assessed for this DV. Why are groups not controlled or assessed for spatial learning ability (which is a cognitive skill we know differs amongst individuals: Coutrot et al., 2018). Perhaps the introduction of a spatial task (Perspective Taking Task for Adults (Frick et al., 2014)) or even just a cognitive assessment (Trail Making Test etc). Just something that I think is important.
I would be wary about the upper age-limit of your participants, over 18 may include older adults who may not be as familiar with the task. Either including them and controlling across groups, or setting a limit would be advised, as age has an impact on usability (Commins et al., 2020) and also spatial learning performance (see Figure 2E in Coutrot et al., 2018 for global data).

Nevertheless, this is a good and clean design to assess a very important question using a repeatedly used but rarely validated virtual spatial task from the literature. It is also an important, unanswered question which could eventually facilitate a framework for other researchers testing the reliability of their virtual tasks. It may also perhaps, save research teams time by implementing whatever version of the task fits into their setup and budget, if they have no real impact on learning. I would recommend this go forward with perhaps some additional thought in the areas mentioned above.

https://doi.org/10.24072/pci.rr.100439.rev11

Reviewed by Gavin Buckingham, 30 May 2023

This interesting and timely article seeks to resolve the uncertainty in the literature about whether a VR version of a virtual maze task (VMT) compared favourably with a classic desktop version of the task. It is well-written throughout, and I appreciate the open materials narrative which is far too often overlooked in registered reports. As a disclaimer, I know little about the VMT literature, but have some expertise in VR and cognitive psychology in general. Some specific comments below.

1. The manuscript as it stands is written in a slightly awkward mixture of past and future tense – I presume this is to make the edits to the final version after data collection a bit easier, but it didn’t feel like there was much consistency in how these tenses where used throughout. I don’t have strong feelings about whether this requires changing, but the editor may.

2. I found the framing of the introduction confusing – the narrative around the VMT is compelling, but I found the link to dreams as the main motivator for using VR rather tenuous. Indeed, the framing of the manuscript sits awkwardly between a paper seeking to resolve a dispute in the literature and a validation of a new method. I’m not sure it succeeds with either narrative, but do not know the extant literature well enough to say which is a more appropriate goal of the paper.

3. RQ1 and H1 are obvious enough (although surprising to me that the authors would be predicting equivalent performance – I’d have assumed that iVR would outperform desktop due to immersion)

4. I was unable to open the build linked in the manuscript – this probably reflects my difficulties with unity hub rather than the application, but it would be worth uploading ‘fixed’ .exe builds for desktop and quest that will be used in the manuscript to streamline this process for more casual users

5. My biggest issue with this article is from the power calculation and associated sample size. The authors base the sample size calculation. The Wamsley papers from which the d=1.1 estimate is drawn from is a correlational analysis bears no resemblance to the methods of the current study. I understand the challenges of predicting an effect size, but would recommend the authors use an effect size derived from other studies comparing VR to desktop environments OR the smallest relevant example of how VMT performance can vary from one condition to another in a between-group design. As it stands, the current sample size seems far too low to provide anything like a resolution of the issues in the literature or a validation of VR in this context. As it stands, the TOST would miss any different that was just below a ‘classic’ large effect size of 0.8, which feels like very shaky grounds to declare equivalence. This is a pretty simple paradigm and I see no reason not to be more conservative in this regard.

https://doi.org/10.24072/pci.rr.100439.rev12

User comments

No user comments yet

or Register
Submit a report