DOI or URL of the report: https://osf.io/pgv39?view_only=2627595019be4f92abf0bf338c83ee68
Version of the report: 2
Dear Dr. Foucher,
both reviewers are now mostly satisfyed with your revision, and only minor updates have been requested. Please address the issues in a revised version of your manuscript.
Warmest,
Rima-Maria Rahal
I would like to express my appreciation to the authors, as I feel that all the points I have raised have been adequately addressed (especially sample size, description of analysis plans, description of open science practices). The only point that the authors should perhaps look at again is in the Study Design Table - here the interpretation given different outcomes is described with “Determine if neon fixation and saccade detection can be comparable to Eyelink 1000.” Since frequentist methods are used, it is not possible to test for similarity, but only for difference. I would recommend clarifying this in the text. Otherwise, I find the manuscript to be clear, well structured and now sufficiently transparent. I am looking forward to the results of this study and recommend awarding an IPA.
Best,
Lisa Spitzer
Thank you for the updated Manuscript. I only have very few minor things below, and I trust the authors & editor to address them without me needing to see the manuscript again.
- L1 Abstract: "testtestest" - test was successful ;)
- Figure 1: I still find it unintuitive that the icons are different to small grid. Also, in our study we did not use a fixatoin cross, but this (supposedly) microsaccade reducing fixation symbol
- For the PL manual gaze correction, I think you should specify in the paper how it works (pupil labs docs will be vastly different in 5 years time). Is there no other way than using your finger on a mobile phone to adjust the circle? E.g. a QR marker or something that pupil lab can easily detect (I assume the answer is no - but that is a bit ridiculous by pupil labs;))?
- I would recommend putting in the filter settings for remodnav - or at least mention how you will find out what good filter settings are.
- Does PL return regularly sampled data? If no, will you generally resample them?
- L483 - there is a / missing in between nested subject / block in the LMM formula "| subject block)"
DOI or URL of the report: https://osf.io/aqvhb?view_only=2627595019be4f92abf0bf338c83ee68
Version of the report: 1
Dear Dr. Foucher,
thank you for your submission "Independent Comparative Evaluation of the Pupil Neon - A New Mobile Eye-tracker" to PCI RR, for which I have now received two independent reviews by experts in the field. Based on these reviews and my own reading of your manuscript, I would like to invite you to revise the proposal. There is much to like about the manuscript, but I will highlight the most salient opportunities for further improvement below:
These issues fall within the normal scope of a Stage 1 evaluation and, in addition to responding to the reviewers' thoughtful further comments, can be addressed in a round of revisions.
Warmest,
Rima-Maria Rahal
I want to congratulate the authors for this interesting Stage 1 RR, which I enjoyed reading and reviewing very much.
Summary: The authors aim to provide detailed benchmark information for the new mobile eye-tracker Pupil Neon, using the EyeLink 1000 Plus as a reference. For this reason, they plan to utilize the extensive test battery provided by Ehinger et al. (2019), taking into account not only accuracy and precision, but a broad range of different eye-tracking parameters. Participants will absolve multiple blocks of this test battery, while their eye movements will be measured simultaneously with both eye-trackers.
I have summarized my comments on the study below.
Major points
Minor points
Overall, I think the study is addressing a very important topic, i.e., the comparability and reproducibility of eye-tracking research. It is well thought out and shows qualitative rigor, with the authors relying heavily on the study by Ehinger et al. (2019). In my opinion, the main points that should be addressed in a revision concern the sample size and the description of the analyses, which should more detailed. I hope that my comments will help the author improve their manuscript.
All the best,
Lisa Spitzer
In this registered report, Foucher et al. will investigate the performance of the Pupil Neon eye tracking glasses against the current "gold standard", Eyelink 1000. For this, they closely follow our previously published EyeTracking benchmark.
The paper, and the choice of eye-tracker, is well motivated and a very valuable thing to investigate for the community. The paper further is well written and reasoned, and I have only some smaller comments.
I'm now very excited to see the outcome of this comparison.
Best, Benedikt Ehinger
We are currently re-using our benchmark and are in the analysis phase of comparing the ViewPixx Trackpix3 against an eyelink. Because of this, we updated & upgraded our analysis pipeline (we also made the stimulation code compatible with Octave, if this is interesting to the authors, they can contact me for the code, I dont think it is in the public repo) to new python version & packages. We identified one major breaking change (besides the typical renaming + adding documentation to the analysis functions):
The engbert mergenthaler implementation which we took from the Donner' Lab, has a bug in the code, slightly miscalculating the velocity threshold. Due to this reason, and some other more conceptual ones, in our new pipeline we switched to REMoDNaV, which is a successor in spirit of the Engbert-Mergenthaler algorithm. An argument could be made, that yet another class of event-classifier should be used (Drew & Dierkes 2024), but I think it would be ill-placed given the lab & head-fixed setup.
The improved analysis code can be found here: https://github.com/behinger/etcomp/tree/etcomp2/ . Note that we are still analyzing the data, and not all tasks have been fully analysed & ported to new versions of python. There are some drawbacks: We removed quite a bit of pupil-labs specific code, this is due to some internal misscommunication with the leading-author of that study. The authors should contact us in case they want to switch to the new pipeline so we can provide requirements/yml files we are using - we are just not there yet to have a proper project :)
Further, we introduced a reading task, given a collaboration with psycholinguists on this project - you can decide whether this is relevant/interesting for your study or not. For us we included it mostly because it allows some ecological validity tests for reading studies - but if it tells you more than the large-grid task, I cannot say.
Minor comments
L176: You write there are calibration accuracy limits for Pupil Labs Neon, but you nowhere describe any method to identify them (see question below)
Figure 1: Nice improvements! I only found the large grid illustration confusing. Why does it not look the same as the small grids ones? It seems one point is dropping of the screen
L267: Calibration/Validation of Pupil Labs Neon. Is there no settings whatsoever that you decide per subject that could influence accuracy? And, is there no recommended validation behavior?
L366: As stated above, I would probably move to remodnav due to the bug in the mergenthaler algoritm (or fix the bug)
L414: You argue to convert PupilLabs Pupil measurement to area similar to eyelink. But I would argue, that the pupil-labs 'mm' output, is the actual more interesting and relevant output. So maybe calibrating the pupil for eyelink should be the goal, rather than "deconverting" the pupil-labs output back to ellipses / areas?
L420: There is a mistake in the formula in our paper (2*atan2 should be atan2) - we have a correction request pending since January for this. sorry for the inconvenience. The code is correct though.
Open question for analysis plan: Do you (winsorized) average the accuracy values per block, then take the winsorized mean over blocks, then the (bootstraped) winsorized mean over subjects? Afaik this is how we did it. You could also disregard blocks and immediately go for subjects. Maybe I missed it in the manuscript.