Can unconscious experience drive perceptual learning?
Can one-shot learning be elicited from unconscious information?
Abstract
Recommendation: posted 15 October 2023, validated 15 October 2023
Sreekumar, V. (2023) Can unconscious experience drive perceptual learning? . Peer Community in Registered Reports, . https://rr.peercommunityin.org/articles/rec?id=350
Recommendation
Level of bias control achieved: Level 3. At least some of the data/evidence that will be used to answer the research question already exists AND is accessible in principle to the authors BUT the authors certify that they have not yet accessed any part of that data/evidence.
List of eligible PCI-RR-friendly journals:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.
Evaluation round #2
DOI or URL of the report: https://osf.io/df3bj?view_only=b05c622bfae04562af871bb8ec9a9e52
Version of the report: 1
Author's Reply, 20 Sep 2023
Decision by Vishnu Sreekumar, posted 19 Sep 2023, validated 19 Sep 2023
Dear authors,
Thank you for your patience. We were unable to get the opinions of the first reviewer whose questions you responded to satisfactorily, in my opinion. We have a brief comment from the other reviewer that I have reproduced below under the dashed line. I do think that the reviewer's concern about PAS ratings reflecting some other cue in the images is valid and deserves attention. I also understand the reviewer's concern about the masking paradigm not being strong enough. However, since some of the data have been collected already, I understand that it is not possible to modify the contrast of the mask. That said, I am convinced by your response that in your pilot data, most trials not marked as PAS1 were marked PAS2. I would ask you though to thoroughly discuss the possibility raised by the reviewer in the eventual discussion section.
I have gone through the manuscript and have read your detailed responses to both reviewers' questions in the previous round of reviews. This is a well designed set of experiments that will potentially resolve some important questions about the influence of unconscious stimuli on perception. So please respond to the final set of comments below from the first reviewer before we make a final decision.
Thanks,
Vishnu
-------------
In the comment marked 2.7, the suggestion was to compare PAS ratings from pilot data to the RMS contrast of images. The authors instead compared RMS contrast and other properties across experimental conditions. The suggestions was primarily to test if these qualities were being used as cues for PAS ratings, in which case, the authors could correct the issue (ensure all images have the same RMS contrast) prior to data collection. It is a shame data has already been collected and so the authors cannot correct the methods, where methodological improvement is normally a major benefit of the registered report. I would suggest the authors do compare RMS contrast and PAS ratings in the experimental data, to answer the question “are participants making PAS ratings according to instructions (visibility of contents) or based on some cue such as RMS contrast”.
It is a shame the authors cannot implement the suggested changes to the methods. The pilot data suggests the masking paradigm is not strong enough to guarantee the participants had no conscious recognition of the images. I suggest the authors discuss this possibility in their eventual discussion, in relation to the frequency of actually giving a PAS of 1 in the condition that is supposed to be ‘unconscious’.
Reviewed by anonymous reviewer 1, 03 Aug 2023
The authors have satisfactorily responded to most of my previous comments.
In the comment marked 2.7, the suggestion was to compare PAS ratings from pilot data to the RMS contrast of images. The authors instead compared RMS contrast and other properties across experimental conditions. The suggestions was primarily to test if these qualities were being used as cues for PAS ratings, in which case, the authors could correct the issue (ensure all images have the same RMS contrast) prior to data collection. It is a shame data has already been collected and so the authors cannot correct the methods, where methodological improvement is normally a major benefit of the registered report. I would suggest the authors do compare RMS contrast and PAS ratings in the experimental data, to answer the question “are participants making PAS ratings according to instructions (visibility of contents) or based on some cue such as RMS contrast”.
It is a shame the authors cannot implement the suggested changes to the methods. The pilot data suggests the masking paradigm is not strong enough to guarantee the participants had no conscious recognition of the images. I suggest the authors discuss this possibility in their eventual discussion, in relation to the frequency of actually giving a PAS of 1 in the condition that is supposed to be ‘unconscious’.
Evaluation round #1
DOI or URL of the report: https://osf.io/89ubd?view_only=b05c622bfae04562af871bb8ec9a9e52
Version of the report: 1
Author's Reply, 14 Jul 2023
Decision by Vishnu Sreekumar, posted 08 Feb 2023, validated 08 Feb 2023
As you may know, it has been hard to find reviewers recently but we now have two high quality reviews for this submission. Both reviewers agree that the proposed study is well designed, timely, and scientifically valid. However, they have identified some concerns about sampling plan, some experimental parameters, and other issues that need to be addressed. I request the authors to address the concerns and provide a point by point response with their resubmission which will be sent back to the same reviewers.
Reviewed by Jeffrey Saunders, 05 Dec 2022
This is a well-thought study that will contribute to the literature. It is closely based on a previous study, so the general idea is not novel. However, it will provide a better controlled test of the previously reported effect.
The authors do a good analysis of previous studies, and present a compelling case for revisiting the findings of Chang et al (2016). They discuss theoretical and empirical reasons to doubt that masked information could allow future disambiguation of two-tone images, and identify limitations of the methods of Chang et al (2016). The background and motivation for the study are clearly presented, with good arguments.
The authors have also given careful thought to the methodology. The primary challenge is ensuring that "unconscious" stimuli are truly unconscious. I think the authors do a good job meeting this challenge. Trials will be classified based on multiple measures in a graded manner. The criteria for "fully unconscious" is more conservative than in the previous study, so we can more confident that any post-exposure effects will not be due to some conscious awareness. I think this is the main feature of the new study. They have also given good consideration to details like attention checks, exclusion criteria, and statistical power. The fact that it is pre-registered RR is positive feature in itself.
I question whether Experiment 2 is needed. Experiment 1 already implements blind ratings, and specifies a coding plan. Any errors or biases in rating responses would just add noise or shift the baselines. As the authors point out, adopting the forced choice method also has drawbacks, which might end up increasing the variability. More data is always nice, so if the authors want to repeat with this variation in method, that is fine. It seems like a lot of extra data collection to address an issue that is unlikely to affect the results, and might introduce some new problems.
If Experiment 2 is going to be included, the authors should say more about what they would conclude if the results from the two experiments are not entirely consistent. What if Experiment 1 finds strong evidence for unconscious priming but Experiment 2 finds only a weak trend? Would they conclude that there was experimenter bias in Experiment 1, and that the effect may not be reliable? Or conclude that the data in Experiment 2 was noisier due to methodological issues, so it should be discounted?
I think that the third experiment makes more sense as a follow-up because it addresses a potential alternate explanation that is more likely and problematic, and which might be ruled out by the Experiment 1 results. If the main trials and catch trials don't show a difference, then the follow-up experiment will be important, but if there is a clear difference between main trials and catch trials, then it isn't needed. This is more important than the issue of subjective ratings. In fact, I suggest reversing the order. If the evidence suggests that effects in Experiment 1 are due to spontaneous disambiguation, then it would be better to know this before conducting the proposed Experiment 2, which would otherwise have the same confound.
For the third experiment (the results contingent follow-up), the authors should say something about the conclusions that would be drawn from different possible outcomes. What if Experiment 1 appears to show disambiguation from unconscious stimuli, but the follow-up study does not?
To evaluate the planned analysis and presentation of the results, I would like to see some sort of draft of the results section. The authors could use simulated data or placeholders for statistical results. The authors describe the planned analyses in the study design table, but there are a lot of hypotheses and analyses, and it is a bit hard to follow. Presenting the planned analyses in the format of a results section will make it easier to check that the analyses make sense and nothing is missing, and also provides an opportunity for reviewers to give feedback about the presentation.
I am not a fan of the "study design table" required by PCI-RR. Answering all the questions in a single row for each hypothesis requires a table that spans multiple pages, with narrow text blocks. The sampling plan is generally the same for all hypotheses, so that column has redundant information. The space limitation encourages enumeration of hypotheses, so a reader has to keep track of many non-descriptive labels (H1a, H1b, …). Given the limitations of the format, I think the authors did a reasonable job conveying the information. I hope that PCI-RR changes this requirement, or allows some flexibility in how the information is organized. In the meantime, it would be helpful to see the analysis plan presented as a results section.
Using the Bayesian sequential sampling procedure is a good idea, and the proposed stopping criteria should provide good power for a range of possible effects. I have some suggestions.
For computation of Bayes' factors, the authors propose using a Cachy prior with scale parameter r = 1/sqrt(2). Schönbrodt & Wagenmakers (2018), following Rouder et al (2009), recommend a scale parameter of r = 1. They note that smaller scale parameters take longer to reach the H0 criteria in the null case. Their simulations of stopping criteria BF>6 also found that the Type I rate is slightly inflated with r = 1/sqrt(2), but not with r = 1. I suggest that they follow Schönbrodt & Wagenmakers (2018) and use r = 1.
I also think that the authors should provide a justification for the choice of boundary criteria based on expected effect size, and describe the power for one or more possible effect sizes. The methods section includes statement about the boundary criteria: "A BF of 6 (or 1/6), taken to indicate moderate evidence (Lee & Wagenmakers, 2014, as cited in Quintana & Williams, 2018), was chosen as an estimated equivalent for a medium effect size." That helps make a connection from the BF criteria to effect size, but does not say anything about why a medium effect size is targeted. Later, the authors report estimated effect sizes from the previous study, but that is not connected to the choice of stopping criteria.
The boundary criteria and prior determine the range of possible effect sizes that could be reliably detected, so a given BF criteria implies a target effect size. For example, the simulation results of Schönbrodt & Wagenmakers (2018) found that a criteria of BF>6 and r=1 would have 86% power for d = 0.4 in a between-subjects, so this criteria would correspond to targeting an effect size of d ≥ 0.4. In the present study, using BF>6 will allow detection of smaller effects because it is a within-subjects design. Reporting the minimum effect size that could be reliably detected will make it easy for the reader to see that the study is well-powered (even if not familiar with BFs).
The lower bound on sample size, N=60, seems higher than necessary. Sequential procedures are more efficient because they can stop early if the evidence shows clear evidence one way or the other. This efficiency is lost if the lower bound is higher than needed. An effect size of d = 0.5 only needs N=44 for 90% power. In the case of no effect, N=30 would be enough for reasonably sized confidence intervals around zero in the not-recognized condition (SE = 5.3%/sqrt(30) = 1.02%). I suggest that the authors use a smaller lower bound, N=30-40, so they can take advantage of the efficiency of the sequential testing. The sample size will still go past N=60 if the data is ambiguous, but not if the true effect turns out to be large or zero. If the authors want to ensure power for smaller effects, the BF criteria for stopping could be slightly increased, which would be more efficient than using a large minimum sample size.
Minor points
I am not sure that abbreviated labels "C1", "C2" etc are needed. Descriptive labels could be used ("Fully Unconscious", "Mostly Unconscious", etc) without adding too much clutter in the text. Or could use "U" and "C" in the abbreviations to make it easy to remember which are unconscious vs conscious, "U","MU","MC","C", or "U1", "U2", "C2", "C1".
This topic sentence in the introduction is awkward: "Another relevant literature is the one referring to longer-term learning effects, and pertains to the increase in accuracy following repeated exposure to some stimuli over time." I suggest re-wording in a simpler manner, and maybe breaking off the second part to a new sentence.
Another line that could be simplified: "In a conceptually similar context to that adopted by Chang and colleagues (2016), we aim to study whether the visual system can organise two-tone images into meaningful percepts after masked greyscale image exposure." Maybe something like this: "Using a similar method as Chang and colleagues (2016), we tested whether the visual system can organise two-tone images into meaningful percepts after masked greyscale image exposure."
The use of catch trials is listed as a different in method, but Chang et al (2016) also had catch trials. Are the catch trials different in the proposed study different?
Reviewed by anonymous reviewer 1, 01 Feb 2023
1A. The scientific validity of the research question(s).
The authors present a sound argument for the validity of their research question. A previous publication (Chang et al., 2016) argued that grey-scale stimuli rendered unconscious by backward masking at an SOA of 67 ms could improve disambiguation of their Mooney (thresholded) counterparts. The grey-scale stimuli were categorised as ‘unconscious’ if the participant reported that they could not recognise the image. The present authors question whether the grey-scale stimuli were unconscious, as the previous authors did not control for response bias (where a participant may be unwilling to report that they could recognise an image though they consciously perceived it).
I would offer that, in addition to the arguments already made, the authors may wish to note that 67 ms is quite a long time for backward masking of visual stimuli. Bacon-Macé and colleagues (2005) show 85% correct performance in discriminating whether a natural scene contains an animal with an SOA of 44 ms and a much stronger mask (and accuracy was above chance at 12 ms SOAs). The weaker masks used by Chang and colleagues, and the proposed mask in this study, can be compared to RSVP studies, where performance above 75% correct can be achieved a stimulus presentation duration of 13 ms (even when the categorisation decision is only indicated after stimulus presentation; Potter et al., 2014). Based on this, it is very unlikely that the manipulation presented in Chang and colleagues 2016 experiment resulted in ‘unconscious’ stimuli, given this previous research showing participants can make quite accurate decisions about the contents of images presented at much shorter durations with stronger masking.
1B. The logic, rationale, and plausibility of the proposed hypotheses, as applicable.
The authors’ hypotheses are multifaceted. My interpretation is that the main aim is to test whether unconscious grey-scale stimuli can improve identification of their Mooney counterparts when using a combination of objective and subjective measures. The secondary hypotheses include comparing different criteria for consciousness, and the effect this has on whether Mooney identification can be considered significantly improved following exposure. Overall, these hypotheses are logical, rational, and plausible, and the results will be interesting whether the null is accepted or rejected.
It was often not clear what the authors meant by ‘unconscious’. Most often, they do not qualify, and readers might presume they mean ‘undetectable’. Frequently, they use the term ‘conscious recognition’, yet presumably they aim to present participants with images they have never seen before, and so even if the participant were fully conscious of the image, they would not recognise it on the first presentation. Sometimes they describe the phenomenon in terms of ‘contents’ (as they use in the description of the PAS ratings to observers). They should be careful here too, about the interpretation of what counts as ‘content’: would it be sufficient to have information about approximate figure/ground segmentation, or a general theme such as ‘animal’, or perhaps if the observer could tell whether the image was presented upright or inverted? It is also unclear if the authors presume an image could be detected (the observers are conscious of the presence of the image) while the ‘content’ is ‘unconscious’ – this might be important for some readers to understand whether the authors’ criterion for consciousness matches their own.
1C. The soundness and feasibility of the methodology and analysis pipeline (including statistical power analysis or alternative sampling plans where applicable).
Given the previous literature and the pilot data, it is not clear that the proposed backward masking paradigm is strong enough: participants report seeing a brief glimpse of the content of the image on about 50% of trials at the short SOA. The authors could increase the contrast of the mask to get stronger backward masking (Bachmann & Francis, 2014).
On that note, the example mask in Figure 4 looks as if it has broad vertical columns, it does not look like a phase scrambled version of the target stimulus as described.
There are not enough trials to perform the analysis. The authors seek to test for an increase in disambiguation at C1 (PAS 1 + incorrect + short SOA) against catch trials. With 24 stimuli, half catch, half long SOA, half rated PAS 2 at the short SOA, that leaves 3 trials at C1 per participant. Based on Chang et al., 2016, they expect disambiguation to rise from ~2.5% to ~5%. Chang and colleagues had ~15 trials per participant, with 5% disambiguation this means that ~20/25 participants were able to disambiguate 1 trial each (20 trials out of the total 375). Even with 120 participants (the maximum), the authors will have a total of 360 trials, 18 correct, meaning 18/120 participants get 1 trial correct each. The number of trials should be at least doubled. The authors could fit double the number of trials in a similar amount of time by reducing the mask duration (500 ms should be more than sufficient) and the fixation duration (or some of the beginning of the trial – Figure 4 suggests there is 0.5s blank, 1s fixation, 0.1s blank, 0.2s fixation – 0.2s blank followed by 0.5s fixation should suffice). Increasing the masking effect (by increasing the contrast of the mask) should also help to get more trials at PAS 1 at the short SOA.
The pilot data suggests that participants are remarkably good at the task in the pre-exposure phase, with accuracy ~15% correct free naming identification (Figure 6). This is much better than reported in Chang et al., 2016. It could also be problematic that there is quite a substantial increase in catch trial performance, considering the small number of trials. All this will make it even more difficult to get good measures of performance with only 3 trials per participant. The authors could get a better estimate of the likely statistics by dividing pilot participants’ data as if they were different participants (the pilot has 19 participants with 12 trials = 228 trials total, can be divided into 76 participants with 3 trials each, to estimate whether the effects could be reliably detected with so few trials).
It is worrying that so many of the long SOA trials were rated as PAS 2 in the pilot data. This could indicate that participants do have some bias, or that they are relying on some other cues to make their ratings. The example grey-scale images look as though they have different RMS contrast and spatial frequency properties. The catch-trial peacock appears much more difficult to see (in the pdf, long duration) than the ‘main’ trial peacock. I wonder if some participants might be using these cues to separate out their PAS ratings over clearly visible stimuli. The authors could check this in their pilot data, they should also report on the variability of low-level stimulus properties, or even control the low-level stimulus properties.
The effect size calculation compares accuracy to 0, and so does not match the main hypothesis – compare ‘unconscious’ exposure to catch trials (~5% to ~2.5% correct). However, this estimation is not used for anything, so it could be removed.
1D. Whether the clarity and degree of methodological detail is sufficient to closely replicate the proposed study procedures and analysis pipeline and to prevent undisclosed flexibility in the procedures and analyses.
The methodology is highly detailed, however there are some typographical errors and ambiguities. Experiment 1 stimuli description lists 23 images + 9 attention checks, the table in Appendix 2 lists 24 images, the detail about attention checks lists 9 visible + 6 absent, the detail in point 2 of participant rejection suggests 6 visible.
The authors may wish to specify that their hypothesis (and analysis) is one-sided (they expect increased performance after exposure, but presumably a decrease in performance would be evidence for the null).
1E. Whether the authors have considered sufficient outcome-neutral conditions (e.g. absence of floor or ceiling effects; positive controls; other quality checks) for ensuring that the obtained results are able to test the stated hypotheses or answer the stated research question(s).
Yes, the catch trials should be sufficient control.
Minor points:
There are some typographical errors, e.g. page 5 1st line “Chang et colleagues” and last line “undistinguishable”. “for which likelihood of” to “which the likelihood” (abstract)…
Figure 6 does not have panel labels.
Perhaps there is a better name for ‘normal trials’/’main trials’ – for example ‘relevant exposure trials’ or ‘test trials’.