Factors influencing word recognition for self- and other-produced speech in noise
Sensorimotor experience modulates perception; expert basketball players have been found to be more accurate at predicting shots than expert watchers (Aglioti, Cesari, Romani, Urgesi, 2008). The common coding hypothesis (Prinz, 1990) argues that this indicates parity between the representations accessed for production and perception, and predicts that actions that match our own sensorimotor experience will be more accurately recognized. This is supported by research demonstrating that people are more accurate when lip-reading their own speech compared to that of others (Tye-Murray et al., 2013).
In this ongoing study, we investigate how phonetic, lexical, and indexical factors may modulate how talkers perceive their own speech and that of others. Groups of seven gender-matched participants produce Dutch sentences containing two semantically unrelated words (e.g. “de kwal is boven de deur” -- “the jellyfish is above the door”). The sentences are prompted by serial presentation of the orthographic labels of two objects, followed by visual display of the referenced objects in a simple spatial configuration. Items consist of 112 words, sorted into four categories based on phonological neighborhood density (PND) and word frequency (high vs. low). Each item appears in first and second position in three sentence lists, totaling 336 sentences.
In sessions spaced approximately one to two weeks apart, the same participants attempt to recognize these sentences, evenly divided amongst the seven talkers, under three conditions: 1) 6-band noise-vocoded speech (NVS), 2) sentences embedded in speech-shaped noise at a ratio of -7 dB (SPIN), and 3) sentences filtered to approximate the signal generated by a combination of air and bone conduction and then embedded in -7dB noise (FilSPIN). Order of SPIN-FilSPIN sessions was counterbalanced across participants. For half of the participants, each sentence is preceded by a talker-label, consisting of either a common Dutch name or “jij” (“you”). These No-Label participants are not informed about the number of talkers or that some of the stimuli were based on their own recordings.
Preliminary results suggest that overall accuracy is greater for self-produced stimuli. This self-advantage is greater in the SPIN and FilSPIN sessions compared to the NVS session. However, the proportion of accurate responses appears to be modulated by the presence/absence of a label, whether or not the sentences had been filtered, and word frequency/PND. Taken together, these findings suggest that sensorimotor experience and lexical properties may interact to shape representations and that indexical cues guide access to different representations.