Effect of quality loss on the perception of emotion

An article, posted more than 11 years ago filed in , , & .

The central question in this document is whether sound quality degradation affects the perception of emotion. Based on this short review of literature I believe that it is safe to assume that as long as the contents of speech is intelligible, emotions can be heard.

Note: This document one of the 'darlings' that I had to kill to keep my thesis focussed. I've invested little effort in making this a great piece for reading... (This document was written as an early summary of my literature review that was part of my graduation project. As a result of this investigation I decided not to dive into this matter any further)

"Sound quality" has been analyzed systematically in emotion perception research in the form of inference studies (Scherer, 2003) which have been employed to investigate which properties of a sound are most important for the communication of emotions. One methodology for analyzing the respective contribution In one of these studies, of each parameter is based on cue masking (Scherer, 2003), which is in fact a controlled change in sound quality. By comparing, in a controlled experiment, an emotional utterenace with a degraded version with the original sound, information can obtained on the relative contribution of the masked property.

Frick (1985) provides a literature review of studies that systematically degraded sound quality. Some of the experiments cited filtered everything but the pitch, others filtered everything but the loudness (or everything but loudness and pitch), the sound was played back in reverse, etc. (note in some studies, instead of filtering, the properties were (and had to be) (re)synthesised). Based on his review, Frick concluded that "no matter how an utterance is degraded, the loss of prosodic features seems to impair recognition of emotion, and [ed. but] the remaining prosodic features still allow better than chance recognition." (p. 415). There seems to be much redundancy in the communication of emotion; a wide array of sound properties is used. Although limiting the array width impairs emotion recognition, only removal of almost all sound properties makes above chance emotion recognition impossible, such as in the 130Hz low pass filtering used by Scherer, Ladd and Silverman (1984). But in such case, also the intelligibility of speech is also severely impaired.

It is believed that the mechanisms at work for emotional encoding/deduction in music are the same of those for speech (Juslin & Laukka, 2003). Since single musical instruments are able to communicate emotion (Juslin, Friberg & Bresin, 2002), even with very little parameters to 'play' with (intensity, frequency, duration/pauses), one might conclude that emotion can be communicated with very simple means already, means (or properties) that are quite well communicated over very poor lines (in terms of bandwidth).

Neglected so far is the effect that people communicating with each other may adjust their way of speaking to accommodate for the lack of sound quality, the study of transmission. Scherer (2003) believes most researchers have ignored the transmission of sound as a "'quantité négligeable'" (p. 240). Scherer suggests that people may adjust their voice to bad communication channels. Due to weaknesses in the signals, people may start to communicate differently than normal (e.g. talk louder). This change in signal may also affect, or at least distort, the emotional percept. It doesn't really matter whether this is a learned effect or not if we are considering the design of communication systems. Junqua (1996) reports for example that people are known to adjust their voice when the receiver is in a noisy situation as a learned behaviour. It is for this reason that our voice output is intentionally played back to us through the earpiece in telephony (DiFilippo & Greenebaum, 2004). "If we didn't have a clear sense of hearing ourselves speak, we would tend to talk louder and louder because we [would] assume that the listener on the other end...[would not be able to] hear us either." (DiFilippo & Greenebaum, 2004, pp. 74-75)

Although one's gut feeling might be that emotion signals are something of an higher level than, or an additional level to, communication of information, this view seems not to correspond with evolutionary theories on how speech developped. Emotional expression is not unique to humans, whereas speech is. Human affect bursts have much in common with animal affect vocalisations (Scherer, 1995). Many researchers in the past have made a difference between more primary ('nature sounds') and secondary interjections (interjections that have become assimilated in language). Scherer suggests that our current vocal communication code may have evolved from these primary bursts of emotions (p. 236).

To conclude, degradation of sound quality can affect the perception of emotion, but as emotion is conveyed in many sound properties, the degrading effect of the communication channel might indeed be a "'quantité négligeable'", as Scherer (2003) puts it. The most important features, such as F0, amplitude, duration of voiced periods, energy distribution above 1000Hz, are quite reasonably communicated already. When considering the ideas behind the origin of speech, it is unlikely that emotion is degraded easier than speech (i.e. if speech is intelligible, there is reason enough to assume that the emotions are recognizable above chance as well). Since their may be little reason for lowering the sound quality to lower than what customers are accustomed to nowadays, I do not think it is worth investigating the effects of extreme sound degradation for practical applications (as in filtering information from the signal). Of potential interest, however, might be the effects on how people react to different transmission qualities such as talking louder to accommodate for the bad communication channel as noted by Scherer. Furthermore, it may also be interesting to consider the effects of random delays caused by packet loss, and other artifacts/distortions introduced with (digital) transmission (information that is added to the signal that may interfere with the original emotional percept).


Frick, R.W. (1985). Communicating Emotion: The role of Prosodic Features. Psychological Bulletin 97(3), 412-429
Junqua, J. (1996). The influence of acoustics on speech production; a noise-induced stress phenomenon known as the Lombard reflex. Speech Communication 20 (1-2), 13
Juslin, P.N., Laukka, P. (2003). Communication of Emotions in Vocal Expression and Music Performance: different Channels, Same Code? Psychological Bulletin 129(5), 770-814
Juslin, P.N., Friberg, A. & Bresin, R. (2002). Toward a computational model of expression in music performance: the GERM model. Musicae Scientiae. Special Issue 2001-2002, 63-122
Scherer, K.R. (1995). Expression of emotion in voice and music. Journal of Voice 9(3), 235-248
Scherer, K.R. (2003). Vocal communication of emotion: A review of research paradigms. Speech Communication 40, 227-256
Scherer, K.R., Ladd, R., Silverman, K.E.A. (1984). Vocal cues to speaker affect: testing two models. The Journal of the Acoustical Society of America 76 (5), 1346-56

Op de hoogte blijven?

Maandelijks maak ik een selectie artikelen en zorg ik voor wat extra context bij de meer technische stukken. Schrijf je hieronder in:

Mailfrequentie = 1x per maand. Je privacy wordt serieus genomen: de mailinglijst bestaat alleen op onze servers.