6 Vowel formants’ normalization

This section is based on (Adank 2003).

There are three possible sources of variation in vowel formants measurements (P. Ladefoged and Broadbent 1957; Pols, Tromp, and Plomp 1973: 1095; Adank 2003):

acoustic variation;
speaker variation;
- sociolinguistic;
- anatomical/physiological variation;
and measurement error (“residual variance” in (Pols, Tromp, and Plomp 1973)).

There are a lot of researchers aimed to reduce speaker-related variation using acoustic vowel normalization (e. g. (Gerstman 1968; Lobanov 1971; Syrdal and Gopal 1986)). However there are some researches that afraid that normalization procedures can reduce interesting for the linguistics information like sociolinguistic/dialectal signal in data (Hindle 1978; Disner 1980; Thomas 2002, 174–75).

Human listeners deal seemingly effortlessly with all three possible sources of variation, but the dataset from (Peterson and Barney 1952) shows extrordinary variation:

6.1 Acoustic vowel normalization procedures

There are several classes of vowel normalization procedures:

formant-based procedures (Gerstman 1968; Lobanov 1971; Fant 1975; Syrdal and Gopal 1986; Miller 1989);
whole-spectrum procedures (Klein, Plomp, and Pols 1970; Pols, Tromp, and Plomp 1973; Bladon and Lindblom 1981; Bladon 1982; Klatt 1982);
Neural networks (D. J. M. Weenink 1993; D. Weenink 1997).

Formant-based procedures are the most compact (just 2- or 3-dimensional represetations) and comparable crosslinguisticaly.

In (Adank 2003) author compared 11 methods of vowel normalization:

	abb	method
1	HZ	the baseline condition, formants in Hz
2	LOG	a log-transformation of the frequency scale
3	BARK	a bark-transformation of the frequency scale
4	MEL	a mel-transformation of the frequency scale
5	ERB	an ERB-transformation of the frequency scale
6	GERSTMAN	Gerstman’s (1968) range normalization
7	LOBANOV	Lobanov’s (1971) z-score transformation
8	NORDSTRÖM & LINDBLOM	Nordström & Lindblom’s (1975) vocal-tract scaling
9	CLIH i4	Nearey’s (1978) individual log-mean procedure
10	CLIH s4	Nearey’s (1978) shared log-mean procedure
11	SYRDAL & GOPAL	Syrdal & Gopal’s (1986) bark-distance model
12	MILLER	Miller’s (1989) formant-ratio model

6.2 (Lobanov 1971) z-score transformation

The idea behind the Lobanov’s method is simple z-normalization. Imagine some random distribution:

If we apply the folowing normalization, the distribution form will be the same, however the scale will be unified with mean = 0 and standard deviation = 1:

\[x_{normalized} = \frac{x-\mu}{\sigma}\]

library(phonTools)
data(pb52)
pb52 %>% 
  group_by(speaker) %>% 
  mutate(vowel = ipa::convert_phonetics(vowel, from = "xsampa", to = "ipa"),
         scaled_f1 = scale(f1),
         scaled_f2 = scale(f2)) %>%
  ggplot(aes(scaled_f2, scaled_f1, label = vowel, color = vowel))+
  stat_ellipse()+
  geom_text()+
  scale_x_reverse()+
  scale_y_reverse()

You can find implementation of other methods in R package vowels.

Try to normalize and visualize data from the dataset Hillenbrand et al. (1995), stored in h95 variable in the package phonTools.

Sometimes it make sense to get back to the formant values:

pb52 %>% 
  mutate(overall_mean_f1 = mean(f1),
         overall_sd_f1 = sd(f1),
         overall_mean_f2 = mean(f2),
         overall_sd_f2 = sd(f2)) %>% 
  group_by(speaker) %>% 
  mutate(vowel = ipa::convert_phonetics(vowel, from = "xsampa", to = "ipa"),
         sclaed_f1 = scale(f1),
         sclaed_f2 = scale(f2),
         restored_f1 = sclaed_f1*overall_sd_f1+overall_mean_f1,
         restored_f2 = sclaed_f2*overall_sd_f2+overall_mean_f2) %>%
  ggplot(aes(restored_f2, restored_f1, label = vowel, color = vowel))+
  stat_ellipse()+
  geom_text()+
  scale_x_reverse()+
  scale_y_reverse()

6.3 `vowels` package

You can find implementation of other methods in R package vowels:

library(vowels)
data(ohiovowels)
vowelplot(norm.lobanov(ohiovowels), color="vowels", label="vowels")

vowelplot(norm.labov(ohiovowels), color="vowels", label="vowels")

vowelplot(norm.nearey(ohiovowels), color="vowels", label="vowels")

vowelplot(norm.wattfabricius(ohiovowels), color="vowels", label="vowels")

References

Adank, P. M. 2003. “Vowel Normalization. A Perceptual Acoustic Study of Dutch Vowels.” PhD thesis, Ponsen & Looijen bv, Wageningen.

Bladon, RAW. 1982. “Arguments Against Formants in the Auditory Representation of Speech.” The Representation of Speech in the Peripheral Auditory System.

Bladon, RAW, and Björn Lindblom. 1981. “Modeling the Judgment of Vowel Quality Differences.” The Journal of the Acoustical Society of America 69 (5): 1414–22.

Disner, S. F. 1980. “Evaluation of Vowel Normalization Procedures.” The Journal of the Acoustical Society of America 67 (1): 253–61.

———. 1975. “Non-Uniform Vowel Normalization.” STL-QPSR 16 (2-3): 1–19.

Gerstman, Louis. 1968. “Classification of Self-Normalized Vowels.” IEEE Transactions on Audio and Electroacoustics 16 (1): 78–80.

Hindle, D. 1978. “Approaches to Vowel Normalization in the Study of Natural Speech.” Linguistic Variation: Models and Methods, 161–71.

Klatt, D. 1982. “Prediction of Perceived Phonetic Distance from Critical-Band Spectra: A First Step.” In ICASSP’82. IEEE International Conference on Acoustics, Speech, and Signal Processing, 7:1278–81. IEEE.

Klein, W., R. Plomp, and L. C. W. Pols. 1970. “Vowel Spectra, Vowel Spaces, and Vowel Identification.” The Journal of the Acoustical Society of America 48 (4B): 999–1009.

Ladefoged, P., and D. E. Broadbent. 1957. “Information Conveyed by Vowels.” The Journal of the Acoustical Society of America 29 (1): 98–104.

Lobanov, B. M. 1971. “Classification of Russian Vowels Spoken by Different Speakers.” The Journal of the Acoustical Society of America 49 (2B): 606–8.

Miller, J. D. 1989. “Auditory-Perceptual Interpretation of the Vowel.” The Journal of the Acoustical Society of America 85 (5): 2114–34.

Peterson, G. E., and H. L. Barney. 1952. “Control Methods Used in a Study of the Vowels.” The Journal of the Acoustical Society of America 24 (2): 175–84.

Pols, L. C. W., H. R. C. Tromp, and R. Plomp. 1973. “Frequency Analysis of Dutch Vowels from 50 Male Speakers.” The Journal of the Acoustical Society of America 53 (4): 1093–1101.

Syrdal, A. K., and H. S. Gopal. 1986. “A Perceptual Model of Vowel Recognition Based on the Auditory Representation of American English Vowels.” The Journal of the Acoustical Society of America 79 (4): 1086–1100.

Thomas, E. R. 2002. “Instrumental Phonetics.” In The Handbook of Language Variation and Change, edited by J. K. Chambers, P. Trudgill, and N. Schilling-Estes, 168–200. Oxford: Blackwell.

Weenink, D. J. M. 1993. “Modelling Speaker Normalization by Adapting the Bias in a Neural Net.” In Proceedings Eurospeech93, 2259–62. Berlin.

Weenink, David. 1997. “Category ART: A Variation on Adaptive Resonance Theory Neural Networks.” In Proc. Institute of Phonetic Sciences University of Amsterdam, 21:117–29. Citeseer.

6 Vowel formants’ normalization

6.1 Acoustic vowel normalization procedures

6.2 (Lobanov 1971) z-score transformation

6.3 vowels package

References

6.3 `vowels` package