In [2]:
%matplotlib inline
In [1]:
from jupyterthemes import get_themes
import jupyterthemes as jt
from jupyterthemes.stylefx import set_nb_theme
set_nb_theme('oceans16')
Out[1]:
In this notebook we'll discover curious case of Turkish articulatory phonetics. We'll discover phonemes that foreigners usually ask, from our unique Ğ (g with breve) to less famous phonetic events.

Since I have infinite resource :) I decided to prepare a small speech corpus. We'll see
- Short/long vowels
- Vowel ephentesis in loan words
- Lack of diphthongs
- Famous "soft G", Ğ
- Word level stress, morphology dependencies

Let's hit it:
In [2]:
import matplotlib.pyplot as plt
from IPython.display import Audio
In [4]:
import librosa
import librosa.display
import numpy as np

Short/Long Vowels¶

Turkish has many Arabic/Persian loan words. Before Script Revolution, Turkish language was indeed a mix of Persian, Arabic and Turkish words. Today, in modern Turkish most of the language is Turkish words but still with a high influence of Arabic and Persian.
Problem with these words is, usually there are Turkish counterparts written the same but vowel is indeed different. Compare

kâr (profit) vs kar (snow)
hâla (still) vs hala (aunt)
Kâsım (proper noun, a male name) vs Kasım(November)

"Roof mark" , ^ is excluded from written language in 90s. Hence, these words become indistinguishable in ortography and create trouble for SMT, ASR and many other statistical systems.

Let's hear the words and compare vowel lengths:
In [5]:
from scipy.io import wavfile
rate, x = wavfile.read("/home/altinok/Desktop/cool_stuff/kar.wav")
Here goes "kar" with short "a", native Turkish word for "snow":
In [6]:
Audio('/home/altinok/Desktop/cool_stuff/kar.wav')
Out[6]:
In [ ]:
"Kâr" (profit) sounds as follows:
In [3]:
Audio('/home/altinok/Desktop/cool_stuff/kar_uzun.wav')
Out[3]:
In [10]:
plt.figure(figsize=(20, 10))
plt.subplot(1, 2, 1)
plt.title("Raw Signal")
plt.plot(x);

plt.subplot(1, 2, 2)
D = librosa.amplitude_to_db(librosa.stft(x), ref=np.max)
librosa.display.specshow(D, y_axis='linear')
plt.colorbar(format='%+2.0f dB')
plt.title('Linear-frequency power spectrogram')
Out[10]:
Text(0.5,1,u'Linear-frequency power spectrogram')
Compare it to the "kâr", with long "a".

1. See the difference in vowel length from the raw signal
2. See the similarity of formants of both "a"s in spectogram
3. Also hear and see the difference of "k". Here, "k"s are different indeed. "k" has two allophones, /c/ and /k/ ; platalized and non-palatalized allophones. Long vowels has the effect of palatalizing the consonants in the same syllable as well. Here, "k" in "kâr" is indeed /c/, where "k" of kar is /k/. See the different signatures of /k/ and /c/ from the spectograms. Turkish SAMPA distinguishes /c/ and /k/, obviously they are different phonemes from spectogram signatures: http://www.phon.ucl.ac.uk/home/sampa/turkish.htm
In [12]:
rate, x = wavfile.read("/home/altinok/Desktop/cool_stuff/kar_uzun.wav")
In [13]:
plt.figure(figsize=(20, 10))
plt.subplot(1, 2, 1)
plt.title("Raw Signal")
plt.plot(x);

plt.subplot(1, 2, 2)
D = librosa.amplitude_to_db(librosa.stft(x), ref=np.max)
librosa.display.specshow(D, y_axis='linear')
plt.colorbar(format='%+2.0f dB')
plt.title('Linear-frequency power spectrogram')
Out[13]:
<matplotlib.text.Text at 0x7fcdaceeca90>

Epenthesis¶

Speaking from loan words, vowel epenthesis happens often in Western loan words. Turkish has no onset clusters; every syllable has one vowel and every vowel is one syllable; there is 1-1 correspondance of vowels-syllables. Hence no wonder we insert one vowel first syllable to split the underlying onset cluster. See the extra vowels in pronunciations:

Kral     k 1 r a 5
Bruksel     b y r y c s e l
Twitter     t i v i t 1 r
Assaf, a colleague and friend of mine, a senior linguist noticed this phonetic event with his educated ears. He said I insert an extra vowel, and I told him it's very common among native Turkish speakers.

Let's hear "Brüksel" from my mouth. Notice the two distinct vowels in the spectogram:
In [36]:
Audio('/home/altinok/Desktop/cool_stuff/Bruksel2.wav')
Out[36]:
In [37]:
rate, x = wavfile.read("/home/altinok/Desktop/cool_stuff/Bruksel2.wav")
In [35]:
plt.figure(figsize=(20, 10))
plt.subplot(1, 2, 1)
plt.title("Raw Signal")
plt.plot(x);

plt.subplot(1, 2, 2)
D = librosa.amplitude_to_db(librosa.stft(x), ref=np.max)
librosa.display.specshow(D, y_axis='linear')
plt.colorbar(format='%+2.0f dB')
plt.title('Linear-frequency power spectrogram')
Out[35]:
<matplotlib.text.Text at 0x7fcda44ff890>
Ben is a colleague, a talented coder and an English native speaker from USA. Hence, his vocal tract definitely knows how to produce fancy clusters that I can't produce. Listen to "Brussels" from him, see the onset cluster from the spectogram:
In [23]:
Audio('/home/altinok/Desktop/cool_stuff/Brussels.wav')
Out[23]:
In [26]:
rate, x = wavfile.read("/home/altinok/Desktop/cool_stuff/Brussels.wav")
In [27]:
plt.figure(figsize=(20, 10))
plt.subplot(1, 2, 1)
plt.title("Raw Signal")
plt.plot(x);

plt.subplot(1, 2, 2)
D = librosa.amplitude_to_db(librosa.stft(x), ref=np.max)
librosa.display.specshow(D, y_axis='linear')
plt.colorbar(format='%+2.0f dB')
plt.title('Linear-frequency power spectrogram')
Out[27]:
<matplotlib.text.Text at 0x7fcdac3c0f90>

Why I can't produce diphthongs¶

I can't produce diphthongs, that's right. Today I tried to say "low" and I failed while Assaf and Ben succeeded :)
Let's see it on action:
In [39]:
Audio('/home/altinok/Desktop/cool_stuff/low_duygu.wav')
Out[39]:
In [41]:
rate, x = wavfile.read("/home/altinok/Desktop/cool_stuff/low_duygu.wav")
In [42]:
plt.figure(figsize=(20, 10))
plt.subplot(1, 2, 1)
plt.title("Raw Signal")
plt.plot(x);

plt.subplot(1, 2, 2)
D = librosa.amplitude_to_db(librosa.stft(x), ref=np.max)
librosa.display.specshow(D, y_axis='linear')
plt.colorbar(format='%+2.0f dB')
plt.title('Linear-frequency power spectrogram')
Out[42]:
<matplotlib.text.Text at 0x7fcda43011d0>
Unlike English, for Turkish backness and roundness of vowels are uncorrelated. We have both front round (ö, ü) and back round vowels (o, u). If we look at the formants of the spectogram, we notice low F2 pointing to backness. Relatively high F1 goes to being low and F3 together with gap between F1 and F2 hints roundedness..this vowel is /o/. So, I pronounce a monophthong unfortunately. Also notice, there is only one rise in speech signal and duration is rather long. Ben comes to rescue again:
In [43]:
Audio('/home/altinok/Desktop/cool_stuff/low_ben.wav')
Out[43]:
In [45]:
rate, x = wavfile.read("/home/altinok/Desktop/cool_stuff/low_ben.wav")
In [46]:
plt.figure(figsize=(20, 10))
plt.subplot(1, 2, 1)
plt.title("Raw Signal")
plt.plot(x);

plt.subplot(1, 2, 2)
D = librosa.amplitude_to_db(librosa.stft(x), ref=np.max)
librosa.display.specshow(D, y_axis='linear')
plt.colorbar(format='%+2.0f dB')
plt.title('Linear-frequency power spectrogram')
Out[46]:
<matplotlib.text.Text at 0x7fcda41080d0>
Notice the transition from one vowel to another in the spectogram. However, there is ONLY ONE rise in the speech signal; technically there's one sound; but two vowels in spectogram. Being an US American, Ben's vocal tract accomplishes the glide perfectly, while all I was able to produce was "Lovvvvvvvvv".

Soft G!¶

Soft G is controversial sound. In Turkish, it's accepted as a consonant while it's indeed not a real sound. In some cases it lengthens the previous vowel, in some cases it vanishes completely. Here are some observations:

- When it is in word-final or syllable-final position, it lengthens a preceding back vowel e.g. dağdan /d a: d a n/ and dağ /d a:/

- Between identical back vowels it is inaudible e.g. uğur /u: r/, ağarmak /a: r m a k/ and sığır /s ı: r/.

- Between identical front vowels it is either inaudible e.g. bildiğim /b i l d i: m/, or sounds like a palatal glide e.g. düğün /d y j y n/.

- When it occurs between an e and an i it is either inaudible or pronounced as a palatal glide /j/. Word "değil" is often heard as /d e j i l/ and /d i: l/.

- In the occurrence of between i and e, soft g mostly heard as palatal glide /j/ : "diğer" is often pronounced as /d i j e r/ and sometimes incorrectly written as "diyer"

- Between rounded vowels it is mostly inaudible e.g. soğuk /s o u k/.

- Between a rounded vowel and an unrounded vowel it is mostly inaudible e.g. doğan /d o a n/.

- a+ğ+ı sequences may either sound like a sequence of /a/ followed by /1/ or like a sequence of two /a/ vowels: ağır as /a 1 r/ or /a: r/.

- ı+ğ+a sequences are pronounced as sequences of /a/ followed by /1/ : sığan /s 1 a n/ Let's see Soft G on the action. You'll hear the word "dağ", here Soft G is in syllable-final position, hence he's expected to lengthen syllable vowel "a". See the duration of the vowel in speech signal and compare it to another one-syllable word "kar", very first word of this notebook:
In [61]:
Audio('/home/altinok/Desktop/cool_stuff/dag.wav')
Out[61]:
In [62]:
rate, x = wavfile.read("/home/altinok/Desktop/cool_stuff/dag.wav")
In [63]:
plt.figure(figsize=(20, 10))
plt.subplot(1, 2, 1)
plt.title("Raw Signal")
plt.plot(x);

plt.subplot(1, 2, 2)
D = librosa.amplitude_to_db(librosa.stft(x), ref=np.max)
librosa.display.specshow(D, y_axis='linear')
plt.colorbar(format='%+2.0f dB')
plt.title('Linear-frequency power spectrogram')
Out[63]:
<matplotlib.text.Text at 0x7fcd96c7f990>

Stress, Stress, Stress¶

Turkish stress pattern is highly morphology dependent. Majority of the native words carry stress in their last syllable, suffixes usually shifts the stress “towards” the end of the word. Word-final stress is more or less standart for most words, exceptions occur with old Arabic/Persian loan words.

Let's see one example. "Koyun" can be interpreted both as

koyun     koy+un <2per>
koyun     koyun

Imperative suffix carries stress to previous syllable, hence

koyun     k o+ j u n <2per>
koyun     k o j u+ n

From the experience with Kaldi, I noticed that neural network learns the stress pattern anyway, no need to feed stress position. Indeed, it's better not to mark the stress position at all and let neural network learn it.

Next¶

We had a good time observing vowel duration, spectogram signatures! (hopefully). I give my thanks to smart and talented colleagues Assaf and Ben for their contributions.

Next time, we'll swicth from phonetics to text processing. We'll play with time/place efficient data structures for storing German, English and Turkish lexicons. Stay tuned for more speech&language processing.

Greetings from Berlin,
Duygu.