آرشیو

آرشیو شماره ها:
۲۹

چکیده

پژوهش حاضر با هدف بهبود بازسازیِ گفتار در زبان فارسی و بررسی افت منحنی فرکانس پایه بین دو قله زیروبمی در چارچوب نظریه خودواحد عروضی آهنگ انجام شده است. دو فرضیه اصلی این پژوهش افت فرکانس پایه از طریق وقوع قاعده درون یابی آوایی و یا از طریق وقوع قاعده گسترش نواخت است و پیش بینی می شود که فرضیه اول پژوهش تأیید شود. داده های پژوهش، شامل 45 ساخت پی بستی، با توجه به واکدار بودن همخوان ها در تمامی هجاها بین دو قله زیروبمی طراحی شده و فاصله دو قله با افزودن به تعداد هجاهای بدون تکیه از مقدار صفر تا دو هجا افزایش داده شد. تعداد 1350 داده از 30 گویشور فارسیِ معیار ضبط و جمع آوری شد. این داده ها در نرم افزار پرات تحلیل آوایی شد و نتایج این تحلیل در برنامه اس پی اس اس بررسی شد. نتایج تحلیل ها نشان داد فاصله قله اول تا دره منحنی با افزایش تعداد هجاها افزایش یافته که نشان گر برهم نهادگیِ این دره با ابتدای هجای تکیه بر است. اختلاف معنادارِ این فاصله میان سه گروهِ داده ها به صورت دوبه دو نیز نشان دهنده افزایش آن به نسبت افزایش تعداد هجاهای بدون تکیه است. همچنین منحنی فرکانس پایه به طور میانگین برای داده ها رسم شد که نشان داد این افتِ میان دو قله تدریجی است و از طریق وقوع درون یابی رخ می دهد.

Phonetics and phonology of the F0 valley in Persian intonation

This research aims to improve speech synthesis in Persian and investigate the pitch contour fall between two H* peaks carried out in the framework of the Autosegmental Metrical (AM) theory of intonational phonology. In this paper, we tested two main hypotheses. Firstly, the F0 falls through phonetic interpolation and, secondly, it involves tone spreading. The present paper argues that the first hypothesis is proven to be correct. The data for the present research includes 45 enclitic phrases that were produced using voiced consonants between the two peaks. It means that in the course of producing each phrase, it was taken into consideration that all of the consonants placed between the two peaks would be deliberately and certainly voiced. This is mainly because if the consonants were not voiced, the data analysis results would be affected in Praat. It has to be noted that the distance between the two H* was increased by adding zero to two unstressed syllables. Totally, we recorded 1350 utterances from 30 native Persian speakers. We employed Praat software so as to analyse the utterances. Besides, we utilized SPSS for further analysis. The results of acoustic and statistical analyses showed that the distance between the first peak and the following F0 valley is increased with the addition of the unstressed syllables, demonstrating the alignment of this valley with the beginning of the stressed syllable. Results of statistical analyses revealed that the distance between the H and the following L target is increased significantly with the addition of the unstressed syllables. Furthermore, the normalized pitch contour was computed for all data. This indicates that the fall of the pitch contour between the two H peaks is realized through phonetic interpolation. Keywords: Speech Synthesis, Phonetic Interpolation, Tone Spreading, Autosegmental Metrical theory (AM), Pitch Accent   Introduction Text-to-speech technology can be used in many different ways such as allowing blind or visually impaired people to read texts. It also assists people with speech impairments to establish verbal communication and/or receive information from a text through listening. Speech intonation and the prosodic structure play a pivotal role in the process of synthetic speech production. The aim of this research is to investigate the pattern of the F0 declination between the H* and L+H* tonal targets in the Persian language. Being systematic, this declination is possible to predict F0 in the same tonal environments and use these predictions in the process of synthetic speech production of Persian phrases. The current research is carried out employing Autosegmental Metrical (AM) theory of intonational phonology. According to Autosegmental Metrical theory, H and L tones are regarded as phonological elements. The occurrence of a L* pitch accent in a word stands for the fact that its stressed syllable is produced with a low tone and the occurrence of a H* pitch accent means that a stressed syllable with a high tone is produced. In bitonal pitch accents, the starred tone is aligned with the stressed syllable and the un-starred tone appears immediately before or after the starred one (Sadeghi, 2018). The difference between the monotonal H* accent and the bitonal L*+H accent can be shown in terms of the placement of the F0 peak and valley. In the H* accent, the F0 peak is placed on the stressed syllable. In the L*+H accent, the F0 valley is placed on the stressed syllable and the F0 peak occurs slightly after it. The pitch accent in Persian is defined as a bitonal L+H* pitch accent (Mahjani, 2003; Sadat-Tehrani, 2009). This is the consecutive combination of low and high tones. It is to note that both are aligned with the stressed syllable. This research seeks to determine the type of the F0 declination, which can happen gradually or sharply. The gradual fall of F0 is due to phonetic interpolation. This is while the sudden and sharp decline in F0 is because of tone spreading. Given these considerations, two main hypotheses of this research run as follows: Firstly, the first H peak is interpolated to the L target. Secondly, the first H falls until the beginning of the second word in the phrase since the L in the second pitch accent spreads itself to the begnning of the second word.   Materials and Methods The data for this research comes from phrases that are featured with two pitch accents (H* L+H*). Thus, 45 enclitic phrases were produced based on this pattern, using voiced consonants between the two peaks, and the distance between the two H* was increased by adding zero to two unstressed syllables. In order to collect these utterances, thirty native Persian speakers, including 15 male and 15 female speakers, took part in our project. Below are examples of each group of data: (a) [ʔɑ.be rud] ɟelɑlud bud     1st word   2nd word   (b) tʃand mɑh piʃ dar [maziɢe.je mɑ.li] budand                                        1st word        2nd word       (c) ʔaz [mahal.le.je ɢa.di.mi] rafte budand                     1st word        2nd word         We employed Praat and ProsodyPro software for analysing so as to calculate the distances between the tonal targets. Using SPSS, we carried out other tests with the obtained data in the previous step. These tests included ANOVA and post-hoc tests and the correlation between each group of data. According to the data analysis, the L tone is commonly placed before the beginning of the stressed syllable. That is to say, the more the distance between the H tone in the first word and the stressed syllable, the more the L tone’s distance becomes. This finding supports the first hypothesis pointing to the phonetic interpolation between the two tonal targets. In order to inspect the overall changes in the pitch contour, the normalized pitch contour was also computed for all the research data.   Discussion of Results and Conclusions The results of analysing data confirm the validity of the phonetic interpolation hypothesis, according to which the L tone, being placed on the stressed syllable of the word, is where the F0 downtrend from the first H tone ends. To put it differently, the H tone in the first word and the L tone in the second word are interpolated to each other through a steady fall. Therefore, the F0 slope is distinct from the two tones. This distinction depends on the distance between the first H and the L tone. All the conducted tests in this research indicate that the L tone is aligned with the beginning of the stressed syllable, confirming that the F0 fall is realized through phonetic interpolation. As a result of the F0 fall between the first peak and the valley, the pitch contour’s slope depends on the placement of the stressed syllable or the L tone. The findings of this research can be used for synthetic speech production in Persian and in the prosodic pattern determination step of the text-to-speech systems. Predicting the precise changes in the F0 in the produced prosodic structure will result in a more natural production of the synthetic speech which will, in turn, enhance the functionality of the text-to-speech systems.

تبلیغات