Generating a Pronunciation Dictionary for European Portuguese Using a Joint-Sequence Model with Embedded Stress Assignment
Veiga, A.
;
Candeias, S.
;
Perdigão, F.
Generating a Pronunciation Dictionary for European Portuguese Using a Joint-Sequence Model with Embedded Stress Assignment, Proc Brazilian Comput Soc Brazilian Symp. in Information and Human Language Technology - STIL, Cuiabá, Mato Grosso, Brazil, Vol. , pp. 144 - 153, October, 2011.
Digital Object Identifier:
Abstract
This paper addresses the problem of grapheme to phoneme conversion in order to create a pronunciation dictionary from a vocabulary of the most frequent words in European Portuguese. A system based on a mixed approach funded on a stochastic model with embedded rules for stressed vowel assignment is described. The model can generate pronunciations from unrestricted words; however, a dictionary with the 40k most frequent words was constructed and corrected interactively. The vocabulary was defined using the CETEMPúblico corpus. The model and dictionary are publicly available.