Back in 2007-2008 Mehryar Mohri ,Fernando Pereira and Google researcher Michael Riley, were developing speech recognition system based on Weighted Finite State Transducer.The transducer proposed by them provided a common and natural representation for major components of speech recognition systems, including hidden Markov models (HMMs), context-dependency models, pronunciation dictionaries, statistical grammars, and word or phone lattices.[1]

A finite-state transducer is a finite automaton whose state transitions are labeled with both input and output symbols while a weighted transducer puts weights on transitions in addition to the input and output symbols.[1] The examples in Figure 1(a) [1, Fig.1] represents the language model used for speech recognition by giving each transition identical input and output labels. This language model predicts how probable particular sentence over other. For example, if the first is ‘using’ (having probability of occurrence 1) then the word following it can be either ‘data’ with probability of occurrence 0.66 or it can be intuition with probability 0.33. And the probability of occurrence of a sentence is the product of probability of occurrence of all the words in the sequence. This transducer, hence, provide a grammatically correct and context dependent output from a speech recognition system.

Similarly,Figure 1(b) [1, Fig. 1] represents pronunciation lexicon as a mapping from phone strings to words in the lexicon, in this example data and dew, with probabilities representing the likelihoods of alternative pronunciations. It transduces a phone string that can be read along a path from the start state to a final state to a word string with a particular weight. The word corresponding to a pronunciation is output by the transition that consumes the first phone for that pronunciation.In figure 1(b) [1, Fig. 1] the transitions that consume the remaining phones output no further symbols, indicated by the null symbol ε as the transition’s output label. In general, an input label marks a transition that consumes no input, and an output label marks a transition that produces no output. This model helps in transcribing sound into text from the phonetics of the speech.

Fig. 1. Weighted Finite State Transducer

Speech recognition is a combination of different parts, and Mehryar Mohri et al proposed an operation called Composition to combine different level of representation in transducers. For instance, the two examples mentioned above i.e. a pronunciation lexicon and word-level grammar can be combined using composition to produce a phone-to-word transducer whose word strings are restricted to the grammar. While creating a composition, say T = T1 ◦ T2 of two transducers T1 and T2 , it should satisfies three conditions: (1) its initial state is the pair of the initial state of T1 and the initial state of T2; (2) its final states are pairs of a final state of T1 and a final state of T2, and (3) there is a transition t from (q1, q2) to (r1, r2) for each pair of transitions t1 from q1 to r1 and t2 from q2 to r2 such that the output label of t1 matches the input label of t2. The transition t takes its input label from t1, its output label from t2, and its weight is the combination of the weights of t1 and t2 done with the same operation that combines weights along a path. [1]

M. Mohri et.al also proposed an approach called Determinization for transducer based speech recognition system. They used determinization algorithm to transforms a nondeterministic weighted automaton into an equivalent deterministic automaton. They did so in order to achieve an advantage of ir-redundancy which means it contains at most one path ( which is not the case for non-deterministic automata) matching any given input string. This significantly reduces the time and space needed to process the string which is quite important in ASR due to pronunciation lexicon redundancy in large vocabulary tasks.The tree lexicon in ASR , proposed by Ortmanns et al, is a example of deterministic pronunciation representation [2].Weighted Determinization achieved this ability to eliminate redundant paths by calculating the combined weight of all paths with the same labeling. When each path represents a disjoint event with probability given by its weight, the combined weight, representing the probability of the common labeling for that set of paths, would be the sum of the weights of the paths. This thus results in a single path making better speech recognition system.

M. Mohri et al. proposed minimization algorithm that can be used to minimize deterministic weighted automaton .A deterministic weighted automaton is said to be minimal if there is no other deterministic weighted automaton with a smaller number of states that represents the same mapping from strings to weights. Thus, implementing minimization algorithm on ,say, automaton A results in a weighted automaton B is equivalent to the automaton A, and has the least number of states and the least number of transitions among all deterministic weighted automata equivalent to A. This algorithm works in two steps: the first step pushes weight among transitions, and the second applies the classical minimization algorithm proposed by Aho et al [3] . to the result with each distinct label-weight pair viewed as a distinct symbol. Weight pushing is useful not only as a first step of minimization but also to redistribute weight among transitions to improve search, especially pruned search. By using above mentioned method M.Mohri et al. achieved 88% word accuracy on DARPA Eval ’95 test set.

## Reference

[1] M. Mohri, F. Pereira and M. Riley, "Speech Recognition with Weighted Finite-State Transducers", *Google AI*, 2018. [Online]. Available: https://ai.google/research/pubs/pub32892. [Accessed: 21- Jul- 2018].

[2] S. Ortmanns, H. Ney and A. Eiden, "Language-model look-ahead for large vocabulary speech recognition - IEEE Conference Publication", *Ieeexplore.ieee.org*, 1996. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/607215/. [Accessed: 23- Jul- 2018].

[3] A. Aho, R. Sethi and J. Ullman, "Compilers: principles, techniques, and tools", *Dl.acm.org*, 1986. [Online]. Available: https://dl.acm.org/citation.cfm?id=6448. [Accessed: 23- Jul- 2018].