Simon October 11, 2014

Pipeline architecture for TTS

slownormalfast

Log in if you want to mark this as completed

Most text-to-speech systems split the problem into two main stages. The first stage is called the front end and contains many separate processes which gradually build up a linguistic specification from the input text. The second stage typically uses language-independent techniques (although they still require a language-specific speech corpus) to generate a waveform. Here we see those two main stages, and also take a look inside the front end to see what some typical sub-components do.

Filed Under: Synthesis Tagged With: front end, video, waveform generation

emulabel
reply by Simon

1 week ago

Upload Audio Files to Qualtrics
2 weeks ago

About abstract and introduction
reply by Simon

2 weeks ago

Autocorrelation and Pitch Prediction in FastPitch Vs. UnitSelec
reply by Simon

2 weeks ago

SIOD ERROR: not a number
reply by Iakovi A

2 weeks ago

Synthesis with SoundStream
reply by Simon

2 weeks ago

save output of festival command
reply by Simon

2 weeks ago

About target cost
3 weeks ago

Voice with new dictionary and phone set
reply by Korin Richmond

1 month ago

Gibberish: Bad pitch marking or do_alignment?
reply by Simon

1 month ago

Response to Speech Synthesis feedback of 2024-02
reply by Simon

1 month ago

do_alignment script
1 month ago

Can't make mfcc list
reply by Simon

1 month ago

Phone (‘oir’) missing from unilex-gam?
reply by Zoë B

1 month ago

Out-of-dictionary words
reply by Simon

2 months ago