I hope you're having a great day. So, just the other day my boss handed us a new software to learn. This software uses AI to convert text into speech and the results are like deceivably great. The only problem is, some parts of it have this "wobble" in them probably because the AI has to change the tone/intonation of the words.
I've tried using Audition "Auto heal" but instead of just deleting the wobble it silences the part of the audio. It doesn't fix it all. I'm also not sure what the correct term of this problem is.
I was wondering if anyone has any solution to fixing this problem. This is what I see in the waveform. The word said is "world" but the r in the letter vibrates.
The word "world" sounds fine. A much bigger problem is that the whole delivery is not convincing at all. Lacking emotion and spoken too quickly. It's an impressive algorithm for sure, but it needs work. Maybe slowing down the overall speed might make the r sound more agreeable for you?
Last edited by Tomás Mulcahy on Fri Sep 18, 2020 10:00 am, edited 1 time in total.
The 'wobbles' or vibrations, as you call it, sounds like editing artefacts to me --points where the different sound elements are being stitched together.
Technical Editor, Sound On Sound...
(But generally posting my own personal views and not necessarily those of SOS, the company or the magazine!)
In my world, things get less strange when I read the manual...
Re: Wobble artifact produced by a text-to-speech software.
The whole thing sounds wobbly to me; "world", "Frodo", "quiet Hobbit". I don't think that's something you can fix in the mix; either something went wrong during the creation/edit or the algorithm needs work. Maybe you could try getting the system to repeat this section a couple of times and see if the glitches are identical. (In ye olden days you sometimes had to feed in different words to get the best result; eg using "whirled" instead of "world".)
Last edited by BJG145 on Fri Sep 18, 2020 10:55 am, edited 5 times in total.
Thank you for taking a time out of your day to check my post. My initial thought was that these wobbles are not possible to fix after I export them from the text-to-speech software (because there are a lot of them).
Surprisingly, a coworker found out that typing the problematic words in ALL CAPS fixes them for some reason.