Over the last few months Iβve leaned heavily into the ElevenLabs AI text-to-speech voices in Articulate Storyline and I have some tips to share.
First, make sure you are using the drop down AI Text-to-Speech. The regular Text-to-Speech functionality only provides access to the old Amazon Polly voices and they do not compare.
Second, the new voices have a drawback: you canβt use SSML to modify pronunciation yet. I phonetically spell things out to achieve better output. Here are a few examples:
- Abbreviations and Acronyms -> Spell these out how you want them pronounced. For example, POS. This means point-of-sale and for this I write βposeβ.
- # * & -> Write out special characters like hashtag because AI butchers them.
- Capital letters -> AI was pronouncing UP_FRONT in my script like βyou pee front,β ha! So I had to script it as Up Front.
- One-offs like “Read only” -> I write this as “reed only”, otherwise AI pronounces it as “red only” every time.
My basic workflow is generate the audio, export the .wav file to Camtasia, edit/splice the audio into my video, export an .mp4 back to Storyline. Iβm skipping the step where I recorded and edited audio in Audacity.
These new and improved text-to-speech voices save time not only in production, but also revisions and future updates.
Have you found any funny quirks with pronunciation as well? I hope my tips help!
View original post on LinkedIn.


