TTS

From Nikipedia
Jump to navigation Jump to search

TTS (Text-to-Speech) is the technology of converting human language text into speech. My first encounter with rudimentary TTS was on a Commodore 64 computer at my friend Dylan's house in 1987. At that time, it only synthesized very basic phonetic spellings, and the quality was quite poor. However, by the mid-1990s, Mac System 7 (or maybe 8) on Apple computers had SimpleText, which could read English text with fairly high precision. I had already begun using TTS for proofreading letters that I would print and mail or fax. When email gained popularity in 1997, I was already using my computer to read messages to me and to read back my draft emails before I sent them.

In 1999, there were numerous options for TTS on Microsoft Windows computers, including third-party apps and a free Speech SDK from Microsoft that came with a basic TTS app. As I got into C#/Dot Net development in the mid-2000s, I developed my own clipboard-saving TTS app. In the early 2010s, I transitioned back to using Mac computers and was delighted that both Safari and Google Chrome browsers had easy select-to-speak technology.

Unlike iOS/Android Kindle, I can listen to my books on my Mac

I enjoy using the Kindle app on my Apple computer because of its easy-to-use text-to-speech feature. My relationship with this technology was one of the inspirations for our app, AutoWIKI, which reads or "plays" geotagged content to you in a Siri or Google Assistant voice.

Server-side and AI Speech

By the late 2010s, it was already apparent that the speech synthesis done in the Google, Amazon, or Microsoft clouds was far superior to the local TTS technology on phones and PCs. In the 2020s, with technologies like ChatGPT, 11Labs, Speechify, and more, we are in an age of more natural-sounding TTS than ever. One of the new trends from this is the proliferation of popular Reddit discussions converted to speech and often accompanied by video footage from Subway Surfers, Minecraft, or other video games.

An example of this approach can be seen in the voices available on TikTok. You can see one I created here: A Few Milliseconds Of 2073.

See Also