Complete Voice As A Service Solution

Top radio and podcast hosts are in high demand these days, putting time constraints on their ability to voice everything that’s asked of them. That could range from localized tags for a national spot, a customized version of a show targeted for a whole new audience, or a podcast voiced in a different language. These and other use cases have audio stakeholders considering the use of synthetic voices, carefully constructed facsimiles of the real thing that sound like the original and can be used to more efficiently create spots and programming, while expanding the talent’s brand into new markets.

It’s no pipe dream. Two weeks ago Veritone, the Costa Mesa-based artificial intelligence tech company, introduced, an end-to-end Voice-as-a-Service (VaaS) product that creates “hyper-realistic” synthetic voices, which personalities can use to record endorsement spots and product testimonials.

“We believe that someone’s voice is another core pillar of their brand attributes, just like their name and likeness,” Veritone President Ryan Steelberg tells Inside Radio. “Individuals need to start thinking of their synthetic voice as an extension of who they are.”

The company has ambitious plans to create a centralized bureau that would serve as a trusted gatekeeper for the synthetic voices of radio and podcast talent, celebrities, influencers and others. When that podcast host wants to expand their show to French-speaking territories or do dynamic ad copy creation, they only have to deal with one service provider. The company is amassing a vault of high quality, isolated audio recordings of talent voices, known as “training data,” to construct realistic sounding synthetic voices.

Once enough audio recordings have been assembled, can produce audio files using the host’s synthetic voice in one of several ways. One is text-to-speech, where text is imported into the synthetic voice model and the machine creates audio using a synthetic version of the hosts’ voice. What started as text ends up as an audio file.

Speech-to-speech uses the same training data, only the input is a voice actor imitating the “tone and inflection that makes people’s signature voices unique,” Steelberg explains. This combination produces a more realistic sounding product.

The potential applications are still being explored. But at the simplest level, the tech can be used to produce hundreds of derivatives of content. “The tonnage of creating ad copy is a big one,” Steelberg says. “You produce one piece of national copy and you can regionalize it for 50 different markets or different dialects. The key is understanding the use case the talent is trying to achieve and then use different modalities for that execution.”

Virtual Chat With A Host

Applied to voice assistants and smart speakers, the technology could allow listeners to have a virtual chat with their favorite host. In a scenario first described by former ESPN Audio exec Traug Keller a few years back at a Radio Show, Keller played a prototype audio clip of sports talk host Scott Van Pelt interacting with a New York Yankees fan on a smart speaker. Only it wasn’t really Van Pelt, but a machine synthesizing his voice using hundreds of words the ESPN host recorded. The sports network’s answer to Siri or Alexa would allow listeners to have virtual conversations with hosts about their favorite teams.

Steelberg says could make this type of scenario a reality today. “In a conversational AI approach, a listener can interact through their computer or over the phone, having a conversation, asking questions and the feedback would come back in Dan Patrick’s voice or whoever’s trained voice,” he explains.

In the spirit of full disclosure, Veritone is working with the Interactive Advertising Bureau on a clear but unobtrusive way to inform listeners that a synthetic voice is being used. In media with a visual, that would be displayed at the bottom of the screen, similar to disclosures about paid sponsorships. For a purely audio medium, it would be 2-5 second audible tone.

The brave new world of synthetic voices is already a reality for San Jose Sharks hockey announcer Randy Hahn, along with digital influencers, talent agencies, broadcasters, and corporations using “We look at it as giving more creative freedom, more flexibility to the talent because it’s still their voice and they’re still behind it,” Steelberg says. “There are going to be so many killer use cases that are going to pop up.”