Face of a Robot, Voice of an Angel?

13.09.2016

Robotic speech can still get much, much better. Enter Google’s machine learning division, DeepMind, which has just announced a new text-to-speech technology called WaveNet. Unlike many previous approaches, the system uses convolutional neural networks to create an entire raw audio waveform from the ground up. Trained using real audio recordings, when provided with text it works through a sentence stepwise, generating short utterances which are combined to create the final waveform. DeepMind claims that the results sound more natural than the best existing systems, but you can judge for yourself by listening to some examples. There’s one hitch: the technique, which needs to produce perhaps 16,000 data points for every second of audio it creates, is incredibly computationally expensive. According to the Financial Times, that means it won’t, for now, be used within any of Google’s products.

Experten- und Marktplattformen

Cloud Computing

Technologie-Basis zur Digitalisierung

mehr

Sicherheit und Datenschutz

Vertrauen zur Digitalisierung

mehr

Anwendungen

wichtige Schritte zur Digitalisierung

mehr

Digitale Transformation

Partner zur Digitalisierung

mehr

Energie

Grundlage zur Digitalisierung

mehr

Experten- und Marktplattformen

  • company

    Cloud Computing –

    Technologie-Basis zur Digitalisierung

    mehr
  • company

    Sicherheit und Datenschutz –

    Vertrauen zur Digitalisierung

    mehr
  • company

    Anwendungen –

    wichtige Schritte zur Digitalisierung

    mehr
  • company

    Digitale Transformation

    Partner zur Digitalisierung

    mehr
  • company

    Energie

    Grundlage zur Digitalisierung

    mehr