In this episode of Device Squad, the podcast for the Mobile Enterprise from Propelics, Steve gets futuristic with MERL Senior Principal Research Scientist, John Hershey. The conversation centers around the current state of Neural Networks and artificial intelligence as John brings us the news from the recent NIPS (Neural Information Processing) conference in Barcelona.
We discuss voice recognition and replication strategies and what role they’ll play in our everyday lives—along with John’s current project, deep learning for signal separation, speech recognition, language processing, and multi-modal semantic representation learning.
In other words, John has solved the problem of isolating a single voice in a crowd, a process known as Deep Clustering.
Specifically, we discuss:
The Universe Project – a software platform for measuring and training an AI’s general intelligence across the world’s supply of games, websites and other applications.
WaveNet – a deep generative model of raw audio waveforms – able to generate speech which mimics human voice and sounds more natural than the best text-to-speech systems.
Google’s DeepDreams – DeepDream is a computer vision program created by Google which uses a convolutional neural network to find and enhance patterns in images via algorithmic pareidolia, thus creating a dreamlike hallucinogenic appearance in the deliberately over-processed images.
MERL Deep Clustering – Training deep discriminative embeddings to solve the cocktail party problem.The human auditory system gives us the extraordinary ability to converse in the midst of a noisy throng of party goers. Solving this so-called cocktail party problem has proven extremely challenging for computers, and separating and recognizing speech in such conditions has been the holy grail of speech processing for more than 50 years. Deep clustering is a recently introduced deep learning architecture that uses discriminatively trained embeddings as the basis for clustering, producing unprecedented speaker-independent single-channel separation performance on two-speaker and three-speaker mixtures.
John also predicts when our robot overlords will finally take over, and whether or not the revolution will take the form of an army of seemingly benevolent toys. Also, how long it will be before Alexa (and other) voice-controlled devices begin targeting content based on our emotional states.
Lastly, our two heroes engage in an exciting game of BLIP. Tune in and find out what this 1970’s TOMY game has to do with with artificial intelligence and analog processing!
It’s a long episode but a great one so be sure to tune in!
Oh, and by the way, Mind Flex is a scam.
Content Strategy Lead at Anexinet