We believe people's need for instant access to information and resolution is rapidly increasing and a language interface is the best vehicle to fulfill such a need. Very soon, sophisticated voice interfaces, which help people accomplish complex tasks will be seen as expedient, easy to use, and enjoyable.
Language modeling is about to hit its platform zenith, similar to computer vision's rapid evolution over the past five years. Speech recognition achieved "human parity" in 2016; Machine synthesized ultra-realistic human voice was achieved in 2018; Late in 2018, Transformer models, led by BERT, GPT, and XLNet exceeded human-level performance in NLP tasks across the board.
The constituent parts of a true "talking machine" are available and converging fast. However, today's systems don't really converse. Users can use voice queries to look up addresses, play music, create alerts, and such, via Google assistant, Alexa, and Siri, but when confronted with more complex dialogue they all fail. The fundamental reason for such failure is that these systems are all based on speech recognition, and are not designed to understand conversations.
We have developed technology that understands conversations and can undertake complex tasks via a simple voice interface. We want to use this technology to provide better communication experiences to users. Eventually, we want to make our technology openly available so others can build useful applications to the benefit of society as a whole.