Microsoft reaches human parity in Conversational Speech Recognition

Last year, Microsoft reported that its transcription system reached the 5.9% word error rate (that was at par with that of human transcribers score). A second group of researchers working on multi-transcriber process, achieved a 5.1% error rate for humans.

According to the blog post by Microsoft Technical Fellow Xuedong Huang Microsoft has created a technology that recognizes words in a conversation as well as professional human transcribers with a 5.1 percent error rate, equaling the rate of that of humans.

He joins a host of other confirmed resignations such as of executive chairwoman of the Mozilla Foundation Mitchell Baker; president of the Institute of Electrical and Electronics Engineers (IEEE) Karen Bartleson; Sonia Katyal, a professor of law at the University of California; CEO of the Markle Foundation Zoë Baird; Greg Becker, CEO of Silicon Valley Bank; Oisin Hanrahan, CEO of Handy; and Corey Thomas, CEO of Rapid7.

Microsoft used Switchboard as its benchmark. Switchboard is a collection of recorded telephone conversations that has been used by the speech research community for more than two decades. Tests using the system involve transcribing conversations between strangers discussing topics such as sports and politics.

The post also stated that Microsoft strengthened the recognizer’s language model by using the entire history of a dialog session to predict what is likely to come next, effectively allowing the model to adapt to the topic and local context of a conversation. The next step in this direction would be trying to understand speech (and not just recognize).