Microsoft's Cortana can now recognise human speech with the same accuracy as a professional human transcriber
- Microsoft made improvements to its conversational speech recognition system
- This resulted in a 5.1 per cent margin of error in line with trained professionals
- The firm achieved a 5.9 per cent error rate equal to the average person last year
- The Washington-based company is now setting its sights on getting machines to understand the meaning behind the words they recognise
A new milestone in human speech recognition has been reached by Microsoft, matching the accuracy of trained human transcribers.
The firm's software, used in its Cortana voice assistant, has achieved a 5.1 per cent margin of error, putting it on a par with professionals.
One of the big frustrations of voice recognition has been getting machines to accept commands, a process which often involves repetition and exaggerated speech.
The development means the company's products will soon accept orders with super-human precision.
Scroll down for video
A new milestone in human speech recognition has been reached by Microsoft, matching the accuracy of trained human transcribers. The firm's software, used in its Cortana voice assistant (pictured), achieved a 5.1 per cent error rate, putting it on a par with professionals
The findings were published in a technical report published by Microsoft on Saturday.
Last year, researchers from Microsoft Artificial Intelligence and Research reached a 5.9 per cent error rate, the same as the average person.
The new paper details how experts used improvements in AI to refine its conversational speech recognition system.
This allows the system to better recognise the waveform of speech patterns, moment to moment and word to word.
It also uses the context of a conversation to predict what is likely to come next.
The technology is used in the company's Cortana voice assistant that allows users to perform a range of tasks, from checking the weather to chatting.
It also provides a voice translation service.
Writing on the Microsoft Research blog, technical fellow Xuedong Huang said: 'Reaching human parity with an accuracy on par with humans has been a research goal for the last 25 years.
'Microsoft's willingness to invest in long-term research is now paying dividends for our customers in products and services such as Cortana, Presentation Translator, and Microsoft Cognitive Services.
'It's deeply gratifying to our research teams to see our work used by millions of people each day.'
Switchboard is a body of recorded telephone conversations that the speech research community has used for more than 20 years to test voice recognition systems.
The task involves transcribing conversations between strangers discussing topics ranging from sports to politics.
Previous research has shown that humans achieve higher levels of agreement on the precise words spoken as they expend more care and effort, as in the case of professional transcribers. This images shows some of the options available through the Cortana voice assistant
Previous research has shown that humans achieve higher levels of agreement on the precise words spoken as they expend more care and effort, as in the case of professional transcribers.
Microsoft says it is now turning its attention to solving some of the remaining challenges facing speech recognition, as well as teaching machines to understand what they hear.
Mr Huang added: 'While achieving a 5.1 per cent word error rate on the Switchboard speech recognition task is a significant achievement, the speech research community still has many challenges to address.
Microsoft says it is now turning its attention to solving some of the remaining challenges facing speech recognition, as well as teaching machines to understand what they hear. This image shows the firm's voice translation service
'[This includes] achieving human levels of recognition in noisy environments with distant microphones, in recognising accented speech, or speaking styles and languages for which only limited training data is available.
'Moreover, we have much work to do in teaching computers not just to transcribe the words spoken, but also to understand their meaning and intent.
'Moving from recognising to understanding speech is the next major frontier for speech technology.'
Most watched News videos
- Shocking moment woman is abducted by man in Oregon
- MMA fighter catches gator on Florida street with his bare hands
- Moment escaped Household Cavalry horses rampage through London
- Wills' rockstar reception! Prince of Wales greeted with huge cheers
- Vacay gone astray! Shocking moment cruise ship crashes into port
- New AI-based Putin biopic shows the president soiling his nappy
- Rayner says to 'stop obsessing over my house' during PMQs
- Ammanford school 'stabbing': Police and ambulance on scene
- Shocking moment pandas attack zookeeper in front of onlookers
- Columbia protester calls Jewish donor 'a f***ing Nazi'
- Helicopters collide in Malaysia in shocking scenes killing ten
- Prison Break fail! Moment prisoners escape prison and are arrested