Digital voice assistants spelling death of written word?

To some, the Spike Jonze movie 'Her 'was excruciating; to others it was a glimpse into the future, but imagine if the film's personal assistant Samantha had suddenly uttered to Theodore: "Sorry, I didn't catch that?" It would have killed the romance dead.

Siri and Google Now's conversational styles are nowhere near Samantha's, but their development is part of a movement that is threatening to eclipse the written word.

Our handwriting has never been worse, typing on a keyboard is beginning to feel archaic and even constantly tapping out text messages and web search terms is likely to bring on finger cramps and sore hands.

With iOS devices now allowing the sending of voice messages and predictions for self-driving cars and voice-activated doors, lights and elevators (cue The Internet of Things), it's clear that the future will be spoken, not written.

The technology behind this shift in how we interact with our surroundings is natural language processing, a technology that enables computers to understand the meaning of our words and recognize the habits of our speech.

Where will we see natural language processing first?
As well as Siri and Google Now, you may already have used it on the Xbox One and the Samsung UE65HU8500, but so far voice recognition has revolved around a very small list of phrases and words. A proper conversation this is not.

"Magic words have caused these technologies to rely on structured menu systems in which voice command simply replaces traditional inputs," says Charles Dawes, global strategic accounts director at Rovi. "These do not provide a satisfactory experience, forcing users to learn how to talk to the device and causing speech to become stilted and unnatural."

Automatic speech recognition systems on TVs have so far relied on built-in microphones that could be some way from the viewer, though most are moving to apps.

"The prevalence of smartphones and tablets offers operators the opportunity to sidestep this issue by enabling search and recommendations for the TV via the second screen," says Dawes. "The development of these devices has boomed, and the processing power offered by most on the market provides an ample base upon which to build conversation capabilities."

But there are many other places we're already seeing natural language engines used. Barclays Wealth uses it to verify an account holder, airline JetBlue is using intelligent voice advertising, and Ford is using natural language for drivers to control in-car systems such as the phone, music, temperature, navigation and traffic updates.

How does natural language processing work?
Once it's recognized what someone has said, it's then all about context, and disambiguating similar terms.

"A viewer could say 'what time is the City game on tonight?', and voice technology would have to make a decision about the context - football - and the preference of the user based on their history. Do they support Norwich City or Manchester City?" says Dawes.

"The technology must also be able to deal with sudden changes. For example, it must recognize that if the same viewer then asks 'are there any thrillers on tonight?' they are searching outside the context of sports," he adds.

How important is natural language processing?
"The most natural form of communication is talking," says Jonathan Whitmore, UK, Ireland and Middle East regional sales manager at Nuance Communications, which makes the Dragon speech-to-text software.

"It offers a common means of interfacing with multiple devices, from phones to televisions. A voice is unique to an individual so it is a secure way of identifying a person and it is easier to talk to a phone rather than trying to type messages."

However, creating a realistic, responsive website or app that can understand and give intelligent responses is more complex. "That requires research into semantics, linguistics, the context of conversations, the way people search for information and the relationship between different data, but this is where we will see the most advances," says Whitmore.

It's all about context.

"To avoid a stilted user experience, the technology must be able to understand the way that people navigate video content, jumping between genre and content type, and at the same time provide personalized recommendations," says Dawes.

"This next-generation 'conversation' technology offers consumers a way out of having to use the clunky remote control interface and, rather than having to learn how to talk to the device, allow them to speak and interact with their TVs as they would with another human being." Can natural language processing solve crime?

Seems so. Linguistics academics in the US founded Fonetic, which uses "sentiment analysis" to analyze strings of speech between bankers, alerting compliance teams to possibly fraudulent conversations on trading floors.

This is all about context; Fonetic has spent five years building-up a finance industry-specific lexicon in 79 languages. Santander loves it.

"Fraudulent behaviour is likely to be coded language or could be an act that is related to market abuse, such as insider dealing," says Simon Richards, CEO of Fonetic US. "The technology puts intelligent structure to unstructured data such as a voice call... applying indexing and categorization to things that are being talked about."

The software can detect when something makes sense, when it doesn't, and determine patterns and trends.

Fonetic analyzes all voice calls in real-time, because transcription doesn't cut it, according to Richards. "When the audio is transcribed before analysis, whether the transcription is phonetic or speech-to-text, there is typically a loss of up to 60% of the conversations."

Put simply, the written word is inaccurate, and slows things down.

Is natural language processing resurrecting dictation?
A voice-activated future may await us, but for now most of us are still kicking around on keyboards at work.

Nuance is trying to change that with its Dragon digital dictation and desktop speech recognition software, such as the fabulous Dragon Dictation 4 and the free Dragon Recorder App. The latter now lets the former transcribe voice recordings, which is handy for mobile workers, and makes voice memos suddenly worthwhile.

"A dictation device allows users to capture their thoughts naturally. Given that society today has a culture of sharing - whether via email or social media platforms - dictation and speech recognition play a role in the effective capture and near-instantaneous sharing of an event, activity or even a document," says Whitmore.

He adds, "In any industry that charges by the hour or by tasks completed, the added productivity provided by dictation and speech can make an appreciable difference and contribution to a company's balance sheet."

Will everything soon use natural language processing?
Probably not. Natural language technologies are already used for Siri and Google Now, and will be in Microsoft's Cortana, but they remain novelty add-ons. Smartphones aren't yet built around them.

"Speech must be the central construct of usability design and successfully employ a range of intelligence under the hood to understand context and intent," says Dawes, who thinks that the ever more intuitive smartphones and tablets is producing a generation that wont tolerate dumb home electronics for much longer.

http://timesofindia.indiatimes.com/followceleb.cms?alias=voice recognition,Siri,Natural Language Processing,Google Now,cortana