Shortly after reaching home from a long day’s work, Ma Bin, 30, a software engineer in Beijing, walks into his flat’s living room and, in Alibaba’s “Open Sesame” style in One Thousand and One Nights, says to a remote controller on the sofa, “Let’s Talk”.
Presto, his internet-connected TV “hears” his command, switches itself on automatically in utter obedience and begins beaming Ma’s favorite TV show magically.
After relaxing for a while watching the show, Ma heads out for a bite, settles in the driver’s seat in his car and slips into the Alibaba mode again: “Lower the temperature to 19 degrees.” The car’s air conditioner obeys the Master instantly.
While still driving, hands very much on the wheel, he issues another oral command: “Recommend the nearest coffee shop.” It’s now the turn of the car’s navigation system to obey the Master.
Welcome to the your-word-is-my-command age where devices use audio as input or medium for rendering services powered by AI (artificial intelligence).
Ask and you shall receive as it were. The consumer, it appears, never had a more powerful voice.
Ma accomplishes many everyday tasks by uttering instructions to voice-based digital assistants－he has quite a few of them－that control his devices and appliances.
For instance, his iPhone-based assistant Siri helps him book calendars and livestream music.
Rapid advances in speech recognition and language understanding technologies are making human voice the next major medium to communicate with computers, which are at the heart of almost all devices and appliances these days.
Devices are getting better at processing voice commands from across different rooms and against background noise.
No need to type on keyboards; no need to tap, swipe or draw on touch-screens; no need to press buttons, levers and such things. Do an Alibaba: just say, “Open Sesame”.
Early converts like Ma are embracing the era of voice computing with gusto. “If I can control the surroundings simply by uttering a few words, why should I bother to touch screens or buttons?”
In China, conversation-savvy electronics are on the rise as local tech heavyweights vie for early lead in the next frontier of growth and innovation. The scene is not much different from the US where Apple Inc, Microsoft Corp, Google, Facebook Inc and Amazon.com Inc are all battling for slices of the AI pie.
The trend will be further stoked by China’s plan to build a 1 trillion yuan ($147.9 billion) artificial intelligence industry by 2030. The plan was unveiled by the State Council on Thursday. Voice computing is an important part of that ambitious goal, which the private sector is determined to reach.
For instance, on July 5, e-commerce behemoth Alibaba Group Holding Ltd unveiled its Tmall Genie X1, its voice-driven digital speaker, which is modeled on Amazon.com Inc’s Echo and Google’s Home.
The same day, Baidu Inc, the Chinese internet search leader, showcased its Mandarin-speaking DuerOS personal assistant.
Such voice-based speakers can stream music, newscasts, so on, and can be improved to perform other tasks.
Toward that end, Baidu announced a new deal to acquire a startup specializing in the development of voice recognition technology.
Not to be left behind, Tencent Holdings Ltd, China’s social networking and gaming titan, is developing its own voice-based speaker for launch within months.
Huawei Technologies Co Ltd, the world’s third-largest smartphone manufacturer, jumped onto the voice-based technology bandwagon, hiring more than 100 researchers to work on developing a Siri-like assistant.
According to a Bloomberg report, more than 60 companies in China are working with US-based Conexant Systems Inc, an audio technology player, to introduce voice-activated intelligent devices.
“Voice interaction, though still nascent, will be of utmost importance in future. In the internet-of-things era, most internet-connected devices won’t have screens. Voice control will be the most convenient way to interact with them,” said Liu Xingliang, president of the Data Center of China Internet, a Beijing-based market research company.
Recent facts and figures appear to back Liu’s vision. In China, the speech recognition market expanded by about 40 percent to 4.03 billion yuan ($635 million) in 2015, faster than the $6.12 billion global market which grew at 34 percent, according to a report by the Speech Industry Alliance of China.
The China market is expected to grow almost 70 percent year-on-year to 10.07 billion yuan in sales this year. Some 2 million smart speakers will likely be shipped in China this year, a fraction of the 14 million in the US; and 22 million will be sold in China in 2022, according to Counterpoint Research estimates.
With potential applications of the technology growing by the day on the back of constant improvements, Grand View Research projects the global market will reach $128 billion in 2024.
That kind of optimism stems from the high level of accuracy of the technology. For instance, in 2015, Andrew Ng, former chief scientist at Baidu, said the technology was about 95 percent accurate. Stated differently, devices were able to hear and act on about 19 out of 20 words correctly.
That is, there were not too many serious risks to consumers seen arising from devices mishearing words and acting in ways contrary to commands.
And now, the accuracy rate is said to be higher－97 to 98 percent. Baidu and iFlytek Co Ltd are leading the voice technology pack.
To be sure, technological hurdles exist. James Yan, research director at Counterpoint, said, “More efforts are needed so that third-party services can be swiftly activated through voice control.”
Improvements are coming at a faster rate than expected as big data is crunched, analyzed and made to yield insights, which, in turn, are opening up voice recognition platforms to third-party services, according to Analysys, a Beijing-based market research company.
With market potential increasing, Chinese companies are scrambling to unveil always-on listening devices that are eager to communicate or interact with their “masters”.
For instance, e-commerce giant Alibaba is emulating Amazon in envisioning a central role for voice-driven smart speakers that consumers can use to control almost everything at home.
Its Tmall Genie X1 speaker can simplify online shopping by executing purchases based on voice commands.
Similarly, JD.com Inc, another leading online marketplace, has unveiled several versions of smart speakers by using iFlytek’s voice recognition technology.
JD said it sold around 10,700 speakers during last year’s Nov 11 online shopping festival and the following two weeks.
“Many domestic players are inspired by (Amazon) Echo’s phenomenal success in the United States,” said Zhang Yin, an analyst at Orient Securities.
In the fourth quarter of 2016, the Echo accounted for about 88 percent of shipments of 4.2 million intelligent home speakers in the US. In that quarter, US shipments were up nearly 600 percent year-on-year, according to Strategy Analytics.