Issue 12 2022

The name may be misleading since they deal with much more than speech data alone, but SpeechOcean works with the world’s leading technology and commercial enterprises, as well as academic institutions, to power their AI applications. From scheme design, to data collection, labelling, and evaluation, chances are you’ve interacted with an AI model somewhere in the world that’s been trained using SpeechOcean’s data. Founder and Chairwoman, He Lin, is one of the world’s leading researchers in machine learning and related fields. Graduating in computer science and technology from Peking University, she worked in speech recognition, synthesis, language understanding and testing at the Chinese Academy of Sciences’ Institute of Acoustics before founding SpeechOcean with the vision of becoming the world’s leading AI data provider, an achievement today recognised many times over, including with this award. There is still a great deal of misunderstanding surrounding AI though. Concerns range from privacy and security, to whether the dystopian futures depicted in so many science fiction movies at the hands of a disgruntled AI might somehow be just around the corner. True sentience in AI is still a long way off however. It’s about as close as we are as a species to realising time travel, so we can all sleep soundly knowing we’re not about to be invaded by liquid metal robots from our own future any time soon – unless of course there’s a time paradox in that statement. The reality is far more mundane than most realise. The fact is, the data used to train AI and machine learning systems is usually very broad, generic, and contains no personally identifying information. AI models need to understand a wide range of inputs – from different languages, accents, and dialects, to environments and other visual data, including distinguishing unwanted or unintended inputs. It’s a complex academic and technical challenge, but one SpeechOcean has been at the forefront of for nearly two decades. The data it produces covers a wide range of situations, scenarios, participants and recording devices, so that when presented with fresh enquiries these models are able to respond ‘intelligently’. And because people and social behaviours are constantly changing, data needs are also constantly changing, to keep up with these and emerging technological trends. Smartphones were almost non-existent when SpeechOcean started. Now most of us can’t imagine life without them, and many of the AI applications the company helps train use Most Innovative AI Data Resource Provider 2022 AI can only be as effective as the data used to train it, and AI models need a lot of training – so they need a lot of data. Founded in Beijing in 2005, SpeechOcean is today one of the world’s leading providers of visual, audio, and text training data for artificial intelligence and machine learning. Recognised as Most Innovative AI Data Resource Provider 2022 in this issue of Corporate Vision magazine, we learn more about the company. these devices as their primary mode of user input, so they have become integral to its process and systems. All data SpeechOcean collects also goes through careful labelling (annotation) so that target algorithms are able to recognise and understand the patterns within, often including subtle contextual and emotional cues. It’s a case of the more data a model can be trained with, the ‘smarter’ it’s able to perform. Much of this annotation is now automated (some element of human supervision is still needed) helping the company dramatically improve project delivery times, accuracy, and cost. Looking to the future, autonomous driving is a key area of development for the company as more automakers worldwide embrace this technology. Again, chances are that SpeechOcean’s data has been used in training the autonomous vehicles you may have ridden in – or could one day soon. Helping small and medium enterprises leverage the potential of AI is another key area for the company; wanting to make sure AI is not only the domain of big tech. It has more than 1,000 complete datasets ready-to-use for this purpose, and even algorithms needed to help SMEs get started. Like every successful enterprise though, SpeechOcean’s business is based on trust, and it takes great pride in ensuring the quality and accuracy of the data it provides. Evidence of this comes in the number of clients who keep returning to the company as their machine learning needs grow and they look to keep their AI models up to date. Also like every other successful business, people are the lifeblood of SpeechOcean. The company is blessed with a smart, dedicated team who live to innovate – solving problems that help drive its clients’ businesses forward, and this is a key part of what differentiates them. While the pandemic certainly presented challenges, it also helped strengthen awareness and uptake of AI and machine learning technologies as face-to-face interactions diminished everywhere. 2023 promises to be a big year for the company because of this, and as it welcomes new senior leadership and expands its presence to serve more markets globally. Having recently listed on the Shanghai Stock Exchange, investments are being made in acquisitions, talent, systems, and new technologies which have the company poised to break new ground in how it is able to benefit its clients, and continue to be their global data partner. Company: SpeechOcean Contact: Charles Edwards Email: [email protected] Website: