M+E Daily

Dubformer: Reviewing Innovations in Media Localisation

Across sectors, artificial intelligence (AI) is transforming our daily lives. The new applications have been raising a storm, with reactions ranging from giddy excitement to anxiety and fear about the future. We can already say that for the media and entertainment industries, AI adoption has been increasing optimisation and speeding up innovation in global localisation.

The path this technology is taking clearly mirrors the advance and implementation of machine translation (MT) and natural language processing (NLP).

Today, thanks to AI advances, the media workflows around dubbing, image, and text generation are shifting again.

At Dubformer, we have more than a decade of experience in development for NLP and large-scale machine learning. Through reflecting on how these innovations came to widespread industry adoption in recent years, we can predict how the boom of GenAI will affect the media landscape, especially applying to localisation in the globalised world.

Read on to discover our insights on the course of AI evolution, similarities with the solutions that Dubformer’s experts led and predictions for the future of AI in media localisation.

How MT disrupted the translation industry

The history of MT dates to the 1950s and even further; however, the quality wasn’t always on par with the work of translators and agencies. The first automated translation using a computer was completed in New York City in 1954: it was IBM 701 translating 60 sentences from Russian into English. In 2015, the first software based on neural networks was launched, but the quality was still not up to snuff – mostly word-for-word or phrase-based. This hindered mass adoption of MT in business.

With new inventions integrated into MT, a surge in quality became possible. This change occurred in 2017 with Transformer, introduced across fields and applied to MT tasks.

At this point, translation ceased to be word-for-word and began to integrate context, resulting in better outcomes for large texts in translation.

The technology didn’t change the industry overnight, but over five years it was completely transformed. Gradually, automation seeped into the work of translators and translation agencies.

Everything progressed in line with Rogers’ diffusion of innovations theory.

Within seven years, it was hard to find businesses or translators who hadn’t turned to MT in their processes.

Over time, while the translation profession wasn’t eliminated, a new role emerged – that of MT editors.

The requirements for translators changed, with a growing, wide-ranging demand: their trade is not as much translation but precision and quality (e.g., for translation of documents), knowledge of cultural nuance and writing goal-oriented texts (immersing the audience, selling a product, etc.).

Notably, the demand for human translators has remained stable in the last decade, according to Statista’s 2022 U.S. data.

Despite pessimism and low expectations, the market for translation agencies has actually expanded and is expected to grow, thanks to their adaptation and transformation.

Naturally, straightforward and rote translation tasks (e.g., product descriptions for e-commerce) are now handled by algorithms and therefore don’t reach the agencies.

However, the LSPs have been able to initiate change, increase efficiency, and remain competitive to continue meeting the ever-growing demand for global content.

The data on the global LSP industry demonstrates how the supply doubled in the last 15 years, to try to catch up with the demand: from 23.5 billion in 2009 to 56.4 billion projection in 2022.

Another area that was overtaken by AI is script and caption generation. Speech recognition technology has come a long way. Initially used as an assistant for humans, significantly increasing efficiency and reducing the cost of creating subtitles, it now achieves higher quality than human transcription.

Since it isn’t a creative task, the automation here is almost complete. However, people haven’t disappeared from the process, and subtitlers are still very much in demand.

They guarantee the missing 1-5% in quality; it’s less costly to hire human editors than to adjust the algorithms case-by-case.

AI dubbing: Predicting the future

Everybody is familiar with the case of GenAI by now. Developed since at least 2014, it made a leap in quality by late 2022 and early 2023, when the wider public took note.

Improvements in capability are especially striking: where early voice assistants lacked emotion, GenAI generates natural-sounding synthesised speech.

The leap was made possible by a few factors, namely, new approaches to neural network training, big data collection, and server power (GPUs) improving performance at a dramatically lower price.

All these advances affected the video, image, text, and sound generation quality, with examples from DALL-E to SORA constantly making the news.

Using Dubformer’s insights into technology adoption, as well as our expertise in specialised AI voice-overs, we can easily anticipate future advances for GenAI.

1. Surge in dubbing and voice-generation quality

With the introduction of GenAI, voice-overs became more lively, natural, and therefore not robotic.

Previously, to create a synthetic voice, an actor needed to work out of a recording studio, which took about a month and cost several thousand dollars. Even the best-case outcome was a voice with a couple of emotions that the actor had provided – the technology was unable to adjust in tone or expected expression.

Now voices can be synthesised from scratch, and the spectrum of available emotions is ever-growing.

2. Business use cases

The earliest applications included translation with voice-over for straightforward content, where the information is crucial, while creativity and emotionality are not so important. They are applied for fully automated translation and with some human verification of the final result. For example, Dubformer has been creating voice-over projects for media companies, mainly translating factual content for platforms with low requirements, such as YouTube.

Today, Dubformer also dubs more sophisticated emotional content by involving human proof-listeners and acting talent in the workflow.

This AI-human collaboration is the perfect fit for content like movies and animation that requires more nuance in tone and cultural adaptation.

3. Ethical AI: Human experts required

AI dubbing will have a great impact, but not by way of replacing humans.

There is a high demand for dubbed content, both among audiences and industry production companies that recognise the opportunity. The sheer volume of content is impossible to meet with traditional methods. This content surplus is what AI solutions will help address. And so, AI will expand the dubbing market instead of taking it over.

Human creativity will remain essential for high-quality emotional dubbing, while AI takes care of technical tasks, just like it happened in the case of MT for media.

Plus, with AI, there is a blueprint for process optimisation, even for those creative tasks.

We at Dubformer have developed a dubbing approach that blends the work of humans and AI. In this new landscape, dubbing professionals should get ready to update and market their skills.

Look out for Dubformer’s unique course in the summer of 2024, which will allow multilingual actors, sound engineers, translators, and LPMs to gain new in-demand skills for this field.

4. Advantages for early GenAI adopters

We haven’t seen radical changes in the business models of studios, translation vendors, or media companies in response to GenAI. Not yet. Those who start implementation first will benefit the most. Transformative market opportunities and change in technological paradigms are rare, only occurring every 30 to 50 years in the last 3 centuries.

Hefty initial investments are required, and the first innovators are already experimenting with new technologies and testing applications.

At the moment, there are no ready-made standards or regulations, which makes this path challenging. But history shows that early technology adopters are at an advantage and progress is unstoppable.

The future of media localisation isn’t a mystery. We have experienced enough tech revolutions to be able to predict the course of innovation. The winning solution is a blend of algorithms that fulfil routine tasks, leaving creative and complex challenges to skilled humans.

The time for AI adoption in media is now.

Elena Chernysheva is the COO of Dubformer. A product development professional with more than 15 years of experience, she has led the development and launch of AI solutions for text, photo (including augmented reality files and translation), and video translation. Chernysheva has also pioneered crowdsourcing solutions, cutting content creation costs threefold and generating tens of thousands of jobs.