Khazen

By Douglas Heaven — MIT review — This time last year we did something reckless. In an industry where nothing stands still, we had a go at predicting the future. How did we do? Our four big bets for 2023 were that the next big thing in chatbots would be multimodal (check: the most powerful large language models out there, OpenAI’s GPT-4 and Google DeepMind’s Gemini, work with text, images and audio); that policymakers would draw up tough new regulations (check: Biden’s executive order came out in October and the European Union’s AI Act was finally agreed in December); Big Tech would feel pressure from open-source startups (half right: the open-source boom continues, but AI companies like OpenAI and Google DeepMind still stole the limelight); and that AI would change big pharma for good (too soon to tell: the AI revolution in drug discovery is still in full swing, but the first drugs developed using AI are still some years from market).

Now we’re doing it again.

We decided to ignore the obvious. We know that large language models will continue to dominate. Regulators will grow bolder. AI’s problems—from bias to copyright to doomerism—will shape the agenda for researchers, regulators, and the public, not just in 2024 but for years to come. (Read more about our six big questions for generative AI here.) Instead, we’ve picked a few more specific trends. Here’s what to watch out for in 2024. (Come back next year and check how we did.)

Customized chatbots

You get a chatbot! And you get a chatbot! In 2024, tech companies that invested heavily in generative AI will be under pressure to prove that they can make money off their products. To do this, AI giants Google and OpenAI are betting big on going small: both are developing user-friendly platforms that allow people to customize powerful language models and make their own mini chatbots that cater to their specific needs—no coding skills required. Both have launched web-based tools that allow anyone to become a generative-AI app developer. In 2024, generative AI might actually become useful for the regular, non-tech person, and we are going to see more people tinkering with a million little AI models. State-of-the-art AI models, such as GPT-4 and Gemini, are multimodal, meaning they can process not only text but images and even videos. This new capability could unlock a whole bunch of new apps. For example, a real estate agent can upload text from previous listings, fine-tune a powerful model to generate similar text with just a click of a button, upload videos and photos of new listings, and simply ask the customized AI to generate a description of the property. But of course, the success of this plan hinges on whether these models work reliably. Language models often make stuff up, and generative models are riddled with biases. They are also easy to hack, especially if they are allowed to browse the web. Tech companies have not solved any of these problems. When the novelty wears off, they’ll have to offer their customers ways to deal with these problems.

Generative AI’s second wave will be video

It’s amazing how fast the fantastic becomes familiar. The first generative models to produce photorealistic images exploded into the mainstream in 2022—and soon became commonplace. Tools like OpenAI’s DALL-E, Stability AI’s Stable Diffusion, and Adobe’s Firefly flooded the internet with jaw-dropping images of everything from the pope in Balenciaga to prize-winning art. But it’s not all good fun: for every pug waving pompoms, there’s another piece of knock-off fantasy art or sexist sexual stereotyping. The new frontier is text-to-video. Expect it to take everything that was good, bad, or ugly about text-to-image and supersize it. A year ago we got the first glimpse of what generative models could do when they were trained to stitch together multiple still images into clips a few seconds long. The results were distorted and jerky. But the tech has rapidly improved.

Runway, a startup that makes generative video models (and the company that co-created Stable Diffusion), is dropping new versions of its tools every few months. Its latest model, called Gen-2, still generates video just a few seconds long, but the quality is striking. The best clips aren’t far off what Pixar might put out. Runway has set up an annual AI film festival that showcases experimental movies made with a range of AI tools. This year’s festival has a $60,000 prize pot, and the 10 best films will be screened in New York and Los Angeles. It’s no surprise that top studios are taking notice. Movie giants, including Paramount and Disney, are now exploring the use of generative AI throughout their production pipeline. The tech is being used to lip-sync actors’ performances to multiple foreign-language overdubs. And it is reinventing what’s possible with special effects. In 2023, Indiana Jones and the Dial of Destiny starred a de-aged deepfake Harrison Ford. This is just the start. Away from the big screen, deepfake tech for marketing or training purposes is taking off too. For example, UK-based Synthesia makes tools that can turn a one-off performance by an actor into an endless stream of deepfake avatars, reciting whatever script you give them at the push of a button. According to the company, its tech is now used by 44% of Fortune 100 companies. The ability to do so much with so little raises serious questions for actors. Concerns about studios’ use and misuse of AI were at the heart of the SAG-AFTRA strikes last year. But the true impact of the tech is only just becoming apparent. “The craft of filmmaking is fundamentally changing,” says Souki Mehdaoui, an independent filmmaker and cofounder of Bell & Whistle, a consultancy specializing in creative technologies.

AI-generated election disinformation will be everywhere

If recent elections are anything to go by, AI-generated election disinformation and deepfakes are going to be a huge problem as a record number of people march to the polls in 2024. We’re already seeing politicians weaponizing these tools. In Argentina, two presidential candidates created AI-generated images and videos of their opponents to attack them. In Slovakia, deepfakes of a liberal pro-European party leader threatening to raise the price of beer and making jokes about child pornography spread like wildfire during the country’s elections. And in the US, Donald Trump has cheered on a group that uses AI to generate memes with racist and sexist tropes.

While it’s hard to say how much these examples have influenced the outcomes of elections, their proliferation is a worrying trend. It will become harder than ever to recognize what is real online. In an already inflamed and polarized political climate, this could have severe consequences. Just a few years ago creating a deepfake would have required advanced technical skills, but generative AI has made it stupidly easy and accessible, and the outputs are looking increasingly realistic. Even reputable sources might be fooled by AI-generated content. For example, users-submitted AI-generated images purporting to depict the Israel-Gaza crisis have flooded stock image marketplaces like Adobe’s. The coming year will be pivotal for those fighting against the proliferation of such content. Techniques to track and mitigate it content are still in early days of development. Watermarks, such as Google DeepMind’s SynthID, are still mostly voluntary and not completely foolproof. And social media platforms are notoriously slow in taking down misinformation. Get ready for a massive real-time experiment in busting AI-generated fake news.

Robots that multitask

Inspired by some of the core techniques behind generative AI’s current boom, roboticists are starting to build more general-purpose robots that can do a wider range of tasks. The last few years in AI have seen a shift away from using multiple small models, each trained to do different tasks—identifying images, drawing them, captioning them—toward single, monolithic models trained to do all these things and more. By showing OpenAI’s GPT-3 a few additional examples (known as fine-tuning), researchers can train it to solve coding problems, write movie scripts, pass high school biology exams, and so on. Multimodal models, like GPT-4 and Google DeepMind’s Gemini, can solve visual tasks as well as linguistic ones. The same approach can work for robots, so it wouldn’t be necessary to train one to flip pancakes and another to open doors: a one-size-fits-all model could give robots the ability to multitask. Several examples of work in this area emerged in 2023.

In June, DeepMind released Robocat (an update on last year’s Gato), which generates its own data from trial and error to learn how to control many different robot arms (instead of one specific arm, which is more typical). In October, the company put out yet another general-purpose model for robots, called RT-X, and a big new general-purpose training data set, in collaboration with 33 university labs. Other top research teams, such as RAIL (Robotic Artificial Intelligence and Learning) at the University of California, Berkeley, are looking at similar tech. The problem is a lack of data. Generative AI draws on an internet-size data set of text and images. In comparison, robots have very few good sources of data to help them learn how to do many of the industrial or domestic tasks we want them to. Lerrel Pinto at New York University leads one team addressing that. He and his colleagues are developing techniques that let robots learn by trial and error, coming up with their own training data as they go. In an even more low-key project, Pinto has recruited volunteers to collect video data from around their homes using an iPhone camera mounted to a trash picker. Big companies have also started to release large data sets for training robots in the last couple of years, such as Meta’s Ego4D.

This approach is already showing promise in driverless cars. Startups such as Wayve, Waabo, and Ghost are pioneering a new wave of self-driving AI that uses a single large model to control a vehicle rather than multiple smaller models to control specific driving tasks. This has let small companies catch up with giants like Cruise and Waymo. Wayve is now testing its driverless cars on the narrow, busy streets of London. Robots everywhere are set to get a similar boost. —Will Douglas Heaven