Research on machine learning and AI, now a key technology in virtually every industry and business, is far too extensive for anyone to read. This column, Perceptron, aims to collect some of the most relevant recent discoveries and articles – particularly in the field of, but not limited to, artificial intelligence – and explain why they matter.
For the past few weeks, Google researchers have been demonstrating an AI system, PaLI, which can perform many tasks in more than 100 languages. Elsewhere, a Berlin-based group launched a project called Source+ which is designed as a way to enable artists, including visual artists, musicians, and writers, to log in and out, allowing their work to be used as training data for AI.
AI systems such as OpenAI’s GPT-3 can generate reasonably meaningful text or summarize existing text from the Internet, ebooks, and other sources of information. But historically they have been limited to a single language, limiting both their usefulness and reach.
Fortunately, research on multilingual systems has gained momentum in recent months — made possible in part by community efforts like Hugging Face’s Bloom. In an effort to leverage these advances in multilingualism, a Google team created PaLI, which is trained in both images and text to perform tasks such as image captioning, object detection, and optical character recognition.
Google claims that PaLI can understand 109 languages and the relationships between words in those languages and images, allowing it, for example, to subtitle an image of a postcard in French. While the work remains firmly in the research phase, the makers say it illustrates the important interplay between language and images – and could later lay the foundation for a commercial product.
Speech is another aspect of language in which AI is constantly improving. Play.h recently showed a new text-to-speech model that brings a remarkable amount of emotion and range to the results. The clips he posted last week sound fantastic, even if they are picked of course.
We made our own clip using the intro from this article and the results are still solid:
It is still unclear what exactly this type of speech generation will serve. We’re not quite at the stage where they’re making entire books — or rather, they can, but it might not be someone’s first choice yet. But as the quality increases, the requests multiply.
Mat Dryhurst and Holly Herndon – an academic and musician respectively – have partnered with the organization Spawning to launch Source+, a standard they hope will draw attention to the problem of photo-generating AI systems created with artwork from artists who were not informed or sought permission. Source+, which costs nothing, aims to allow artists to disallow their work for AI training purposes if they wish.
Image-generating systems like Stable Diffusion and DALL-E 2 are trained on billions of images scraped from the Internet to “learn” how to translate text prompts into art. Some of these images came from public art communities such as ArtStation and DeviantArt – not necessarily with the knowledge of artists – and imbued the systems with the ability to mimic certain creators, including artists like Greg Rutowski.
Due to the system’s ability to imitate art styles, some creators fear they could pose a threat to livelihoods. Source+ could — though voluntary — be a step to give artists more say in how their art is used, Dryhurst and Herndon say — assuming it’s widely adopted (a big if).
DeepMind has a research team to attempt to solve another long-standing problematic aspect of AI: its tendency to spread toxic and misleading information. The team focused on text and developed a chatbot called Sparrow that can answer frequently asked questions by searching the web with Google. Other advanced systems like Google’s LaMDA can do the same, but DeepMind claims that Sparrow provides plausible, non-toxic answers to questions more often than its counterparts.
The trick was to tailor the system to people’s expectations of it. DeepMind recruited people to use Sparrow and then had them provide feedback to train a model of how useful the answers were, showing participants multiple answers to the same question and asking which answer they liked the most. The researchers also defined rules for Sparrow, such as “don’t make threatening statements” and “don’t make hateful or abusive comments,” which they allowed participants to impose on the system by trying to trick it into breaking the rules.
DeepMind recognizes that Sparrow has room for improvement. But in a study, the team found that the chatbot gave a “plausible” answer 78% of the time, backed up by evidence when asking a factual question, and broke the aforementioned rules only 8% of the time. That’s better than DeepMind’s original dialogue system, the researchers note, which was about three times more likely to break the rules when tricked into doing so.
A separate team at DeepMind recently tackled an entirely different domain: video games that have historically been difficult for AI to master quickly. Their system, so brazenly called MEMEreportedly achieved human-level feats on 57 different Atari games, 200 times faster than the previous best system.
According to DeepMind’s paper describing MEME, the system can learn to play games by observing approximately 390 million frames – “frames” referring to the still images that refresh very quickly to give the impression of movement. That may sound like a lot, but the previous state-of-the-art technique required 80 billion frames over the same number of Atari games.
Skillfully playing Atari may not sound like a desirable skill. And indeed, some critics argue that games are a flawed AI benchmark due to their abstractness and relative simplicity. But research labs like DeepMind believe the approaches could be applied to other, more useful areas in the future, such as robots that learn to perform tasks more efficiently by watching videos or self-improving, self-driving cars.
Nvidia had a field day on the 20th announcing dozens of products and services, including several interesting AI efforts. Self-driving cars are one of the company’s focuses, both powering the AI and training it. For the latter, simulators are crucial and equally important that the virtual roads resemble real roads. They describe a new improved content stream that accelerates bringing data collected by cameras and sensors on real cars into the digital realm.
Things like real vehicles and irregularities in the road or tree cover can be accurately reproduced, so the self-driving AI doesn’t learn in a sanitized version of the street. And it makes it possible to create larger and more variable simulation settings in general, which improves robustness. (Another image of it is at the top.)
Nvidia also introduced its IGX system for: autonomous platforms in industrial situations — human-machine collaboration as you might encounter on a factory floor. There is of course no shortage of that, but as the complexity of tasks and operating environments increases, the old methods no longer suffice and companies looking to improve their automation are looking to future-proof.
“Proactive” and “predictive” safety is what IGX aims to assist with, that is, to resolve safety issues before they cause malfunction or injury. A bot may have its own emergency stop mechanism, but if a camera monitoring the area could tell it to swerve before a forklift gets in its way, things go a little smoother. Exactly what company or software does this (and on what hardware and how it’s all paid for) is still a work in progress, with Nvidia and startups like Veo Robotics working their way through.
Another interesting step forward was taken in Nvidia’s home of gaming. The company’s latest and greatest GPUs are built not just to push triangles and shaders, but to quickly perform AI-powered tasks like its proprietary DLSS technology for uprezzing and adding frames.
The problem they are trying to solve is that gaming engines are so demanding that generating over 120 frames per second (to keep up with the latest monitors) while maintaining visual fidelity is a formidable task that even powerful GPUs hardly can. But DLSS is a kind of intelligent frame blender that can increase the resolution of the source frame without aliasing or artifacts, so that the game doesn’t have to push so many pixels.
In DLSS 3, Nvidia claims it can generate full extra frames in a 1:1 ratio, so you could render 60 frames naturally and the other 60 through AI. I can think of several reasons that could make things weird in a high-performance gaming environment, but Nvidia is probably well aware of that. In any case, you’ll have to pay about a grand for the privilege of using the new system, as it only works on RTX 40 series cards. But if graphic fidelity is your top priority, then do it.
The last thing today is a drone-based 3D printing technique from Imperial College London that could be used for autonomous building processes sometime in the distant future. For now, it’s certainly not practical to make anything bigger than a trash can, but it’s still early days. In the end, they hope to make it more like the above, and it looks cool, but check out the video below to set your expectations straight.