6.9 C
London
Thursday, September 29, 2022

What does the European approach to AI mean for GPT and DALL-E?

Must read

Six people were shot at a California school, official says

Six people were injured in a shooting at an Oakland school on Wednesday, authorities said.The victims, all adults, were treated in local hospitals, Mayor...

Astra will no longer launch NASA’s TROPICS satellites • londonbusinessblog.com

Rocket launch company Astra will no longer send the remaining NASA TROPICS payloads to space, but will instead launch other "similar" science missions for...

Ecommerce Aggregator Una Brands Gets $30 Million to Acquire More APAC Brands • londonbusinessblog.com

Una brands, an e-commerce aggregator targeting brands in the Asia-Pacific region, today announced the first close of its Series B round at $30 million....

The Antler Investor Memo: Codis Lowers Software Development Costs With Automation

Early stage investment firm Antler Australia recently supported 13 startups as part of its ongoing program to build great local tech companies. For...
Shreya Christinahttps://londonbusinessblog.com
Shreya has been with londonbusinessblog.com for 3 years, writing copy for client websites, blog posts, EDMs and other mediums to engage readers and encourage action. By collaborating with clients, our SEO manager and the wider londonbusinessblog.com team, Shreya seeks to understand an audience before creating memorable, persuasive copy.

The global AI explosion has greatly increased the need for common sense, people-centric methodology for handling data privacy and ownership. Leading the way is the European General Data Protection Regulation (GDPR), but there is more than just personally identifiable information (PII) at stake in the modern market.

What about the data we generate as content and art? It is certainly not legal to copy someone else’s work and then present it as your own. But there are AI systems that try to: scrape as much human-generated content from the web as possible to generate content that is comparable.

Can the GDPR or any other EU-focused policy protect this type of content? As it turns out, like most things in the machine learning world, it depends on the data.

Privacy vs. Property

Greetings, Humanoids

Sign up for our newsletter now for a weekly roundup of our favorite AI stories delivered to your inbox.

The primary purpose of the GDPR is to protect European citizens from harmful actions and consequences related to the misuse, abuse or exploitation of their private data. Citizens (or organisations) are of little use when it comes to protecting intellectual property (IP).

Unfortunately, to the best of our knowledge, the policies and regulations put in place to protect IP are not equipped to cover data scraping and anonymization. That makes it difficult to understand exactly where the regulations apply when it comes to searching for content on the web.

These techniques, and the data they obtain, are used to create massive databases for use in training large AI models such as OpenAI’s GPT-3 and DALL-E 2 systems.

The only way to teach an AI to imitate humans is to expose it to human-generated data. And the more data you put into an AI system, the more robust its output is.

Here’s how it works: imagine you draw a picture of a flower and post it on an online artist forum. Using scraping techniques, a tech outfit sucks up your image, along with billions of others, so it can create a massive dataset of artwork. The next time someone asks the AI ​​to generate an image of a ‘flower’, there’s a greater than zero chance that your work will be used in the AI’s interpretation of the prompt.

Whether such use would be ethical remains an open question.

Public data vs PII

While the regulatory oversight of the GDPR can be described as far-reaching when it comes to protecting private information and providing the right to delete, it seemingly does very little to protect the content from scraping. However, that does not mean that the GDPR and other EU regulations are completely infallible in this regard.

Individuals and organizations have to follow very specific rules for deleting PII or else they will be in violation of the law – something that can get quite costly.

For example, it becomes nearly impossible for Clearview AI, a company that builds facial recognition databases for government use by scrape social media data, to do business in Europe. EU watchdogs from at least seven countries have already issued hefty fines or recommended fines for the company’s refusal to comply with GDPR and similar regulations.

At the other end of the spectrum, companies like Google, OpenAI and Meta use similar data scraping practices directly or through the purchase or use of scraped datasets for many of their AI models without any consequences. And while major tech companies in Europe have received a large share of the fines, very few of the violations involve data scraping.

Why not ban deletion?

At first glance, scraping may seem like a practice with too much potential for abuse not to ban outright. However, for many organizations that rely on scraping, the data that is obtained is not necessarily “content” or “PII”, but information that can serve the public.

We have contacted the UK data privacy agency, the Office of the Information Commissioner (ICO), to find out how they regulated internet-scale scraping techniques and datasets, and to understand why it was so important not to over-regulate.

A spokesperson for the ICO told TNW:

Using publicly available information can bring many benefits, from research to developing new products, services and innovations, including in the field of AI. However, if this information is personal data, it is important to understand that data protection laws apply. This is whether the techniques used to collect the data include scraping or something else.

In other words, it’s more about the type of data used than how it’s collected.

Whether you’re copying images from Facebook profiles or using machine learning to scrape the web for tagged images, you’re likely violating GDPR and other European privacy rules if you build a facial recognition engine without the consent of the people whose faces are in its database.

But it’s generally acceptable to scour the Internet for massive amounts of data, as long as you either… anonymize it or make sure there is no PII in the dataset.

Further gray areas

But even within the allowed use cases, there are still some gray areas associated with private information.

For example, GPT-2 and GPT-3 are: known to occasionally perform PII in the form of addresses, phone numbers, and other information apparently baked into its corpus via large-scale training datasets.

Here, where it is clear that the company behind GPT-2 and GPT-3 is taking steps to mitigate this, the GDPR and similar regulations are doing their job.

Simply put, we can choose not to train large AI models or give the companies training them the ability to investigate edge cases and address the concerns.

What might be needed is a GDUR, a General Data Use Regulation, something that could provide clear guidance on how human-generated content can be used legally in large data sets.

At the very least, it seems worth having a conversation about whether European citizens should have as much right to have the content they create removed from datasets as their selfies and profile photos.

For now it seems that in the UK and in the rest of Europe the right to erasure only extends to our PII. Everything we put online probably ends up in some AI’s training dataset.

More articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest article

Six people were shot at a California school, official says

Six people were injured in a shooting at an Oakland school on Wednesday, authorities said.The victims, all adults, were treated in local hospitals, Mayor...

Astra will no longer launch NASA’s TROPICS satellites • londonbusinessblog.com

Rocket launch company Astra will no longer send the remaining NASA TROPICS payloads to space, but will instead launch other "similar" science missions for...

Ecommerce Aggregator Una Brands Gets $30 Million to Acquire More APAC Brands • londonbusinessblog.com

Una brands, an e-commerce aggregator targeting brands in the Asia-Pacific region, today announced the first close of its Series B round at $30 million....

The Antler Investor Memo: Codis Lowers Software Development Costs With Automation

Early stage investment firm Antler Australia recently supported 13 startups as part of its ongoing program to build great local tech companies. For...

Amazon will air Friday’s Yankees game on cable, alongside Prime Video

This Friday's Yankees game against the Orioles will no longer be available exclusively on Amazon Prime Video, like 20 games before it: The game...