Musk is tough on OpenAI, and users suffer misfortune

Source: Alphabet List, Author: Bi Andi, Editor: Wang Jing

Image source: Generated by Unbounded AI‌

I have only heard of social media trying to make users stay longer, but I have never heard of taking the initiative to put a cap on people. Open your eyes now, Elon Musk is adding "minor protection" to all Twitter users, and all this is actually forced by AI?

Nowadays, the maximum number of tweets that Twitter users can browse every day does not depend on hand speed or unwillingness to stay up late, but there is a clear number: 10,000 verified (that is, paid "Bluebird" service) accounts, 10,000 unverified There are 1000 accounts, but only 500 for newly registered unverified accounts.

This is Musk’s twice-raised standard in the face of angry users. As for the reason, it is "to address extreme levels of data scraping and system manipulation."

He was referring to AI companies, which require large amounts of data to feed on in order to train their models. In December last year, Musk cut off the data connection with OpenAI, and in April this year accused Microsoft of illegally using Twitter's data.

OpenAI is facing a class-action lawsuit as Musk takes aggressive steps to stop data scraping. There are 16 plaintiffs in the lawsuit, all individuals, in other words, ordinary Internet surfers. They accuse OpenAI of secretly "scraping 300 billion words from the Internet" and stealing "vast amounts of private information" from that user of the Internet without permission to train ChatGPT.

On one side are Internet users and platforms that have accumulated a large amount of UGC content over the years, and on the other side are emerging AIGC companies. A war has begun around data capture and privacy security.

01

Beat the gongs and drums Fri. Fri. It was finally the weekend, but Twitter users were dumbfounded. An error message was displayed on the screen, reminding them that they had exceeded the "rate limit", violated Twitter's rules, and viewed too many tweets.

People didn't know what this meant at all. Twitter boss Musk stepped forward and said that there is indeed a rate limit, and announced: In order to solve extreme levels of data scraping and system manipulation, verified, unverified, and new registrations are not allowed. Verified accounts are limited to 6000, 600 and 300 tweets per day.

Prior to this, Musk had just announced that Twitter would prohibit unlogged users from browsing content, which was acceptable to users. The restriction was really hammered, and the users were paralyzed, and then looked at the difference between verification and non-verification, and raised their eyebrows: Is it possible that you are trying to use this trick to promote "Blue Bird Subscription"? In the comment area, more than one user commented: "Now we have to use money to win?"

The voice of dissatisfaction is loud, Twitter's competing products Hive, Mastodon, Tumblr, etc. appear in hot topics, and a meme of Twitter's tombstone is widely used. During the controversy, Musk raised the standard twice to 10,000 views by verified users and 1,000 views by unverified users.

One of Musk's fake accounts joked: "I set the limit because you twitter addicts need to get out. I'm doing good things for the world." This kind of idea of increasing value is good. Musk’s backhand is a repost, and he also sent a separate message "Go and visit your friends and family."

But the joke is a joke, Musk gave a clear explanation for his "test": dealing with data crawling. The dissatisfaction of users also lies in whether the current limiting method is effective, not in the issue of data crawling.

How serious is it that AI startups come to Twitter to "scrape data"? In a tweet, Musk said the surge in traffic forced Twitter to turn on backup servers:** "It's too much to turn on a large number of online servers in an emergency just to help some AI startups with ridiculously high valuations." It's annoying."**

The day before the current limit storm, Tim Sweeney, CEO of Epic Games, also tweeted to complain that Twitter was also building a wall. Musk replied: "Hundreds (or even more) blocks are extremely aggressively crawling tweets. Special data, to the point where it affects the user experience. How should we do it? I’m open to all ideas.”

Tim, who was still complaining just now, quickly gave serious suggestions, such as adding prohibition of data crawling to Twitter's terms of service, protecting the platform with information security engineering, and taking legal action against companies that abuse Twitter on a large scale action.

Notably, Musk mentioned in his reply that legal action would "absolutely" be taken against those who stole the data: "(optimistically) 2 to 3 years from now, looking forward to seeing them in court."

Regardless of whether the conjecture of "adding firewood to paid subscriptions" is a villain's heart, Musk's holding high the banner of user privacy may be more or less selfish. In April, Musk was rumored to establish X.AI, a new artificial intelligence company, to fight against ChatGPT. If you really want to train a large language model, Twitter's user data is of course only for yourself.

In any case, it is possible to actively limit the flow of the platform. Musk is ready to fight the AI startups to the end.

02

**Just as Musk was attacking to limit the flow of the entire platform, OpenAI, the "initiator" of this AICG craze and the creator of ChatGPT, was involved in a class action lawsuit. **

The lawsuit was initiated in the U.S. District Court for the Northern District of California, with 16 plaintiffs, all anonymous, all individuals. The complaint is very long, with a full 157 pages, starting with a sentence from Stephen Hawking: "The rise of powerful artificial intelligence is either the best thing in human history or the worst thing." In addition to OpenAI, the defendant also has Microsoft, which has injected tens of billions of dollars into it.

The core allegation is that ChatGPT violated "the copyrights and privacy of countless people" when it used data collected from the Internet to "train its technology."

OpenAI secretly scraped 300 billion words from the internet and eavesdropped on "books, articles, websites and posts, including personal information obtained without consent," in violation of privacy laws, the indictment said. It mentioned that OpenAI crawls a large amount of network data, including data in social media.

They also point out that OpenAI has a proprietary AI corpus that has amassed vast amounts of personal data, including data taken from Reddit posts and the websites they link to.

This is an accusation in terms of training models. In addition, the plaintiff also claimed that the user's interaction with OpenAI's products and private information in the products were also illegally accessed and misappropriated by OpenAI on a large scale.

This is not the first time OpenAI has faced a class action lawsuit in the United States. In November last year, Github programmers launched a class action lawsuit against Github, OpenAI, and Microsoft, accusing OpenAI of allegedly violating open source licenses and using their contributed code to train the proprietary AI tool GitHub Copilot.

At that time, ChatGPT was not yet online. Looking back now, the problem of AI training has been exposed at that time. Today, the latest class action is aimed at ChatGPT, which has a wider range of users and a wider range of people who have been violated (basically all victims). More importantly, under the frenzy of AIGC, any legal precedent may affect the future .

In a statement, Clarkson, the public interest law firm representing the case, called the class action a "landmark" federal case and a warning to artificial intelligence as a whole.

From this perspective, the burden on OpenAI's shoulders is indeed heavy.

**OpenAI has already encountered a lot of troubles due to data capture and privacy security. The platform is locked and users turn against each other. These are just the tip of the iceberg. **

In Europe, OpenAI has been investigated by many countries. Even in April this year, Italy worried that ChatGPT would violate European data protection laws and temporarily banned ChatGPT.

Regulation of the entire field of artificial intelligence is advancing. France launched an artificial intelligence action plan in May. Among them, in terms of AIGC, the French privacy regulator pays special attention to the practice of some AI models collecting data from the Internet and building data sets to train large language models.

The most important is the European Union Artificial Intelligence Regulation Act (EU AI Act), which is currently in its final stage. The bill will likely become a model for global AI governance.

03

**Platforms, users, and supervision, the three forces have formed an encirclement trend, vowing to establish rules for AIGC as soon as possible, and start from the starting point of large-scale model training. **

On the one hand, time is running out and AIGC is developing too fast.

We don't know who Musk is referring to by "absurdly high valuation AI startups". But as soon as these words came out, there were indeed many hits. After all, there are waves of financing in the AIGC field, all of which are hot money.

Among startups, OpenAI is valued at nearly US$30 billion, with a total financing scale of US$11.3 billion, making it the richest in AIGC; followed by Anthropic, the second richest, with a valuation of more than US$4 billion. And Inflection, which shocked Silicon Valley with US$1.3 billion in financing just a few days ago, has a valuation of US$4 billion, and it has only been established for more than a year.

The big ones may be yet to come. Inflection uses its own big language model. This time it got 1.3 billion US dollars and announced that it will build 22,000 Nvidia H100 chips to build the world's largest artificial intelligence cluster. With such a large-scale computing power, the amount of target parameters and data sets are bound to be amazing.

** On the other hand, ChatGPT was born out of nowhere, and it is not so easy to "fix" when it exposes problems. **OpenAI's several generations of large language models, the GPT-2 dataset has 40GB of text, and the GPT-3 (that is, the model used when ChatGPT was released) has 570GB of training data. As for the GPT-4 released this year, the size of the dataset is basically Did not disclose.

Massive amounts of data were not properly documented from the start. Nicia Sambaswan, a former research scientist at Google, has said in interviews that tech companies don’t keep track of how they collect or annotate AI training data, or even what’s in the dataset.

The finished ChatGPT is like a black box, and it is a black box built in a secret room. Now it needs to be transparent and privacy-protected, such as listing what data is crawled, explaining how it will be used during use, and deleting it at the request of users. A certain piece of data is actually very difficult.

Internet surfers and regulators bite OpenAI, there is another reason that cannot be ignored-in the years when social media developed and grew, the awareness of personal network data protection was still in its infancy, and when it was time to contend, it was found that it had missed too far .

When Zuckerberg first sat on the congressional hearing in 2018, his social media platform Facebook had been launched for 14 years. At that time, Facebook was caught in the "Cambridge Scandal", and the company's chief technology officer said that 87 million users were affected. It was also a big mistake caused by data scraping.

When Altman sat in the U.S. Congressional hearing in May this year, congressmen frequently expressed regret for their lack of action in the age of social media. **

The large models one after another are still being trained, and data capture is a thread. Only by grasping it can we hope to clear up AIGC's confusion.

Reference materials:

  1. Sina Technology: "Musk and Microsoft on the bar? Twitter accuses Microsoft of illegally using its data »

  2. Dark horse programmer: "These programmers have sued GitHub!" Request for compensation of 64.9 billion"

  3. Jiemian News: "EU AI bill is released, how can OpenAI and other companies score, and what are the core disputes?" "

  4. Tencent Technology: "The thirst for data is hurting OpenAI? Multiple countries accuse it of violating data protection laws

  5. Netease Technology: "ChatGPT resumes online in Italy but OpenAI's regulatory troubles have just begun"

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)