Wu Enda's ChatGPT class exploded: AI gave up writing words backwards, but understood the whole world

Source: Qubit

Unexpectedly, even today, ChatGPT will still make low-level mistakes?

God Wu Enda pointed it out in the latest class:

ChatGPT does not reverse words!

For example, let it reverse the word lollipop, and the output is pilollol, which is completely confusing.

Oh, this is indeed a bit eye-popping.

So much so that after the netizens who attended the class posted on Reddit, they immediately attracted a large number of onlookers, and the popularity of posts rushed to 6k.

And this is not an accidental bug. Netizens found that ChatGPT is indeed unable to complete this task, and the result of our personal test is the same.

△ Measured ChatGPT (GPT-3.5)

Not even a lot of products including Bard, Bing, and Wenxin Yiyan.

△ Measured Bard

△Testing the heart and mind of a word

Some people followed up and complained that ChatGPT is terrible at handling these simple word tasks.

For example, playing Wordle, the previously popular word game, was a disaster and was never done right.

Eh? Why?

The key is the token

The reason for this phenomenon lies in the token. Tokens are the most common character sequences in text, and large models use tokens to process text.

It can be a whole word or a fragment of a word. The large model understands the statistical relationship between these tokens and is good at generating the next token.

So when dealing with the small task of word reversal, it might just flip each token, not the letter.

This is even more obvious in the Chinese context: a word is a token, or a word is a token.

For the example at the beginning, someone tried to understand the reasoning process of ChatGPT.

For a more intuitive understanding, OpenAI even released a GPT-3 Tokenizer.

For example, for the word lollipop, GPT-3 will understand it into three parts: I, oll, ipop.

According to the conclusion of experience, such unwritten rules were born.

  • 1 token ≈ 4 English characters ≈ 3/4 words;
  • 100 tokens ≈ 75 words;
  • 1-2 sentences ≈ 30 tokens;
  • A paragraph ≈ 100 tokens, 1500 words ≈ 2048 tokens;

How words are divided also depends on the language. According to previous statistics, the number of tokens used in Chinese is 1.2 to 2.7 times that of English.

The higher the ratio of token-to-char (token to word), the higher the processing cost. So processing Chinese tokenize is more expensive than English.

It can be understood that token is the way for the big model to understand the real world of human beings. It is very simple and also greatly reduces memory and time complexity.

However, there is a problem with tokenizing words, which will make it difficult for the model to learn meaningful input representations. The most intuitive representation is that it cannot understand the meaning of words.

At that time, Transformers had been optimized accordingly. For example, a complex and uncommon word was divided into a meaningful token and an independent token.

Just like annoyingly is divided into "annoying" and "ly", the former retains its semantics, while the latter appears frequently.

This has also contributed to the stunning effects of ChatGPT and other large-scale model products today, which can understand human language very well.

As for such a small task as unable to handle word reversal, there is naturally a solution.

The most simple and direct way is to separate the words by yourself~

Or you can let ChatGPT step by step, first tokenize each letter.

Or let it write a program that reverses letters, and then the result of the program is correct. (dog head)

However, GPT-4 can also be used, and there is no such problem in the actual measurement.

△ Measured GPT-4

In short, token is the cornerstone for AI to understand natural language.

As a bridge for AI to understand human natural language, the importance of token is becoming more and more obvious.

It has become a key determinant of the performance of AI models, and it is also a billing standard for large models.

even have token literature

As mentioned above, token can facilitate the model to capture ** finer-grained ** semantic information, such as word meaning, word order, grammatical structure, etc. Its order, position is crucial in sequence modeling tasks such as language modeling, machine translation, text generation, etc.

Only when the model accurately understands the position and context of each token in the sequence can it predict the content better and give a reasonable output.

Therefore, the quality and quantity of token have a direct impact on the model effect.

Since the beginning of this year, when more and more large models are released, the number of tokens will be emphasized. For example, Google PaLM 2 exposure details mentioned that it used 3.6 trillion tokens for training.

And many bigwigs in the industry also said that token is really the key!

Andrej Karpathy, an AI scientist who jumped from Tesla to OpenAI this year, said in his speech:

More tokens can make the model think better.

And he emphasized that the performance of the model is not determined by the size of the parameters alone.

For example, the parameter scale of LLaMA is much smaller than that of GPT-3 (65B vs 175B), but because it uses more tokens for training (1.4T vs 300B), LLaMA is more powerful.

And by virtue of its direct impact on model performance, token is still the billing standard for AI models.

Take OpenAI's pricing standard as an example. They bill in units of 1K tokens. Different models and different types of tokens have different prices.

In short, after stepping into the gate of the AI large model field, you will find that token is an unavoidable knowledge point.

Well, it even spawned token literature...

However, it is worth mentioning that what the token should be translated into in the Chinese world has not yet been fully determined.

The literal translation of "token" is always a little weird.

GPT-4 thinks it is better to call it "word element" or "token", what do you think?

Reference link: [1] [2] [3]

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)