🎉 [Gate 30 Million Milestone] Share Your Gate Moment & Win Exclusive Gifts!
Gate has surpassed 30M users worldwide — not just a number, but a journey we've built together.
Remember the thrill of opening your first account, or the Gate merch that’s been part of your daily life?
📸 Join the #MyGateMoment# campaign!
Share your story on Gate Square, and embrace the next 30 million together!
✅ How to Participate:
1️⃣ Post a photo or video with Gate elements
2️⃣ Add #MyGateMoment# and share your story, wishes, or thoughts
3️⃣ Share your post on Twitter (X) — top 10 views will get extra rewards!
👉
Wu Enda's ChatGPT class exploded: AI gave up writing words backwards, but understood the whole world
Source: Qubit
Unexpectedly, even today, ChatGPT will still make low-level mistakes?
God Wu Enda pointed it out in the latest class:
For example, let it reverse the word lollipop, and the output is pilollol, which is completely confusing.
So much so that after the netizens who attended the class posted on Reddit, they immediately attracted a large number of onlookers, and the popularity of posts rushed to 6k.
Not even a lot of products including Bard, Bing, and Wenxin Yiyan.
Some people followed up and complained that ChatGPT is terrible at handling these simple word tasks.
For example, playing Wordle, the previously popular word game, was a disaster and was never done right.
The key is the token
The reason for this phenomenon lies in the token. Tokens are the most common character sequences in text, and large models use tokens to process text.
It can be a whole word or a fragment of a word. The large model understands the statistical relationship between these tokens and is good at generating the next token.
So when dealing with the small task of word reversal, it might just flip each token, not the letter.
According to the conclusion of experience, such unwritten rules were born.
How words are divided also depends on the language. According to previous statistics, the number of tokens used in Chinese is 1.2 to 2.7 times that of English.
It can be understood that token is the way for the big model to understand the real world of human beings. It is very simple and also greatly reduces memory and time complexity.
However, there is a problem with tokenizing words, which will make it difficult for the model to learn meaningful input representations. The most intuitive representation is that it cannot understand the meaning of words.
At that time, Transformers had been optimized accordingly. For example, a complex and uncommon word was divided into a meaningful token and an independent token.
Just like annoyingly is divided into "annoying" and "ly", the former retains its semantics, while the latter appears frequently.
This has also contributed to the stunning effects of ChatGPT and other large-scale model products today, which can understand human language very well.
As for such a small task as unable to handle word reversal, there is naturally a solution.
The most simple and direct way is to separate the words by yourself~
In short, token is the cornerstone for AI to understand natural language.
As a bridge for AI to understand human natural language, the importance of token is becoming more and more obvious.
It has become a key determinant of the performance of AI models, and it is also a billing standard for large models.
even have token literature
As mentioned above, token can facilitate the model to capture ** finer-grained ** semantic information, such as word meaning, word order, grammatical structure, etc. Its order, position is crucial in sequence modeling tasks such as language modeling, machine translation, text generation, etc.
Only when the model accurately understands the position and context of each token in the sequence can it predict the content better and give a reasonable output.
Therefore, the quality and quantity of token have a direct impact on the model effect.
Since the beginning of this year, when more and more large models are released, the number of tokens will be emphasized. For example, Google PaLM 2 exposure details mentioned that it used 3.6 trillion tokens for training.
And many bigwigs in the industry also said that token is really the key!
Andrej Karpathy, an AI scientist who jumped from Tesla to OpenAI this year, said in his speech:
For example, the parameter scale of LLaMA is much smaller than that of GPT-3 (65B vs 175B), but because it uses more tokens for training (1.4T vs 300B), LLaMA is more powerful.
Take OpenAI's pricing standard as an example. They bill in units of 1K tokens. Different models and different types of tokens have different prices.
Well, it even spawned token literature...
The literal translation of "token" is always a little weird.
GPT-4 thinks it is better to call it "word element" or "token", what do you think?