Unlocking the AI Brain: Language Concept Models, a Major Step That Makes AI Think More Like Us

Over the years, We've seen the ability of AI to create stunning text or images that are almost indistinguishable from what humans can do. Yes. We're talking about generative AI with large language models or LLMs at its core.

LLMs are very good at guessing the next word from a sentence we give. Imagine AI as an intelligent word guesser that has read so much that it knows which words should come after which, but on the other hand, it still works primarily on the basis of word associations. It is not really "understanding" at the conceptual level, like our brain.

What if AI doesn't just guess words, it can "think" and "understand" at the conceptual level of a sentence? That's what researchers are heading towards with a technology called Language Concept Models or LCMs This is the next important step to unlock the potential of generative AI to be smarter, deeper, and better understand the world closer to humans.

How did you go from "guessing words" to "understanding concepts"?

Imagine LLMs working like we're putting together a jigsaw puzzle of letters, looking at an existing piece (token) and then guessing which is the next most likely piece. Keep doing it, and you get a long message.

But the world is more complex than just letters. For example, the sentences "I love reading" and "reading is my happiness" use different words, but they have the same main "concepts": "reading" and "positive feelings/happiness".

This is where LCMs differ. Tokens Next, from the set of tokens received, LCMs will try to predict. concept Next from the set of sentences or information received.

Simply put, LCMs don't just look at bricks (tokens) one at a time, but try to understand the overall picture of the building (concept) that is about to be built, allowing AI to think abstractly and have a deeper understanding of more than just the surface of words.

Behind the Scenes: AI How do you see language as a "mind map"?

The key to making AI, both LLMs and LCMs, work, is to transform language into something that computers can understand. Embeddings

Imagine that the world of words and sentences is transformed into a "mind map" that has many dimensions (like a 3D world map or more).

Words or sentences that have similar meanings or are used in similar contexts will be close to each other on this map.
To know how similar the meaning of two sentences is, you can do this by measuring the distance or angle between the two points on the map. The most popular method is Cosine Similarity This is measured from the angle between the "vectors" pointing to that point.

Having good "mind maps" or embeddings makes it possible for AI to not only see the word "apple" as just 5 letters, but to understand that it is a kind of fruit, a technology company, or it can mean an eyeball, depending on what word "apple" is close to on the mind map (near "fruit," "pie," "tree," or near "iPhone," "MacBook").

The Evolution of "Embeddings": From Counting Words to Understanding Context

Creating mind maps or embeddings has been constantly evolving.

Early (Frequency-Based): The simplest is to count how often each word appears in the text. This method is easy to create mind maps, but there are limitations on the depth of meaning and context, for example, the word "head" can refer to the top part of the body, or a headline that may not be distinguishable by frequency counting alone.
Prediction-Based: It is to create a more complex mind map using an AI model to help guess missing words in a sentence or guess words from the surrounding context. This allows the resulting embeddings to capture the contextual meaning and relationships between words much more accurately.

Some of the most popular models in the Prediction-Based Embeddings group that we may have heard of, such as:

Word2Vec (2013): It is an important step that makes words with similar meanings significantly closer to each other on the map (for example, the vector of "king" - "man" + "woman" will be close to the vector of "queen").
GloVe, ELMo: Develop to better grasp the context.
BERT, ALBERT: A model that revolutionized NLP by understanding the meaning of words from the context in two directions (reading the first and last words at the same time), embeddings have immense power to understand complex languages.
And the latest like SONAR: They are designed to create embeddings that represent "ideas" rather than just words or sentences, allowing LCMs to function as intended.

How do embeddings and LLMs work together?

If you compare LLM to a factory to produce embeddings, it is like a fine raw material sent to this factory.

Imported Raw Materials (Embeddings): The words or sentences we enter for the LLM are first converted into embeddings, which is a set of numbers that the AI can understand.
Through the "Encoder" machine: These embeddings go into a part called the encoder, which has several layers.
There is an "Attention Mechanism" system: While the data is in the encoder (and in the decoder, too), there is a special mechanism called Attention that helps the model know which words or parts of a sentence to focus on more. It's like reading a long text and highlighting only the main points.
Forward the "Decoder" section: The processed and understood data is sent to the decoder, which works like an encoder but is responsible for generating new messages.
Generate Results (Messages): The decoder also uses the Attention Mechanism to select the most appropriate words to generate into a new text that makes sense and is consistent with the information received.

Architecture Encoder-Decoder With Attention This is it. is at the heart of many modern large language models, including the famous Transformer Models that form the foundation of LLMs like ChatGPT.

The next big step: when AI starts to "think" as a concept (Language Concept Models)

As mentioned, LLMs are good at predicting the next token from the previous one. But LCMs want to go beyond that.

"Concepts" here are not just words, but represent ideas, ideas, or essences at a higher level, and most importantly:

Language-agnostic: The concept of "happiness" is the same concept, whether we speak in Thai, English, or any other language.
Modality-Agnostic: The concept of "cat" can come from the image of a cat. It can be a cat voice, or a message that talks about cats.

The function of LCMs is to use embeddings that represent concepts (not just words) to process.

How do LCMs work with SONAR Embeddings?

Models like SONAR The previously mentioned are specifically designed to transform sentences or texts into "catching" embeddings. Capture the concept of the message effectively.

The initial working process of an LCM based on SONAR might look like this:

Convert to Embeddings Concepts: Incoming sentences or data are converted into embeddings that represent "concepts" with SONAR.
Predict the next idea: The LCM model takes in the embeddings of these ideas and predicts what the next "idea" in the logical sequence should be.
Convert concepts back into language: After obtaining the embeddings of a predictable idea, SONAR will perform the reverse function, converting the embeddings of the concept into sentences or texts that can be understood by humans.

Working at this conceptual level helps AI better understand complex data sequences and think in abstract terms.

Advanced Techniques: Diffusion-Based LCMs with Concept Moderation

One of the interesting approaches to developing LCMs is to adopt a technique called Diffusion Model This technique is very famous for its realistic AI image generation.

Imagine creating an AI image with Diffusion: it usually starts with an image with a lot of noise or blur and gradually reduces the noise until you get a sharp, realistic image.

This concept has been applied to LCMs by:

Instead of visual diffusion, LCMs work with conceptual embeddings.
The process is to gradually reduce the noise or uncertainty from the embeddings of "possible ideas".
The model starts with embeddings that are relatively vague and have high noise, and then gradually refines and refines until the embeddings represent the "most correct and clear concepts" in that context.

Using the Diffusion Model allows LCMs to predict concepts more accurately and reliably. two-tower architecture One part is responsible for reducing noise (denoiser) and the other part is acting as a "decoder". The idea that came out

Why are Language Concept Models (LCMs) important and what can they be used for?

Moving from token-focused LLMs to concept-focused LCMs Open the door to exciting new AI capabilities:

Think abstractly: AI is not just stuck with words, but understands the essence of the story, allowing it to analyze and process data more deeply, such as understanding the "intent" of the law or the "humor" hidden in dialogue.
Manage complex data in a hierarchy: Just as the human brain organizes information into categories and logical relationships, LCMs can analyze complex data structures, such as summarizing the content of an entire book or analyzing the structure of arguments in discussions.
Process long and varied data: By working at a more conceptual level than looking at each token at a time, LCMs are better able to handle very long pieces of data. Modality-Agnostic This allows AI to connect and understand ideas from different sources at the same time, whether it's text, audio, images, or even video.
Get better at zero-shot content creation: With conceptual understanding, AI can generate the desired text or content without having seen a lot of exact examples before, because it understands the "concept" of what we are asking for, such as ordering a poem about "loneliness in a big city" without ever seeing a poem like this before.
Flexible in use: Modality-agnostic makes LCMs a good basis for developing AI systems that can take multiple inputs and provide multiple outputs. Meet the needs of a wide range of applications in daily life and in the business sector.

Conclusion from Insiderly

The emergence and development of Language Concept Models or LCMs is truly a major evolution of generative AI. We are transitioning from having AI that is adept at mimicking language patterns to AI that is starting to have the ability to "understand" and "think" on a deeper and more abstract level.

Working at this conceptual level not only makes AI better able to handle the complexity of language and data, but also paves the way for a flexible AI system that can learn and adapt to new forms of data more easily. The ability to link concepts across data models (Modality-Agnostic) will be key to creating AI that can interact with and assist humans in the real world filled with a variety of data.

LCMs are a clear sign that AI is moving closer to understanding the world the way we are, leading to new applications and innovations that we may not have imagined.

Technical terms to know

Token: The basic unit of language that the AI model processes can be a whole word, a part of a word, or even a punctuation.
Embedding: The process of converting words, sentences, or concepts into vectors of numbers in a multidimensional space so that computers can analyze and compare their meanings.
Cosine Similarity: How to measure the similarity between two vectors by looking at the angle between them (values near 1 are very similar, close -1 is opposite, close 0 is not related).
Encoder-Decoder Architecture: The encoder section is responsible for receiving and processing the input data, and the decoder section is responsible for generating the output data.
Attention Mechanism: A mechanism in an AI model that helps the model know which parts of the input data need to be "paid special attention" or focus on processing each step.
Multi-Headed Attention: Using multiple Attention Mechanisms works simultaneously so that the model can capture the relationships and importance of multiple dimensions and perspectives of information at once.
Diffusion Model: A generative model that works by gradually adding noise to the actual data until it becomes pure noise in the training stage, and learning how to "reduce noise" to convert noise back into real data in the creation stage.
Modality-Agnostic: The ability of AI models to work with a variety of data modalities, such as text, images, audio, or video, without being limited to one format only.
Zero-Shot Generation: AI Model's Ability to Generate Content Answer questions, or perform instructions that have never been seen before (or very rarely) based on an understanding of the "concept" or "meaning" of the object.
SONAR: is the name of the technology for creating embeddings that represent the concept. It is used as a key component in LCMs to convert ideas into vectors and convert them back.

Learn how to use ChatGPT to create brand-aligned graphics.

OpenAI Acquires Windsurf for $3 Billion to Strengthen AI Coding

Collaborate and write work seamlessly with GPT-4o.