I love LLMs
Pradip Wasre

NLP Explorer

Pradip Wasre

NLP Explorer

Blog Post

Day 4: Exploring the Evolution of AI and LLMs

January 3, 2025 Week 1

4.1 The Journey of AI: From Early Models to Transformers

The journey of artificial intelligence has been a fascinating one, with transformative breakthroughs shaping its progress. One of the most pivotal moments came in 2017, when Google scientists published the seminal paper “Attention Is All You Need”. This paper introduced the Transformer model architecture, which revolutionized natural language processing and paved the way for modern large language models (LLMs).

The Rise of Transformers

  • 2017: Introduction of the Transformer architecture by Google.

  • 2018: GPT-1, the first Generative Pre-trained Transformer, introduced with 117 million parameters.

  • 2019: GPT-2, significantly larger, with 1.5 billion parameters.

  • 2020: GPT-3, boasting 175 billion parameters, achieved extraordinary performance.

  • 2022: Reinforcement Learning from Human Feedback (RLHF) and ChatGPT made conversational AI more user-friendly.

  • 2024: GPT-4, with over a trillion parameters, took AI capabilities to new heights.

The World’s Reaction

  • First, Shock: ChatGPT surprised even seasoned practitioners with its versatility and fluency.

  • Healthy Skepticism: Many dismissed it as merely “predictive text on steroids” or a “stochastic parrot.”

  • Emergent Intelligence: As models scaled, unexpected capabilities emerged, demonstrating reasoning, creativity, and even problem-solving.

Key Developments Along the Way

  • Prompt Engineers: The rise (and possible decline) of roles specializing in crafting effective prompts for LLMs.

  • Customer GPTs: GPT-powered applications tailored for specific user needs.

  • Copilots: Tools like Microsoft Copilot and GitHub Copilot revolutionized productivity.

  • Agentization: Emerging use cases, such as GitHub Copilot workspace, where LLMs act as autonomous agents.


4.2 Understanding LLM Parameters: From GPT-1 to Trillion-Weight Models

The power of LLMs lies in their parameters – the numerical weights that determine how the model processes data. Over the years, the number of parameters has increased dramatically, leading to greater capabilities.

Parameter Scale of Key Models

  • GPT-1: 117 million parameters.

  • GPT-2: 1.5 billion parameters.

  • GPT-3: 175 billion parameters.

  • GPT-4: 1.76 trillion parameters.

  • Latest Frontier Models: Exceeding 10 trillion parameters (specific details undisclosed).

Other Notable Models

  • Gemma: 2 billion parameters.

  • Llama 3.1: Ranges from 8 billion to 405 billion parameters, depending on the version.

  • Mixtral: 140 billion parameters.

These massive models enable unprecedented performance but come with challenges in terms of cost and computational requirements.


4.3 GPT Tokenization Explained

Large language models process text input by breaking it down into smaller chunks called tokens. Tokens can be as small as a single character or as large as a word or phrase.

The Evolution of Tokenization

  1. Character-Level Models: Early neural networks predicted the next character in a sequence. This approach was computationally expensive and limited in scope.
  2. Word-Level Models: Predicting the next word in a sequence improved efficiency but resulted in massive vocabularies.
  3. Tokens: A middle ground, where words and subwords (stems, prefixes, or suffixes) are broken into manageable chunks. This method balances efficiency and vocabulary size.

Example

For the sentence, “An important sentence for my class of AI Engineers”:

  • Tokens = 9

  • Characters = 50

Tokens capture the structure of text while reducing complexity, allowing the neural network to process language more effectively.


4.4 How Context Windows Impact LLMs: Token Limits Explained

The context window of an LLM is the maximum number of tokens it can consider when generating the next token. This includes:

  • The original input prompt.

  • Subsequent conversation turns.

  • The model’s output.

Why Context Windows Matter

  • Reference and Memory: Models use the context window to remember references and maintain continuity in a conversation.

  • Multi-Shot Prompting: Useful for providing examples within a prompt.

  • Long-Form Content: Critical for processing extensive texts, like analyzing Shakespeare’s complete works.

As models expand their context windows, they can handle more complex tasks and retain better conversational coherence.


4.5 Navigating AI Model Costs: API Pricing vs. Chat Interface Subscriptions

Chat Interfaces

  • Typically offer a Pro Plan with a monthly subscription.

  • Rate-limited but do not charge per usage.

APIs

  • No subscription fees but charge per API call.

  • Costs depend on the number of input tokens (your prompt) and output tokens (the model’s response).

Understanding pricing structures is essential for businesses leveraging AI at scale, balancing cost and performance.


4.6 Wrapping Up Day 4: Key Takeaways and Practical Insights

Here’s what we’ve learned so far:

  • Transformers: Revolutionized AI with their ability to handle sequences effectively.

  • LLM Parameters: Scaling up parameters unlocks emergent capabilities but increases complexity and cost.

  • Tokenization: Enables efficient text processing and understanding.

  • Context Windows: Govern how much information an AI model can retain in a single interaction.

  • API Costs vs. Subscriptions: Each pricing model has its pros and cons depending on the use case.

What You Can Do Now

  • Write code to call OpenAI or Ollama APIs.

  • Compare the six leading frontier LLMs.

  • Discuss key topics like transformers, tokens, context windows, and API costs.

AI’s journey has been extraordinary, and its future promises even more groundbreaking advancements. Stay tuned as we explore what’s next in the world of AI!

Tags: