70+ AI terms explained in plain language. From foundational concepts like LLMs and transformers to emerging topics like vibe coding and context engineering.
AI systems that can autonomously plan, reason, and take actions to accomplish goals with minimal human intervention. Agentic AI often combines LLMs with tool use, memory, and decision-making capabilities.
A software system powered by AI that can perceive its environment, make decisions, and take actions to achieve specific objectives. AI agents can use tools, browse the web, write code, and interact with external services.
The research field focused on ensuring AI systems behave in ways that are consistent with human values and intentions. Alignment aims to make AI helpful, harmless, and honest.
The interdisciplinary field dedicated to ensuring AI systems do not cause unintended harm. AI safety encompasses technical research, policy, and practices to mitigate risks from both current and future AI systems.
Application Programming Interface. A set of protocols and tools that allows software applications to communicate with each other. AI APIs let developers send prompts and receive model responses programmatically.
A neural network technique that allows models to focus on the most relevant parts of the input when producing output. Attention is the core building block of the Transformer architecture and enables models to capture long-range dependencies in text.
The number of training examples processed together in one forward and backward pass during model training. Larger batch sizes can speed up training but require more memory, while smaller batch sizes may improve generalization.
A standardized test or dataset used to evaluate and compare the performance of AI models on specific tasks. Benchmarks provide objective metrics that help researchers and users understand model capabilities.
Systematic errors or unfair tendencies in AI outputs that reflect prejudices present in training data or model design. Bias can lead to discriminatory outcomes across dimensions such as race, gender, and socioeconomic status.
A prompting technique that encourages AI models to break down complex problems into intermediate reasoning steps before arriving at a final answer. This approach significantly improves performance on math, logic, and multi-step tasks.
A software application that simulates human conversation through text or voice interactions. Modern AI chatbots are typically powered by large language models and can handle a wide range of questions and tasks.
The use of AI models to automatically write, complete, or translate programming code based on natural language descriptions or partial code. Modern LLMs can generate code in dozens of programming languages.
A training approach developed by Anthropic where AI models are guided by a set of principles (a constitution) to self-critique and revise their outputs. This method reduces harmful responses without requiring extensive human feedback on every example.
Automated systems that screen AI inputs and outputs to block harmful, inappropriate, or policy-violating content. Content filters are a key safety layer applied on top of model behavior.
The practice of carefully designing and structuring the information provided to an AI model within its context window to maximize output quality. Context engineering goes beyond prompt engineering to include managing system prompts, retrieved documents, and conversation history.
The maximum amount of text (measured in tokens) that an AI model can process in a single interaction, including both the input prompt and the generated output. Larger context windows allow models to handle longer documents and conversations.
An AI assistant designed to work alongside humans, augmenting their capabilities rather than replacing them. Copilots are common in coding, writing, and productivity tools, offering suggestions while keeping the human in control.
Techniques for artificially expanding training datasets by creating modified versions of existing data. In NLP, this can include paraphrasing, back-translation, or synonym replacement to improve model robustness.
A technique where a smaller student model is trained to replicate the behavior of a larger teacher model. Distillation produces compact models that retain much of the larger model's capability while being faster and cheaper to run.
Running AI models directly on local devices such as phones, laptops, or embedded systems rather than in the cloud. Edge AI reduces latency, improves privacy, and enables offline operation.
A numerical representation of text, images, or other data as a vector of numbers that captures semantic meaning. Similar concepts have embeddings that are close together in vector space, enabling search, clustering, and recommendation systems.
A prompting technique where a small number of example input-output pairs are included in the prompt to guide the model's behavior. Few-shot learning helps models understand the desired format and style without any fine-tuning.
The process of further training a pre-trained model on a specific dataset to specialize it for a particular task or domain. Fine-tuning adjusts the model's weights to improve performance on targeted use cases while leveraging knowledge from pre-training.
A large AI model trained on broad data at scale that can be adapted to a wide range of downstream tasks. Foundation models like GPT, Claude, and Gemini serve as the base for chatbots, coding assistants, and many other applications.
A capability that allows AI models to generate structured requests to invoke external functions or APIs based on user input. Function calling enables models to perform actions like searching databases, making calculations, or interacting with services.
Generative Pre-trained Transformer. A family of large language models developed by OpenAI that generate text by predicting the next token. GPT models, including GPT-4 and GPT-4o, are among the most widely used AI models.
Graphics Processing Unit. A specialized processor originally designed for rendering graphics that is now essential for training and running AI models. GPUs excel at the parallel matrix computations that neural networks require.
Safety mechanisms and constraints built into AI systems to prevent harmful, biased, or undesirable outputs. Guardrails can include input validation, output filtering, topic restrictions, and behavioral guidelines.
When an AI model generates information that sounds plausible but is factually incorrect or entirely fabricated. Hallucinations are a major challenge for LLMs and can include invented citations, false statistics, or fictional events.
A benchmark developed by OpenAI for evaluating AI models on code generation tasks. It consists of programming problems that test a model's ability to write correct Python functions from docstrings.
The process of using a trained AI model to generate predictions or outputs from new inputs. Inference is what happens when you send a prompt to an AI model and receive a response.
An adversarial technique that attempts to bypass an AI model's safety guidelines and restrictions to produce prohibited or harmful content. Jailbreaks exploit vulnerabilities in model training or prompt processing.
The time delay between sending a request to an AI model and receiving the first token of a response. Lower latency is critical for real-time applications like chatbots and interactive coding assistants.
A hyperparameter that controls how much a model's weights are adjusted during each step of training. A learning rate that is too high can cause unstable training, while one that is too low results in slow convergence.
Large Language Model. A type of AI model trained on vast amounts of text data that can understand and generate human language. LLMs power chatbots, coding assistants, and many other AI applications.
Massive Multitask Language Understanding. A widely used benchmark that tests AI models across 57 academic subjects including math, history, law, and science. MMLU scores are a common way to compare model knowledge and reasoning.
The infrastructure and processes involved in deploying a trained AI model so it can accept requests and return predictions in production. Model serving includes load balancing, scaling, and optimizing for latency and cost.
An architecture where a model contains multiple specialized sub-networks (experts) and a gating mechanism that routes each input to only a subset of experts. MoE allows models to have more total parameters while keeping inference costs manageable.
AI models or systems that can process and generate multiple types of data, such as text, images, audio, and video. Multimodal models like GPT-4o and Gemini can understand images and produce text descriptions, or generate images from text.
An NLP task that identifies and classifies named entities in text into predefined categories such as person names, organizations, locations, and dates. NER is widely used in information extraction and document processing.
An AI model whose trained weights are publicly released, allowing anyone to download, use, and modify it. Open-weight models like Llama and Mistral enable local deployment and customization but may not include training data or code.
When a model learns the training data too well, including its noise and quirks, and fails to generalize to new unseen data. Overfitting results in high training accuracy but poor real-world performance.
A learnable value within a neural network that is adjusted during training to improve model performance. The number of parameters in a model, often measured in billions, is a rough indicator of its capacity and complexity.
Additional training steps applied after pre-training to refine model behavior, including instruction tuning, RLHF, and safety training. Post-training transforms a raw language model into a helpful, safe assistant.
The initial phase of training an AI model on a large, diverse dataset to learn general language patterns and knowledge. Pre-training typically involves predicting the next token across billions of text documents.
A technique where the output of one AI prompt is used as input for another, creating a sequence of steps to accomplish complex tasks. Prompt chaining breaks down difficult problems into manageable sub-tasks.
The practice of crafting effective instructions and inputs to get the best possible outputs from AI models. Good prompt engineering involves clear instructions, relevant context, examples, and structured formatting.
A technique that reduces the precision of a model's numerical weights, for example from 32-bit to 8-bit or 4-bit numbers, to decrease memory usage and speed up inference. Quantization enables large models to run on consumer hardware with minimal quality loss.
Retrieval-Augmented Generation. A technique that enhances AI model responses by first retrieving relevant information from external knowledge sources, then including that information in the prompt. RAG reduces hallucinations and keeps responses grounded in up-to-date facts.
A restriction on the number of API requests a user or application can make within a given time period. Rate limits protect AI services from overload and ensure fair usage across customers.
The practice of systematically testing AI systems by attempting to find vulnerabilities, biases, and failure modes through adversarial prompting. Red teaming helps identify safety issues before models are deployed to the public.
Reinforcement Learning from Human Feedback. A training technique where human evaluators rank model outputs to create a reward signal that guides the model toward more helpful and harmless behavior. RLHF is a key step in making LLMs safe and useful.
The use of AI to identify and classify the emotional tone or opinion expressed in text, such as positive, negative, or neutral. Sentiment analysis is widely used in social media monitoring, customer feedback, and market research.
AI model responses that follow a specific data format such as JSON, XML, or a defined schema rather than free-form text. Structured output makes it easier to integrate AI responses into software applications programmatically.
The task of condensing longer text into a shorter version while preserving the key information and meaning. AI-powered summarization can handle documents, articles, meeting transcripts, and conversations.
A benchmark that evaluates AI models on real-world software engineering tasks drawn from GitHub issues and pull requests. SWE-bench measures a model's ability to understand codebases and produce working fixes for actual bugs.
Artificially generated data created by AI models or algorithms rather than collected from real-world sources. Synthetic data is increasingly used to train and fine-tune AI models when real data is scarce, expensive, or privacy-sensitive.
A special instruction provided to an AI model that sets its behavior, persona, and constraints for an entire conversation. System prompts are typically hidden from the end user and define how the model should respond.
A parameter that controls the randomness of an AI model's output. Lower temperatures produce more focused and deterministic responses, while higher temperatures increase creativity and variation but may reduce accuracy.
AI models that generate images from natural language text descriptions. Popular text-to-image models include DALL-E, Midjourney, and Stable Diffusion, which can create realistic photos, illustrations, and artwork from prompts.
AI models that generate video content from natural language descriptions. Text-to-video represents a frontier in generative AI, with models like Sora producing increasingly realistic video clips from text prompts.
The number of requests or tokens an AI system can process per unit of time. High throughput is essential for serving many users simultaneously and is a key metric when evaluating AI infrastructure.
The basic unit of text that AI models process, typically representing a word, subword, or character. A single word may be split into multiple tokens, and most models process about 3-4 tokens per English word.
The process of breaking text into tokens that an AI model can process. Different models use different tokenization schemes, which affect how efficiently they handle various languages and special characters.
The ability of AI models to interact with external tools and services such as web browsers, code interpreters, calculators, and APIs to accomplish tasks. Tool use is a key capability that enables agentic AI behavior.
Decoding strategies that control which tokens the model considers when generating text. Top-p (nucleus sampling) selects from the smallest set of tokens whose cumulative probability exceeds p, while top-k limits selection to the k most likely tokens.
Tensor Processing Unit. A custom AI accelerator chip designed by Google specifically for machine learning workloads. TPUs are optimized for the matrix operations used in training and running neural networks.
A neural network architecture introduced in 2017 that uses self-attention mechanisms to process sequences of data in parallel. Transformers are the foundation of virtually all modern large language models including GPT, Claude, Gemini, and Llama.
A style of programming where developers describe what they want in natural language and let AI generate the code, focusing on the creative direction rather than writing code manually. Vibe coding lowers the barrier to building software and is popular for prototyping.
A technique where an AI model performs a task without being given any examples in the prompt, relying entirely on its pre-trained knowledge. Zero-shot performance indicates how well a model generalizes to new tasks out of the box.
Check out our in-depth guides and resources.