AI models are trained up to a certain date and have no access to your company's internal data. RAG fixes this by letting the AI look something up before it answers, like a researcher who checks the files first.
π‘ Think of a brilliant consultant who doesn't know your company yet. Before answering, they quickly read your policy documents and last quarter's reports, then give you a precise, relevant answer. Without RAG, they'd only speak in generalities.
A prompt is the instruction you give to an AI. Prompt engineering is the skill of writing those instructions well because how you ask something dramatically changes what you get back.
π‘ Think of briefing a contractor. "Fix my house" gets you a puzzled look. "Repaint the kitchen walls in off-white, avoid the skirting boards, finish by Friday" gets you exactly what you wanted. AI works the same way.
AI doesn't read your text word by word β it first breaks everything into tokens: small chunks of characters representing commonly occurring sequences in language. A token can be a full word, part of a word, a punctuation mark, or a single character. Common words like "is" or "the" = 1 token. "Blueberries" = 2 tokens: "blue" + "berries". Emojis cost 2β3 tokens each. These tokens feed into the context window β how much text the AI can "see" and consider at one time during a conversation. Everything has to fit inside this window: your question, the AI's previous answers, and any documents you've shared. Once you exceed it, the oldest content starts to fall out β like pushing items off a conveyor belt.
π‘ Think of it like the whiteboard in a meeting room. You can only fit so much on it. When it gets full, you have to erase the earlier notes to make room for new ones, and the AI, like you, can no longer refer back to what was erased. The bigger the whiteboard, the more context the AI can hold at once.