Prompt Engineering for RAG — From Retrieval to Generation

Episode 7 18 minutes

A Quick Recap

In the previous episode, we learned how to find the best relevant chunks using vector search, metadata filtering, and hybrid search. Now we have these chunks, but an important question remains: how do we feed them to the LLM to produce the best answer?

This is where Prompt Engineering for RAG enters the game. A good prompt makes the difference between a RAG that gives accurate, useful answers and one that hallucinates.

System Prompt Structure for RAG

The System Prompt is the instruction that tells the LLM “who you are and how you should respond.” In RAG, this prompt must specify several important things:

system_prompt = """
You are a specialist assistant that only answers based on the provided information.

Rules:
1. Only use information from the "Context" section
2. If information is insufficient, honestly say "I don't have enough information to answer"
3. Cite each claim with a source number [1], [2], ...
4. Do not use general knowledge or guesses
5. Write the answer in a clear and understandable way
"""

Let’s break down each section.

Role

The first line specifies the LLM’s role. “Specialist assistant” is better than “AI” because it frames the LLM in a specific context. You can make the role more specific: “Specialist assistant for electronic product technical support.”

Constraints

Rules 1 and 4 are the most important parts. Without these, the LLM might use its general knowledge and provide information that isn’t in your data. This is called Hallucination and it’s enemy number one of RAG.

Behavior When Information Is Missing

Rule 2 is very important. LLMs inherently want to provide answers — even when they don’t know! You must explicitly say “if you don’t know, say you don’t know.” We’ll discuss this more later.

Context Injection Patterns

Now you need to place retrieved chunks into the prompt. Several common patterns exist:

Simple Pattern — All Chunks Back-to-Back

prompt = f"""
Context:
{chunk_1}
{chunk_2}
{chunk_3}

Question: {user_question}

Answer:
"""

Simple but problematic: the LLM doesn’t understand where each chunk came from and where the boundaries are between chunks.

Numbered Pattern — With Clear Sources

prompt = f"""
Context:
[Source 1]: {chunk_1}
[Source 2]: {chunk_2}
[Source 3]: {chunk_3}

Question: {user_question}

Answer based on the sources above and cite the source number for each claim.
"""

This is much better. The LLM can write “According to [Source 1], …” in its answer and the user can verify the source.

Structured Pattern — With Metadata

prompt = f"""
Context:
---
Source: {source_1_title}
Date: {source_1_date}
Content: {chunk_1}
---
Source: {source_2_title}
Date: {source_2_date}
Content: {chunk_2}
---

Question: {user_question}
"""

This pattern gives the LLM more information. For example, if two sources conflict, the LLM can prefer the newer source based on date.

Source Citations — Building User Trust

One of RAG’s biggest advantages over a simple ChatBot is that you can show the answer’s source. But this requires proper design.

Method 1 — Inline Citation

Like academic papers. Place source numbers within the answer text:

"To solve this issue, first restart the device [1]. 
If the problem persists, check network settings [2]."

Sources:
[1] Troubleshooting Guide, page 12
[2] Network Settings Documentation, section 3

Method 2 — Section-level Citation

Each answer paragraph has a source:

About installation: (Source: Installation Guide v2.1)
The installation steps include...

About configuration: (Source: Technical Docs v3.0)
Initial settings include...

Important Note

LLMs don’t always provide accurate citations. Sometimes they assign wrong source numbers. One solution is to add a verification step after answer generation that confirms the citations.

The Art of Saying “I Don’t Know”

This might be the most important part of RAG prompt engineering. The LLM must know when to answer and when to say “I don’t have enough information.”

Why Is This So Important?

Suppose you have a RAG system for medical documentation. A user asks a question whose answer isn’t in your docs. If the LLM answers from its general knowledge, it might provide incorrect or outdated information. In healthcare, this can be dangerous.

Techniques for Saying “I Don’t Know”

1. Similarity Threshold: If even the best search result has a low score, before sending to the LLM, you yourself can say “We don’t have information about this.”

2. Explicit Prompt Instruction:

If the information provided in the "Context" section is not sufficient to answer,
write exactly: "Unfortunately, I don't have enough information to answer this question."
Never use your general knowledge.

3. Partial Answer: Sometimes part of the answer is in your data. Tell the LLM it can answer the known part and be honest about the unknown part:

If only part of the question can be answered, 
answer that part and specify which part lacked sufficient information.

Few-shot Examples — Teaching by Example

One of the most powerful techniques is giving the LLM a few examples of the answers you expect:

system_prompt = """
Example 1:
Context: [Product X has been available since 2023 and comes with a 2-year warranty.]
Question: How long is the warranty for Product X?
Answer: According to available information [1], Product X comes with a 2-year warranty.

Example 2:
Context: [Product Y has a 6.5-inch display.]
Question: What's the price of Product Y?
Answer: Unfortunately, pricing information for Product Y was not found in the available sources.

Now answer in the same style:
"""

Few-shot Examples accomplish three important things:

  • They define the answer style and tone
  • They show how to add citations
  • They show how to say “I don’t know”

Prompt Template — Ready-Made Format

In practice, you typically create a template and replace variables each time:

from string import Template

RAG_TEMPLATE = Template("""
You are a specialist assistant in $domain.

Context:
$context

User question: $question

Instructions:
- Only use information from the "Context" section
- Cite each claim with its source number
- If information is insufficient, honestly say so
- Write the answer in clear, simple language

Answer:
""")

# Usage
prompt = RAG_TEMPLATE.substitute(
    domain="technical support",
    context=formatted_chunks,
    question=user_query
)

Note: You can also use libraries like LangChain or LlamaIndex which have ready-made and customizable prompt templates.

Common Mistakes in RAG Prompt Engineering

1. Too Much Context

If you dump 20 text chunks into the prompt, the LLM gets confused and might miss important information. This is called Lost in the Middle — LLMs typically understand the beginning and end of context better.

Solution: Put more important chunks first. Limit the number of chunks (usually 3 to 7 is sufficient).

2. Too Little Context

If you only provide 1 chunk, the information might be insufficient and the LLM is forced to guess.

3. Vague Instructions

“Try to use the provided information” is weak. “Only and exclusively use information from the Context section. Never use general knowledge” is stronger.

4. Ignoring Language

If your data is in one language but the prompt is in another, the LLM might respond in the wrong language. Explicitly specify the response language.

A Complete, Ready-to-Use Prompt

Let me show you a complete prompt that follows all principles:

"""
You are an intelligent assistant that answers user questions based on documentation.

## Mandatory Rules
1. Only and exclusively answer from the "Context" section
2. Cite each claim with its relevant source number: [1], [2], ...
3. If information is insufficient: "Sufficient information to answer is not available in the documentation"
4. If only part can be answered, answer that part
5. Write the answer in clear, concise language
6. Do not use guesses, estimates, or general knowledge

## Context
[1] Title: {title_1}
Content: {content_1}

[2] Title: {title_2}
Content: {content_2}

[3] Title: {title_3}
Content: {content_3}

## Question
{user_question}

## Answer
"""

Summary

In this episode you learned:

  • What makes a good RAG System Prompt
  • Several different patterns for Context injection
  • How to teach the LLM to add citations
  • How to strengthen the “I don’t know” behavior
  • How effective Few-shot Examples are
  • What common mistakes are and how to avoid them

Now you have a RAG system that searches and generates answers. But how do you know if it works well? In the next episode, we’ll discuss RAG Evaluation — a topic many overlook but one that separates an amateur RAG from a professional one.