AI Interview Series 12: How to Optimize Prompts?

Prompt tuning (Prompt Engineering / Optimization) is a key skill for making large language models "obedient." Especially in RAG systems, it directly determines whether the model faithfully adheres to retrieved content, avoids hallucinations, and follows output format specifications.

1. Core Principles of Prompt Tuning

Clarity > Complexity: Simple, direct instructions are often more effective than fancy chain-of-thought.
Give Sufficient Constraints: Clearly tell the model "what it can and cannot do."
Provide Examples: Few-shot is more stable than zero-shot.
Verifiability: Have the model output citations or confidence levels for downstream judgment.
Iterative Optimization: Start from a baseline, change only one variable at a time, and compare results.

2. Specific Tuning Techniques (From Easy to Hard)

1. Role Setting (System Prompt)

You are a professional customer service assistant. You can only answer questions based on the [Reference Materials] provided below.
If you don't know the answer, say "No relevant information in the materials" and do not make up anything.

Effect: Sets boundaries and tone.
Tuning Points: Tone (professional/friendly), constraint strength (strict/loose).

2. Clear Instructions

❌ Bad: "Answer the user's question."
✅ Good: "Based only on the [Reference Materials] below. If the materials do not contain the answer, respond with 'I cannot answer this question.'"

3. Output Format Control

Please output in the following JSON format:
{
  "answer": "Your answer",
  "confidence": "High/Medium/Low",
  "sources": [1, 3]
}

Use: Facilitates downstream parsing, citation, and debugging.

4. Few-shot Examples (Very Effective)

Example 1:
Question: How many days of annual leave?
Reference Material: Annual leave rules: 5 days for 1 year, 10 days for 10 years.
Answer: 5 days for 1 year, 10 days for 10 years.

Example 2:
Question: How is overtime pay calculated?
Reference Material: 1.5 times for weekdays, 2 times for weekends.
Answer: 1.5 times for weekdays, 2 times for weekends.

Now answer:
Question: {User question}
Reference Material: {Retrieved content}
Answer:

Tip: Examples should cover different difficulty levels, and it's best to include one that shows "cannot answer."

5. Mandatory Citation

At the end of the answer, mark the source number with [citation:X]. For example: "Annual leave is 5 days[citation:1]."
If combining multiple sources, label them separately.

6. Setting Refusal Threshold

Hard Constraint: "If the reference material is completely unrelated to the question, respond with 'Material not relevant.'"
Soft Constraint: Combine with retrieval confidence score; automatically trigger refusal branch when below threshold.

7. Chain-of-Thought for Multi-hop Reasoning

Question: Who is Zhang San's boss?
Steps: 1. First find Zhang San's department. 2. Then find the head of that department. 3. Give the final answer.
Think step by step and then output.

8. Negative Prompting

Do not make up answers. Do not use vague words like "maybe" or "perhaps." Do not output any numbers outside the reference material.

3. How to Evaluate Prompt Quality?

Metric	Meaning	How to Measure
Faithfulness	Whether the answer is strictly based on reference material	Human evaluation or RAGAS Faithfulness
Refusal Accuracy	Whether it refuses when it should	Calculate on a test set with no answers
Format Compliance	Whether it outputs JSON/citations as required	Regex matching
User Satisfaction	Whether the answer is useful	Online feedback / A/B testing

Suggestion: Prepare a small test set (20-50 edge cases), run it after each prompt change, and record changes.

4. Common Pitfalls and Tuning Directions

Issue	Possible Cause	Tuning Method
Model ignores reference material and answers on its own	Instruction not forceful enough	Change to "Based only on the following material" and use few-shot to demonstrate refusal
Model always says "I don't know"	Refusal threshold too high	Lower threshold or check retrieval quality
Output format messy, not JSON	Instruction unclear	Add strict format examples or use function calling
Answer too long/short	No length specification	"Answer in no more than 3 sentences."
Multi-hop reasoning error	Insufficient model reasoning ability	Require step-by-step reasoning or switch to a stronger model
Hallucinated numbers/dates	Model relies on its own knowledge	Emphasize "Do not use any numbers you remember; only look at the material."