← 返回列表

AI Interview Series 12: How to Optimize Prompts?

Prompt tuning (Prompt Engineering / Optimization) is a key skill for making large language models "obedient." Especially in RAG systems, it directly determines whether the model faithfully adheres to retrieved content, avoids hallucinations, and follows output format specifications.


1. Core Principles of Prompt Tuning

  1. Clarity > Complexity: Simple, direct instructions are often more effective than fancy chain-of-thought.
  2. Give Sufficient Constraints: Clearly tell the model "what it can and cannot do."
  3. Provide Examples: Few-shot is more stable than zero-shot.
  4. Verifiability: Have the model output citations or confidence levels for downstream judgment.
  5. Iterative Optimization: Start from a baseline, change only one variable at a time, and compare results.

2. Specific Tuning Techniques (From Easy to Hard)

1. Role Setting (System Prompt)

You are a professional customer service assistant. You can only answer questions based on the [Reference Materials] provided below.
If you don't know the answer, say "No relevant information in the materials" and do not make up anything.
  • Effect: Sets boundaries and tone.
  • Tuning Points: Tone (professional/friendly), constraint strength (strict/loose).

2. Clear Instructions

❌ Bad: "Answer the user's question."
✅ Good: "Based only on the [Reference Materials] below. If the materials do not contain the answer, respond with 'I cannot answer this question.'"

3. Output Format Control

Please output in the following JSON format:
{
  "answer": "Your answer",
  "confidence": "High/Medium/Low",
  "sources": [1, 3]
}
  • Use: Facilitates downstream parsing, citation, and debugging.

4. Few-shot Examples (Very Effective)

Example 1:
Question: How many days of annual leave?
Reference Material: Annual leave rules: 5 days for 1 year, 10 days for 10 years.
Answer: 5 days for 1 year, 10 days for 10 years.

Example 2:
Question: How is overtime pay calculated?
Reference Material: 1.5 times for weekdays, 2 times for weekends.
Answer: 1.5 times for weekdays, 2 times for weekends.

Now answer:
Question: {User question}
Reference Material: {Retrieved content}
Answer:
  • Tip: Examples should cover different difficulty levels, and it's best to include one that shows "cannot answer."

5. Mandatory Citation

At the end of the answer, mark the source number with [citation:X]. For example: "Annual leave is 5 days[citation:1]."
If combining multiple sources, label them separately.

6. Setting Refusal Threshold

  • Hard Constraint: "If the reference material is completely unrelated to the question, respond with 'Material not relevant.'"
  • Soft Constraint: Combine with retrieval confidence score; automatically trigger refusal branch when below threshold.

7. Chain-of-Thought for Multi-hop Reasoning

Question: Who is Zhang San's boss?
Steps: 1. First find Zhang San's department. 2. Then find the head of that department. 3. Give the final answer.
Think step by step and then output.

8. Negative Prompting

Do not make up answers. Do not use vague words like "maybe" or "perhaps." Do not output any numbers outside the reference material.

3. How to Evaluate Prompt Quality?

Metric Meaning How to Measure
Faithfulness Whether the answer is strictly based on reference material Human evaluation or RAGAS Faithfulness
Refusal Accuracy Whether it refuses when it should Calculate on a test set with no answers
Format Compliance Whether it outputs JSON/citations as required Regex matching
User Satisfaction Whether the answer is useful Online feedback / A/B testing

Suggestion: Prepare a small test set (20-50 edge cases), run it after each prompt change, and record changes.

4. Common Pitfalls and Tuning Directions

Issue Possible Cause Tuning Method
Model ignores reference material and answers on its own Instruction not forceful enough Change to "Based only on the following material" and use few-shot to demonstrate refusal
Model always says "I don't know" Refusal threshold too high Lower threshold or check retrieval quality
Output format messy, not JSON Instruction unclear Add strict format examples or use function calling
Answer too long/short No length specification "Answer in no more than 3 sentences."
Multi-hop reasoning error Insufficient model reasoning ability Require step-by-step reasoning or switch to a stronger model
Hallucinated numbers/dates Model relies on its own knowledge Emphasize "Do not use any numbers you remember; only look at the material."

评论

暂无已展示的评论。

发表评论(匿名)