Prompt Engineering for Production
Prompts are the primary interface between your application and an LLM. Small changes have outsized effects on output quality, cost, and reliability.
Core Principles
Be Explicit About Format
LLMs default to verbose prose. Always specify the output format you need.
# Vague
Summarize this article.
# Explicit
Summarize the following article in 3 bullet points, each under 15 words.
Focus on technical findings, not background context.
Give the Model a Role
System prompts that assign a clear persona improve consistency.
You are a senior software engineer reviewing a pull request.
Your feedback is precise, actionable, and backed by reasoning.
You do not suggest changes that are purely stylistic.
Chain of Thought for Reasoning Tasks
For classification or analysis, ask the model to reason before answering.
Analyze the customer complaint below.
First, identify the core issue in one sentence.
Then, classify it as one of: [billing, technical, account, other].
Finally, rate urgency as high/medium/low with a one-line justification.
Complaint: {complaint_text}
Few-Shot Examples
Include 2–3 input/output examples for tasks where format precision matters.
Convert the following dates to ISO 8601 format.
Input: March 5th, 2024
Output: 2024-03-05
Input: 12 Jan 23
Output: 2023-01-12
Input: {date}
Output:
Structured Output
Use JSON mode or function calling instead of parsing free text.
from openai import OpenAI
from pydantic import BaseModel
client = OpenAI()
class EntityExtraction(BaseModel):
company: str
role: str
years_experience: int
completion = client.beta.chat.completions.parse(
model="gpt-4o",
messages=[
{"role": "user", "content": f"Extract info from: {resume_text}"}
],
response_format=EntityExtraction,
)
result = completion.choices[0].message.parsed
Cost and Latency Optimization
| Technique | Latency | Cost | Quality |
|---|---|---|---|
| Reduce prompt length | ↓ | ↓ | Neutral |
| Use smaller model for simple tasks | ↓↓ | ↓↓ | ↓ slightly |
| Cache repeated prompts | ↓↓ | ↓↓ | Same |
| Prompt caching (Anthropic/OpenAI) | ↓ | ↓↓ | Same |
| Batch API | None | ↓↓ | Same |
Prompt Caching
For prompts with large, repeated system prompts or context:
# Anthropic prompt caching
response = client.messages.create(
model="claude-opus-4-6",
system=[{
"type": "text",
"text": long_system_prompt,
"cache_control": {"type": "ephemeral"} # Cache this block
}],
messages=[{"role": "user", "content": user_query}]
)
Evaluation
Never deploy a prompt change without evaluating it. Build a small eval set (50–200 examples) and track:
- Accuracy on labeled examples
- Format compliance (does output match expected schema?)
- Regression rate on cases the old prompt handled correctly
def evaluate_prompt(prompt_fn, eval_set):
correct = 0
for example in eval_set:
output = prompt_fn(example["input"])
if output == example["expected"]:
correct += 1
return correct / len(eval_set)
Prompt Versioning
Treat prompts like code:
- Store in version control alongside your codebase
- Tag prompt versions that ship to production
- A/B test significant rewrites before full rollout
- Log which prompt version produced each response