
Integrating LLMs Into Your Application: Patterns That Actually Work
Beyond the ChatGPT Wrapper
Everyone's building AI features now, but most integrations are shallow: take user input, send to OpenAI, display response. That works for demos but fails in production.
Here are the patterns we use when integrating LLMs into real applications.
Pattern 1: RAG (Retrieval-Augmented Generation)
Use when: You need the LLM to answer questions about your data.
The problem with raw LLMs: they don't know about your company's documentation, your product catalog, or your customer data. RAG solves this by retrieving relevant context before generating.
How It Works
User Question → Search Your Data → Add Results to Prompt → LLM → Answer
Implementation Sketch
public async Task<string> AnswerQuestion(string question)
{
// 1. Convert question to embedding
var questionEmbedding = await _embeddingService.GetEmbedding(question);
// 2. Search your vector database for relevant chunks
var relevantDocs = await _vectorDb.Search(questionEmbedding, topK: 5);
// 3. Build prompt with context
var prompt = $"""
Answer the user's question based on the following context.
If the answer isn't in the context, say "I don't know."
Context:
{string.Join("\n\n", relevantDocs.Select(d => d.Content))}
Question: {question}
""";
// 4. Generate answer
return await _llm.Complete(prompt);
}
Key Decisions
- Chunking strategy - How you split documents matters more than which embedding model you use
- Hybrid search - Combine vector search with keyword search for better recall
- Reranking - Use a smaller model to rerank retrieved chunks before sending to the LLM
Pattern 2: Function Calling
Use when: You need the LLM to take actions, not just generate text.
Function calling (or "tool use") lets the LLM invoke your code. Instead of generating a text answer, it generates a structured function call that your application executes.
Example: Order Lookup
var tools = new[]
{
new Tool
{
Name = "lookup_order",
Description = "Look up an order by order number",
Parameters = new
{
type = "object",
properties = new
{
order_number = new { type = "string", description = "The order number" }
},
required = new[] { "order_number" }
}
}
};
var response = await _llm.Chat(
messages: [new("user", "What's the status of order ABC-123?")],
tools: tools
);
if (response.ToolCalls.Any())
{
var call = response.ToolCalls[0];
var orderNumber = call.Arguments["order_number"];
var order = await _orderService.GetOrder(orderNumber);
// Send result back to LLM for final response
var finalResponse = await _llm.Chat(
messages: [
new("user", "What's the status of order ABC-123?"),
new("assistant", null, toolCalls: response.ToolCalls),
new("tool", $"Order {orderNumber}: {order.Status}", toolCallId: call.Id)
]
);
}
Key Decisions
- Tool descriptions matter - The LLM decides which tool to call based on descriptions
- Validate inputs - Don't trust the LLM to generate valid parameters
- Handle errors gracefully - If a tool fails, let the LLM recover
Pattern 3: Agentic Workflows
Use when: You need the LLM to complete multi-step tasks autonomously.
An agent is an LLM in a loop: it decides what to do, takes an action, observes the result, and repeats until the task is complete.
Simple Agent Loop
public async Task<string> RunAgent(string task)
{
var messages = new List<Message>
{
new("system", "You are a helpful assistant. Use tools to complete tasks."),
new("user", task)
};
while (true)
{
var response = await _llm.Chat(messages, tools: _tools);
messages.Add(new("assistant", response.Content, response.ToolCalls));
if (!response.ToolCalls.Any())
return response.Content; // Done
foreach (var call in response.ToolCalls)
{
var result = await ExecuteTool(call);
messages.Add(new("tool", result, toolCallId: call.Id));
}
}
}
Key Decisions
- Set a max iterations limit - Agents can loop forever
- Add guardrails - Restrict which tools are available based on context
- Log everything - You'll need to debug why the agent did what it did
Anti-Patterns to Avoid
1. Putting Business Logic in Prompts
Bad:
"If the user is a premium customer, show them the discount.
Premium customers have spent over $1000 lifetime..."
Good: Check customer status in your code, then tell the LLM what to do.
2. No Fallbacks
What happens when the API times out? When the model returns garbage? Always have a fallback path.
3. Ignoring Latency
LLM calls are slow (500ms-2s). Don't put them in hot paths. Use streaming for user-facing responses.
4. No Observability
You need to see:
- What prompts you're sending
- What responses you're getting
- Token usage and costs
- Error rates and latency
Start Simple
If you're adding LLM features for the first time:
- Start with a single, well-defined use case
- Use function calling over free-form generation
- Add structured output (JSON mode) to get predictable responses
- Measure everything
Need Help?
We've integrated LLMs into applications across financial services, legal, and logistics. If you're exploring AI features and want to avoid the common pitfalls, let's talk.