Implementing AI Features: A Practical Framework
How to add LLM and AI features to your product without the chaos.
Jason Overmier
Innovative Prospects Team
Implementing AI Features: A Practical Framework
Everyone’s adding AI to their products. But most AI features fail—not because the technology doesn’t work, but because they’re poorly planned, poorly tested, and poorly governed.
This is how to do AI features right.
The AI Feature Framework
Before writing code, answer these questions:
1. What Problem Are You Solving?
AI isn’t a solution looking for a problem. Validate that:
- Users actually want this feature
- AI is the best solution (not rules, templates, etc.)
- Success is measurable (engagement, time saved, etc.)
Red flag: “We need AI to stay competitive” isn’t a use case.
2. What’s the Worst That Happens?
AI makes mistakes. Plan for it:
| Risk | Mitigation |
|---|---|
| Hallucinations | Citations, confidence scores, human review |
| Bias | Diverse training data, regular audits |
| Cost | Budget limits, caching, cheaper models |
| Latency | Streaming responses, optimistic UI |
| Data privacy | PII redaction, on-premise options |
3. Can You Test It?
AI is probabilistic. Testing is harder:
// ❌ Bad - Exact match testing
expect(aiResponse).toBe("The capital of France is Paris");
// ✅ Good - Semantic testing
expect(aiResponse).toInclude("Paris");
expect(aiResponse).toInclude("France");
expect(aiResponse).toSatisfySafetyGuidelines();
Architecture Patterns
Pattern 1: Direct API Integration
Simplest—call OpenAI/Anthropic directly from your backend.
import OpenAI from "openai";
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
async function summarizeText(text: string) {
const response = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: `Summarize: ${text}` },
],
max_tokens: 150,
});
return response.choices[0].message.content;
}
Pros: Fast to implement, no infrastructure Cons: API costs, latency, vendor lock-in
Pattern 2: RAG (Retrieval-Augmented Generation)
Ground AI responses in your data:
User Query
↓
Embed (vectorize) query
↓
Search vector database for similar documents
↓
Add retrieved context to prompt
↓
LLM generates response with citations
Tools: Pinecone, Weaviate, pgvector, LangChain.
import { OpenAIEmbeddings } from "langchain/embeddings/openai";
import { PineconeStore } from "langchain/vectorstores/pinecone";
async function answerQuestion(query: string) {
// Search for relevant documents
const vectorStore = await PineconeStore.fromExistingIndex(
new OpenAIEmbeddings(),
{ pineconeIndex },
);
const results = await vectorStore.similaritySearch(query, 3);
// Generate answer with context
const context = results.map((r) => r.pageContent).join("\n");
const response = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [
{ role: "system", content: `Answer using this context:\n${context}` },
{ role: "user", content: query },
],
});
return {
answer: response.choices[0].message.content,
sources: results.map((r) => r.metadata),
};
}
Pattern 3: Fine-Tuning
For specialized domains, train custom models:
- When: Domain-specific language (legal, medical), specific format requirements
- Trade-off: More expensive, requires ML expertise
- Alternative: RAG is often sufficient
Guardrails & Safety
1. Content Moderation
Filter inputs and outputs:
import { moderation } from '@openai/moderation';
async function safeCompletion(userInput: string) {
// Check input
const moderationResult = await moderation.check(userInput);
if (moderationResult.flagged) {
throw new Error('Content violates policy');
}
// Get completion
const response = await openai.chat.completions.create({...});
// Check output
const outputModeration = await moderation.check(response.content);
if (outputModeration.flagged) {
return fallbackResponse;
}
return response.content;
}
2. PII Redaction
Never send sensitive data to external APIs:
function redactPII(text: string) {
// Remove or redact:
// - SSNs, credit cards, emails
// - Names, addresses (optional)
// - Medical records, financial data
return text
.replace(/\d{3}-\d{2}-\d{4}/g, "***-**-****") // SSN
.replace(/\d{4}-\d{4}-\d{4}-\d{4}/g, "****-****-****-****") // Credit card
.replace(
/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g,
"***@***.***",
); // Email
}
3. Rate Limiting & Budget Controls
AI costs scale with usage. Protect against abuse:
import rateLimit from "express-rate-limit";
const aiRateLimit = rateLimit({
windowMs: 60 * 1000, // 1 minute
max: 10, // 10 requests per minute
message: "Too many AI requests, please try again later",
});
app.post("/api/ai/generate", aiRateLimit, async (req, res) => {
// Your AI logic here
});
4. Human-in-the-Loop
For high-stakes outputs, add review:
enum ReviewStatus {
AUTO_APPROVED = 'auto_approved',
PENDING_REVIEW = 'pending_review',
REJECTED = 'rejected'
}
async function generateContent(prompt: string) {
const response = await openai.chat.completions.create({...});
const confidence = calculateConfidence(response);
const status = confidence > 0.9
? ReviewStatus.AUTO_APPROVED
: ReviewStatus.PENDING_REVIEW;
return {
content: response.content,
status,
confidence
};
}
Testing AI Features
Unit Testing
Test deterministic parts:
describe("AI Service", () => {
it("should redact PII before sending to API", () => {
const redacted = redactPII("Call 555-123-4567 for support");
expect(redacted).toBe("Call ***-***-**** for support");
});
it("should handle API failures gracefully", async () => {
vi.mocked(openai).mockRejectedValueOnce(new Error("API error"));
const result = await generateSummary("test");
expect(result).toEqual(fallbackSummary);
});
});
Integration Testing
Test real AI calls (in CI, use mocked responses):
it("should generate coherent summaries", async () => {
const summary = await summarizeText(longText);
expect(summary.length).toBeLessThan(200);
expect(summary).toIncludeKeyPoints();
});
Evals (LLM Evaluations)
Use LLMs to grade LLM outputs:
async function gradeResponse(query: string, response: string) {
const gradingPrompt = `
Grade this response 1-10 on:
1. Accuracy
2. Helpfulness
3. Safety
Query: ${query}
Response: ${response}
Return JSON: {accuracy: number, helpfulness: number, safety: number}
`;
const grade = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: gradingPrompt }],
});
return JSON.parse(grade.choices[0].message.content);
}
Monitoring & Observability
Track AI-specific metrics:
| Metric | Why It Matters |
|---|---|
| Latency (P50, P95, P99) | User experience |
| Cost per request | Budget management |
| Token usage | Capacity planning |
| Error rate | Reliability |
| User satisfaction | Quality feedback |
import { Histogram } from "prom-client";
const aiLatency = new Histogram({
name: "ai_request_duration_seconds",
labelNames: ["model", "operation"],
});
async function trackedAIRequest(model: string, operation: string) {
const end = aiLatency.startTimer({ model, operation });
try {
const result = await makeAIRequest(model, operation);
end();
return result;
} catch (error) {
end();
throw error;
}
}
Cost Optimization
AI gets expensive fast. Strategies:
1. Caching
const cache = new LRUCache<string, string>({ max: 1000 });
async function cachedEmbedding(text: string) {
const key = crypto.createHash("sha256").update(text).digest("hex");
const cached = cache.get(key);
if (cached) return cached;
const embedding = await getEmbedding(text);
cache.set(key, embedding);
return embedding;
}
2. Smaller Models
Not everything needs GPT-4:
| Use Case | Recommended Model |
|---|---|
| Complex reasoning | GPT-4o, Claude 3.5 Sonnet |
| Simple tasks | GPT-4o-mini, Claude 3.5 Haiku |
| Embeddings | text-embedding-3-small |
| Classification | Fine-tuned small model |
3. Streaming
Reduce perceived latency:
async function streamingResponse() {
const stream = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [...],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
}
Common Pitfalls
| Pitfall | Why It Happens | Fix |
|---|---|---|
| No fallback when AI fails | Assumes 100% uptime | Always have a non-AI fallback option |
| Ignoring token costs | Usage scales unpredictably | Implement budget limits, per-user quotas |
| Missing guardrails | Users probe for vulnerabilities | Add input/output filtering, explicit policies |
| Poor prompt engineering | Rushing to shipping | Test prompts like you test code |
| Not measuring quality | No baseline for comparison | Track engagement, satisfaction, error rates |
| Sending PII to external APIs | Convenience over privacy | Implement PII redaction before API calls |
| Hallucinations accepted as truth | Trusting AI outputs blindly | Add citations, confidence scores, human review |
Planning an AI feature? We’ve implemented production-ready AI features with RAG, guardrails, and cost optimization for SaaS platforms, marketplaces, and internal tools. Our senior architects ensure your AI integration is secure, scalable, and actually delivers value. Let’s discuss your AI use case.