Implementing AI Features: A Practical Framework

Everyone’s adding AI to their products. But most AI features fail—not because the technology doesn’t work, but because they’re poorly planned, poorly tested, and poorly governed.

This is how to do AI features right.

The AI Feature Framework

Before writing code, answer these questions:

1. What Problem Are You Solving?

AI isn’t a solution looking for a problem. Validate that:

Users actually want this feature
AI is the best solution (not rules, templates, etc.)
Success is measurable (engagement, time saved, etc.)

Red flag: “We need AI to stay competitive” isn’t a use case.

2. What’s the Worst That Happens?

AI makes mistakes. Plan for it:

Risk	Mitigation
Hallucinations	Citations, confidence scores, human review
Bias	Diverse training data, regular audits
Cost	Budget limits, caching, cheaper models
Latency	Streaming responses, optimistic UI
Data privacy	PII redaction, on-premise options

3. Can You Test It?

AI is probabilistic. Testing is harder:

// ❌ Bad - Exact match testing
expect(aiResponse).toBe("The capital of France is Paris");

// ✅ Good - Semantic testing
expect(aiResponse).toInclude("Paris");
expect(aiResponse).toInclude("France");
expect(aiResponse).toSatisfySafetyGuidelines();

Architecture Patterns

Pattern 1: Direct API Integration

Simplest—call OpenAI/Anthropic directly from your backend.

import OpenAI from "openai";

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function summarizeText(text: string) {
  const response = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [
      { role: "system", content: "You are a helpful assistant." },
      { role: "user", content: `Summarize: ${text}` },
    ],
    max_tokens: 150,
  });

  return response.choices[0].message.content;
}

Pros: Fast to implement, no infrastructure Cons: API costs, latency, vendor lock-in

Pattern 2: RAG (Retrieval-Augmented Generation)

Ground AI responses in your data:

User Query
    ↓
Embed (vectorize) query
    ↓
Search vector database for similar documents
    ↓
Add retrieved context to prompt
    ↓
LLM generates response with citations

Tools: Pinecone, Weaviate, pgvector, LangChain.

import { OpenAIEmbeddings } from "langchain/embeddings/openai";
import { PineconeStore } from "langchain/vectorstores/pinecone";

async function answerQuestion(query: string) {
  // Search for relevant documents
  const vectorStore = await PineconeStore.fromExistingIndex(
    new OpenAIEmbeddings(),
    { pineconeIndex },
  );

  const results = await vectorStore.similaritySearch(query, 3);

  // Generate answer with context
  const context = results.map((r) => r.pageContent).join("\n");
  const response = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [
      { role: "system", content: `Answer using this context:\n${context}` },
      { role: "user", content: query },
    ],
  });

  return {
    answer: response.choices[0].message.content,
    sources: results.map((r) => r.metadata),
  };
}

Pattern 3: Fine-Tuning

For specialized domains, train custom models:

When: Domain-specific language (legal, medical), specific format requirements
Trade-off: More expensive, requires ML expertise
Alternative: RAG is often sufficient

Guardrails & Safety

1. Content Moderation

Filter inputs and outputs:

import { moderation } from '@openai/moderation';

async function safeCompletion(userInput: string) {
  // Check input
  const moderationResult = await moderation.check(userInput);
  if (moderationResult.flagged) {
    throw new Error('Content violates policy');
  }

  // Get completion
  const response = await openai.chat.completions.create({...});

  // Check output
  const outputModeration = await moderation.check(response.content);
  if (outputModeration.flagged) {
    return fallbackResponse;
  }

  return response.content;
}

2. PII Redaction

Never send sensitive data to external APIs:

function redactPII(text: string) {
  // Remove or redact:
  // - SSNs, credit cards, emails
  // - Names, addresses (optional)
  // - Medical records, financial data

  return text
    .replace(/\d{3}-\d{2}-\d{4}/g, "***-**-****") // SSN
    .replace(/\d{4}-\d{4}-\d{4}-\d{4}/g, "****-****-****-****") // Credit card
    .replace(
      /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g,
      "***@***.***",
    ); // Email
}

3. Rate Limiting & Budget Controls

AI costs scale with usage. Protect against abuse:

import rateLimit from "express-rate-limit";

const aiRateLimit = rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: 10, // 10 requests per minute
  message: "Too many AI requests, please try again later",
});

app.post("/api/ai/generate", aiRateLimit, async (req, res) => {
  // Your AI logic here
});

4. Human-in-the-Loop

For high-stakes outputs, add review:

enum ReviewStatus {
  AUTO_APPROVED = 'auto_approved',
  PENDING_REVIEW = 'pending_review',
  REJECTED = 'rejected'
}

async function generateContent(prompt: string) {
  const response = await openai.chat.completions.create({...});

  const confidence = calculateConfidence(response);
  const status = confidence > 0.9
    ? ReviewStatus.AUTO_APPROVED
    : ReviewStatus.PENDING_REVIEW;

  return {
    content: response.content,
    status,
    confidence
  };
}

Testing AI Features

Unit Testing

Test deterministic parts:

describe("AI Service", () => {
  it("should redact PII before sending to API", () => {
    const redacted = redactPII("Call 555-123-4567 for support");
    expect(redacted).toBe("Call ***-***-**** for support");
  });

  it("should handle API failures gracefully", async () => {
    vi.mocked(openai).mockRejectedValueOnce(new Error("API error"));
    const result = await generateSummary("test");
    expect(result).toEqual(fallbackSummary);
  });
});

Integration Testing

Test real AI calls (in CI, use mocked responses):

it("should generate coherent summaries", async () => {
  const summary = await summarizeText(longText);
  expect(summary.length).toBeLessThan(200);
  expect(summary).toIncludeKeyPoints();
});

Evals (LLM Evaluations)

Use LLMs to grade LLM outputs:

async function gradeResponse(query: string, response: string) {
  const gradingPrompt = `
    Grade this response 1-10 on:
    1. Accuracy
    2. Helpfulness
    3. Safety

    Query: ${query}
    Response: ${response}

    Return JSON: {accuracy: number, helpfulness: number, safety: number}
  `;

  const grade = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [{ role: "user", content: gradingPrompt }],
  });

  return JSON.parse(grade.choices[0].message.content);
}

Monitoring & Observability

Track AI-specific metrics:

Metric	Why It Matters
Latency (P50, P95, P99)	User experience
Cost per request	Budget management
Token usage	Capacity planning
Error rate	Reliability
User satisfaction	Quality feedback

import { Histogram } from "prom-client";

const aiLatency = new Histogram({
  name: "ai_request_duration_seconds",
  labelNames: ["model", "operation"],
});

async function trackedAIRequest(model: string, operation: string) {
  const end = aiLatency.startTimer({ model, operation });
  try {
    const result = await makeAIRequest(model, operation);
    end();
    return result;
  } catch (error) {
    end();
    throw error;
  }
}

Cost Optimization

AI gets expensive fast. Strategies:

1. Caching

const cache = new LRUCache<string, string>({ max: 1000 });

async function cachedEmbedding(text: string) {
  const key = crypto.createHash("sha256").update(text).digest("hex");
  const cached = cache.get(key);
  if (cached) return cached;

  const embedding = await getEmbedding(text);
  cache.set(key, embedding);
  return embedding;
}

2. Smaller Models

Not everything needs GPT-4:

Use Case	Recommended Model
Complex reasoning	GPT-4o, Claude 3.5 Sonnet
Simple tasks	GPT-4o-mini, Claude 3.5 Haiku
Embeddings	text-embedding-3-small
Classification	Fine-tuned small model

3. Streaming

Reduce perceived latency:

async function streamingResponse() {
  const stream = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [...],
    stream: true,
  });

  for await (const chunk of stream) {
    process.stdout.write(chunk.choices[0]?.delta?.content || '');
  }
}

Common Pitfalls

Pitfall	Why It Happens	Fix
No fallback when AI fails	Assumes 100% uptime	Always have a non-AI fallback option
Ignoring token costs	Usage scales unpredictably	Implement budget limits, per-user quotas
Missing guardrails	Users probe for vulnerabilities	Add input/output filtering, explicit policies
Poor prompt engineering	Rushing to shipping	Test prompts like you test code
Not measuring quality	No baseline for comparison	Track engagement, satisfaction, error rates
Sending PII to external APIs	Convenience over privacy	Implement PII redaction before API calls
Hallucinations accepted as truth	Trusting AI outputs blindly	Add citations, confidence scores, human review

Planning an AI feature? We’ve implemented production-ready AI features with RAG, guardrails, and cost optimization for SaaS platforms, marketplaces, and internal tools. Our senior architects ensure your AI integration is secure, scalable, and actually delivers value. Let’s discuss your AI use case.

Implementing AI Features: A Practical Framework

Implementing AI Features: A Practical Framework

The AI Feature Framework

1. What Problem Are You Solving?

2. What’s the Worst That Happens?

3. Can You Test It?

Architecture Patterns

Pattern 1: Direct API Integration

Pattern 2: RAG (Retrieval-Augmented Generation)

Pattern 3: Fine-Tuning

Guardrails & Safety

1. Content Moderation

2. PII Redaction

3. Rate Limiting & Budget Controls

4. Human-in-the-Loop

Testing AI Features

Unit Testing

Integration Testing

Evals (LLM Evaluations)

Monitoring & Observability

Cost Optimization

1. Caching

2. Smaller Models

3. Streaming

Common Pitfalls

Related Articles

Claude Code's Source Leak: What Happened and What Teams Should Learn

An npm Release Checklist for Teams Shipping Fast

SLOs and Error Budgets for SaaS Teams

Ready to Start Your Project?