Implementing AI Features: A Practical Framework
AI January 23, 2026

Implementing AI Features: A Practical Framework

How to add LLM and AI features to your product without the chaos.

J

Jason Overmier

Innovative Prospects Team

Implementing AI Features: A Practical Framework

Everyone’s adding AI to their products. But most AI features fail—not because the technology doesn’t work, but because they’re poorly planned, poorly tested, and poorly governed.

This is how to do AI features right.

The AI Feature Framework

Before writing code, answer these questions:

1. What Problem Are You Solving?

AI isn’t a solution looking for a problem. Validate that:

  • Users actually want this feature
  • AI is the best solution (not rules, templates, etc.)
  • Success is measurable (engagement, time saved, etc.)

Red flag: “We need AI to stay competitive” isn’t a use case.

2. What’s the Worst That Happens?

AI makes mistakes. Plan for it:

RiskMitigation
HallucinationsCitations, confidence scores, human review
BiasDiverse training data, regular audits
CostBudget limits, caching, cheaper models
LatencyStreaming responses, optimistic UI
Data privacyPII redaction, on-premise options

3. Can You Test It?

AI is probabilistic. Testing is harder:

// ❌ Bad - Exact match testing
expect(aiResponse).toBe("The capital of France is Paris");

// ✅ Good - Semantic testing
expect(aiResponse).toInclude("Paris");
expect(aiResponse).toInclude("France");
expect(aiResponse).toSatisfySafetyGuidelines();

Architecture Patterns

Pattern 1: Direct API Integration

Simplest—call OpenAI/Anthropic directly from your backend.

import OpenAI from "openai";

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function summarizeText(text: string) {
  const response = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [
      { role: "system", content: "You are a helpful assistant." },
      { role: "user", content: `Summarize: ${text}` },
    ],
    max_tokens: 150,
  });

  return response.choices[0].message.content;
}

Pros: Fast to implement, no infrastructure Cons: API costs, latency, vendor lock-in

Pattern 2: RAG (Retrieval-Augmented Generation)

Ground AI responses in your data:

User Query

Embed (vectorize) query

Search vector database for similar documents

Add retrieved context to prompt

LLM generates response with citations

Tools: Pinecone, Weaviate, pgvector, LangChain.

import { OpenAIEmbeddings } from "langchain/embeddings/openai";
import { PineconeStore } from "langchain/vectorstores/pinecone";

async function answerQuestion(query: string) {
  // Search for relevant documents
  const vectorStore = await PineconeStore.fromExistingIndex(
    new OpenAIEmbeddings(),
    { pineconeIndex },
  );

  const results = await vectorStore.similaritySearch(query, 3);

  // Generate answer with context
  const context = results.map((r) => r.pageContent).join("\n");
  const response = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [
      { role: "system", content: `Answer using this context:\n${context}` },
      { role: "user", content: query },
    ],
  });

  return {
    answer: response.choices[0].message.content,
    sources: results.map((r) => r.metadata),
  };
}

Pattern 3: Fine-Tuning

For specialized domains, train custom models:

  • When: Domain-specific language (legal, medical), specific format requirements
  • Trade-off: More expensive, requires ML expertise
  • Alternative: RAG is often sufficient

Guardrails & Safety

1. Content Moderation

Filter inputs and outputs:

import { moderation } from '@openai/moderation';

async function safeCompletion(userInput: string) {
  // Check input
  const moderationResult = await moderation.check(userInput);
  if (moderationResult.flagged) {
    throw new Error('Content violates policy');
  }

  // Get completion
  const response = await openai.chat.completions.create({...});

  // Check output
  const outputModeration = await moderation.check(response.content);
  if (outputModeration.flagged) {
    return fallbackResponse;
  }

  return response.content;
}

2. PII Redaction

Never send sensitive data to external APIs:

function redactPII(text: string) {
  // Remove or redact:
  // - SSNs, credit cards, emails
  // - Names, addresses (optional)
  // - Medical records, financial data

  return text
    .replace(/\d{3}-\d{2}-\d{4}/g, "***-**-****") // SSN
    .replace(/\d{4}-\d{4}-\d{4}-\d{4}/g, "****-****-****-****") // Credit card
    .replace(
      /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g,
      "***@***.***",
    ); // Email
}

3. Rate Limiting & Budget Controls

AI costs scale with usage. Protect against abuse:

import rateLimit from "express-rate-limit";

const aiRateLimit = rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: 10, // 10 requests per minute
  message: "Too many AI requests, please try again later",
});

app.post("/api/ai/generate", aiRateLimit, async (req, res) => {
  // Your AI logic here
});

4. Human-in-the-Loop

For high-stakes outputs, add review:

enum ReviewStatus {
  AUTO_APPROVED = 'auto_approved',
  PENDING_REVIEW = 'pending_review',
  REJECTED = 'rejected'
}

async function generateContent(prompt: string) {
  const response = await openai.chat.completions.create({...});

  const confidence = calculateConfidence(response);
  const status = confidence > 0.9
    ? ReviewStatus.AUTO_APPROVED
    : ReviewStatus.PENDING_REVIEW;

  return {
    content: response.content,
    status,
    confidence
  };
}

Testing AI Features

Unit Testing

Test deterministic parts:

describe("AI Service", () => {
  it("should redact PII before sending to API", () => {
    const redacted = redactPII("Call 555-123-4567 for support");
    expect(redacted).toBe("Call ***-***-**** for support");
  });

  it("should handle API failures gracefully", async () => {
    vi.mocked(openai).mockRejectedValueOnce(new Error("API error"));
    const result = await generateSummary("test");
    expect(result).toEqual(fallbackSummary);
  });
});

Integration Testing

Test real AI calls (in CI, use mocked responses):

it("should generate coherent summaries", async () => {
  const summary = await summarizeText(longText);
  expect(summary.length).toBeLessThan(200);
  expect(summary).toIncludeKeyPoints();
});

Evals (LLM Evaluations)

Use LLMs to grade LLM outputs:

async function gradeResponse(query: string, response: string) {
  const gradingPrompt = `
    Grade this response 1-10 on:
    1. Accuracy
    2. Helpfulness
    3. Safety

    Query: ${query}
    Response: ${response}

    Return JSON: {accuracy: number, helpfulness: number, safety: number}
  `;

  const grade = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [{ role: "user", content: gradingPrompt }],
  });

  return JSON.parse(grade.choices[0].message.content);
}

Monitoring & Observability

Track AI-specific metrics:

MetricWhy It Matters
Latency (P50, P95, P99)User experience
Cost per requestBudget management
Token usageCapacity planning
Error rateReliability
User satisfactionQuality feedback
import { Histogram } from "prom-client";

const aiLatency = new Histogram({
  name: "ai_request_duration_seconds",
  labelNames: ["model", "operation"],
});

async function trackedAIRequest(model: string, operation: string) {
  const end = aiLatency.startTimer({ model, operation });
  try {
    const result = await makeAIRequest(model, operation);
    end();
    return result;
  } catch (error) {
    end();
    throw error;
  }
}

Cost Optimization

AI gets expensive fast. Strategies:

1. Caching

const cache = new LRUCache<string, string>({ max: 1000 });

async function cachedEmbedding(text: string) {
  const key = crypto.createHash("sha256").update(text).digest("hex");
  const cached = cache.get(key);
  if (cached) return cached;

  const embedding = await getEmbedding(text);
  cache.set(key, embedding);
  return embedding;
}

2. Smaller Models

Not everything needs GPT-4:

Use CaseRecommended Model
Complex reasoningGPT-4o, Claude 3.5 Sonnet
Simple tasksGPT-4o-mini, Claude 3.5 Haiku
Embeddingstext-embedding-3-small
ClassificationFine-tuned small model

3. Streaming

Reduce perceived latency:

async function streamingResponse() {
  const stream = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [...],
    stream: true,
  });

  for await (const chunk of stream) {
    process.stdout.write(chunk.choices[0]?.delta?.content || '');
  }
}

Common Pitfalls

PitfallWhy It HappensFix
No fallback when AI failsAssumes 100% uptimeAlways have a non-AI fallback option
Ignoring token costsUsage scales unpredictablyImplement budget limits, per-user quotas
Missing guardrailsUsers probe for vulnerabilitiesAdd input/output filtering, explicit policies
Poor prompt engineeringRushing to shippingTest prompts like you test code
Not measuring qualityNo baseline for comparisonTrack engagement, satisfaction, error rates
Sending PII to external APIsConvenience over privacyImplement PII redaction before API calls
Hallucinations accepted as truthTrusting AI outputs blindlyAdd citations, confidence scores, human review

Planning an AI feature? We’ve implemented production-ready AI features with RAG, guardrails, and cost optimization for SaaS platforms, marketplaces, and internal tools. Our senior architects ensure your AI integration is secure, scalable, and actually delivers value. Let’s discuss your AI use case.

Ready to Start Your Project?

Let's discuss how we can help bring your vision to life.

Book a Consultation