Practice: Deploying a Production-Grade Agent Service

Section Goal: Apply everything learned in this chapter to complete the full workflow from development to deployment of an Agent service.

Project Structure

agent-service/
├── app/
│   ├── __init__.py
│   ├── main.py          # FastAPI entry point
│   ├── agent.py          # Agent core logic
│   ├── config.py         # Configuration management
│   └── middleware.py     # Middleware
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
├── .env.example
└── tests/
    └── test_api.py

Core Code

config.py — Configuration Management

from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    openai_api_key: str
    model_name: str = "gpt-4o"
    redis_url: str = "redis://localhost:6379"
    api_keys: str = ""  # Comma-separated valid API Keys
    max_concurrent: int = 50
    request_timeout: int = 30
    
    class Config:
        env_file = ".env"
        env_prefix = "AGENT_"

settings = Settings()

agent.py — Agent Core

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import HumanMessage, AIMessage
from app.config import settings

class ProductionAgent:
    """Production-grade Agent"""
    
    def __init__(self):
        self.llm = ChatOpenAI(
            model=settings.model_name,
            api_key=settings.openai_api_key,
            streaming=True
        )
        
        self.prompt = ChatPromptTemplate.from_messages([
            ("system", "You are a professional AI assistant. Please answer questions accurately and concisely."),
            MessagesPlaceholder("history"),
            ("human", "{input}")
        ])
        
        self.chain = self.prompt | self.llm
    
    async def run(self, message: str, history: list[dict] = None):
        """Execute Agent (non-streaming)"""
        chat_history = self._build_history(history or [])
        response = await self.chain.ainvoke({
            "input": message,
            "history": chat_history
        })
        return response.content
    
    async def stream(self, message: str, history: list[dict] = None):
        """Execute Agent (streaming)"""
        chat_history = self._build_history(history or [])
        async for chunk in self.chain.astream({
            "input": message,
            "history": chat_history
        }):
            if chunk.content:
                yield chunk.content
    
    def _build_history(self, history: list[dict]):
        """Build conversation history"""
        messages = []
        for msg in history:
            if msg["role"] == "user":
                messages.append(HumanMessage(content=msg["content"]))
            elif msg["role"] == "assistant":
                messages.append(AIMessage(content=msg["content"]))
        return messages

# Global singleton
agent = ProductionAgent()

main.py — API Entry Point

import uuid
import json
import asyncio

from fastapi import FastAPI, HTTPException, Header, Depends
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import StreamingResponse
from pydantic import BaseModel, Field
from typing import Optional

from app.config import settings
from app.agent import agent

app = FastAPI(title="Agent Service", version="1.0.0")

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_methods=["*"],
    allow_headers=["*"],
)

# Concurrency control
semaphore = asyncio.Semaphore(settings.max_concurrent)

# ===== Models =====

class ChatRequest(BaseModel):
    message: str = Field(..., min_length=1, max_length=5000)
    session_id: Optional[str] = None

class ChatResponse(BaseModel):
    reply: str
    session_id: str

# ===== Authentication =====

async def verify_key(x_api_key: str = Header(...)):
    valid = settings.api_keys.split(",")
    if x_api_key not in valid:
        raise HTTPException(401, "Invalid API Key")

# ===== Endpoints =====

@app.get("/health")
async def health():
    return {"status": "healthy"}

@app.post("/chat", response_model=ChatResponse)
async def chat(req: ChatRequest, _=Depends(verify_key)):
    session_id = req.session_id or str(uuid.uuid4())
    
    async with semaphore:
        try:
            reply = await asyncio.wait_for(
                agent.run(req.message),
                timeout=settings.request_timeout
            )
        except asyncio.TimeoutError:
            raise HTTPException(504, "Request timed out")
    
    return ChatResponse(reply=reply, session_id=session_id)

@app.post("/chat/stream")
async def chat_stream(req: ChatRequest, _=Depends(verify_key)):
    async def generate():
        async with semaphore:
            async for token in agent.stream(req.message):
                yield f"data: {json.dumps({'token': token})}\n\n"
            yield f"data: {json.dumps({'done': True})}\n\n"
    
    return StreamingResponse(generate(), media_type="text/event-stream")

if __name__ == "__main__":
    import uvicorn
    uvicorn.run("app.main:app", host="0.0.0.0", port=8000, workers=4)

Deployment Steps

1. Prepare Environment Variables

# ⚠️ The following is the .env.example template
# Copy to .env and fill in real values: cp .env.example .env
# 🔒 Security reminder: .env file must be added to .gitignore, never commit to version control!

AGENT_OPENAI_API_KEY=sk-your-key-here
AGENT_API_KEYS=key1,key2,key3
AGENT_MODEL_NAME=gpt-4o
AGENT_REDIS_URL=redis://redis:6379

2. Build and Start

# Build image and start
docker compose up -d --build

# Check service status
docker compose ps

# Verify health check
curl http://localhost:8000/health

3. Test the API

Test Pyramid

# Regular chat
curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -H "X-API-Key: key1" \
  -d '{"message": "Hello, please introduce Python"}'

# Streaming chat
curl -X POST http://localhost:8000/chat/stream \
  -H "Content-Type: application/json" \
  -H "X-API-Key: key1" \
  -d '{"message": "Tell me a short story"}' \
  --no-buffer

Deployment Checklist

Check Item	Description	✅
Environment variables	API Keys and sensitive info not hardcoded
Health check	/health endpoint returns normally
Authentication	API Key validation is active
Rate limiting	Nginx rate limiting configured correctly
Logging	Request logs recorded normally
Monitoring	Error rate and latency are observable
Backup	Redis data persistence
SSL	HTTPS certificate configured

Summary

Concept	Description
Project structure	Clear layering: config, core, API, middleware
Configuration management	Pydantic Settings + environment variables
Concurrency control	Semaphore + timeout mechanism
Streaming response	SSE real-time push of generation process
Container deployment	Docker Compose one-command startup

🎓 Chapter Summary: From API wrapping to containerized deployment, from streaming responses to concurrency handling, we've completed the full path from "a runnable script" to "a production-grade service." Next, let's enter the comprehensive project section and build real Agent applications!

Next Chapter: Chapter 19 Project Practice: AI Coding Assistant →

Keyboard shortcuts

Learn Agent Development from Scratch