Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Practice: Deploying a Production-Grade Agent Service

Section Goal: Apply everything learned in this chapter to complete the full workflow from development to deployment of an Agent service.


Project Structure

agent-service/
├── app/
│   ├── __init__.py
│   ├── main.py          # FastAPI entry point
│   ├── agent.py          # Agent core logic
│   ├── config.py         # Configuration management
│   └── middleware.py     # Middleware
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
├── .env.example
└── tests/
    └── test_api.py

Core Code

config.py — Configuration Management

from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    openai_api_key: str
    model_name: str = "gpt-4o"
    redis_url: str = "redis://localhost:6379"
    api_keys: str = ""  # Comma-separated valid API Keys
    max_concurrent: int = 50
    request_timeout: int = 30
    
    class Config:
        env_file = ".env"
        env_prefix = "AGENT_"

settings = Settings()

agent.py — Agent Core

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import HumanMessage, AIMessage
from app.config import settings

class ProductionAgent:
    """Production-grade Agent"""
    
    def __init__(self):
        self.llm = ChatOpenAI(
            model=settings.model_name,
            api_key=settings.openai_api_key,
            streaming=True
        )
        
        self.prompt = ChatPromptTemplate.from_messages([
            ("system", "You are a professional AI assistant. Please answer questions accurately and concisely."),
            MessagesPlaceholder("history"),
            ("human", "{input}")
        ])
        
        self.chain = self.prompt | self.llm
    
    async def run(self, message: str, history: list[dict] = None):
        """Execute Agent (non-streaming)"""
        chat_history = self._build_history(history or [])
        response = await self.chain.ainvoke({
            "input": message,
            "history": chat_history
        })
        return response.content
    
    async def stream(self, message: str, history: list[dict] = None):
        """Execute Agent (streaming)"""
        chat_history = self._build_history(history or [])
        async for chunk in self.chain.astream({
            "input": message,
            "history": chat_history
        }):
            if chunk.content:
                yield chunk.content
    
    def _build_history(self, history: list[dict]):
        """Build conversation history"""
        messages = []
        for msg in history:
            if msg["role"] == "user":
                messages.append(HumanMessage(content=msg["content"]))
            elif msg["role"] == "assistant":
                messages.append(AIMessage(content=msg["content"]))
        return messages

# Global singleton
agent = ProductionAgent()

main.py — API Entry Point

import uuid
import json
import asyncio

from fastapi import FastAPI, HTTPException, Header, Depends
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import StreamingResponse
from pydantic import BaseModel, Field
from typing import Optional

from app.config import settings
from app.agent import agent

app = FastAPI(title="Agent Service", version="1.0.0")

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_methods=["*"],
    allow_headers=["*"],
)

# Concurrency control
semaphore = asyncio.Semaphore(settings.max_concurrent)

# ===== Models =====

class ChatRequest(BaseModel):
    message: str = Field(..., min_length=1, max_length=5000)
    session_id: Optional[str] = None

class ChatResponse(BaseModel):
    reply: str
    session_id: str

# ===== Authentication =====

async def verify_key(x_api_key: str = Header(...)):
    valid = settings.api_keys.split(",")
    if x_api_key not in valid:
        raise HTTPException(401, "Invalid API Key")

# ===== Endpoints =====

@app.get("/health")
async def health():
    return {"status": "healthy"}

@app.post("/chat", response_model=ChatResponse)
async def chat(req: ChatRequest, _=Depends(verify_key)):
    session_id = req.session_id or str(uuid.uuid4())
    
    async with semaphore:
        try:
            reply = await asyncio.wait_for(
                agent.run(req.message),
                timeout=settings.request_timeout
            )
        except asyncio.TimeoutError:
            raise HTTPException(504, "Request timed out")
    
    return ChatResponse(reply=reply, session_id=session_id)

@app.post("/chat/stream")
async def chat_stream(req: ChatRequest, _=Depends(verify_key)):
    async def generate():
        async with semaphore:
            async for token in agent.stream(req.message):
                yield f"data: {json.dumps({'token': token})}\n\n"
            yield f"data: {json.dumps({'done': True})}\n\n"
    
    return StreamingResponse(generate(), media_type="text/event-stream")

if __name__ == "__main__":
    import uvicorn
    uvicorn.run("app.main:app", host="0.0.0.0", port=8000, workers=4)

Deployment Steps

1. Prepare Environment Variables

# ⚠️ The following is the .env.example template
# Copy to .env and fill in real values: cp .env.example .env
# 🔒 Security reminder: .env file must be added to .gitignore, never commit to version control!

AGENT_OPENAI_API_KEY=sk-your-key-here
AGENT_API_KEYS=key1,key2,key3
AGENT_MODEL_NAME=gpt-4o
AGENT_REDIS_URL=redis://redis:6379

2. Build and Start

# Build image and start
docker compose up -d --build

# Check service status
docker compose ps

# Verify health check
curl http://localhost:8000/health

3. Test the API

Test Pyramid

# Regular chat
curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -H "X-API-Key: key1" \
  -d '{"message": "Hello, please introduce Python"}'

# Streaming chat
curl -X POST http://localhost:8000/chat/stream \
  -H "Content-Type: application/json" \
  -H "X-API-Key: key1" \
  -d '{"message": "Tell me a short story"}' \
  --no-buffer

Deployment Checklist

Check ItemDescription
Environment variablesAPI Keys and sensitive info not hardcoded
Health check/health endpoint returns normally
AuthenticationAPI Key validation is active
Rate limitingNginx rate limiting configured correctly
LoggingRequest logs recorded normally
MonitoringError rate and latency are observable
BackupRedis data persistence
SSLHTTPS certificate configured

Summary

ConceptDescription
Project structureClear layering: config, core, API, middleware
Configuration managementPydantic Settings + environment variables
Concurrency controlSemaphore + timeout mechanism
Streaming responseSSE real-time push of generation process
Container deploymentDocker Compose one-command startup

🎓 Chapter Summary: From API wrapping to containerized deployment, from streaming responses to concurrency handling, we've completed the full path from "a runnable script" to "a production-grade service." Next, let's enter the comprehensive project section and build real Agent applications!


Next Chapter: Chapter 19 Project Practice: AI Coding Assistant →