MCP in Production

Quick Recap

Throughout this series, we started with the basics of MCP, explored its architecture, built servers, connected them to clients, and covered security best practices. Now it’s time to answer a crucial question: how do you take all of this into a real production environment?

Local development and production are two different worlds. Locally, you spin up a server with npx and call it a day. But when you need a system running 24/7, handling multiple users, and dealing with real data, a whole new set of challenges emerges.

Who is this for?

This episode is mainly for those who want to use MCP in real-world projects. If you’re setting up MCP just for personal use, feel free to skip the architecture and scaling sections for now.

From Development to Deployment

Before diving in, let me give you a high-level picture. When you want to bring an MCP Server to production, here are the stages:

Local development and testing: You build the server and test it locally (you already know this part)
Packaging: You prepare the server for deployment (Docker, npm package, or binary)
Deployment: You deploy the server to a machine or cloud service
Configuration: You set up environment variables, secrets, and connections
Monitoring: You make sure everything is running smoothly

Analogy

The difference between local development and production is like the difference between home cooking and running a restaurant. At home, you cook and eat. But a restaurant needs a menu, a professional kitchen, consistent ingredients, and the ability to serve 100 people simultaneously. MCP in production works the same way.

Packaging with Docker

The most popular way to deploy MCP Servers is Docker. Why? Because:

Isolation: The server runs in its own container with no access to the rest of the system
Reproducibility: The exact same image you tested locally runs in production
Scalability: You can run multiple instances of the same server simultaneously

A simple Dockerfile for an MCP Server might look like this:

FROM node:20-slim
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY . .
EXPOSE 3000
CMD ["node", "server.js"]

The key rule here: never put secrets inside a Docker image. Use environment variables and pass them when running the container.

Warning

If you’re building a local MCP Server (stdio), you don’t need Docker. Docker is for when you’re deploying a remote server (SSE/HTTP).

Docker Compose for Multiple Servers

When you have several MCP Servers working together, Docker Compose is excellent. Imagine you have a database server, a file server, and an email server:

version: "3.8"
services:
  mcp-database:
    build: ./servers/database
    environment:
      - DB_HOST=postgres
      - DB_PASSWORD_FILE=/run/secrets/db_pass
    secrets:
      - db_pass

  mcp-files:
    build: ./servers/files
    volumes:
      - ./shared-data:/data:ro

  mcp-email:
    build: ./servers/email
    environment:
      - SMTP_HOST=smtp.example.com

secrets:
  db_pass:
    file: ./secrets/db_password.txt

Notice the file server is mounted with :ro (read-only). This is a practical example of the principle of least privilege we covered in the previous episode.

Hosting on the Cloud

Once Docker is ready, the next step is deciding where to run it. Here are some popular options:

1. Simple VPS (DigitalOcean, Hetzner, etc.)

The simplest approach: get a VPS, install Docker, and run your container. The advantage is simplicity and full control. The downside is that you handle scaling and server management yourself.

This works great for small to medium projects.

2. Container Services (AWS ECS, Google Cloud Run, Azure Container Instances)

If you don’t want to manage servers, you can run containers directly on cloud services. These services handle scaling, restarts, and resource management for you.

Google Cloud Run is particularly well-suited for HTTP-based MCP Servers because you only pay when there are active requests (pay-per-request).

3. Kubernetes

For large, enterprise-grade projects, Kubernetes is the best choice. But it comes with more complexity. If your team is small and you’re just getting started, begin with simpler options.

Recommendation

Start with the simplest solution. A basic VPS with Docker is often more than enough. When you genuinely need scaling, migrate later. Premature optimization is the enemy of progress.

Scaling

When the number of users or requests grows, a single instance of your MCP Server won’t cut it. Let’s look at the challenges and how to solve them.

Horizontal Scaling

The simplest scaling strategy is running multiple instances of your server and distributing traffic between them (load balancing). This works well for HTTP-based MCP Servers.

But here’s the catch: if your server maintains state, scaling becomes trickier. For example, if it holds a database connection session, you need to ensure subsequent requests go to the same instance (sticky sessions) or share state across instances.

Tip

Best practice: Build your MCP Servers to be as stateless as possible. Each request should be independent. If you need state, store it in an external service like Redis.

Connection Pooling

If your MCP Server connects to a database, always use a connection pool. Instead of creating and closing a new connection for every request, maintain a pool of ready connections and reuse them.

This both speeds things up and reduces pressure on the database.

Caching

Some MCP tools return repetitive results. For instance, if a server reads a folder’s file list and the files rarely change, you can cache the result and serve it faster next time.

But be careful: cache invalidation is one of the hardest problems in software engineering. Only cache where you’re confident the data doesn’t change frequently.

Error Handling

In production, errors will happen. The question is how your system handles them.

Core Principles

1. Never swallow errors. Every error should be logged, and if necessary, the user should be notified. Never ignore an error silently.

2. Useful error messages. “Something went wrong” is useless. “Cannot connect to database — PostgreSQL on port 5432 is not responding” is much better.

3. Retry with caution. Some errors are transient (like network timeouts). Retrying makes sense for these. But use exponential backoff — increase the delay between retries each time (1 second, 2 seconds, 4 seconds…). This prevents overwhelming the service.

4. Circuit Breaker. If an external service is struggling, it’s better to temporarily stop calling it after several failures. For example, if an external API times out 5 times in a row, stop sending requests for 30 seconds and then try again.

Analogy

A Circuit Breaker is like a fuse in your house’s electrical panel. When something goes wrong, the fuse trips to prevent further damage. Once the problem is fixed, you switch it back on.

Graceful Degradation

Your system should be designed so that if one part fails, the entire system doesn’t go down. For instance, if the Slack server has issues, the other servers (database, files) should keep working.

The AI model should also handle missing tools gracefully. If the email tool is unavailable, it can say “I can’t send the email right now, but the report is ready” instead of crashing entirely.

Monitoring in Production

In the security episode, we talked about logging. But production monitoring goes beyond logs:

Key Metrics

Latency: How long each tool takes to respond
Error Rate: What percentage of requests fail
Resource Usage: CPU, RAM, database connections
Throughput: How many requests per second are processed
Health Status: Whether the server is alive and healthy

Health Check Endpoint

Every MCP Server should have a health check endpoint — a simple URL (like /health) that reports “I’m healthy” or “I have a problem.” Monitoring services and load balancers use this endpoint.

A good health check doesn’t just say “the server is up.” It should verify:

Is the database connection active?
Are external API connections responding?
Does the disk have enough space?

Monitoring Tools

You can use standard tools for monitoring MCP Servers:

Prometheus + Grafana: For metrics and dashboards
ELK Stack (Elasticsearch, Logstash, Kibana): For logs
Sentry: For error tracking
Uptime Robot / Healthchecks.io: For simple uptime monitoring

Tip

Start simple. Even a bash script that checks the health endpoint every 5 minutes is better than nothing. Add more sophisticated tools later.

Real-World Architecture Patterns

Let’s explore some architecture patterns used in real projects.

Pattern 1: Gateway Pattern

You place a gateway layer between clients and MCP Servers. This gateway handles shared responsibilities like authentication, rate limiting, and logging. The servers focus solely on their core functionality.

Advantage: You write shared logic in one place. Servers stay simpler.

Pattern 2: Sidecar Pattern

Each MCP Server has a “companion” (sidecar) responsible for logging, monitoring, and secure communications. The main server only communicates with its sidecar, and the sidecar talks to the outside world.

This pattern is very common in Kubernetes and works well for large environments.

Pattern 3: Serverless

You deploy each MCP Tool as a separate function (like AWS Lambda or Google Cloud Functions). You only pay when there are requests, and scaling is automatic.

This is ideal for tools with irregular usage patterns (like a weekly reporting tool).

Which one should I choose?

Personal/small project: Docker on a simple VPS
Medium project: Docker Compose + Gateway Pattern
Large/enterprise project: Kubernetes + Sidecar + full monitoring
Scattered tools: Serverless

A Real-World Example: Support Team Architecture

Let me walk through a complete example. Imagine you want to build an MCP system for your company’s support team:

Servers:

CRM Server: Access to customer information (read-only)
Ticket Server: Read and update support tickets
Knowledge Base Server: Search internal documentation
Email Server: Send replies to customers

Architecture:

An API Gateway with JWT authentication in front of all servers
Each server in a separate Docker container
Rate limiting: Maximum 50 requests per minute per support agent
Logging: All requests logged in ELK Stack
Health checks: Every 30 seconds
Human approval: Sending emails requires manager approval

A support agent types in Claude Desktop: “Customer John Smith reported a billing issue. Find his ticket, check his latest invoice, and draft a professional response email.”

Claude pulls customer info from the CRM server, finds the relevant ticket, reads the response template from the Knowledge Base, and prepares an email for approval. After the agent confirms, the email is sent.

Final Production Tips

A few last things that are critical in production:

1. Backups: Back up your MCP configurations and logs. If a server goes down, you need to recover quickly.

2. Documentation: Document which servers are active, what access they have, and how they’re configured. You won’t remember next month.

3. Testing: Before any update, test in a staging environment. Never modify production directly.

4. Rollback Plan: Always have a previous version ready. If an update goes wrong, you need to roll back fast.

5. Cost Monitoring: Especially with cloud services, monitor your costs. An MCP Server calling a paid API non-stop can generate a hefty bill.

Series Summary So Far

Congratulations on reaching the end of this part of the MCP series! Here’s a summary of what you’ve learned:

Earlier episodes: What MCP is, Client-Server architecture, building servers, three core capabilities
Episode 5: Connecting to Claude Desktop and Claude Code, the open MCP ecosystem
Episode 6: Security risks and five protection principles
Episode 7 (this one): Docker, Cloud, Scaling, Error Handling, Monitoring, and architecture patterns

MCP is still a young technology that’s evolving rapidly. The best thing you can do is start — build a simple server, connect it, and use it. The more you practice, the better you’ll understand where it fits in your workflow.

What’s Next

In the next episode, we’ll explore how to connect MCP to databases — building a server that gives AI models safe, controlled access to your data. Stay tuned!