Production Features

AA Kit comes with 10 built-in enterprise features that make your agents production-ready from day one.

Enterprise-Ready by Default

Every AA Kit agent includes production features that typically take months to implement. From rate limiting to circuit breakers, your agents are protected and performant out of the box.

Built-in Features

Rate Limiting

Protect your agents from abuse and control usage

python
from aakit import Agent, RateLimiter

# Configure rate limiting
rate_limiter = RateLimiter(
    max_requests_per_minute=60,
    max_requests_per_hour=1000,
    max_tokens_per_minute=100000,
    burst_size=10
)

agent = Agent(
    name="protected_agent",
    instruction="You are a helpful assistant",
    model="gpt-4",
    rate_limiter=rate_limiter
)

# Rate limits are automatically enforced
try:
    response = await agent.chat("Hello")
except RateLimitExceeded as e:
    print(f"Rate limit exceeded: {e.retry_after} seconds")

Response Caching

Cache responses to reduce latency and costs

python
from aakit import Agent, CacheConfig

# Enable intelligent caching
cache_config = CacheConfig(
    enabled=True,
    ttl=3600,  # 1 hour
    max_size=1000,  # Maximum cache entries
    cache_key_strategy="semantic",  # Smart key generation
    backend="redis"  # or "memory", "sqlite"
)

agent = Agent(
    name="cached_agent",
    instruction="You provide information",
    model="gpt-4",
    cache_config=cache_config
)

# Identical queries are served from cache
response1 = await agent.chat("What is Python?")  # API call
response2 = await agent.chat("What is Python?")  # From cache

Circuit Breakers

Prevent cascading failures with smart circuit breaking

python
from aakit import Agent, CircuitBreaker

# Configure circuit breaker
circuit_breaker = CircuitBreaker(
    failure_threshold=5,  # Failures before opening
    recovery_timeout=60,  # Seconds before retry
    expected_exception=APIError,
    fallback_response="Service temporarily unavailable"
)

agent = Agent(
    name="resilient_agent",
    instruction="You are always available",
    model="gpt-4",
    circuit_breaker=circuit_breaker
)

# Circuit breaker protects against failures
response = await agent.chat("Hello")
# Returns fallback if circuit is open

Retry Logic

Automatic retries with exponential backoff

python
from aakit import Agent, RetryConfig

# Configure retry behavior
retry_config = RetryConfig(
    max_attempts=3,
    initial_delay=1,  # seconds
    exponential_base=2,
    max_delay=30,
    retry_on=[TimeoutError, APIError],
    before_retry=lambda attempt: print(f"Retry #{attempt}")
)

agent = Agent(
    name="persistent_agent",
    instruction="You handle tasks reliably",
    model="gpt-4",
    retry_config=retry_config
)

# Automatic retries on failure
response = await agent.chat("Process this request")

Request Validation

Validate and sanitize all inputs

python
from aakit import Agent, ValidationRules

# Define validation rules
validation = ValidationRules(
    max_message_length=1000,
    allowed_languages=["en", "es", "fr"],
    block_patterns=[r"password", r"secret"],
    content_filter="strict",
    require_session_id=True
)

agent = Agent(
    name="secure_agent",
    instruction="You handle sensitive data",
    model="gpt-4",
    validation_rules=validation
)

# Invalid requests are rejected
try:
    response = await agent.chat("Tell me the password")
except ValidationError as e:
    print(f"Request blocked: {e.reason}")

Performance Monitoring

Track metrics and performance in real-time

python
from aakit import Agent, Monitoring

# Enable comprehensive monitoring
monitoring = Monitoring(
    track_latency=True,
    track_tokens=True,
    track_costs=True,
    track_errors=True,
    export_to="prometheus",  # or "datadog", "cloudwatch"
    sampling_rate=1.0
)

agent = Agent(
    name="monitored_agent",
    instruction="You are observable",
    model="gpt-4",
    monitoring=monitoring
)

# Access metrics
metrics = agent.get_metrics()
print(f"P95 latency: {metrics.latency_p95}ms")
print(f"Total tokens: {metrics.total_tokens}")
print(f"Error rate: {metrics.error_rate}%")

Load Balancing

Distribute requests across multiple models

python
from aakit import Agent, LoadBalancer

# Configure load balancing
load_balancer = LoadBalancer(
    models=[
        {"name": "gpt-4", "weight": 0.3, "max_rps": 10},
        {"name": "gpt-3.5-turbo", "weight": 0.7, "max_rps": 100}
    ],
    strategy="weighted_round_robin",  # or "least_latency"
    health_check_interval=30
)

agent = Agent(
    name="balanced_agent",
    instruction="You distribute load efficiently",
    model=load_balancer
)

# Requests are automatically distributed
response = await agent.chat("Hello")

Secure Credential Management

Manage API keys and secrets securely

python
from aakit import Agent, CredentialManager

# Use secure credential storage
credentials = CredentialManager(
    provider="aws_secrets",  # or "vault", "env", "keyring"
    auto_rotate=True,
    rotation_interval=86400,  # 24 hours
    encryption_key="your-encryption-key"
)

agent = Agent(
    name="secure_agent",
    instruction="You handle secrets safely",
    model="gpt-4",
    credentials=credentials
)

# Credentials are automatically managed
# No hardcoded API keys in code!

Health Checks

Automatic health monitoring and recovery

python
from aakit import Agent, HealthCheck

# Configure health checks
health_check = HealthCheck(
    endpoint="/health",
    interval=30,  # seconds
    timeout=5,
    checks=[
        "model_availability",
        "memory_backend",
        "tool_connectivity",
        "rate_limit_status"
    ]
)

agent = Agent(
    name="healthy_agent",
    instruction="You monitor your own health",
    model="gpt-4",
    health_check=health_check
)

# Check agent health
health_status = agent.health_status()
print(f"Health: {health_status.status}")
print(f"Uptime: {health_status.uptime}")

Request Queuing

Queue requests during high load

python
from aakit import Agent, QueueConfig

# Configure request queuing
queue_config = QueueConfig(
    max_queue_size=1000,
    queue_timeout=30,  # seconds
    priority_field="priority",
    backend="redis",  # or "memory", "rabbitmq"
    process_batch_size=10
)

agent = Agent(
    name="queued_agent",
    instruction="You handle high load gracefully",
    model="gpt-4",
    queue_config=queue_config
)

# High priority requests
response = await agent.chat(
    "Urgent request",
    metadata={"priority": 10}
)

Feature Combinations

python
from aakit import Agent, ProductionConfig

# Enable all production features with one config
production_config = ProductionConfig(
    rate_limiting=True,
    caching=True,
    circuit_breaker=True,
    monitoring=True,
    health_checks=True,
    secure_credentials=True,
    request_validation=True,
    error_tracking=True,
    load_balancing=True,
    request_queuing=True
)

# Create a production-ready agent
agent = Agent(
    name="production_agent",
    instruction="You are a production-ready assistant",
    model="gpt-4",
    production_config=production_config
)

# All features work together seamlessly
response = await agent.chat("Hello, production!")

Configuration Profiles

Development

  • • Verbose logging
  • • No rate limiting
  • • In-memory caching
  • • Fast fail on errors
profile="development"

Staging

  • • Moderate rate limits
  • • Redis caching
  • • Error recovery
  • • Basic monitoring
profile="staging"

Production

  • • Strict rate limits
  • • Distributed caching
  • • Full monitoring
  • • Auto-recovery
profile="production"

Monitoring & Observability

Built-in Metrics

Performance Metrics

  • • Request latency (p50, p95, p99)
  • • Token usage and costs
  • • Cache hit rates
  • • Model response times

Reliability Metrics

  • • Error rates by type
  • • Circuit breaker status
  • • Rate limit utilization
  • • Uptime and availability

Best Practices

  • • Start with conservative rate limits and adjust based on usage
  • • Enable caching for repetitive queries to reduce costs
  • • Use circuit breakers to prevent cascade failures
  • • Monitor metrics to identify optimization opportunities
  • • Test your production config in staging first

Integration Examples

Prometheus Integration

Export metrics to Prometheus for visualization in Grafana:

monitoring.export_to="prometheus"
monitoring.endpoint="/metrics"
monitoring.port=9090

Datadog APM

Send traces and metrics to Datadog:

monitoring.export_to="datadog"
monitoring.api_key=credentials.get("DD_API_KEY")
monitoring.service_name="aa-kit-prod"

Next Steps

Learn how to build sophisticated multi-agent systems that work together.

Continue to Multi-Agent Systems →