Production Features
AA Kit comes with 10 built-in enterprise features that make your agents production-ready from day one.
Enterprise-Ready by Default
Every AA Kit agent includes production features that typically take months to implement. From rate limiting to circuit breakers, your agents are protected and performant out of the box.
Built-in Features
Rate Limiting
Protect your agents from abuse and control usage
from aakit import Agent, RateLimiter
# Configure rate limiting
rate_limiter = RateLimiter(
max_requests_per_minute=60,
max_requests_per_hour=1000,
max_tokens_per_minute=100000,
burst_size=10
)
agent = Agent(
name="protected_agent",
instruction="You are a helpful assistant",
model="gpt-4",
rate_limiter=rate_limiter
)
# Rate limits are automatically enforced
try:
response = await agent.chat("Hello")
except RateLimitExceeded as e:
print(f"Rate limit exceeded: {e.retry_after} seconds")Response Caching
Cache responses to reduce latency and costs
from aakit import Agent, CacheConfig
# Enable intelligent caching
cache_config = CacheConfig(
enabled=True,
ttl=3600, # 1 hour
max_size=1000, # Maximum cache entries
cache_key_strategy="semantic", # Smart key generation
backend="redis" # or "memory", "sqlite"
)
agent = Agent(
name="cached_agent",
instruction="You provide information",
model="gpt-4",
cache_config=cache_config
)
# Identical queries are served from cache
response1 = await agent.chat("What is Python?") # API call
response2 = await agent.chat("What is Python?") # From cacheCircuit Breakers
Prevent cascading failures with smart circuit breaking
from aakit import Agent, CircuitBreaker
# Configure circuit breaker
circuit_breaker = CircuitBreaker(
failure_threshold=5, # Failures before opening
recovery_timeout=60, # Seconds before retry
expected_exception=APIError,
fallback_response="Service temporarily unavailable"
)
agent = Agent(
name="resilient_agent",
instruction="You are always available",
model="gpt-4",
circuit_breaker=circuit_breaker
)
# Circuit breaker protects against failures
response = await agent.chat("Hello")
# Returns fallback if circuit is openRetry Logic
Automatic retries with exponential backoff
from aakit import Agent, RetryConfig
# Configure retry behavior
retry_config = RetryConfig(
max_attempts=3,
initial_delay=1, # seconds
exponential_base=2,
max_delay=30,
retry_on=[TimeoutError, APIError],
before_retry=lambda attempt: print(f"Retry #{attempt}")
)
agent = Agent(
name="persistent_agent",
instruction="You handle tasks reliably",
model="gpt-4",
retry_config=retry_config
)
# Automatic retries on failure
response = await agent.chat("Process this request")Request Validation
Validate and sanitize all inputs
from aakit import Agent, ValidationRules
# Define validation rules
validation = ValidationRules(
max_message_length=1000,
allowed_languages=["en", "es", "fr"],
block_patterns=[r"password", r"secret"],
content_filter="strict",
require_session_id=True
)
agent = Agent(
name="secure_agent",
instruction="You handle sensitive data",
model="gpt-4",
validation_rules=validation
)
# Invalid requests are rejected
try:
response = await agent.chat("Tell me the password")
except ValidationError as e:
print(f"Request blocked: {e.reason}")Performance Monitoring
Track metrics and performance in real-time
from aakit import Agent, Monitoring
# Enable comprehensive monitoring
monitoring = Monitoring(
track_latency=True,
track_tokens=True,
track_costs=True,
track_errors=True,
export_to="prometheus", # or "datadog", "cloudwatch"
sampling_rate=1.0
)
agent = Agent(
name="monitored_agent",
instruction="You are observable",
model="gpt-4",
monitoring=monitoring
)
# Access metrics
metrics = agent.get_metrics()
print(f"P95 latency: {metrics.latency_p95}ms")
print(f"Total tokens: {metrics.total_tokens}")
print(f"Error rate: {metrics.error_rate}%")Load Balancing
Distribute requests across multiple models
from aakit import Agent, LoadBalancer
# Configure load balancing
load_balancer = LoadBalancer(
models=[
{"name": "gpt-4", "weight": 0.3, "max_rps": 10},
{"name": "gpt-3.5-turbo", "weight": 0.7, "max_rps": 100}
],
strategy="weighted_round_robin", # or "least_latency"
health_check_interval=30
)
agent = Agent(
name="balanced_agent",
instruction="You distribute load efficiently",
model=load_balancer
)
# Requests are automatically distributed
response = await agent.chat("Hello")Secure Credential Management
Manage API keys and secrets securely
from aakit import Agent, CredentialManager
# Use secure credential storage
credentials = CredentialManager(
provider="aws_secrets", # or "vault", "env", "keyring"
auto_rotate=True,
rotation_interval=86400, # 24 hours
encryption_key="your-encryption-key"
)
agent = Agent(
name="secure_agent",
instruction="You handle secrets safely",
model="gpt-4",
credentials=credentials
)
# Credentials are automatically managed
# No hardcoded API keys in code!Health Checks
Automatic health monitoring and recovery
from aakit import Agent, HealthCheck
# Configure health checks
health_check = HealthCheck(
endpoint="/health",
interval=30, # seconds
timeout=5,
checks=[
"model_availability",
"memory_backend",
"tool_connectivity",
"rate_limit_status"
]
)
agent = Agent(
name="healthy_agent",
instruction="You monitor your own health",
model="gpt-4",
health_check=health_check
)
# Check agent health
health_status = agent.health_status()
print(f"Health: {health_status.status}")
print(f"Uptime: {health_status.uptime}")Request Queuing
Queue requests during high load
from aakit import Agent, QueueConfig
# Configure request queuing
queue_config = QueueConfig(
max_queue_size=1000,
queue_timeout=30, # seconds
priority_field="priority",
backend="redis", # or "memory", "rabbitmq"
process_batch_size=10
)
agent = Agent(
name="queued_agent",
instruction="You handle high load gracefully",
model="gpt-4",
queue_config=queue_config
)
# High priority requests
response = await agent.chat(
"Urgent request",
metadata={"priority": 10}
)Feature Combinations
from aakit import Agent, ProductionConfig
# Enable all production features with one config
production_config = ProductionConfig(
rate_limiting=True,
caching=True,
circuit_breaker=True,
monitoring=True,
health_checks=True,
secure_credentials=True,
request_validation=True,
error_tracking=True,
load_balancing=True,
request_queuing=True
)
# Create a production-ready agent
agent = Agent(
name="production_agent",
instruction="You are a production-ready assistant",
model="gpt-4",
production_config=production_config
)
# All features work together seamlessly
response = await agent.chat("Hello, production!")Configuration Profiles
Development
- • Verbose logging
- • No rate limiting
- • In-memory caching
- • Fast fail on errors
profile="development"Staging
- • Moderate rate limits
- • Redis caching
- • Error recovery
- • Basic monitoring
profile="staging"Production
- • Strict rate limits
- • Distributed caching
- • Full monitoring
- • Auto-recovery
profile="production"Monitoring & Observability
Built-in Metrics
Performance Metrics
- • Request latency (p50, p95, p99)
- • Token usage and costs
- • Cache hit rates
- • Model response times
Reliability Metrics
- • Error rates by type
- • Circuit breaker status
- • Rate limit utilization
- • Uptime and availability
Best Practices
- • Start with conservative rate limits and adjust based on usage
- • Enable caching for repetitive queries to reduce costs
- • Use circuit breakers to prevent cascade failures
- • Monitor metrics to identify optimization opportunities
- • Test your production config in staging first
Integration Examples
Prometheus Integration
Export metrics to Prometheus for visualization in Grafana:
monitoring.export_to="prometheus"
monitoring.endpoint="/metrics"
monitoring.port=9090Datadog APM
Send traces and metrics to Datadog:
monitoring.export_to="datadog"
monitoring.api_key=credentials.get("DD_API_KEY")
monitoring.service_name="aa-kit-prod"Next Steps
Learn how to build sophisticated multi-agent systems that work together.
Continue to Multi-Agent Systems →