Incorporating Custom Metrics in Python Flask E-Commerce Apps with Prometheus & Grafana


Monitoring is essential in modern applications to ensure system health, detect issues, and optimize performance. I have integrated Prometheus metrics using Python and Flask to achieve robust observability in our E-Commerce Application. These metrics provide insights into system behavior, performance, and user activity, which are visualized using Grafana Dashboards.

Technology Stack

  • Prometheus: Metrics collection and storage
  • Grafana: Visualization and dashboards
  • Custom Python decorators: Metric instrumentation
  • Kubernetes: Deployment and service monitoring

Why These Metrics Matter for SREs and Business Success?

For SREs (Site Reliability Engineers):
These metrics act as the application’s heartbeat, helping SREs ensure high availability, reliability, and performance.
API Metrics help detect slow responses, errors, and bottlenecks before users complain.
Database Metrics ensure queries run efficiently and prevent downtime due to overloaded connections.
User Metrics track logins and active sessions, helping identify potential security issues or unusual spikes.
Health Checks enable automated recovery — Kubernetes or monitoring tools can restart failing services instantly.

For Business Decision-Making:
Metrics aren’t just technical — they drive business success!
💰 Order Tracking helps analyze customer purchasing behavior and identify checkout issues.
🛒 Cart Insights show why users abandon carts, guiding UX improvements.
📊 Performance Data ensures a fast, smooth experience, reducing drop-off rates and boosting sales.
🚦 Live Monitoring allows proactive issue resolution, keeping customers happy and engaged.


Python Flask E-Commerce Application Deployment in EKS:

https://github.com/SubbuTechOps/python-app-docker-compose.git

Prometheus & Grafana Setup Guide for EKS:

https://github.com/SubbuTechOps/K8s/blob/3b5318aaba87f67c6f79f7f9375adfa7ef36697c/Prometheus%26Grafana_Setup.md


Metrics Implementation with Prometheus

I use middleware and decorators to capture and log these metrics automatically.

Key Metrics Implemented:

HTTP Request Metrics — Tracks API request count (http_requests_total) and latency (http_request_duration_seconds).
Business Metrics – Monitors orders (orders_total), cart operations (cart_operations_total).
Database Metrics – Tracks active DB connections (db_connections_active) and query latency (db_query_duration_seconds).
User Metrics – Logs active sessions (user_sessions_active) and login attempts (user_logins_total).

Health Check Endpoints:

  • /api/health/liveLiveness probe (Ensures service is running).
  • /api/health/readyReadiness probe (Checks DB & system health).
  • /metricsPrometheus scrape endpoint for collecting real-time data.

Here’s how to check these metrics internally:

  1. Using kubectl port-forward:
# Port forward the service
kubectl port-forward svc/ecommerce-backend -n shopeasy-dev 5000:80

# In another terminal, test endpoints
curl http://localhost:5000/api/health/live
curl http://localhost:5000/api/health/ready
curl http://localhost:5000/api/metrics

2. From inside the pod:

# Get into the pod
kubectl exec -it $(kubectl get pod -l app=ecommerce,tier=backend -n shopeasy-dev -o jsonpath='{.items[0].metadata.name}') -n shopeasy-dev -- bash

# Check metrics internally
curl http://localhost:5000/api/health/live
curl http://localhost:5000/api/health/ready
curl http://localhost:5000/api/metrics

3. Using LoadBalancer URL:

# Get your LoadBalancer URL
export LB_URL=$(kubectl get svc ecommerce-backend -n shopeasy-dev -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')

# Test endpoints
curl http://$LB_URL/api/health/live
curl http://$LB_URL/api/health/ready
curl http://$LB_URL/api/metrics

4. Check using service DNS:

# From another pod in the same namespace
curl http://ecommerce-backend.shopeasy-dev.svc.cluster.local/api/health/live
curl http://ecommerce-backend.shopeasy-dev.svc.cluster.local/api/health/ready
curl http://ecommerce-backend.shopeasy-dev.svc.cluster.local/api/metrics

Custom Metrics

1.1 HTTP Request Metrics

Tracking HTTP requests helps in understanding traffic patterns, API latency, and error rates.

  • Total HTTP Requests (http_requests_total): Captures the number of incoming API requests categorized by method, endpoint, and status code.
  • Request Latency (http_request_duration_seconds): Records the duration of API responses for performance analysis.
from prometheus_client import Counter, Histogram, Gauge, CollectorRegistry
from functools import wraps
from flask import request
import time

# Create a global registry
REGISTRY = CollectorRegistry()

# HTTP Request Metrics
REQUEST_COUNT = Counter(
'http_requests_total',
'Total HTTP requests',
['method', 'endpoint', 'status'],
registry=REGISTRY
)

REQUEST_LATENCY = Histogram(
'http_request_duration_seconds',
'HTTP request latency in seconds',
['method', 'endpoint'],
registry=REGISTRY
)

1.2 Business-Specific Metrics

Monitoring business-specific transactions provides insights into customer interactions.

  • Total Orders (orders_total): Tracks the number of successful and failed orders.
  • Cart Operations (cart_operations_total): Records actions like adding, removing, and checking out items from the cart.
# Business Metrics
ORDER_COUNT = Counter(
'orders_total',
'Total orders placed',
['status'], # success, failed
registry=REGISTRY
)

CART_OPERATIONS = Counter(
'cart_operations_total',
'Cart operations count',
['operation'], # add, remove, checkout
registry=REGISTRY
)

1.3 User Metrics

Understanding user behavior allows us to track login trends and active sessions.

  • Active User Sessions (user_sessions_active): Tracks the number of currently active users.
  • User Login Attempts (user_logins_total): Records successful and failed logins.

# User Metrics
USER_SESSION_COUNT = Gauge(
'user_sessions_active',
'Number of active user sessions',
registry=REGISTRY
)

USER_LOGIN_COUNT = Counter(
'user_logins_total',
'Total number of user logins',
['status'], # success, failed
registry=REGISTRY
)

Grafana Dashboards for Insights:

📊 API Performance: Request trends, error rates, response latency.
📊 Business Monitoring: Order success/failure rates, cart activities.
📊 Database Performance: Query response time, active DB connections.
📊 System Health: CPU, memory usage, uptime monitoring.

This observability setup ensures scalability, reliability, and proactive issue detection in our E-Commerce platform. 🚀

Grafana Dashboard Setup

  1. Access Grafana:
kubectl get svc -n monitoring grafana-external

2. Login to Grafana UI (http://YOUR-GRAFANA-URL:3000)

  • Username: admin
  • Password: (from kubectl get secret)

3. Add Main Dashboard:

  • Click ‘+ Create’ > ‘Import’
  • Click ‘Create New Dashboard’
  • Add following panels:

1. API Performance Panel:

{
"title": "API Response Times",
"type": "graph",
"datasource": "Prometheus",
"targets": [
{
"expr": "rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m])",
"legendFormat": "{{endpoint}}"
}
]
}

First, create a new panel in your dashboard:

  • Click “Add panel” (+ icon)
  • Select “Add a new panel”

In the query tab:

  • Select “Prometheus” as data source
  • In the Metrics browser, enter this PromQL query:
rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m])

Configure Panel Settings (right side):

  • Title: “API Response Times”
  • Panel options > Description: “Average response time by endpoint”

Under Visualization settings:

  • Select “Time series”
  • Under “Panel options”:
  • Title: “API Response Times”
  • Description: “Average API response times in seconds”

Configure Graph styles:

  • Under “All” tab:
  • Unit: “seconds (s)”
  • Min: 0
  • Decimals: 2

Under “Graph” tab:

  • Fill opacity: 10
  • Line width: 2
  • Show points: When threshold is crossed

Add Legend:

  • Legend mode: Table
  • Legend placement: Bottom
  • Legend values:
  • Check “Min”
  • Check “Max”
  • Check “Avg”

Add Thresholds:

  • Click “Add threshold”
  • Add warning at 0.5 seconds (yellow)
  • Add critical at 1 second (red)

Save:

  • Click “Apply” in the top-right corner
  • Then “Save” the dashboard

Prometheus Query:

2. CPU usage Panel:

# Per pod CPU usage
sum by(pod) (
rate(container_cpu_usage_seconds_total{namespace="shopeasy-dev", container!=""}[5m])
) * 100

This query will:

  • Show CPU usage as a percentage (multiplied by 100)
  • Break it down by pod
  • Exclude pause containers
  • Focus on your shopeasy-dev namespace
  • Use a 5-minute rate to smooth out spikes

Prometheus Query:

Pod’s CPU limits:

sum by(pod) (
kube_pod_container_resource_limits{namespace="shopeasy-dev", resource="cpu"}
) * 100

3. HTTP Request Count:

# Total requests by status code
sum(rate(http_request_total[5m])) by (status)

# Error rate percentage
sum(rate(http_request_total{status=~"5.*"}[5m])) / sum(rate(http_request_total[5m])) * 100

# Requests by endpoint
sum(rate(http_request_total[5m])) by (endpoint)

Prometheus Query:

Prometheus Graph:

Grafana Dashboard:

HTTP Request Duration (Latency):

Prometheus Graph:

Grafana Dashboard:

Business Metrics:

  1. Order Metrics:

Examples:

# Order rate (orders per minute)
sum(rate(orders_total[5m]))

# Success rate of orders
sum(rate(orders_total{status="success"}[5m])) / sum(rate(orders_total[5m])) * 100

# Failed orders rate
sum(rate(orders_total{status="failed"}[5m])) / sum(rate(orders_total[5m])) * 100

# Total orders in last 24 hours
sum(increase(orders_total[24h]))

2. Cart Operations:

# Cart operations by type
sum(rate(cart_operations_total[5m])) by (operation)

# Cart abandonment rate (checkout vs add)
1 - (sum(rate(cart_operations_total{operation="checkout"}[1h])) / sum(rate(cart_operations_total{operation="add"}[1h])))

# Cart operations trend
sum(increase(cart_operations_total[1h])) by (operation)

User Metrics:

1. Session Metrics:

# Current active sessions
user_sessions_active

# Session trend
delta(user_sessions_active[1h])

# Peak sessions in last day
max_over_time(user_sessions_active[24h])

2. Login Metrics:

# Login success rate
sum(rate(user_logins_total{status="success"}[5m])) / sum(rate(user_logins_total[5m])) * 100

# Failed login attempts
sum(rate(user_logins_total{status="failed"}[5m]))

# Total logins per day
sum(increase(user_logins_total[24h]))

Conclusion

Integrating custom metrics with Prometheus and Grafana in a Python Flask E-Commerce application provides real-time insights into user activity, API performance, and system health. By tracking user sessions, logins, orders, cart operations, and database performance, we ensure better reliability, scalability, and business growth.

With Grafana dashboards, teams can easily visualize trends, detect issues early, and improve user experience. This monitoring setup helps both SREs and business teams make data-driven decisions, ensuring a smooth, high-performance application for customers. 🚀


Your Thoughts Matter!

I’d love to hear what you think about this article — feel free to share your opinions in the comments below (or above, depending on your device!). If you found this helpful or enjoyable, a clap, a comment, or even a highlight of your favorite sections would mean a lot.

For more insights into the world of technology and data, visit subbutechops.com. There’s plenty of exciting content waiting for you to explore!

🔔 Subscribe for more DevOps, Shell Scripting, and Kubernetes tutorials:
👉 https://www.youtube.com/@SubbuTechTutorials

Thank you for reading, and happy learning! 🚀

Leave a Comment