LogoLogo
CloudDiscordGitHub
  • 👉Getting Started
    • Introduction
    • Quick start
    • Learn by example
    • Case studies
    • How to contribute?
  • ⭐Memphis Broker
    • Architecture
    • Key concepts
      • Message broker
      • Station
      • Producer API
      • Consumer API
      • Consumer Group
      • Storage and Redundancy
      • Security/Authentication
      • Scaling
      • Ordering
      • Dead-letter Station (DLS)
      • Delayed messages
      • Data exchange
      • Idempotency (Duplicate processing)
      • Failover Scenarios
      • Troubleshooting process
      • Connectors
    • Best practices
      • Producer optimization
      • Compression
    • Memphis configuration
    • Comparisons
      • NATS Jetstream vs Memphis
      • RabbitMQ vs Memphis
      • AWS SQS vs Memphis
      • Apache Kafka vs Memphis
      • Apache Pulsar vs Memphis
      • ZeroMQ vs Memphis
      • Apache NiFi vs Memphis
    • Privacy Policy
  • ⭐Memphis Schemaverse
    • Overview
    • Getting started
      • Management
      • Produce/Consume
        • Protobuf
        • JSON Schema
        • GraphQL
        • Avro
    • Comparison
    • KB
  • 📦Open-Source Installation
    • Kubernetes
      • 1 - Installation
      • 2 - Access
      • 3 - Upgrade
      • Terraform
        • Deploy on AWS
        • Deploy on GCP
        • Deploy on DigitalOcean
      • Guides
        • Deploy/Upgrade Memphis utilizing predefined secrets
        • Monitoring/Alerts Recommendations
        • Production Best Practices
        • NGINX Ingress Controller and Cloud-Agnostic Memphis Deployments
        • Migrate Memphis storage between storageClass's
        • Expanding Memphis Disk Storage
        • Scale-out Memphis cluster
        • TLS - Deploy Memphis with TLS Connection to Metadata Frontend
        • TLS - Memphis TLS websocket configuration
        • TLS - Securing Memphis Client with TLS
        • Installing Memphis with an External Metadata Database
    • Docker
      • 1 - Installation
      • 2 - Access
      • 3 - Upgrade
    • Open-source Support
  • Client Libraries
    • REST (Webhook)
    • Node.js / TypeScript / NestJS
    • Go
    • Python
    • Kotlin (Community)
    • .NET
    • Java
    • Rust (Community)
    • NATS
    • Scala
  • 🔌Integrations Center
    • Index
    • Processing
      • Zapier
    • Change data Capture (CDC)
      • Debezium
    • Monitoring
      • Datadog
      • Grafana
    • Notifications
      • Slack
    • Storage tiering
      • S3-Compatible Object Storage
    • Source code
      • GitHub
    • Other platforms
      • Argo
  • 🗒️Release notes
    • KB
    • Releases
      • v1.4.3 - latest/stable
      • v1.4.2
      • v1.4.1
      • v1.4.0
      • v1.3.1
      • v1.3.0
      • v1.2.0
      • v1.1.1
      • v1.1.0
      • v1.0.3
      • v1.0.2
      • v1.0.1
      • V1.0.0 - GA
      • v0.4.5 - beta
      • v0.4.4 - beta
      • v0.4.3 - beta
      • v0.4.2 - beta
      • v0.4.1 - beta
      • v0.4.0 - beta
      • v0.3.6 - beta
      • v0.3.5 - beta
      • v0.3.0 - beta
      • v0.2.2 - beta
      • v0.2.1 - beta
      • v0.2.0 - beta
      • v0.1.0 - beta
Powered by GitBook
LogoLogo

Legal

  • Terms of Service
  • Privacy Policy

All rights reserved to Memphis.dev 2023

On this page

Was this helpful?

  1. Open-Source Installation
  2. Kubernetes
  3. Guides

Monitoring/Alerts Recommendations

This guide outlines the key system metrics that require continuous monitoring and creating alerts based on them.

Log Monitoring: The system should trigger alerts for every occurrence of an ERR log in Memphis services logs. Filtering based on the ERR statement is essential. For instance:

[ERR] Error trying to connect to route (attempt 1)

CPU: Under regular usage, Memphis should not excessively consume the allocated CPU. The ideal thresholds for CPU consumption, based on Memphis recommendations, are 50%, 75%, and 90%. Monitored metrics include <memphis_varz_cpu>.

Memory: Memory consumption is contingent on the chosen storage type. If memory is the selected storage, monitoring with appropriate thresholds is crucial. The assigned amount will not be released until the retention takes effect. Monitored metrics include <memphis_varz_mem>.

Storage: To prevent pod crashes, Memphis restricts disk usage to a maximum of 95%. Configuring thresholds is recommended to avoid unexpected disruptions during full capacity. The monitored Memphis metric can be calculated using the formula:

sum(memphis_varz_jetstream_stats_storage{pod=~"$server",namespace=~"$namespace"}) / sum(memphis_varz_jetstream_config_max_storage{pod=~"$server",namespace=~"$namespace"})

Alternatively, Kubernetes PVC default metrics can be utilized.

Connections: While numerous connections between clients and Memphis may not indicate issues, monitoring is necessary to identify anomalies. For instance, the first threshold can be set at 100 connections. The monitored metric is <memphis_varz_connections>.

Station Data Size: For a more granular analysis of storage consumption, thresholds can be set on the station consumption itself using:

sum(memphis_stream_total_bytes{namespace=~"memphis",stream_name=~"station_a"}/3)

Adjustments are required based on the station replication ratio. For example, a station with 3 replicas.

DLS (Dead Letter Queue Service): Awareness of DLS messages is recommended. Metrics for both "unacked" and "schemavarese" messages can be calculated using the following examples:

memphis_stream_total_messages{stream_name="$memphis_dls_unacked"}
memphis_stream_total_messages{stream_name="$memphis_dls_schemaverse"}

Last updated 1 year ago

Was this helpful?

📦