LogoLogo
CloudDiscordGitHub
  • 👉Getting Started
    • Introduction
    • Quick start
    • Learn by example
    • Case studies
    • How to contribute?
  • ⭐Memphis Broker
    • Architecture
    • Key concepts
      • Message broker
      • Station
      • Producer API
      • Consumer API
      • Consumer Group
      • Storage and Redundancy
      • Security/Authentication
      • Scaling
      • Ordering
      • Dead-letter Station (DLS)
      • Delayed messages
      • Data exchange
      • Idempotency (Duplicate processing)
      • Failover Scenarios
      • Troubleshooting process
      • Connectors
    • Best practices
      • Producer optimization
      • Compression
    • Memphis configuration
    • Comparisons
      • NATS Jetstream vs Memphis
      • RabbitMQ vs Memphis
      • AWS SQS vs Memphis
      • Apache Kafka vs Memphis
      • Apache Pulsar vs Memphis
      • ZeroMQ vs Memphis
      • Apache NiFi vs Memphis
    • Privacy Policy
  • ⭐Memphis Schemaverse
    • Overview
    • Getting started
      • Management
      • Produce/Consume
        • Protobuf
        • JSON Schema
        • GraphQL
        • Avro
    • Comparison
    • KB
  • 📦Open-Source Installation
    • Kubernetes
      • 1 - Installation
      • 2 - Access
      • 3 - Upgrade
      • Terraform
        • Deploy on AWS
        • Deploy on GCP
        • Deploy on DigitalOcean
      • Guides
        • Deploy/Upgrade Memphis utilizing predefined secrets
        • Monitoring/Alerts Recommendations
        • Production Best Practices
        • NGINX Ingress Controller and Cloud-Agnostic Memphis Deployments
        • Migrate Memphis storage between storageClass's
        • Expanding Memphis Disk Storage
        • Scale-out Memphis cluster
        • TLS - Deploy Memphis with TLS Connection to Metadata Frontend
        • TLS - Memphis TLS websocket configuration
        • TLS - Securing Memphis Client with TLS
        • Installing Memphis with an External Metadata Database
    • Docker
      • 1 - Installation
      • 2 - Access
      • 3 - Upgrade
    • Open-source Support
  • Client Libraries
    • REST (Webhook)
    • Node.js / TypeScript / NestJS
    • Go
    • Python
    • Kotlin (Community)
    • .NET
    • Java
    • Rust (Community)
    • NATS
    • Scala
  • 🔌Integrations Center
    • Index
    • Processing
      • Zapier
    • Change data Capture (CDC)
      • Debezium
    • Monitoring
      • Datadog
      • Grafana
    • Notifications
      • Slack
    • Storage tiering
      • S3-Compatible Object Storage
    • Source code
      • GitHub
    • Other platforms
      • Argo
  • 🗒️Release notes
    • KB
    • Releases
      • v1.4.3 - latest/stable
      • v1.4.2
      • v1.4.1
      • v1.4.0
      • v1.3.1
      • v1.3.0
      • v1.2.0
      • v1.1.1
      • v1.1.0
      • v1.0.3
      • v1.0.2
      • v1.0.1
      • V1.0.0 - GA
      • v0.4.5 - beta
      • v0.4.4 - beta
      • v0.4.3 - beta
      • v0.4.2 - beta
      • v0.4.1 - beta
      • v0.4.0 - beta
      • v0.3.6 - beta
      • v0.3.5 - beta
      • v0.3.0 - beta
      • v0.2.2 - beta
      • v0.2.1 - beta
      • v0.2.0 - beta
      • v0.1.0 - beta
Powered by GitBook
LogoLogo

Legal

  • Terms of Service
  • Privacy Policy

All rights reserved to Memphis.dev 2023

On this page
  • Introduction
  • The object behind the station - Stream
  • Replicas (Mirroring)
  • Storage tiering
  • Tier 1 (Local storage)
  • Tier 2 (Remote storage)

Was this helpful?

  1. Memphis Broker
  2. Key concepts

Storage and Redundancy

This section describes the different storage and redundancy options

Last updated 1 year ago

Was this helpful?

Introduction

Data redundancy in the field of streaming can be a bit misleading. As written on the page, in message brokers, data is not preserved for an infinite time but for a defined period based on certain conditions like ingested time, size, and the number of messages within a station.

When data resides in the broker, it will be redundant and removed only when crossing the defined retention policy.

The object behind the station - Stream

Each station implements a stream object that contains the messages stored in the station. It is up to the user to define which type of storage will this stream object be saved.

Replicas (Mirroring)

Available in cluster mode only.

During station creation, the user can choose the number of station replicas. Replicas are an exact mirror of the entire station data, and each produced message will be mirrored across the configured replicas. Each replica will be stored on a different broker; therefore, the maximum number of replicas is derived from the number of brokers in a cluster. In case of a broker or disk loss, replicas will be used to rebuild the missing replica to maintain the required amount of replicas and, at the same time, ensure data availability through a different broker.

Replicas can be defined using the SDK, GUI, or CLI.

The number of replicas cannot be changed after station creation (but can be in the future)

Storage tiering

Memphis offers a range of storage types that you can choose from based on your workload's data access, resiliency, frequency, and cost requirements, and configured per station.

Tier 1 (Local storage)

The first type of storage each message will initially be stored at.

The options are Memory or Disk. Each with its strengths and weaknesses.

  • Memory For faster performance. Due to its nature as a volatile type of storage, the risk of losing data in case of failure is higher because it resides in the broker's memory, and in the case of a station without configured replicas, data can be lost.

  • Disk For higher availability. Disk storage might be slower than memory, but it offers greater availability and resiliency to broker failures.

Tier 2 (Remote storage)

The typical pattern of message brokers is to delete messages after passing a defined retention policy, like time/size/number of records.

Memphis offers a second storage tier for longer, possibly infinite retention for stored messages. Each message that expels from the station will automatically migrate to the second storage tier.

Architecture

Step-by-step explanation

  1. Once supported remote storage is connected, storage tiering will be available for the entire cluster.

  2. The activation and enablement of the storage tiering takes place per station.

  3. Once storage tiering is enabled for specific stations, out-of-retention records (messages) will be migrated to an internal buffer where every 8 seconds (default), an async task will pack the buffer and migrate it to the second storage tier. The packing interval can change via the "Environment configuration."

  4. At the object storage, a directory within the selected bucket will be created under the name "memphis"; within the memphis directory, a nested directory will be created for each enabled station.

  5. Within Each directory, a JSON file will be created with the latest buffer content mentioned in Step 3. The name of the file is a generated hash to avoid duplications. Content example:

[{"payload":"7b2263697479223a224e657720596f726b222c22636f756e747279223a22555341222c22656d61696c223a226a6f686e406578616d706c652e636f6d222c22686f626279223a22436f6f6b696e67222c226f636375706174696f6e223a22536f66747761726520456e67696e656572222c2270686f6e65223a223535352d313234227d","headers":{"$memphis_connectionid":"b2742350-cccc-aaaa-ffff-adc421a88acc","$memphis_producedby":"ui"}}]

Each array item is a message migrated from Memphis station. The payload is encoded in Hex and should be decoded into the original format (JSON / Protobuf / Avro).

to enable Memphis storage tiering

Here is how
⭐
station
Stream object as it construct and stored
Ack process
Ack process
Page cover image