LogoLogo
CloudDiscordGitHub
  • 👉Getting Started
    • Introduction
    • Quick start
    • Learn by example
    • Case studies
    • How to contribute?
  • ⭐Memphis Broker
    • Architecture
    • Key concepts
      • Message broker
      • Station
      • Producer API
      • Consumer API
      • Consumer Group
      • Storage and Redundancy
      • Security/Authentication
      • Scaling
      • Ordering
      • Dead-letter Station (DLS)
      • Delayed messages
      • Data exchange
      • Idempotency (Duplicate processing)
      • Failover Scenarios
      • Troubleshooting process
      • Connectors
    • Best practices
      • Producer optimization
      • Compression
    • Memphis configuration
    • Comparisons
      • NATS Jetstream vs Memphis
      • RabbitMQ vs Memphis
      • AWS SQS vs Memphis
      • Apache Kafka vs Memphis
      • Apache Pulsar vs Memphis
      • ZeroMQ vs Memphis
      • Apache NiFi vs Memphis
    • Privacy Policy
  • ⭐Memphis Schemaverse
    • Overview
    • Getting started
      • Management
      • Produce/Consume
        • Protobuf
        • JSON Schema
        • GraphQL
        • Avro
    • Comparison
    • KB
  • 📦Open-Source Installation
    • Kubernetes
      • 1 - Installation
      • 2 - Access
      • 3 - Upgrade
      • Terraform
        • Deploy on AWS
        • Deploy on GCP
        • Deploy on DigitalOcean
      • Guides
        • Deploy/Upgrade Memphis utilizing predefined secrets
        • Monitoring/Alerts Recommendations
        • Production Best Practices
        • NGINX Ingress Controller and Cloud-Agnostic Memphis Deployments
        • Migrate Memphis storage between storageClass's
        • Expanding Memphis Disk Storage
        • Scale-out Memphis cluster
        • TLS - Deploy Memphis with TLS Connection to Metadata Frontend
        • TLS - Memphis TLS websocket configuration
        • TLS - Securing Memphis Client with TLS
        • Installing Memphis with an External Metadata Database
    • Docker
      • 1 - Installation
      • 2 - Access
      • 3 - Upgrade
    • Open-source Support
  • Client Libraries
    • REST (Webhook)
    • Node.js / TypeScript / NestJS
    • Go
    • Python
    • Kotlin (Community)
    • .NET
    • Java
    • Rust (Community)
    • NATS
    • Scala
  • 🔌Integrations Center
    • Index
    • Processing
      • Zapier
    • Change data Capture (CDC)
      • Debezium
    • Monitoring
      • Datadog
      • Grafana
    • Notifications
      • Slack
    • Storage tiering
      • S3-Compatible Object Storage
    • Source code
      • GitHub
    • Other platforms
      • Argo
  • 🗒️Release notes
    • KB
    • Releases
      • v1.4.3 - latest/stable
      • v1.4.2
      • v1.4.1
      • v1.4.0
      • v1.3.1
      • v1.3.0
      • v1.2.0
      • v1.1.1
      • v1.1.0
      • v1.0.3
      • v1.0.2
      • v1.0.1
      • V1.0.0 - GA
      • v0.4.5 - beta
      • v0.4.4 - beta
      • v0.4.3 - beta
      • v0.4.2 - beta
      • v0.4.1 - beta
      • v0.4.0 - beta
      • v0.3.6 - beta
      • v0.3.5 - beta
      • v0.3.0 - beta
      • v0.2.2 - beta
      • v0.2.1 - beta
      • v0.2.0 - beta
      • v0.1.0 - beta
Powered by GitBook
LogoLogo

Legal

  • Terms of Service
  • Privacy Policy

All rights reserved to Memphis.dev 2023

On this page
  • Introduction to schemas
  • What is AWS Glue?
  • Capabilities
  • What is Confluent Schema Registry?
  • What is Memphis.dev Schemaverse?
  • Common use cases
  • Comparison
  • Validation and Enforcement
  • Serialization/Deserialization

Was this helpful?

  1. Memphis Schemaverse

Comparison

A comparison article between Schemaverse, Confluent Schema Registry, and AWS Glue

Last updated 1 year ago

Was this helpful?

Introduction to schemas

Before delving into the different supporting technologies, let's establish a baseline understanding of schemas and their role in message brokers or async server-server communication.

Schema = Structure.

A schema defines the structure of a "message" and follows a specific format to ensure effective communication between different applications/services/electronic entities.

Schemas can be found in both SQL & NoSQL databases, providing the structure the database expects to receive data. For example, first\_name:string or first.name etc. An unfamiliar or noncompliant schema will result in data being dropped, and the database will not save the record.

Likewise, schemas are critical in communication between two logical entities, such as two microservices. Consider a scenario where Service A writes a message to Service B, which expects a specific format like Protobuf, and its logic or code depends on specific keys and value types. Even a simple typo in a column name or an unexpected schema or format can cause issues for the consumer.

Schemas serve as a manual or automatic contract that ensures stable communication and dictates how two entities should interact.

The following comparison of technologies will help you establish and enforce schemas between services as data flows from one service to another.

What is AWS Glue?

AWS Glue is a serverless data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development.

Credit: https://aws.amazon.com/glue/

Capabilities

  • Data integration engine

  • Event-driven ETL

  • No-code ETL jobs

  • Data preparation

The main components of AWS Glue are the Data Catalog, which stores metadata, and an ETL engine that can automatically generate Scala or Python code. Typical data sources would be Amazon S3, RDS, and Aurora.

What is Confluent Schema Registry?

Confluent Schema Registry provides a serving layer for your metadata.

It stores a versioned history of all schemas based on a specified subject name strategy, provides multiple compatibility settings, and allows the evolution of schemas according to the configured compatibility settings and expanded support for these schema types.

It provides serializers that plug into Apache Kafka® clients that handle schema storage and retrieval for Kafka messages sent in any supported formats.

Schema Registry lives outside of and separately from your Kafka brokers.

Your producers and consumers still talk to Kafka to publish and read data (messages) to topics.

Concurrently, they can also talk to Schema Registry to send and retrieve schemas that describe the data models for the messages.

What is Memphis.dev Schemaverse?

Memphis Schemaverse provides a robust schema store and schema management layer on top of Memphis broker without a standalone compute unit or dedicated resources. With a unique & modern UI and programmatic approach, technical and non-technical users can create and define different schemas, attach the schema to multiple stations, and choose if the schema should be enforced or not.

Memphis' low-code approach removes the serialization part as it is embedded within the producer library. Schemaverse supports versioning, GitOps methodologies, and schema evolution.

Schemaverse's primary purpose is to act as an automatic gatekeeper, ensure the format and structure of ingested messages to a Memphis station, and reduce consumer crashes, as often happens if certain producers produce an event with an unfamiliar schema.

Common use cases

  • Schema enforcement between micrMemphis's

  • Data contracts

  • Convert events' format

  • Create an organizational standard around the different consumers and producers.

Comparison

Parameter
AWS Glue
Schema Registry
Schemaverse

Data formats

JSON Schema, Avro, Protobuf

Avro, JSON Schema, Protobuf

JSON Schema, Protobuf, GraphQL

Validation and Enforcement

Yes

Yes

Yes

Serialisation

Requires implementation

Requires implementation

Transparent

Deserialization

Requires implementation

Requires implementation

Transparent

Management interface

GUI, CLI, SDK

REST, SDK, GUI

SDK, GUI, CLI

Supported languages

Scala

Java, .NET, Python

Go, Node.js, Python, REST, TypeScript, NestJS, Java, .NET, Kotlin

Compatibility mode

backward or forward

backward or forward

backward or forward

Schema creation

Manual / Auto

Manual / Auto

Manual

Pricing

$1.00 per 100,000 objects stored above 1M, per month + $1.00 per million requests above 1M in a month

Confluent Community License / Confluent Enterprise licence

Open-source / Free

Validation and Enforcement

When data streamiSchemaverse'sns are integrated with schema management, schemas used for production are validated against schemas within a central registry, allowing you to control data quality centrally.

AWS Glue offers enforcement and validation using the Glue schema registry for Java-based applications using Apache Kafka, AWS MSK, Amazon Kinesis Data Streams, Apache Flink, Amazon Kinesis Data Analytics for Apache Flink, and AWS Lambda.

Schema registry validates and enforces message schemas at both the client and server side. Validation will take place on the client side by performing a serialization over the about-to-be-produced data by retrieving the schema from the schema registry. Confluent provides read-to-use serialization functions that can be used. Schema updates and evolution will require to boot of the client and fetching the updates to change the schema at the registry level. It first needed to be switched into a particular mode (forward/backward), performed the change, and then returned to default.

Schemaverse also validates and enforces the schema at the client level without the need for manual schema fetch and supports runtime evolution, meaning clients don't need a reboot to apply new schema changes, including different data formats.

Schemaverse also makes the serialization/deserialization transparent to the client and embeds it within the SDK based on the required data format.

Serialization/Deserialization

When sending data over the network, it must be encoded into bytes.

AWS Glue and Schema Registry works similarly. Each created schema has an ID. When the application producing data has registered its schema, the Schema Registry serializer validates that the record produced by the application is structured with the fields and data types matching a registered schema.

Deserialization will take place by a similar process by fetching the needed schema based on the given ID within the message.

In AWS Glue and Schema Registry, It is the client's responsibility to implement and deal with the serialization, while in Schemaverse, it is fully transparent. All that is needed by the client is to produce a message that complies with the required structure.

It provides a RESTful interface for storing and retrieving your Avro®, JSON Schema, and .

Credit: https://docs.confluent.io/platform/current/schema-registry/index.html#schemas-subjects-and-topics

⭐
Protobuf
schemas
https://docs.aws.amazon.com/glue/latest/dg/schema-registry-gs.html