Learning Databases & Messaging Systems: My Notes on MySQL, MongoDB, Redis, and Kafka
Table of Contents
- Introduction – Overview of MySQL, MongoDB, Redis, and Kafka
- MySQL vs. MongoDB – Comparing relational and document-based databases
- Redis vs. Kafka – From in-memory messaging to event streaming
- Feature Summary – Key strengths, use cases, and when to use each
- References – Source materials and further reading
Introduction
MySQL – SQL Databases (Relational)
MySQL is a widely adopted relational database known for its ease of use, speed, and extensive community support. It is well-suited for applications with clearly defined data structures and scenarios requiring rapid development.
MongoDB – NoSQL Databases
MongoDB is a document-oriented NoSQL database that stores data in a format similar to JSON. It provides a flexible schema model, making it ideal for projects where data structures may change frequently.
Redis – Caching & In-Memory Storage
Redis is an in-memory key-value store known for its ultra-fast response times and support for a variety of data structures. It is commonly used to handle the performance-sensitive parts of an application architecture.
Kafka – Streaming & Messaging
Apache Kafka is a distributed event streaming platform designed to facilitate communication between services via events rather than direct API calls. It provides a robust foundation for building scalable, loosely coupled systems.
MySQL vs. MongoDB
Data Model
- MySQL uses a traditional relational model, where data is stored in rows and columns across tables. It requires a fixed schema, and relationships are defined using primary and foreign keys.
- MongoDB stores data as JSON-like documents in collections. It’s often called “schema-less,” which means it doesn’t force a fixed structure for the data like traditional databases do. Each document in the same collection can look different. However, this doesn’t mean there’s no structure at all—developers usually follow a consistent format and can even add rules to check the structure using built-in validation tools. So while MongoDB gives users more flexibility, it still supports structure when needed.
Scalability
- MySQL is generally better suited for vertical scaling, meaning it performs well when adding more resources (like CPU, RAM, or storage) to a single server. It does support read replicas to help distribute read workloads, but it’s not as naturally built for running across many servers. While horizontal scaling is possible, it usually requires additional tools or more complex setups.
- MongoDB, on the other hand, is designed for horizontal scaling. It supports sharding (splitting data across multiple machines) and replica sets (automatic failover and redundancy), making it easier to handle large datasets and high traffic across distributed systems.
Query Language
- MySQL uses SQL (Structured Query Language), which is widely known and supported.
- MongoDB uses MQL (MongoDB Query Language), which is more document-oriented and may take time to learn for those coming from SQL.
Performance
- MySQL is efficient for structured data and complex joins, especially when data is properly indexed.
- MongoDB performs well when dealing with large volumes of insert or update operations, especially when documents are self-contained and don’t require joins.
Flexibility
- MySQL has a strict schema, which helps maintain data consistency but may require migrations when the structure changes.
- MongoDB allows flexible data structures. This makes it easier to work with evolving or unstructured data, which is common in modern applications.
Security
Both databases support encryption, user authentication, and access control.
- MySQL uses its own built-in authentication system and may be more susceptible to SQL injection if not properly handled.
- MongoDB supports external authentication methods like LDAP, Kerberos, and X.509.
Redis vs. Kafka
(* This comparison focuses on Redis pub/sub messaging, not its general key-value storage features.)
Workflow
- Redis works like a live broadcaster. When a producer sends a message, Redis immediately pushes it to all connected consumers. Messages are grouped using keys, such as “email,” and sent to whoever is listening to that key. Redis stores messages in memory, which makes it very fast, but it doesn’t keep them after delivery. If no one is connected when the message is sent, the message is lost.
- Kafka lets different apps send and receive data through something called “topics.” A topic is like a channel for a specific type of message, such as orders or payments. Apps that send messages are called producers, and those that read messages are consumers. Messages are stored in parts called partitions, which are spread across multiple servers for better performance and reliability. Consumers pull messages from these partitions whenever they are ready, and the messages stay there for a while, even after being read.
Message Size
- Redis is optimized for small messages. It stores everything in memory, so capacity is limited.
- Kafka can handle larger messages (up to ~1 GB) when compression and external storage are used.
Message Delivery
- Redis uses a push-based approach. It sends messages directly to all connected subscribers as soon as they’re available.
- Kafka uses a pull-based approach. Consumers read messages from a queue when ready.
Message Retention
- Redis only keeps messages if subscribers are connected. If no one is listening, the messages are dropped and can’t be recovered. (* This applies to Redis pub/sub. Redis can persist data in other use cases.)
- Kafka stores messages even after they’ve been read. Consumers can re-read data later.
Error Handling
- Redis relies on the application to manage issues like timeouts or memory limits. It doesn’t have built-in message-level error tracking.
- Kafka has built-in tools for error recovery, like dead-letter queues and message retries.
Parallelism
- Redis doesn’t support parallel delivery to multiple consumers.
- Kafka allows the same message to be consumed by multiple consumers at the same time.
Throughput
- Redis has lower throughput when more subscribers are connected, since it waits for each one to receive messages.
- Kafka can process a high volume of messages per second. It doesn’t wait for each consumer to respond.
Latency
- Redis has very low latency because it reads/writes in RAM.
- Kafka is also fast, but usually a bit slower due to disk storage and data replication.
Fault Tolerance
- Redis does not persist data unless configured to do so. Data may be lost if the system shuts down unexpectedly.
- Kafka automatically replicates data across servers to prevent loss.
Feature Summary
Feature | MySQL | MongoDB | Redis | Kafka |
---|---|---|---|---|
Strengths |
|
|
|
|
Use Cases |
|
|
|
|
Recommended When |
|
|
|
|