Big Data

CDSW Diagnostic Bundle Anatomy

cdsw logs output If we run cdsw logs on the CDSW Master node, we’ll see output as below: Generating Cloudera Data Science Workbench diagnostic bundle... Collecting basic system info... Collecting kernel parameters.

Learning Kafka | Theory-2 | Brokers

Brokers A Kafka cluster is composed of multiple brokers(servers) Each Broker is identified with its ID(integer) Each Broker contains certain topic partitions After connecting to any Broker (called a bootstrap broker), you will be connected to the entire cluster A good number to get started is 3 Brokers, but some big clusters have over 100 Brokers In these examples we choose to number Brokers starting at 100 (arbitrary) OK, so we’ve talked about Topics, but what holds the Topics?

Learning Kafka | Theory-1 | Topics, Partitions and Offsets

The basic concept in Kafka is Topic, which is split into one or more Partitions for storage. Topic is a logical concept, Partition is an entity. The unit of data we write to Kafka is message. These messages are saved in the form of Offset in Partition. The number of offsets is infinite, and offsets are only meaningful for specific Partitions.