Druid relies on a distributed filesystem or binary object store for data storage. The most commonly used deep storage implementations are S3 (popular for those on AWS) and HDFS (popular if you already have a Hadoop deployment). In this post, I will show you how to configure Apache Cassandra deep storage for druid cluster.
Druid can use Cassandra as a deep storage mechanism. Segments and their metadata are stored in Cassandra in two tables: index_storage and descriptor_storage. The index storage table is a Chunked Object repository. It contains compressed segments for distribution to historical nodes. The descriptor storage table is a normal C* table that stores the segment metadata.