In this article, I am going to demonstrate how to load data into Druid from CSV file, using Druid’s native batch ingestion using CSV ParseSpec. I assume you already have a good understanding of Druid architecture and have Druid installed and running. If not, see my previous post to quickly install and run Druid using… Read More
Druid supports “multi-value” string dimensions. These are generated when an input field contains an array of values instead of a single value. topN and groupBy queries can group on multi-value dimensions. When grouping on a multi-value dimension, all values from matching rows will be used to generate one group per value. It’s possible for a query to return… Read More
In my previous article, I have demonstrated how to perform a batch file load, using Druid’s native batch ingestion. And I have only shown handling of root level elements of json and I have intentionally skipped the nested elements of json. That’s because nested json needs special handling for ingestion into Druid, they need to… Read More
In case of time-series events data in a relational database, stored one event per row, If we need to calculate the number of events per hour, we’d select all rows within an overall interval, group those rows by hour, and count the rows in each hour group. If we have to perform this query many… Read More
In my last few posts, I have discussed about Druid cluster setup. Zookeeper is required for druid as an external dependency. In my upcoming posts, I will discuss about Apache Kafka which also requires Zookeeper as a dependency.
ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
For better reliability and high availability of the Zookeeper service, we should set up Zookeeper in cluster mode. In this post, I will discuss how to setup a Zookeeper cluster with 3 nodes.