• Skip to primary navigation
  • Skip to content
  • Skip to primary sidebar
Clean Programmer

Clean Programmer

Programming & DevOps Resources

  • Home
  • Library
  • About
  • Contact

How to Configure Druid to Use Minio as Deep Storage

June 20, 2018 Monzurul Haque Shimul

Druid relies on a distributed filesystem or binary object store for data storage. The most commonly used deep storage implementations are S3 (popular for those on AWS) and HDFS (popular if you already have a Hadoop deployment). In this post, I will show you how to configure non-Amazon S3 deep storage for druid cluster. And for this, I will use Minio as S3 deep storage for druid cluster.

Minio

Minio is a high performance distributed object storage server, designed for large-scale private cloud infrastructure. Amazon S3 API is the de facto standard for object storage. Minio implements Amazon S3 v2/v4 API. It is best suited for storing unstructured data such as photos, videos, log files, backups and container / VM images. Size of an object can range from a few KBs to a maximum of 5TB.

At first you need to install Minio. Follow the instructions described here to install Minio. Minio Server comes up with an embedded web based object browser. Point your web browser to http://127.0.0.1:9000 to ensure your server has started successfully.

Now that you have installed Minio, lets create a bucket named cpbucket (or your preferred one) from the web-ui or you can also do that using Minio Client (mc). See the documentation for more details about mc .

Druid

Now it’s time to configure druid. In conf/druid/_common/common.runtime.properties, Add “druid-s3-extensions” to druid.extensions.loadList. If for example the list already contains “druid-parser-route”, the final property should look like:

druid.extensions.loadList=["druid-parser-route", "druid-s3-extensions"]

The S3 extension for deep storage uses jets3t under the hood. You need to create a jets3t.properties on the class path. Let’s create a new file jets3t.properties inside conf/druid/_common directory with following:

s3service.s3-endpoint=localhost
s3service.s3-endpoint-http-port=9000
s3service.disable-dns-buckets=true
s3service.https-only=false

Now comment out the configurations for local storage under “Deep Storage” section and add appropriate values for Minio. After this, “Deep Storage” section should look like:

#
# Deep storage
#

# For local disk (only viable in a cluster if this is a network mount):
# druid.storage.type=local
# druid.storage.storageDirectory=var/druid/segments

# For HDFS:
# druid.storage.type=hdfs
# druid.storage.storageDirectory=/druid/segments

# For S3:
druid.storage.type=s3
druid.storage.bucket=cpbucket
druid.storage.baseKey=druid/segments
druid.s3.accessKey=...
druid.s3.secretKey=...

To configure indexing service logs to be stored in Minio, update the “Indexing service logs” section with appropriate values in conf/druid/_common/common.runtime.properties.
After this, “Indexing service logs” section should look like:

#
# Indexing service logs
#

# For local disk (only viable in a cluster if this is a network mount):
# druid.indexer.logs.type=file
# druid.indexer.logs.directory=var/druid/indexing-logs

# For HDFS:
# druid.indexer.logs.type=hdfs
# druid.indexer.logs.directory=/druid/indexing-logs

# For S3:
druid.indexer.logs.type=s3
druid.indexer.logs.s3Bucket=cpbucket
druid.indexer.logs.s3Prefix=druid/indexing-logs

And you’re done. Now restart the servers to take effect. To test if it is working, load the sample wikipedia data in druid and see data are stored in Minio using the web-ui or mc.

You may also like:

  • How to Configure Druid to Use Cassandra as Deep Storage
  • How to Configure Druid to Use Zenko Cloud Server (Scality S3) as Deep Storage

Druid druid.io imply.io minio

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

Categories

  • Apache Kafka
  • Druid
  • Git
  • Java
  • Java EE
  • Redis
  • Spring
  • Uncategorized
  • Weblogic
  • Wildfly

Featured Posts

Loading data into Druid from TSV file

How to Configure Druid to Use Zenko CloudServer (Scality S3 Server) as Deep Storage

How To Configure Druid To Use MySQL As Metadata Storage

Handling Nested Json During Ingestion Into Druid

Automate the build process for project release using bash script

Tags

bash bitbucket cassandra cloudserver curl docker druid.io eclipselink ejb git imply.io java java-ee jaxws jboss jboss-cli jdbc jdk jms kafka maven minio mssql mysql ojdbc oracle postgresql redis rest rest-template S3 scality sdk sdkman soap spring sqlserver stream stream api weblogic web services wildfly wsdl zenko zookeeper

Archives

  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018

Copyright © 2019 ยท CLEAN PROGRAMMER

  • Privacy Policy
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.OKNoRead more
Revoke Cookies