Spark JDBC

One of the most used data sources supported by Spark is JDBC. In this section, we will provide details on how to use the ClickHouse official JDBC connector with Spark.

Read data
Write data
Parallelism
JDBC Limitations

Read data

Java
Scala
Python
Spark SQL

Write data

Java
Scala
Python
Spark SQL

Parallelism

When using Spark JDBC, Spark reads the data using a single partition. To achieve higher concurrency, you must specify partitionColumn, lowerBound, upperBound, and numPartitions, which describe how to partition the table when reading in parallel from multiple workers. Please visit Apache Spark's official documentation for more information on JDBC configurations.

JDBC Limitations

As of today, you can insert data using JDBC only into existing tables (currently there is no way to auto create the table on DF insertion, as Spark does with other connectors).

Read data​

Write data​

Parallelism​

JDBC Limitations​

Read data

Write data

Parallelism

JDBC Limitations