Skip to main content
Skip to main content
Edit this page

Environmental Sensors Data

Sensor.Community is a contributors-driven global sensor network that creates Open Environmental Data. The data is collected from sensors all over the globe. Anyone can purchase a sensor and place it wherever they like. The APIs to download the data is in GitHub and the data is freely available under the Database Contents License (DbCL).

Info

The dataset has over 20 billion records, so be careful just copying-and-pasting the commands below unless your resources can handle that type of volume. The commands below were executed on a Production instance of ClickHouse Cloud.

  1. The data is in S3, so we can use the s3 table function to create a table from the files. We can also query the data in place. Let's look at a few rows before attempting to insert it into ClickHouse:

The data is in CSV files but uses a semi-colon for the delimiter. The rows look like:

  1. We will use the following MergeTree table to store the data in ClickHouse:
  1. ClickHouse Cloud services have a cluster named default. We will use the s3Cluster table function, which reads S3 files in parallel from the nodes in your cluster. (If you do not have a cluster, just use the s3 function and remove the cluster name.)

This query will take a while - it's about 1.67T of data uncompressed:

Here is the response - showing the number of rows and the speed of processing. It is input at a rate of over 6M rows per second!

  1. Let's see how much storage disk is needed for the sensors table:

The 1.67T is compressed down to 310 GiB, and there are 20.69 billion rows:

  1. Let's analyze the data now that it's in ClickHouse. Notice the quantity of data increases over time as more sensors are deployed:

We can create a chart in the SQL Console to visualize the results:

Number of events per day

  1. This query counts the number of overly hot and humid days:

Here's a visualization of the result:

Hot and humid days