Crowdsourced air traffic data from The OpenSky Network 2020
The data in this dataset is derived and cleaned from the full OpenSky dataset to illustrate the development of air traffic during the COVID-19 pandemic. It spans all flights seen by the network's more than 2500 members since 1 January 2019. More data will be periodically included in the dataset until the end of the COVID-19 pandemic.
Source: https://zenodo.org/records/5092942
Martin Strohmeier, Xavier Olive, Jannis Luebbe, Matthias Schaefer, and Vincent Lenders "Crowdsourced air traffic data from the OpenSky Network 2019–2020" Earth System Science Data 13(2), 2021 https://doi.org/10.5194/essd-13-357-2021
Download the Dataset
Run the command:
Download will take about 2 minutes with good internet connection. There are 30 files with total size of 4.3 GB.
Create the Table
Import Data
Upload data into ClickHouse in parallel:
- Here we pass the list of files (
ls -1 flightlist_*.csv.gz
) toxargs
for parallel processing.xargs -P100
specifies to use up to 100 parallel workers but as we only have 30 files, the number of workers will be only 30. - For every file,
xargs
will run a script withbash -c
. The script has substitution in form of{}
and thexargs
command will substitute the filename to it (we have asked it forxargs
with-I{}
). - The script will decompress the file (
gzip -c -d "{}"
) to standard output (-c
parameter) and the output is redirected toclickhouse-client
. - We also asked to parse DateTime fields with extended parser (--date_time_input_format best_effort) to recognize ISO-8601 format with timezone offsets.
Finally, clickhouse-client
will do insertion. It will read input data in CSVWithNames format.
Parallel upload takes 24 seconds.
If you don't like parallel upload, here is sequential variant:
Validate the Data
Query:
Result:
The size of dataset in ClickHouse is just 2.66 GiB, check it.
Query:
Result:
Run Some Queries
Total distance travelled is 68 billion kilometers.
Query:
Result:
Average flight distance is around 1000 km.
Query:
Result:
Most busy origin airports and the average distance seen
Query:
Result:
Number of flights from three major Moscow airports, weekly
Query:
Result:
Online Playground
You can test other queries to this data set using the interactive resource Online Playground. For example, like this. However, please note that you cannot create temporary tables here.