ISTADFuels - Italian SpatioTemporal Augmented Dataset on Fuels
Description
We present a dataset for fuel sales analysis at the Italian provincial (NUTS3) level from January 2015 to October 2023 (release V3, January 2024). Fuel sales data are collected at monthly frequency, and are organized by fuel type, usage, and point of sale (highway, municipal road, extra-network road). Fuels data are augmented by a set of socio-economic and geographical variables, which help explain the impact of economic phenomena and topography on fuel sales. The data is collected from the Monthly oil Bullettin of Italian Ministry of Environment and Energy Security (MITE), ISTAT (Istituto Nazionale di Statistica), Bank of Italy and Eurostat, and has been collected through both automated web scraping and manual downloads, then cleaned and reshaped to be suitable for analysis. The produced dataset may be useful for spatiotemporal fuel sales forecasting, air quality analysis, urban mobility, econometric research, as well as machine learning applications. To further assist the user in finding valuable insight, an R Shiny app (freely available at the webpage https://ale-ch.shinyapps.io/it-fuel-dashboard/) was developed for data exploration. App code and the data have been made fully available on the following Github repository (https://github.com/ale-ch/it-fuel-dashboard). The app consists of interactive plots that allow the user to visualize every variable in the dataset at different time ranges and locations, allowing full flexibility in data exploration.
Files
Steps to reproduce
Data collection Fuel sales data was downloaded in multiple Excel files from the oil bulletin website of the Italian Ministry of Environment and Energy Security using the “rvest” R package, a set of utilities for web scraping. The HDD and CDD indices were accessed from the “eurostat” R package, whereas the NUTS classification data was sourced from Eurostat’s website. The rest of the data was downloaded automatically from institutional websites whenever direct links were available. Some of the databases (ISTAT’s and Bank of Italy’s) blocked automatized processes, forcing manual downloads. Links to each raw data source are available in the provided metadata. Search and inclusion of auxiliary data was based on evaluations of potential relations with fuel sales, accuracy (in terms of temporal frequency and spatial granularity) and ultimately on public availability. Data cleaning and merging Raw data was cleaned and transformed from a wide shape into a long shape (one row per temporal observation) to facilitate data merging. A dataset in long shape is suitable for econometric analysis for panel data, machine learning methodologies and visualization. Values are repeated across date and location in annual variables and/or with spatial granularity less than NUTS 3. An R Shiny App for data exploration To further assist the user in finding valuable insight, an R Shiny app was developed for data exploration. App code and the data have been made fully available on the following Github repository (https://github.com/ale-ch/it-fuel-dashboard). The app consists of interactive plots that allow the user to visualize every variable in the dataset at different time ranges and locations, allowing full flexibility in data exploration. The app provides an interactive map (“Map” tab), time series plots and basic statistical visualizations (“Analysis” tab). It is possible to download the dataset as well as metadata containing in-depth information on each variable directly from the app (“Data Explorer” tab).
Institutions
Departments
Categories
Additional Metadata for University of Milano - Bicocca
Language | English |
Date the data was collected | 2024-01-03T00:00:00.000Z |
UniMiB Research Centres | Centro di Statistica Applicata |
ERC Keywords | PE1_14 Statistics, PE6_11 Machine learning, statistical data processing and applications using signal processing (e.g. speech, image, video), SH1_12 Agricultural economics; energy economics; environmental economics, SH2_8 Energy, transportation and mobility |
SSD Classification | SECS-S/03 - STATISTICA ECONOMICA, SECS-S/01 - STATISTICA, SECS-P/05 - ECONOMETRIA |
Geolocation | Italy |