Skip to main content
Bicocca Open Archive Research Data

Datasets within this collection

Filter Results
1970
2025
1970 2025
31 results
  • Hodrick-Prescott filter with jumps (Maranzano & Pelagatti, 2025)
    We provide data and code to replicate the results presented in "A Hodrick-Prescott Filter with automatically selected breaks" (Maranzano & Pelagatti, 2025). The subfolders allow replicating the following: 1. Simulation experiments discussed in Section 3 "Simulations"; 2. Application results discussed in Section 4 "Assessing structural breaks in the Italian labour market"; 3. Simulation experiments discussed in Section 5 "A comparison with other business cycle extraction methods". For each subfolder a readme file is provided. It contains information about the reproduction steps.
  • ISTADFuels - Italian SpatioTemporal Augmented Dataset on Fuels
    We present a dataset for fuel sales analysis at the Italian provincial (NUTS3) level from January 2015 to October 2023 (release V3, January 2024). Fuel sales data are collected at monthly frequency, and are organized by fuel type, usage, and point of sale (highway, municipal road, extra-network road). Fuels data are augmented by a set of socio-economic and geographical variables, which help explain the impact of economic phenomena and topography on fuel sales. The data is collected from the Monthly oil Bullettin of Italian Ministry of Environment and Energy Security (MITE), ISTAT (Istituto Nazionale di Statistica), Bank of Italy and Eurostat, and has been collected through both automated web scraping and manual downloads, then cleaned and reshaped to be suitable for analysis. The produced dataset may be useful for spatiotemporal fuel sales forecasting, air quality analysis, urban mobility, econometric research, as well as machine learning applications. To further assist the user in finding valuable insight, an R Shiny app (freely available at the webpage https://ale-ch.shinyapps.io/it-fuel-dashboard/) was developed for data exploration. App code and the data have been made fully available on the following Github repository (https://github.com/ale-ch/it-fuel-dashboard). The app consists of interactive plots that allow the user to visualize every variable in the dataset at different time ranges and locations, allowing full flexibility in data exploration.
  • Data for "Labeled loans and human capital investments"
    Codes and database originated for the manuscript published in the DOI article: 10.1016/j.jdeveco.2023.103053
  • BayesANT: Bayesian Nonparametric Taxonomic classifier for DNA barcoding sequences
    BayesANT is a package for the taxonomic classification of DNA sequences. It trains a taxonomic classifier on a dataset of DNA barcodes and returns probabilistic predictions for query DNA sequences. BayesANT explicitly accounts for potential taxonomic novelty of the query sequences by relying on Bayesian nonparametric species sampling priors to model the taxonomic tree.
  • Data and Files for Zito, Rigon and Dunson (2022): "Inferring taxonomic affiliation from DNA barcoding aiding in discovery of new taxa"
    This folder contains the data and the R code to reproduce the figures and tables in the paper Zito, Rigon and Dunson (2022) - "Inferring Taxonomic placement from DNA barcoding aiding in discovery of new taxa", accepted as open access publication in Methods in Ecology and Evolution. The file "main_FinBOL.R" reproduces the tables in the main document and in the Supporting information available online for the analysis of the FinBOL data, while "main_Simulation_Section4_SI.R" reproduces the simulation in Section 4 of the Supporting information. All data are saved in the folder "data". For replicability purposes, we added version 2.13 of the RDP classifier to the repository, in the folder "RDP/java". This has been downloaded from https://sourceforge.net/projects/rdp-classifier/. For questions, contact the author at alessandro.zito@duke.edu
  • AgrImOnIA: Open Access dataset correlating livestock and air quality in the Lombardy region, Italy
    The AgrImOnIA dataset is a comprehensive dataset relating air quality and livestock (expressed as the density of bovines and swine bred) along with weather and other variables. The AgrImOnIA Dataset represents the first step of the AgrImOnIA project. The purpose of this data set is to give the opportunity to assess the impact of agriculture on air quality in Lombardy through statistical techniques capable of highlighting the relationship between the livestock sector and air pollutants concentrations. This dataset is a collection of estimated daily values for a range of measurements of different dimensions as: air quality, meteorology, emissions, livestock animals and land use. Data are related to Lombardy and the surrounding area for 2016-2021, inclusive. The surrounding area is obtained by applying a 0.3° buffer on Lombardy borders. The data uses several aggregation and interpolation methods to estimate the measurement for all days. For more details see the paper: A. Fassò, J. Rodeschini, A. Fusta Moro, Q. Shaboviq, P. Maranzano, M. Cameletti, F. Finazzi, N. Golini, R. Ignaccolo, and P. Otto (2022) Agrimonia: a dataset on livestock, meteorology and air quality in the Lombardy region, Italy. Arxiv preprint, arxiv:2210.10604. (click here). The files in the folder are: Agrimonia_Dataset.csv(.Rdata,.mat) which is built by joining the daily time series related to the AQ, WE, EM, LI and LA variables. In order to simplify access to variables in the Agrimonia dataset, the variable name starts with the dimension of the variable, i.e., the name of the variables related to the AQ dimension start with 'AQ_'. This file is archived also in the and format for MATLAB and R software, respectively. Metadata_Agrimonia.csv which provides further information for the sources used, variables imported, transformations applied, and about the Agrimonia variables. Metadata_AQ_imputation_uncertainty.csv which contains the daily uncertainty estimate of the imputed observation for the AQ to mitigate missing data in the hourly time series. Metadata_LA_CORINE_labels.csv which contains the label and the description associated with the CLC class. Metadata_monitoring_network_registry.csv which contains all details about the AQ monitoring station used to build the dataset. Information about pollutant stations includes: station type, municipality code, environment type, altitude, pollutants sampled and other information. Each row represents a single sensor. Metadata_LA_SIARL_labels.csv which contains the label and the description associated with the SIARL class. The dataset can be reproduced using the code available at the GitHub page: https://github.com/AgrImOnIA-project/AgrImOnIA_Data
  • Bayesian Modelling of Sequential Discoveries
    We aim at modelling the appearance of distinct tags in a sequence of labelled objects. Common examples of this type of data include words in a corpus or distinct species in a sample. These sequential discoveries are often summarised via accumulation curves, which count the number of distinct entities observed in an increasingly large set of objects. We propose a novel Bayesian method for species sampling modelling by directly specifying the probability of a new discovery, therefore allowing for flexible specifications. The asymptotic behavior and finite sample properties of such an approach are extensively studied. Interestingly, our enlarged class of sequential processes includes highly tractable special cases. We present a subclass of models characterized by appealing theoretical and computational properties, including one that shares the same discovery probability with the Dirichlet process. Moreover, due to strong connections with logistic regression models, the latter subclass can naturally account for covariates. We finally test our proposal on both synthetic and real data, with special emphasis on a large fungal biodiversity study in Finland.
  • ropensci/stplanr: stplanr 1.0.0
    Remove dependency on sp, rgeos and rgdal (#332) That involved removal of the following functions: catchmentArea.R Some of the functionality from linefuns.R Browse the code base as of stplanr 0.8.5 here: https://github.com/ropensci/stplanr/blob/v0.8.5 Removal of 'ABS' reading functionality in favour of https://github.com/mattcowgill/readabs
  • ropensci/osmdata: CRAN version 0.1.10
    Major changes: Changed httr dependency for httr2 (#272) Removed two authors of code formerly including for stubbing results; which is now done via httptest2 package. Minor changes: Moved jsonlite from Imports to Suggests (now only used in tests).
  • BayesANT: Bayesian Nonparametric Taxonomic classifier for DNA barcoding sequences
    BayesANT is a package for the taxonomic classification of DNA sequences. It trains a taxonomic classifier on a dataset of DNA barcodes and returns probabilistic predictions for query DNA sequences. BayesANT explicitly accounts for potential taxonomic novelty of the query sequences by relying on Bayesian nonparametric species sampling priors to model the taxonomic tree. The package works with both aligned and not aligned sequences.
1