Datasets within this collection
Filter Results
58 results
- SegFVG: A High-Resolution Large-Scale Dataset for Building Segmentation from Aerial Imagery in Northeastern ItalyAccurate building extraction from high-resolution aerial imagery is essential for numerous applications in remote sensing, urban planning, and disaster management. While AI-based methods enable fast, scalable, and cost-effective segmentation of building footprints, their development is often limited by the scarce availability of large-scale, geographically diverse datasets with reliable pixel-level annotations. In this work, we present SegFVG, a large-scale, high-resolution, and geographically diverse dataset for building segmentation, focused on the Friuli Venezia Giulia region in northeastern Italy. The dataset includes over 15,000 true orthophoto aerial image tiles, each of size 2000 × 2000 pixels with a ground sampling distance of 0.1 meters, paired with precise pixel-level building segmentation masks. Covering approximately 616 square kilometers, SegFVG captures a broad spectrum of urban, suburban, and rural settings across varied landscapes, including mountainous, flat, and coastal areas. Alongside the dataset, we provide benchmark results using several deep learning models. These support the usability of SegFVG for the development of accurate segmentation models and serve as a baseline to accelerate future research in building segmentation.
- OWL2ASP toolOWL2ASP stands for Ontology Web Language to Answering Set Programming. This Java tool permits translating an OWL 2 ontology to ASP format. The output of that translation is used by the WASP solver to obtain the justifications (MUSes) of a specific consequence WASP solver link http://alviano.github.io/wasp/
- ENRICH: multi-purposE dataset for beNchmaRking In Computer vision and pHotogrammetryA new synthetic, multi-purpose dataset - called ENRICH - for testing photogrammetric and computer vision algorithms. Compared to existing datasets, ENRICH offers higher resolution images also rendered with different lighting conditions, camera orientation, scales, and field of view. Specifically, ENRICH is composed of three sub-datasets: ENRICH-Aerial, ENRICH-Square, and ENRICH-Statue, each exhibiting different characteristics. The proposed dataset is useful for several photogrammetry and computer vision-related tasks, such as the evaluation of hand-crafted and deep learning-based local features, effects of ground control points (GCPs) configuration on the 3D accuracy, and monocular depth estimation. Each zip file in the root is relative to a specific dataset: - ENRICH-Aerial, is an aerial image block of the city of Launceston, Australia. The acquisition is performed by simulating a typical oblique aerial camera with five views (nadir and four oblique views). - ENRICH-Square, is a ground-level dataset of a square captured by four cameras, each one moving on a different path with different focal length, orientation, and lighting conditions. - ENRICH-Statue, is a ground-level dataset portraying a statue (placed in the center of the ENRICH-Square scene), acquired using four cameras moving on different paths with different focal lengths, orientations, and lighting conditions. Be sure to check the README file in the dataset root for information on folder structure and file contents. Please refer to the related paper (https://doi.org/10.1016/j.isprsjprs.2023.03.002) for information about the generation method and the purpose of ENRICH.
- Dataset related to article "Chemotherapy after PD-1 inhibitors in relapsed/refractory Hodgkin lymphoma: Outcomes and clonal evolution dynamics"This record contains raw data related to article "Chemotherapy after PD-1 inhibitors in relapsed/refractory Hodgkin lymphoma: Outcomes and clonal evolution dynamics" Checkpoint inhibitors (CPIs) are routinely employed in relapsed/refractory classical Hodgkin lymphoma. Nonetheless, persistent long-term responses are uncommon, and one-third of patients are refractory. Several reports have suggested that treatment with CPIs may re-sensitize patients to chemotherapy, however there is no consensus on the optimal chemotherapy regimen and subsequent consolidation strategy. In this retrospective study we analysed the response to rechallenge with chemotherapy after CPI failure. Furthermore, we exploratively characterized the clonal evolution profile of a small sample of patients (n = 5) by employing the CALDER approach. Among the 28 patients included in the study, 17 (71%) were primary refractory and 26 (92%) were refractory to the last chemotherapy prior to CPIs. Following rechallenge with chemotherapy, response was recorded in 23 (82%) patients experiencing complete remission and 3 (11%) patients experiencing partial remission. The tumour evolution of the patients inferred by CALDER seemingly occurred prior to the first cycle of therapy and was characterized either by linear or branching evolution patterns.Twenty-five patients proceeded to allogeneic stem cell transplantation. At a median follow-up of 21 months, median PFS and OS were not reached. In conclusion, patients who fail CPIs can be effectively rescued by salvage chemotherapy and bridged to allo-SCT/ auto-SCT.
- MALVIRUS: an integrated application for viral variant analysisAbstract Background Being able to efficiently call variants from the increasing amount of sequencing data daily produced from multiple viral strains is of the utmost importance, as demonstrated during the COVID-19 pandemic, in order to track the spread of the viral strains across the globe. Results We present MALVIRUS, an easy-to-install and easy-to-use application that assists users in multiple tasks required for the analysis of a viral population, such as the SARS-CoV-2. MALVIRUS allows to: (1) construct a variant catalog consisting in a set of variations (SNPs/indels) from the population sequences, (2) efficiently genotype and annotate variants of the catalog supported by a read sample, and (3) when the considered viral species is the SARS-CoV-2, assign the input sample to the most likely Pango lineages using the genotyped variations. Conclusions Tests on Illumina and Nanopore samples proved the efficiency and the effectiveness of MALVIRUS in analyzing SARS-CoV-2 strain samples with respect to publicly available data provided by NCBI and the more complete dataset provided by GISAID. A comparison with state-of-the-art tools showed that MALVIRUS is always more precise and often have a better recall.
- SVDSS - Example DataExample data (reference and alignments) to test SVDSS caller.
- Replication Package: Automated Detection of Software Performance Antipatterns in Java-based ApplicationsThis is the Replication Package of the paper titled "Automated Detection of Software Performance Antipatterns in Java-based Applications" under revision.
- Reddit photo Critique DatasetThe Reddit Photo Critique Dataset (RPCD) contains tuples of image and photo critiques. RPCD consists of 74K images and 220K comments and is collected from a Reddit community used by hobbyists and professional photographers to improve their photography skills by leveraging constructive community feedback. The proposed dataset differs from previous aesthetics datasets mainly in three aspects, namely (i) the large scale of the dataset and the extension of the comments criticizing different aspects of the image, (ii) it contains mostly UltraHD images, and (iii) it can easily be extended to new data as it is collected through an automatic pipeline. More info about the dataset can be found at the Github repo: https://github.com/mediatechnologycenter/aestheval
- Wirewalking over Two Medical AI Chasms: Results and Open Problems in Making "Valid AI" Also Useful in Medical PracticeAchieving a pragmatic, or even an ecological validation (Cabitza and Zeitoun, 2019) of medical AI systems that nevertheless exhibit very high (statistical) accuracy has been observed to be more complicated than initially expected (Coiera et al. 2018): in fact, most of the challenges that make technically sound systems perform poorly in real-world settings lie in the so called “last mile of implementation” (Coiera, 2019). This evocative concept expresses the semantic difference between developing medical machine learning (or medical AI) and the mere application of machine learning techniques to medical data. Moreover, we will make the point that the space bewtween machine learning development and clinical practice, is not a flat and regular path, but rather presents two chasms: the chasm of human trust, and the chasm of machine experience. The former one requires to focus on usability and explainability, while the latter ones requires data governance and to focus on data work, including practice of “data awareness” and “data hygiene”. I will discuss these notions, and report about some researches I personally conducted while trying to bridge the above chasms with mixed fortunes: what we recognize as still open problems are exciting opportunities to look at a seemingly established field from a fresh perspective (the interactionist perspective) and develop solutions that focus on the utility of the technology rather than following the mirage of accuracy.
- Replication Package: Automated Detection of Software Performance Antipatterns in Java-based ApplicationsThis is the Replication Package of the paper titled "Automated Detection of Software Performance Antipatterns in Java-based Applications" under revision.
1