Skip to main content
Bicocca Open Archive Research Data

Datasets within this collection

Filter Results
1970
2025
1970 2025
58 results
  • Dataset related to article "Chemotherapy after PD-1 inhibitors in relapsed/refractory Hodgkin lymphoma: Outcomes and clonal evolution dynamics"
    This record contains raw data related to article "Chemotherapy after PD-1 inhibitors in relapsed/refractory Hodgkin lymphoma: Outcomes and clonal evolution dynamics" Checkpoint inhibitors (CPIs) are routinely employed in relapsed/refractory classical Hodgkin lymphoma. Nonetheless, persistent long-term responses are uncommon, and one-third of patients are refractory. Several reports have suggested that treatment with CPIs may re-sensitize patients to chemotherapy, however there is no consensus on the optimal chemotherapy regimen and subsequent consolidation strategy. In this retrospective study we analysed the response to rechallenge with chemotherapy after CPI failure. Furthermore, we exploratively characterized the clonal evolution profile of a small sample of patients (n = 5) by employing the CALDER approach. Among the 28 patients included in the study, 17 (71%) were primary refractory and 26 (92%) were refractory to the last chemotherapy prior to CPIs. Following rechallenge with chemotherapy, response was recorded in 23 (82%) patients experiencing complete remission and 3 (11%) patients experiencing partial remission. The tumour evolution of the patients inferred by CALDER seemingly occurred prior to the first cycle of therapy and was characterized either by linear or branching evolution patterns.Twenty-five patients proceeded to allogeneic stem cell transplantation. At a median follow-up of 21 months, median PFS and OS were not reached. In conclusion, patients who fail CPIs can be effectively rescued by salvage chemotherapy and bridged to allo-SCT/ auto-SCT.
  • MALVIRUS: an integrated application for viral variant analysis
    Abstract Background Being able to efficiently call variants from the increasing amount of sequencing data daily produced from multiple viral strains is of the utmost importance, as demonstrated during the COVID-19 pandemic, in order to track the spread of the viral strains across the globe. Results We present MALVIRUS, an easy-to-install and easy-to-use application that assists users in multiple tasks required for the analysis of a viral population, such as the SARS-CoV-2. MALVIRUS allows to: (1) construct a variant catalog consisting in a set of variations (SNPs/indels) from the population sequences, (2) efficiently genotype and annotate variants of the catalog supported by a read sample, and (3) when the considered viral species is the SARS-CoV-2, assign the input sample to the most likely Pango lineages using the genotyped variations. Conclusions Tests on Illumina and Nanopore samples proved the efficiency and the effectiveness of MALVIRUS in analyzing SARS-CoV-2 strain samples with respect to publicly available data provided by NCBI and the more complete dataset provided by GISAID. A comparison with state-of-the-art tools showed that MALVIRUS is always more precise and often have a better recall.
  • Additional file 2: of VISPA2: a scalable pipeline for high-throughput identification and annotation of vector integration sites
    In silico dataset and accuracy assessment results. The excel table reports the list of all IS (in rows) and the corresponding output returned by the different tools (divided by colors in the following order: VISPA, VISPA2, MAVRIC, SEQMAP, QUICKMAP). For each read (identified by its “ID” in column “header”), we reported the source genomic coordinates (in columns chromosome “chr”, integration point “locus”, and orientation “strand”), the source of annotation as described in VISPA [22] and the nucleotide sequence. Then we reported the output of IS for each tool: the first set of columns report the returned IS genomic coordinates (columns “header”, “chr”, “locus” and “strand”), whereas the other columns label each IS for statistical assessment as true positive (TP), false positive (FP), and false negative (FN) based on the genomic distance (“IS distance”) from the ground truth. Precision and recall are then derived by the columns of TP, FP, and FN. (XLSX 233 kb)
  • VISPA2: a scalable pipeline for high-throughput identification and annotation of vector integration sites
    Abstract Background Bioinformatics tools designed to identify lentiviral or retroviral vector insertion sites in the genome of host cells are used to address the safety and long-term efficacy of hematopoietic stem cell gene therapy applications and to study the clonal dynamics of hematopoietic reconstitution. The increasing number of gene therapy clinical trials combined with the increasing amount of Next Generation Sequencing data, aimed at identifying integration sites, require both highly accurate and efficient computational software able to correctly process “big data” in a reasonable computational time. Results Here we present VISPA2 (Vector Integration Site Parallel Analysis, version 2), the latest optimized computational pipeline for integration site identification and analysis with the following features: (1) the sequence analysis for the integration site processing is fully compliant with paired-end reads and includes a sequence quality filter before and after the alignment on the target genome; (2) an heuristic algorithm to reduce false positive integration sites at nucleotide level to reduce the impact of Polymerase Chain Reaction or trimming/alignment artifacts; (3) a classification and annotation module for integration sites; (4) a user friendly web interface as researcher front-end to perform integration site analyses without computational skills; (5) the time speedup of all steps through parallelization (Hadoop free). Conclusions We tested VISPA2 performances using simulated and real datasets of lentiviral vector integration sites, previously obtained from patients enrolled in a hematopoietic stem cell gene therapy clinical trial and compared the results with other preexisting tools for integration site analysis. On the computational side, VISPA2 showed a > 6-fold speedup and improved precision and recall metrics (1 and 0.97 respectively) compared to previously developed computational pipelines. These performances indicate that VISPA2 is a fast, reliable and user-friendly tool for integration site analysis, which allows gene therapy integration data to be handled in a cost and time effective fashion. Moreover, the web access of VISPA2 ( http://openserver.itb.cnr.it/vispa/ ) ensures accessibility and ease of usage to researches of a complex analytical tool. We released the source code of VISPA2 in a public repository ( https://bitbucket.org/andreacalabria/vispa2 ).
  • SVDSS - Example Data
    Example data (reference and alignments) to test SVDSS caller.
  • Replication Package: Automated Detection of Software Performance Antipatterns in Java-based Applications
    This is the Replication Package of the paper titled "Automated Detection of Software Performance Antipatterns in Java-based Applications" under revision.
  • Reddit photo Critique Dataset
    The Reddit Photo Critique Dataset (RPCD) contains tuples of image and photo critiques. RPCD consists of 74K images and 220K comments and is collected from a Reddit community used by hobbyists and professional photographers to improve their photography skills by leveraging constructive community feedback. The proposed dataset differs from previous aesthetics datasets mainly in three aspects, namely (i) the large scale of the dataset and the extension of the comments criticizing different aspects of the image, (ii) it contains mostly UltraHD images, and (iii) it can easily be extended to new data as it is collected through an automatic pipeline. More info about the dataset can be found at the Github repo: https://github.com/mediatechnologycenter/aestheval
  • Wirewalking over Two Medical AI Chasms: Results and Open Problems in Making "Valid AI" Also Useful in Medical Practice
    Achieving a pragmatic, or even an ecological validation (Cabitza and Zeitoun, 2019) of medical AI systems that nevertheless exhibit very high (statistical) accuracy has been observed to be more complicated than initially expected (Coiera et al. 2018): in fact, most of the challenges that make technically sound systems perform poorly in real-world settings lie in the so called “last mile of implementation” (Coiera, 2019). This evocative concept expresses the semantic difference between developing medical machine learning (or medical AI) and the mere application of machine learning techniques to medical data. Moreover, we will make the point that the space bewtween machine learning development and clinical practice, is not a flat and regular path, but rather presents two chasms: the chasm of human trust, and the chasm of machine experience. The former one requires to focus on usability and explainability, while the latter ones requires data governance and to focus on data work, including practice of “data awareness” and “data hygiene”. I will discuss these notions, and report about some researches I personally conducted while trying to bridge the above chasms with mixed fortunes: what we recognize as still open problems are exciting opportunities to look at a seemingly established field from a fresh perspective (the interactionist perspective) and develop solutions that focus on the utility of the technology rather than following the mirage of accuracy.
  • Replication Package: Automated Detection of Software Performance Antipatterns in Java-based Applications
    This is the Replication Package of the paper titled "Automated Detection of Software Performance Antipatterns in Java-based Applications" under revision.
  • Recognition of skin diseases and exanthema with deep learning techniques
1