From the Data Rescue Project: the Data Rescue Tracker. “The Data Rescue Tracker is a collaborative tool built to catalog existing public data rescue efforts so that we can coordinate better across initiatives. At this stage, you can use the tool to help reduce duplication of rescue efforts. The Data Rescue Tracker aims to provide a consolidated overview of who is backing up which dataset from […]
Scalable, Efficient Processing and Analysis of Large Audio Datasets – Pawel Cyrta – ADCx Gather 2024
https://www.youtube.com/watch?v=lHME1l9cEPk
#coding #Datasets #programming #softwareengineering
#Reddit #AI #ContentModeration #datasets
'Researchers at Cornell Tech have released a dataset extracted from more than 300,000 public Reddit communities, and a report detailing how Reddit communities are changing their policies to address a surge in AI-generated content. '
https://news.cornell.edu/stories/2025/04/dataset-reveals-how-reddit-communities-are-adapting-ai
"Almost two dozen repositories of research and public health data supported by the National Institutes of Health are marked for “review” under the Trump administration’s direction, and researchers and archivists say the data is at risk of being lost forever if the repositories go down.
“The problem with archiving this data is that we can’t,” Lisa Chinn, Head of Research Data Services at the University of Chicago, told 404 Media. Unlike other government datasets or web pages, downloading or otherwise archiving NIH data often requires a Data Use Agreement between a researcher institution and the agency, and those agreements are carefully administered through a disclosure risk review process.
A message appeared at the top of multiple NIH websites last week that says: “This repository is under review for potential modification in compliance with Administration directives.”
Repositories with the message include archives of cancer imagery, Alzheimer’s disease research, sleep studies, HIV databases, and COVID-19 vaccination and mortality data."
https://www.404media.co/nih-archives-repositories-marked-for-review-for-potential-modification/
Axios: NOAA research websites slated to go dark get a reprieve.”NOAA has averted the early cancellation of an Amazon Web Services contract that would have caused a slew of agency websites to go dark beginning at midnight, the agency said Friday. Why it matters: The outages mainly would have affected NOAA’s research division, and would have made numerous websites and data sets inaccessible to […]
https://rbfirehose.com/2025/04/06/axios-noaa-research-websites-slated-to-go-dark-get-a-reprieve/
Massive, Unarchivable #Datasets of #Cancer, #Covid, #HIV and #Alzheimer's Research Could Be Lost Forever
Days before RFK announced 10,000 #HHS staffers would lose their jobs, a message appeared on #NIH research repository sites saying they were "under review." Unlike other government datasets or web pages, downloading or otherwise archiving NIH data often requires a Data Use Agreement between a researcher institution and the agency.
https://www.404media.co/nih-archives-repositories-marked-for-review-for-potential-modification/
https://archive.ph/Y8asq
Digital Archivists: Protecting Public Data from Erasure
https://spectrum.ieee.org/digital-archive
https://news.ycombinator.com/item?id=43558182
#ListenBrainz / #MetaBrainz I'm confused. Aren't sponsors the true customer? Why use this?
On one hand #Music: "Listen together", "Ethical forever"
On the other: #DATASETS
"Some of the world’s biggest platforms such as Google and Amazon, use our data"
"We ask commercial supporters to support us in order to help fund the creation and maintenance of these datasets."
"The following organizations make use of the data-sets published by MetaBrainz"
STAT: Gold-standard maternal mortality database in limbo as CDC staff placed on leave. “As part of the sweeping layoffs that rocked the Department of Health and Human Services on Tuesday, the entire staff that oversaw an annual survey to better understand infant and maternal health — and that was considered the gold standard in the field — was placed on administrative leave. The Pregnancy […]
#research #science #BigData #DataAnalysis #datasets
'Two hundred forty-six researchers in the fields of ecology and evolutionary biology — including two from Clemson University — worked in 174 teams to answer two different research questions based on the same unpublished data sets.
They came up with a strikingly variable range of answers, including some that were direct opposites of each other.'
Clemson News: Study: Researchers’ choices could result in different conclusions from the same data . “If you give hundreds of researchers the same data and the same hypotheses to test, they will reach the same conclusions, right? Wrong, according to a recent study published in the journal BMC Biology. Two hundred forty-six researchers in the fields of ecology and evolutionary biology — […]
#Databases & #Datasets are the bedrock of scholarly practice in (digital) Humanities
This module by Emily Genatowski and James Baille from University of Vienna dives into the challenges of #dataManagement and storage for #TrainingTuesday
Visit #DARIAHCampus for more on this resource:
https://campus.dariah.eu/resource/posts/data-and-databases-data-management-and-storage
New Map Of Landscape Beneath Antarctica Unveiled
--
https://phys.org/news/2025-03-landscape-beneath-antarctica-unveiled.html <-- shared technical article
--
https://doi.org/10.1038/s41597-025-04672-y <-- shared paper
--
#GIS #spatial #mapping #Bedmap3 #icebed #surface #thickness #gridded #datasets #Antarctica #raster #model #modeling #landscape #elevation #icesheet #survey #remotesensing #earthobservation #climatechange #warming #climate #melt #melting #seafloor #subglacial #geophysical #survey #topography #geology #bathymetry #topobathy #BritishAntarcticSurvey
@BritishAntarcticSurvey
arXiv: FediverseSharing: A Novel Dataset on Cross-Platform Interaction Dynamics between Threads and Mastodon Users. “In March 2024, Threads joined this federation by introducing its Fediverse Sharing service, which enables interactions such as posts, replies, and likes between Threads and Mastodon users as if on a unified platform. Building on this development, we introduce FediverseSharing, […]
From handling massive #DataSets to streamlining delivery, UC Berkeley #Library is ensuring that #ResearchData is well-managed, accessible, and compliant with licensing agreements through #Dataverse, so resources are discoverable and usable by the entire university community. #RDM #DataManagement https://youtu.be/XVBUna3wzgk?si=c_Ixa-sWVmzs3Ezm