Request for help: Build a platform to catalogue COVID-19 related datasets

The COVID-19 Polymath initiative has already collected a long list of publicly available datasets. Currently, this information is shared in a wiki. Can anyone help the Polymath folks by setting up a platform to catalogue datasets and make them more easily browsable (by using search, tags, comments, etc.)? Does anyone know of existing solutions? Generic solutions would be preferred, as this is likely to grow into the central data-against-covid data repository.

3 Likes

Hi there,

I can help by preparing and uploading (some of) those datasets to the Kaggle platform. Kaggle hosts a fast growing database of many different datasets, as well as functionality for search, tags, and discussion. In addition, any dataset can have associated analysis Notebooks (in R or Python) which run on the Kaggle cloud.

Seeing that some of the datasets in the wiki are already hosted on Kaggle, this might be the best way to scale things up. (Note, that Kaggle also currently runs COVID-19 forecasting/machine learning challenges that are likely to benefit from any additional data).

Let me know what’s the best way for me to help.

2 Likes

Thanks for your offer! They are currently discussing what the best approaches for hosting and collecting the organizing the data sets are here. You could give them some more details on your kaggle suggestion there.

1 Like

Thanks @manuel.haussmann. I replied over there, and I’m reposting my statements below. I think that this thread / platform here might be better equipped to continue the conversation, but I’ll leave it up to the others to decide.


A major advantage of Kaggle is that they already host a lot of datasets with e.g. demographic or geospatial information on many different countries, which could be easily joined to more COVID-19 specific data for a more comprehensive analysis. The Kaggle platform is very mature and has robust search and tagging functionality. In addition, analysis notebooks in R/Python can be hosted together with the data to illustrate the extent of a dataset or provide (collaborative) analysis capabilities.

I had a look through the wiki that Jan linked, and I think that most of the datasets are already on Kaggle in some shape or form. There is a certain focus on US numbers, but Kaggle has a very international community and COVID-19 data from multiple other countries is also already present.

Maybe you guys can have a look at what’s already there: https://www.kaggle.com/datasets?search=covid-19

And then we can coordinate how to prioritise adding the missing datasets, and how to best maintain them / add future ones.

1 Like

Hi everyone,

I’m the founder of AsOne, a platform for crowdsourced research. I am in talks with Michael Nielsen about potentially migrating content from the Polymath wiki to our platform. Please let me know how I can help organize the datasets on Kaggle by linking to them on my platform. Perhaps we could use Slack to communicate?

3 Likes

Hi Thomas,

Happy to discuss here or on another platform. What’s your ideas for accessing and organising the datasets?

I have recreated Terry’s categories on the clearinghouse wiki page at https://asone.ai/topic/C19Datasets

If someone could repost the dataset links from the wiki in the relevant categories, that would be appreciated!

1 Like

Hey, my name is Jonah Librach, I’m the founder of http://sciugo.com.

Sciugo’s goal is to catalogue wetlab research in a standardized format for easier searching and reuse by other researchers.

In other words, with Sciugo I’m hoping to replace much of the reliance on manuscripts with detailed standardized metadata of the protocols and results.

Although we have already designed the schema and UI for describing wetlab protocols, it wasn’t implemented in the first release (or the current one).

We launched in beta recently, but the next update will have a search engine which allows users to search for experiments using specific drugs and filter by sequencing data associated with SARS-Cov-2.

It’s a really exciting project that’s moving really fast and it solves some of the problems you outlined above.

I’m happy to elaborate more on this platform or any other, I’d be happy to set up a call.

Jonah