Building an open source platform for crowdsourced open data sharing

In this episode, we interviewed Bastian Greshake, PhD student in Applied Bioinformatics at the University of Frankfurt am Main, in Germany.

If you are a researcher in a field somehow related to genetics, you might have already used OpenSNP to explore SNPs (Single Nucleotide Polymorphisms) in an open dataset. In that case, you probably already know Bastian and his work. Bastian started working on OpenSNP, a platform which allows users to upload their genotyping data and make it available to the community. The platform also allows users to share their phenotypes in order to help scientists discover new genetic associations. It also automatically gets the latest open access articles about genetic variations to inform users and researchers about SNPs.

OpenSNPfeatures

Bastian explained how to start an open source project, the issues related to these kind of projects. We also explored how to get more users for an open source project without any advertising budget, and how to get more contributors to help you develop a similar project. In the second part, Bastian helped us understand the issues related with sharing open data results with the community and how himself and his team are protecting themselves against any legal risks by ensuring that users clearly know what they are putting themselves into when using the platform.

A very clear paper explaining most of the elements we discussed during this podcast episode was written by Bastian and his co-authors, the paper was published (obviously open access) in the journal PLOS ONE and is a highly recommended read by the ColperScience team for anyone interested by working around open source or open data in research. The survey amongst the openSNP users that is referred to during the episode can also be found there.

If you have any questions or remarks, please post a comment at the bottom of this page, or contact us directly through Twitter. Thanks for listening !

Bonus !

Bastian talks about gamification and how it could help certain projects involving massive data.

References