Open-source efforts I'm contributing to

CZ Software Mentions Dataset

One of the largest datasets of software mentions mined from biomedical papers. I worked on disambiguation and linking algorithms for software mentions extracted from scientific research articles, as well as led the overall technical effort.

Dataset Link Github Repo

Open Global Data Citation Corpus

Work In Progress. I worked on the ML algorithms. In particularly, I built a Named Entity Recognition (NER) algorithm to extract mentions of datasets (as accession number IDs, dataset DOIs) from biomedical research articles.

Github Repo

Global Biodata Resource Inventory

Work In Progress. I worked on developing ML algorithms to extract mentions of biodata resources from scientific research articles.

Github Repo