Talks & Media



...
Join the conversation: Building the Open Global Data Citation Corpus / Panelist
I built a Machine Learning model that extracts mentions of datasets from biomedical research papers. This data will be used for the Open Global Data Citation Corpus. The webinar discusses the significance of building this corpus.

Decription: Wellcome Trust and the Chan Zuckerberg Initiative Partners with DataCite to Build the Open Global Data Citation Corpus
Aggregated references to data across outputs will help the community monitor impact, inform future funding, and improve the dissemination of research
DataCite is pleased to announce that The Wellcome Trust has awarded funds to build the Open Global Data Citation Corpus to dramatically transform the data citation landscape. The corpus will store asserted data citations from a diverse set of sources and can be used by any community stakeholder.
Interested community stakeholders are invited to join the virtual kick-off and participate in a conversation between DataCite, Wellcome Trust, Chan Zuckerberg Initiative, EMBL-EBI, COKI, OpenAIRE, and OpenCitations.

Webinar Link
Presentation Available on Youtube


Hunting for the best bioscience software tool? Check this database / nature Technology Feature Quote

A data set funded by the Chan Zuckerberg Initiative shows how research software and tools are used across disciplines — and helps developers gain credit for their work.

nature Article highlighting the impact of the CZ Software Mentions Dataset , on which I worked on at Chan Zuckerberg Initiative. The dataset mines Software Mentions from the biomedical literature at scale.

Article Link Author: Matthew Hutson



...
New data reveals the hidden impact of open source in science / Blog Post

Understanding software used by scientists by mining the biomedical literature

Medium Blog Post on one of the projects I worked on at Chan Zuckerberg Initiative about mining for Software Mentions from the biomedical literature.

Blog Post Link Co-Authors: Boris Veytsman, Donghui Li, Dario Taraborelli, Ivana Williams



...
Women in Tech Share Tips on How They Started Their Career / Blog Post Feature

Meet six trailblazing CZI women working in the technology field and learn how their careers started and how they are going now.

Quote: How It Started For me, it started with math! Ever since elementary school, math was something I’ve always been very good at. I liked it because it’s logical and the only subject I didn’t have to study for at home because I picked it up very easily in class. Of course, that’s also because I had an amazing math teacher. As I grew up, I started developing a similar interest in science. Chemistry, in particular, gave me a logical framework to understand the world around me through molecule interactions. I participated in a number of science competitions, including national and international Science Olympiads. Because of these experiences, I’ve never thought of myself as pursuing anything other than a career in STEM. ...

Blog Post Link



Workshop On Open Citations And Open Scholarly Metadata 2022 / Speaker

Essential frontiers: open data & software citations, an automated ML approach

Abstract: Science is progressive, and every discovery, set of data, and publication builds on previous work. Today, it's impossible to put every new development in the context of what's gone before. Comprehensive open citations can both enable the attribution of scientific progress as well as the evaluation of research and its impacts. For citations to live up to its promise as a vehicle for the discovery, dissemination, and evaluation of all scholarly knowledge, the open citation frontier needs to expand beyond traditional bibliographic metadata into other essential scientific resources such as research data and software. We describe a new open corpus of dataset and software mentions in biomedical papers created by applying machine learning to full text biomedical literature. We share the process of extraction and transformation of mentions into citations, as well as opportunities and challenges that come with disambiguating and linking the mentions in an open dataset of this size.

Presentation Link (Zenodo) Joint Presentation with Jennifer Lin



...
Women In Technology Global Conference 2022! / Speaker

Using transformer models for your own NLP task - building an NLP model End To End
Abstract: Transformer models have revolutionized the NLP field and are currently state-of-the art on a variety of tasks, such as named entity recognition, language inference or question answering. With new, more performant models being continuously developed (BERT, RoBERTa, AlBERT, ELECTRA, ERNIE, etc), these models are ubiquitous in virtually all domains that make use of natural language processing.So how can you apply these models on your own task? In this talk, we will go over the process of using state-of-the-art transformer models for your own NLP task. We will discuss the entire pipeline, from building a training corpus, developing a NLP model and evaluating the model. We will offer an example of building a model to extract mentions of Experimental Methods and Datasets from full-text biomedical papers. Even though our example will focus on an NLP task for the biomedical text, the framework can be applied to any domain. ....

Presentation Link



6 Inspiring #WomenInSTEM Who Are Building the Future / Blog Post Feature

Featured along some amazing women from CZI!

Quote: Ana-Maria went into a career in science because of the beauty she saw in trying to understand the universe and the technology helping to do that in an automated way. In college, she was intrigued by the ideas of randomness and probability, and ended up specializing in artificial intelligence. Now at CZI, she builds machine learning solutions to support the questions brought up by program areas. At the end of the day, Ana-Maria feels that it’s all about the joy she gets out of coming up with solutions to challenging problems.

Blog Post Link

...


...
MLConf 2021 / Speaker

BERT for Named Entity Recognition (NER) on specialized corpora.
Abstract: Transformer models have revolutionized the NLP field and are currently state-of-the art on a variety of tasks, such as named entity recognition, language inference or question answering. With new, more performant models being continuously developed (BERT, RoBERTa, AlBERT, ELECTRA, ERNIE, etc), these models are ubiquitous in virtually all domains that make use of natural language processing. So how can you apply these models on a specific task at hand, especially when the distribution of the data is different from the one the models have been trained on? In this talk, we will go over the process of using transformer models for the Named Entity Recognition (NER) task on specialized corpora. We will offer a specific example of building a BERT-based Named Entity Recognition model for mining for Experimental Methods and Datasets from full-text biomedical papers. ....

Speaker Link
Presentation Link



PIDapalooza 2021 / Speaker

Deep Linking: Machine learning to connect up the PIDs

Abstract: Science is progressive, and every discovery, set of data, and publication builds on previous work. Today, it's impossible to put every new development in the context of what's gone before, especially if research outputs are largely invisible living all over the web disconnected to each other. Meta aims to remove this barrier to scientific progress with its graph of biomedical research connecting up PIDs across "people, places, things." We apply machine learning to the scientific literature as a way to get retrieve more connections between these essential elements. During this session, we will share the work we've done, lessons we're learning, and open up the remaining time as a group discussion on best practices, pitfalls, areas of opportunity.

Presentation Link Joint Presentation with Jennifer Lin, Alex Wade