Snorkel is an open-source system that generates training data for information extraction systems, also known as predictive systems.

Snorkel is an open-source system that introduces a new approach for rapidly creating, modeling, and managing data for training predictive systems. It is currently focused on accelerating the development of structured or "dark" data extraction applications for domains in which large labeled training sets are not available or easy to obtain. Examples include biomedical literature and clinical notes. Initial results show that Snorkel with its use of weakly labeled, noisy training data can achieve the same performance as fully supervised learning approaches with “gold standard” labeled training data.

Snorkel has applicability in many domains. Example biomedical domains where Snorkel is being used include the microbiome, joint replacements, and cancer. To learn more or to download, visit

To view recordings, slides, and other materials from the July 2017 workshop, click Downloads.