DeepCurate

1. June 2021

DeepCurate is a project using machine learning to transfer life science research results from the specialist literature into a structured, machine-readable form. The project is funded by the federal Ministry of Education and Research (BMBF) for three years, starting in January 2020.

DeepCurate: OCR representation of a scanned document (left) and automatically extracted database records (right).

Experimental data from biochemical reactions and their reaction kinetic properties are very important for research in biotechnology, medical treatment methods or diagnostics. Most of this data is published in conventional literature, not or only weakly structured. The current practice is the manual extraction and curation by human experts. SABIO-RK, developed by the SDBV group, is a curated database on biochemical reactions and their reaction kinetic properties.

In an innovative approach, DeepCurate combines different sources that are used in the curation process. DeepCurate introduces database-to-paper backprojection to define the exact data location of exctracted data. For each SABIO-RK data item, the corresponding text locations are found, thus creating rich and highly precise training data for machine-learning-based data extraction methods. Subsequently these will be used together with other training data to improve Deep Learning based NLP methods. The purpose is to integrate these findings into curation pipelines of SABIO-RK and other systems.

Contributors:

Wolfgang Müller (SDBV)
Michael Strube (NLP)

Members:

Sucheta Ghosh
Mark-Christoph Müller
Maja Rey

Publications:

Müller M, Ghosh S, Rey M, Wittig U, Müller W, Strube M (2020). Reconstructing Manual Information Extraction with DB-to-Document Backprojection: Experiments in the Life Science Domain, In Proceedings of the First Workshop on Scholarly Document Processing, Online, November 2020, pp. 81-90. DOI: 10.18653/v1/2020.sdp-1.9

Müller M (2020). pyMMAX2: Deep Access to MMAX2 Projects from Python, In Proceedings of the 14th Linguistic Annotation Workshop, Online, December 2020, pp. 167-173.
https://aclanthology.org/2020.law-1.16/
https://publications.h-its.org/publications/1146

Müller M, Ghosh S, Wittig U, Rey M (2021). Word-Level Alignment of Paper Documents with their Electronic Full-Text Counterparts, In Proceedings of the 20th SIGBioMed Workshop on Biomedical Language Processing, BioNLP 2021, Online, June 11, 2021. https://arxiv.org/abs/2104.14925

Müller M (2022). A proposal for explicit word formation annotation in discourse corpora, Book of Abstracts of the Symposium on Word Formation and Discourse Structure, Leipzig, Germany, May 2022, pp14–15. https://publications.h-its.org/publications/1594

Name	Borlabs Cookie
Provider	Eigentümer dieser Website
Purpose	Speichert die Einstellungen der Besucher, die in der Cookie Box von Borlabs Cookie ausgewählt wurden.
Cookie Name	borlabs-cookie
Cookie Expiry	1 Jahr

Accept	Matomo
Name	Matomo
Provider	HITS gGmbH
Purpose	Cookie von Matomo für Website-Analysen. Erzeugt statistische Daten darüber, wie der Besucher die Website nutzt.
Cookie Name	_pk_.
Cookie Expiry	13 Monate

Accept	Facebook
Name	Facebook
Provider	Meta Platforms Ireland Limited, 4 Grand Canal Square, Dublin 2, Ireland
Purpose	Wird verwendet, um Facebook-Inhalte zu entsperren.
Privacy Policy	https://www.facebook.com/privacy/explanation
Host(s)	.facebook.com

Accept	Google Maps
Name	Google Maps
Provider	Google Ireland Limited, Gordon House, Barrow Street, Dublin 4, Ireland
Purpose	Wird zum Entsperren von Google Maps-Inhalten verwendet.
Privacy Policy	https://policies.google.com/privacy
Host(s)	.google.com
Cookie Name	NID
Cookie Expiry	6 Monate

Accept	Instagram
Name	Instagram
Provider	Meta Platforms Ireland Limited, 4 Grand Canal Square, Dublin 2, Ireland
Purpose	Wird verwendet, um Instagram-Inhalte zu entsperren.
Privacy Policy	https://www.instagram.com/legal/privacy/
Host(s)	.instagram.com
Cookie Name	pigeon_state
Cookie Expiry	Sitzung

Accept	OpenStreetMap
Name	OpenStreetMap
Provider	Openstreetmap Foundation, St John’s Innovation Centre, Cowley Road, Cambridge CB4 0WS, United Kingdom
Purpose	Wird verwendet, um OpenStreetMap-Inhalte zu entsperren.
Privacy Policy	https://wiki.osmfoundation.org/wiki/Privacy_Policy
Host(s)	.openstreetmap.org
Cookie Name	_osm_location, _osm_session, _osm_totp_token, _osm_welcome, _pk_id., _pk_ref., _pk_ses., qos_token
Cookie Expiry	1-10 Jahre

Accept	Twitter
Name	Twitter
Provider	Twitter International Company, One Cumberland Place, Fenian Street, Dublin 2, D02 AX07, Ireland
Purpose	Wird verwendet, um Twitter-Inhalte zu entsperren.
Privacy Policy	https://twitter.com/privacy
Host(s)	.twimg.com, .twitter.com
Cookie Name	__widgetsettings, local_storage_support_test
Cookie Expiry	Unbegrenzt

Accept	Vimeo
Name	Vimeo
Provider	Vimeo Inc., 555 West 18th Street, New York, New York 10011, USA
Purpose	Wird verwendet, um Vimeo-Inhalte zu entsperren.
Privacy Policy	https://vimeo.com/privacy
Host(s)	player.vimeo.com
Cookie Name	vuid
Cookie Expiry	2 Jahre