{"id":44233,"date":"2021-06-28T11:40:11","date_gmt":"2021-06-28T09:40:11","guid":{"rendered":"https:\/\/www.h-its.org\/de\/projects\/deepcurate\/"},"modified":"2025-03-14T12:27:44","modified_gmt":"2025-03-14T11:27:44","slug":"deepcurate","status":"publish","type":"hits-project","link":"https:\/\/www.h-its.org\/de\/projects\/deepcurate\/","title":{"rendered":"DeepCurate"},"content":{"rendered":"\n<p>DeepCurate is a project using machine learning to transfer life science research results from the specialist literature into a structured, machine-readable form. The project is funded by the federal Ministry of Education and Research (BMBF) for three years, starting in January 2020.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"alignleft is-resized\"><a href=\"https:\/\/www.h-its.org\/wp-content\/uploads\/2021\/06\/Deep_Curate_Image.png\" target=\"_blank\" rel=\"noreferrer noopener\"><img decoding=\"async\" src=\"https:\/\/www.h-its.org\/wp-content\/uploads\/2021\/06\/Deep_Curate_Image.png\" alt=\"\" class=\"wp-image-53107\" style=\"width:375px;height:319px\" \/><\/a><figcaption class=\"wp-element-caption\">DeepCurate: OCR representation of a scanned document (left) and automatically extracted database records (right).<\/figcaption><\/figure>\n<\/div>\n\n\n<p>Experimental data from biochemical reactions and their reaction kinetic properties are very important for research in biotechnology, medical treatment methods or diagnostics. Most of this data is published in conventional literature, not or only weakly structured. The current practice is the manual extraction and curation by human experts. <a rel=\"noreferrer noopener\" href=\"http:\/\/sabio.h-its.org\/\" target=\"_blank\">SABIO-RK<\/a>, developed by the <a href=\"https:\/\/www.h-its.org\/research\/sdbv\/\" target=\"_blank\" rel=\"noreferrer noopener\">SDBV group<\/a>, is a curated database on biochemical reactions and their reaction kinetic properties.<\/p>\n\n\n\n<p>In an innovative approach, DeepCurate combines different sources that are used in the curation process. DeepCurate introduces database-to-paper backprojection to define the exact data location of exctracted data. For each SABIO-RK data item, the corresponding text locations are found, thus creating rich and highly precise training data for machine-learning-based data extraction methods. Subsequently these will be used together with other training data to improve Deep Learning based NLP methods. The purpose is to integrate these findings into curation pipelines of SABIO-RK and other systems.<\/p>\n\n\n\n<p><strong>Contributors:<\/strong><\/p>\n\n\n\n<p><a href=\"https:\/\/www.h-its.org\/people\/priv-doz-dr-wolfgang-muller\/\" target=\"_blank\" rel=\"noreferrer noopener\">Wolfgang M\u00fcller<\/a> (SDBV)<br><a href=\"https:\/\/www.h-its.org\/people\/prof-dr-michael-strube\/\" target=\"_blank\" rel=\"noreferrer noopener\">Michael Strube<\/a> (NLP)<\/p>\n\n\n\n<p><strong>People:<\/strong><\/p>\n\n\n\n<p><a href=\"https:\/\/www.h-its.org\/people\/dr-sucheta-ghosh-2\/\" target=\"_blank\" rel=\"noreferrer noopener\">Sucheta Ghosh<\/a><br><a href=\"https:\/\/www.h-its.org\/people\/dr-mark-christoph-muller\/\" target=\"_blank\" rel=\"noreferrer noopener\">Mark-Christoph M\u00fcller<\/a><br><a href=\"https:\/\/www.h-its.org\/people\/dr-maja-rey-2\/\" target=\"_blank\" rel=\"noreferrer noopener\">Maja Rey<\/a><\/p>\n\n\n\n<p><strong>Publications:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>M\u00fcller M, Ghosh S, Rey M, Wittig U, M\u00fcller W, Strube M (2020). <a href=\"https:\/\/www.aclweb.org\/anthology\/2020.sdp-1.9\" target=\"_blank\" rel=\"noreferrer noopener\">Reconstructing Manual Information Extraction with DB-to-Document Backprojection: Experiments in the Life Science Domain<\/a>, In Proceedings of the First Workshop on Scholarly Document Processing, Online, November 2020, pp. 81-90. DOI: <a href=\"http:\/\/dx.doi.org\/10.18653\/v1\/2020.sdp-1.9\" target=\"_blank\" rel=\"noreferrer noopener\">10.18653\/v1\/2020.sdp-1.9<\/a> \u00a0<br><br>M\u00fcller M (2020). pyMMAX2: Deep Access to MMAX2 Projects from Python, In Proceedings of the 14th Linguistic Annotation Workshop, Online, December 2020, pp. 167-173. <br><a href=\"https:\/\/aclanthology.org\/2020.law-1.16\/\">https:\/\/aclanthology.org\/2020.law-1.16\/<\/a><br><a href=\"https:\/\/publications.h-its.org\/publications\/1146\">https:\/\/publications.h-its.org\/publications\/1146<\/a><br><br>M\u00fcller M, Ghosh S, Wittig U, Rey M (2021). Word-Level Alignment of Paper Documents with their Electronic Full-Text Counterparts, In Proceedings of the 20th SIGBioMed Workshop on Biomedical Language Processing, BioNLP 2021, Online, June 11, 2021. <a href=\"https:\/\/arxiv.org\/abs\/2104.14925\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/arxiv.org\/abs\/2104.14925<\/a><br><br>M\u00fcller M (2022). A proposal for explicit word formation annotation in discourse corpora, Book of Abstracts of the Symposium on Word Formation and Discourse Structure, Leipzig, Germany, May 2022, pp14\u201315. <a href=\"https:\/\/publications.h-its.org\/publications\/1594\">https:\/\/publications.h-its.org\/publications\/1594<\/a><\/td><\/tr><\/tbody><\/table><\/figure>\n","protected":false},"author":58,"featured_media":44234,"template":"","hits-research-group":[1418],"hits-project-category":[1396],"class_list":["post-44233","hits-project","type-hits-project","status-publish","has-post-thumbnail","hentry","hits-research-group-hits-lab","hits-project-category-previous-projects-de"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>DeepCurate - HITS gGmbH<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.h-its.org\/de\/projects\/deepcurate\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"DeepCurate - HITS gGmbH\" \/>\n<meta property=\"og:description\" content=\"DeepCurate is a project using machine learning to transfer life science research results from the specialist literature into a structured, machine-readable ...\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.h-its.org\/de\/projects\/deepcurate\/\" \/>\n<meta property=\"og:site_name\" content=\"HITS gGmbH\" \/>\n<meta property=\"article:modified_time\" content=\"2025-03-14T11:27:44+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.h-its.org\/de\/wp-content\/uploads\/sites\/2\/2021\/06\/Deep_Curate_Image.png\" \/>\n\t<meta property=\"og:image:width\" content=\"939\" \/>\n\t<meta property=\"og:image:height\" content=\"800\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data1\" content=\"3\u00a0Minuten\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.h-its.org\\\/de\\\/projects\\\/deepcurate\\\/\",\"url\":\"https:\\\/\\\/www.h-its.org\\\/de\\\/projects\\\/deepcurate\\\/\",\"name\":\"DeepCurate - HITS gGmbH\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.h-its.org\\\/de\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.h-its.org\\\/de\\\/projects\\\/deepcurate\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.h-its.org\\\/de\\\/projects\\\/deepcurate\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.h-its.org\\\/de\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2021\\\/06\\\/Deep_Curate_Image.png\",\"datePublished\":\"2021-06-28T09:40:11+00:00\",\"dateModified\":\"2025-03-14T11:27:44+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.h-its.org\\\/de\\\/projects\\\/deepcurate\\\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.h-its.org\\\/de\\\/projects\\\/deepcurate\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.h-its.org\\\/de\\\/projects\\\/deepcurate\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.h-its.org\\\/de\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2021\\\/06\\\/Deep_Curate_Image.png\",\"contentUrl\":\"https:\\\/\\\/www.h-its.org\\\/de\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2021\\\/06\\\/Deep_Curate_Image.png\",\"width\":939,\"height\":800,\"caption\":\"DeepCurate: OCR representation of a scanned document (left) and automatically extracted database records (right).\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.h-its.org\\\/de\\\/projects\\\/deepcurate\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.h-its.org\\\/de\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Projects\",\"item\":\"https:\\\/\\\/www.h-its.org\\\/de\\\/projects\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"DeepCurate\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.h-its.org\\\/de\\\/#website\",\"url\":\"https:\\\/\\\/www.h-its.org\\\/de\\\/\",\"name\":\"HITS gGmbH\",\"description\":\"Heidelberg Institute for Theoretical Studies\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.h-its.org\\\/de\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"DeepCurate - HITS gGmbH","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.h-its.org\/de\/projects\/deepcurate\/","og_locale":"de_DE","og_type":"article","og_title":"DeepCurate - HITS gGmbH","og_description":"DeepCurate is a project using machine learning to transfer life science research results from the specialist literature into a structured, machine-readable ...","og_url":"https:\/\/www.h-its.org\/de\/projects\/deepcurate\/","og_site_name":"HITS gGmbH","article_modified_time":"2025-03-14T11:27:44+00:00","og_image":[{"width":939,"height":800,"url":"https:\/\/www.h-its.org\/de\/wp-content\/uploads\/sites\/2\/2021\/06\/Deep_Curate_Image.png","type":"image\/png"}],"twitter_card":"summary_large_image","twitter_misc":{"Gesch\u00e4tzte Lesezeit":"3\u00a0Minuten"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.h-its.org\/de\/projects\/deepcurate\/","url":"https:\/\/www.h-its.org\/de\/projects\/deepcurate\/","name":"DeepCurate - HITS gGmbH","isPartOf":{"@id":"https:\/\/www.h-its.org\/de\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.h-its.org\/de\/projects\/deepcurate\/#primaryimage"},"image":{"@id":"https:\/\/www.h-its.org\/de\/projects\/deepcurate\/#primaryimage"},"thumbnailUrl":"https:\/\/www.h-its.org\/de\/wp-content\/uploads\/sites\/2\/2021\/06\/Deep_Curate_Image.png","datePublished":"2021-06-28T09:40:11+00:00","dateModified":"2025-03-14T11:27:44+00:00","breadcrumb":{"@id":"https:\/\/www.h-its.org\/de\/projects\/deepcurate\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.h-its.org\/de\/projects\/deepcurate\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.h-its.org\/de\/projects\/deepcurate\/#primaryimage","url":"https:\/\/www.h-its.org\/de\/wp-content\/uploads\/sites\/2\/2021\/06\/Deep_Curate_Image.png","contentUrl":"https:\/\/www.h-its.org\/de\/wp-content\/uploads\/sites\/2\/2021\/06\/Deep_Curate_Image.png","width":939,"height":800,"caption":"DeepCurate: OCR representation of a scanned document (left) and automatically extracted database records (right)."},{"@type":"BreadcrumbList","@id":"https:\/\/www.h-its.org\/de\/projects\/deepcurate\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.h-its.org\/de\/"},{"@type":"ListItem","position":2,"name":"Projects","item":"https:\/\/www.h-its.org\/de\/projects\/"},{"@type":"ListItem","position":3,"name":"DeepCurate"}]},{"@type":"WebSite","@id":"https:\/\/www.h-its.org\/de\/#website","url":"https:\/\/www.h-its.org\/de\/","name":"HITS gGmbH","description":"Heidelberg Institute for Theoretical Studies","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.h-its.org\/de\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"}]}},"publishpress_future_action":{"enabled":false,"date":"2026-05-07 06:40:48","action":"change-status","newStatus":"draft","terms":[],"taxonomy":"hits-research-group","extraData":[]},"publishpress_future_workflow_manual_trigger":{"enabledWorkflows":[]},"_links":{"self":[{"href":"https:\/\/www.h-its.org\/de\/wp-json\/wp\/v2\/hits-project\/44233","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.h-its.org\/de\/wp-json\/wp\/v2\/hits-project"}],"about":[{"href":"https:\/\/www.h-its.org\/de\/wp-json\/wp\/v2\/types\/hits-project"}],"author":[{"embeddable":true,"href":"https:\/\/www.h-its.org\/de\/wp-json\/wp\/v2\/users\/58"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.h-its.org\/de\/wp-json\/wp\/v2\/media\/44234"}],"wp:attachment":[{"href":"https:\/\/www.h-its.org\/de\/wp-json\/wp\/v2\/media?parent=44233"}],"wp:term":[{"taxonomy":"hits-research-group","embeddable":true,"href":"https:\/\/www.h-its.org\/de\/wp-json\/wp\/v2\/hits-research-group?post=44233"},{"taxonomy":"hits-project-category","embeddable":true,"href":"https:\/\/www.h-its.org\/de\/wp-json\/wp\/v2\/hits-project-category?post=44233"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}