{"id":33410,"date":"2019-03-25T16:48:02","date_gmt":"2019-03-25T15:48:02","guid":{"rendered":"http:\/\/www.h-its.org\/downloads\/wikibiography-corpus\/"},"modified":"2019-05-23T10:58:54","modified_gmt":"2019-05-23T08:58:54","slug":"wikibiography-corpus","status":"publish","type":"hits-software","link":"https:\/\/www.h-its.org\/de\/software\/wikibiography-corpus\/","title":{"rendered":"WikiBiography Corpus"},"content":{"rendered":"\n<p>WikiBiography\u00a0Click\u00a0<a href=\"http:\/\/hits-data-migration.test\/wp-content\/uploads\/2014\/12\/wiki-biography.tar.gz\">here<\/a>\u00a0to download WikiBiography.<\/p>\n\n\n\n<p>WikiBiography is a corpus of about 1200 annotated biographies from the German version of\u00a0<a href=\"http:\/\/de.wikipedia.org\/\">Wikipedia<\/a>. Fully automatic preprocessing includes the following:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>sentence boundaries<\/li><li>part-of-speech tags<\/li><li>word lemmas<\/li><li>syntactic dependencies<\/li><li>anaphora resolution*<\/li><li>discourse connectives<\/li><li>classified named entities<\/li><li>temporal expressions<\/li><\/ul>\n\n\n\n<p>* there is only one coreference chain which links all mentions of the biographee.&nbsp;The annotation is done with freely available software (see references). To visualize the data and access and correct the annotation you should use&nbsp;<a href=\"http:\/\/mmax2.sourceforge.net\/\">MMAX2<\/a>. With MMAX2 API you can access any layer of annotation from your Java programs.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Screenshots<\/h2>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/cosyne.h-its.org\/nlpdl\/wikibiography\/wikibio1.jpg\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/cosyne.h-its.org\/nlpdl\/wikibiography\/wikibio2.jpg\" alt=\"\"\/><\/figure>\n\n\n\n<p>Orange and green fonts are used for temporal expressions (e.g. \u201c7. Oktober 1885\u201d, \u201csp\u00e4ter\u201d) and locations (e.g. \u201cKopenhagen\u201d, \u201cD\u00e4nemarks\u201d) respectively. People other than the biographee (e.g. \u201cChtistian Bohr\u201d, \u201cHarald Bohr\u201d) are highlighted with light-blue. Mentions of the biographee are highlighted with red (e.g. \u201cNiels Henrik David Bohr\u201d, \u201cer\u201d, \u201cNiels Bohr\u201d). The annotation of a selected word (e.g. \u201cProfessor\u201d) is displayed in a separate window. The head of the word is highlighted with grey colour then and an ark from the dependent word to its head is displayed.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Code Sample<\/h2>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/cosyne.h-its.org\/nlpdl\/wikibiography\/wikibio3.jpg\" alt=\"\"\/><\/figure>\n\n\n\n<p>Download<\/p>\n\n\n\n<p>Click\u00a0<a href=\"http:\/\/hits-data-migration.test\/wp-content\/uploads\/2014\/12\/wiki-biography.tar.gz\">here<\/a>\u00a0to download WikiBiography.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">References<\/h2>\n\n\n\n<p>A\u00a0<a href=\"http:\/\/search.cpan.org\/~holsten\/Lingua-DE-Sentence-0.07\/Sentence.pm\">CPAN Perl module<\/a>\u00a0is used for sentence boundaries identification.<\/p>\n\n\n\n<p><a href=\"http:\/\/www.coli.uni-saarland.de\/~thorsten\/tnt\/\">TNT tagger<\/a>&nbsp;is used for PoS-tagging:&nbsp;<br>Brants, T.: 2000, \u2018TnT \u2013 A statistical Part-of-Speech tagger\u2019. In: Proceedings of the 6th Conference on Applied Natural Language Processing, Seattle, Wash., 29 April \u2013 4 May 2000. pp. 224-231.<\/p>\n\n\n\n<p><a href=\"http:\/\/www.ims.uni-stuttgart.de\/projekte\/corplex\/TreeTagger\/\">TreeTagger<\/a>&nbsp;is used for lemmatization:&nbsp;<br>Schmid, H.: 1997, \u2018Probabilistic part-of-speech tagging using decision trees\u2019. In: D. Jones and H. Somers (eds.): New Methods in Language Processing. London, UK: UCL Press, pp. 154-164.<\/p>\n\n\n\n<p><a href=\"http:\/\/nats-www.informatik.uni-hamburg.de\/CDG\/ParserDemo\">WCDG parser<\/a>&nbsp;is used for dependency parsing:&nbsp;<br>Foth, K. and W. Menzel: 2006, \u2018Hybrid parsing: Using probabilistic models as predictors for a symbolic parser\u2019. In: Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, 17-21 July 2006. pp. 321-327.<\/p>\n\n\n\n<p>A list of about 300 connectives from&nbsp;<a href=\"http:\/\/hypermedia.ids-mannheim.de\/pls\/public\/gramwb.ansicht\">IDS Mannheim<\/a>&nbsp;is used to identify these connectives in our corpus.<\/p>\n\n\n\n<p>Temporal expressions are identified with a set of templates. Named entities are classified as person, location or organization based on the information from Wikipedia.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>WikiBiography\u00a0Click\u00a0here\u00a0to download WikiBiography. WikiBiography is a corpus of about 1200 annotated biographies from the German version of\u00a0Wikipedia. Fully &#8230;<\/p>\n","protected":false},"featured_media":0,"template":"","hits-research-group":[1302],"hits-software-category":[],"class_list":["post-33410","hits-software","type-hits-software","status-publish","hentry","hits-research-group-nlp-de"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.2 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>WikiBiography Corpus - HITS gGmbH<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.h-its.org\/de\/software\/wikibiography-corpus\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"WikiBiography Corpus - HITS gGmbH\" \/>\n<meta property=\"og:description\" content=\"WikiBiography\u00a0Click\u00a0here\u00a0to download WikiBiography. WikiBiography is a corpus of about 1200 annotated biographies from the German version of\u00a0Wikipedia. Fully ...\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.h-its.org\/de\/software\/wikibiography-corpus\/\" \/>\n<meta property=\"og:site_name\" content=\"HITS gGmbH\" \/>\n<meta property=\"article:modified_time\" content=\"2019-05-23T08:58:54+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/cosyne.h-its.org\/nlpdl\/wikibiography\/wikibio1.jpg\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data1\" content=\"2\u00a0Minuten\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.h-its.org\/de\/software\/wikibiography-corpus\/\",\"url\":\"https:\/\/www.h-its.org\/de\/software\/wikibiography-corpus\/\",\"name\":\"WikiBiography Corpus - HITS gGmbH\",\"isPartOf\":{\"@id\":\"https:\/\/www.h-its.org\/de\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.h-its.org\/de\/software\/wikibiography-corpus\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.h-its.org\/de\/software\/wikibiography-corpus\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/cosyne.h-its.org\/nlpdl\/wikibiography\/wikibio1.jpg\",\"datePublished\":\"2019-03-25T15:48:02+00:00\",\"dateModified\":\"2019-05-23T08:58:54+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.h-its.org\/de\/software\/wikibiography-corpus\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.h-its.org\/de\/software\/wikibiography-corpus\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/www.h-its.org\/de\/software\/wikibiography-corpus\/#primaryimage\",\"url\":\"https:\/\/cosyne.h-its.org\/nlpdl\/wikibiography\/wikibio1.jpg\",\"contentUrl\":\"https:\/\/cosyne.h-its.org\/nlpdl\/wikibiography\/wikibio1.jpg\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.h-its.org\/de\/software\/wikibiography-corpus\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.h-its.org\/de\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Software\",\"item\":\"https:\/\/www.h-its.org\/de\/software\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"WikiBiography Corpus\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.h-its.org\/de\/#website\",\"url\":\"https:\/\/www.h-its.org\/de\/\",\"name\":\"HITS gGmbH\",\"description\":\"Heidelberg Institute for Theoretical Studies\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.h-its.org\/de\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"WikiBiography Corpus - HITS gGmbH","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.h-its.org\/de\/software\/wikibiography-corpus\/","og_locale":"de_DE","og_type":"article","og_title":"WikiBiography Corpus - HITS gGmbH","og_description":"WikiBiography\u00a0Click\u00a0here\u00a0to download WikiBiography. WikiBiography is a corpus of about 1200 annotated biographies from the German version of\u00a0Wikipedia. Fully ...","og_url":"https:\/\/www.h-its.org\/de\/software\/wikibiography-corpus\/","og_site_name":"HITS gGmbH","article_modified_time":"2019-05-23T08:58:54+00:00","og_image":[{"url":"https:\/\/cosyne.h-its.org\/nlpdl\/wikibiography\/wikibio1.jpg","type":"","width":"","height":""}],"twitter_card":"summary_large_image","twitter_misc":{"Gesch\u00e4tzte Lesezeit":"2\u00a0Minuten"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.h-its.org\/de\/software\/wikibiography-corpus\/","url":"https:\/\/www.h-its.org\/de\/software\/wikibiography-corpus\/","name":"WikiBiography Corpus - HITS gGmbH","isPartOf":{"@id":"https:\/\/www.h-its.org\/de\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.h-its.org\/de\/software\/wikibiography-corpus\/#primaryimage"},"image":{"@id":"https:\/\/www.h-its.org\/de\/software\/wikibiography-corpus\/#primaryimage"},"thumbnailUrl":"https:\/\/cosyne.h-its.org\/nlpdl\/wikibiography\/wikibio1.jpg","datePublished":"2019-03-25T15:48:02+00:00","dateModified":"2019-05-23T08:58:54+00:00","breadcrumb":{"@id":"https:\/\/www.h-its.org\/de\/software\/wikibiography-corpus\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.h-its.org\/de\/software\/wikibiography-corpus\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.h-its.org\/de\/software\/wikibiography-corpus\/#primaryimage","url":"https:\/\/cosyne.h-its.org\/nlpdl\/wikibiography\/wikibio1.jpg","contentUrl":"https:\/\/cosyne.h-its.org\/nlpdl\/wikibiography\/wikibio1.jpg"},{"@type":"BreadcrumbList","@id":"https:\/\/www.h-its.org\/de\/software\/wikibiography-corpus\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.h-its.org\/de\/"},{"@type":"ListItem","position":2,"name":"Software","item":"https:\/\/www.h-its.org\/de\/software\/"},{"@type":"ListItem","position":3,"name":"WikiBiography Corpus"}]},{"@type":"WebSite","@id":"https:\/\/www.h-its.org\/de\/#website","url":"https:\/\/www.h-its.org\/de\/","name":"HITS gGmbH","description":"Heidelberg Institute for Theoretical Studies","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.h-its.org\/de\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"}]}},"publishpress_future_action":{"enabled":false,"date":"2026-04-14 23:02:32","action":"change-status","newStatus":"draft","terms":[],"taxonomy":"hits-research-group","extraData":[]},"publishpress_future_workflow_manual_trigger":{"enabledWorkflows":[]},"_links":{"self":[{"href":"https:\/\/www.h-its.org\/de\/wp-json\/wp\/v2\/hits-software\/33410","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.h-its.org\/de\/wp-json\/wp\/v2\/hits-software"}],"about":[{"href":"https:\/\/www.h-its.org\/de\/wp-json\/wp\/v2\/types\/hits-software"}],"wp:attachment":[{"href":"https:\/\/www.h-its.org\/de\/wp-json\/wp\/v2\/media?parent=33410"}],"wp:term":[{"taxonomy":"hits-research-group","embeddable":true,"href":"https:\/\/www.h-its.org\/de\/wp-json\/wp\/v2\/hits-research-group?post=33410"},{"taxonomy":"hits-software-category","embeddable":true,"href":"https:\/\/www.h-its.org\/de\/wp-json\/wp\/v2\/hits-software-category?post=33410"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}