WikiBiography Click here to download WikiBiography.
WikiBiography is a corpus of about 1200 annotated biographies from the German version of Wikipedia. Fully automatic preprocessing includes the following:
* there is only one coreference chain which links all mentions of the biographee. The annotation is done with freely available software (see references). To visualize the data and access and correct the annotation you should use MMAX2. With MMAX2 API you can access any layer of annotation from your Java programs.
Orange and green fonts are used for temporal expressions (e.g. “7. Oktober 1885”, “später”) and locations (e.g. “Kopenhagen”, “Dänemarks”) respectively. People other than the biographee (e.g. “Chtistian Bohr”, “Harald Bohr”) are highlighted with light-blue. Mentions of the biographee are highlighted with red (e.g. “Niels Henrik David Bohr”, “er”, “Niels Bohr”). The annotation of a selected word (e.g. “Professor”) is displayed in a separate window. The head of the word is highlighted with grey colour then and an ark from the dependent word to its head is displayed.
Download
Click here to download WikiBiography.
A CPAN Perl module is used for sentence boundaries identification.
TNT tagger is used for PoS-tagging:
Brants, T.: 2000, ‘TnT – A statistical Part-of-Speech tagger’. In: Proceedings of the 6th Conference on Applied Natural Language Processing, Seattle, Wash., 29 April – 4 May 2000. pp. 224-231.
TreeTagger is used for lemmatization:
Schmid, H.: 1997, ‘Probabilistic part-of-speech tagging using decision trees’. In: D. Jones and H. Somers (eds.): New Methods in Language Processing. London, UK: UCL Press, pp. 154-164.
WCDG parser is used for dependency parsing:
Foth, K. and W. Menzel: 2006, ‘Hybrid parsing: Using probabilistic models as predictors for a symbolic parser’. In: Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, 17-21 July 2006. pp. 321-327.
A list of about 300 connectives from IDS Mannheim is used to identify these connectives in our corpus.
Temporal expressions are identified with a set of templates. Named entities are classified as person, location or organization based on the information from Wikipedia.
This page is only available in English
We use cookies on our website. Some of them are essential, while others help us improve this site and your experience.
Here you will find an overview of all cookies used. You can give your consent to entire categories or have further information displayed and thus select only certain cookies.
Essential cookies enable basic functions and are necessary for the proper functioning of the website.
Name | |
---|---|
Provider | Eigentümer dieser Website |
Purpose | Speichert die Einstellungen der Besucher, die in der Cookie Box von Borlabs Cookie ausgewählt wurden. |
Cookie Name | borlabs-cookie |
Cookie Expiry | 1 Jahr |
Statistics Cookies collect information anonymously. This information helps us to understand how our visitors use our website.
Accept | |
---|---|
Name | |
Provider | HITS gGmbH |
Purpose | Cookie von Matomo für Website-Analysen. Erzeugt statistische Daten darüber, wie der Besucher die Website nutzt. |
Cookie Name | _pk_*.* |
Cookie Expiry | 13 Monate |
Content from video platforms and social media platforms is blocked by default. If cookies from external media are accepted, access to this content no longer requires manual consent.
Accept | |
---|---|
Name | |
Provider | |
Purpose | Wird verwendet, um Facebook-Inhalte zu entsperren. |
Privacy Policy | https://www.facebook.com/privacy/explanation |
Host(s) | .facebook.com |
Accept | |
---|---|
Name | |
Provider | |
Purpose | Wird zum Entsperren von Google Maps-Inhalten verwendet. |
Privacy Policy | https://policies.google.com/privacy |
Host(s) | .google.com |
Cookie Name | NID |
Cookie Expiry | 6 Monate |
Accept | |
---|---|
Name | |
Provider | |
Purpose | Wird verwendet, um Instagram-Inhalte zu entsperren. |
Privacy Policy | https://www.instagram.com/legal/privacy/ |
Host(s) | .instagram.com |
Cookie Name | pigeon_state |
Cookie Expiry | Sitzung |
Accept | |
---|---|
Name | |
Provider | OpenStreetMap Foundation |
Purpose | Wird verwendet, um OpenStreetMap-Inhalte zu entsperren. |
Privacy Policy | https://wiki.osmfoundation.org/wiki/Privacy_Policy |
Host(s) | .openstreetmap.org |
Cookie Name | _osm_location, _osm_session, _osm_totp_token, _osm_welcome, _pk_id., _pk_ref., _pk_ses., qos_token |
Cookie Expiry | 1-10 Jahre |
Accept | |
---|---|
Name | |
Provider | |
Purpose | Wird verwendet, um Twitter-Inhalte zu entsperren. |
Privacy Policy | https://twitter.com/privacy |
Host(s) | .twimg.com, .twitter.com |
Cookie Name | __widgetsettings, local_storage_support_test |
Cookie Expiry | Unbegrenzt |
Accept | |
---|---|
Name | |
Provider | Vimeo |
Purpose | Wird verwendet, um Vimeo-Inhalte zu entsperren. |
Privacy Policy | https://vimeo.com/privacy |
Host(s) | player.vimeo.com |
Cookie Name | vuid |
Cookie Expiry | 2 Jahre |
Accept | |
---|---|
Name | |
Provider | YouTube |
Purpose | Wird verwendet, um YouTube-Inhalte zu entsperren. |
Privacy Policy | https://policies.google.com/privacy |
Host(s) | google.com |
Cookie Name | NID |
Cookie Expiry | 6 Monate |