Imprint | Deutsch | Start







 


Download WikiNet


WikiNet


We are developing WikiNet - a multi-language ontology by exploiting several aspects of Wikipedia. If you want to make your own WikiNet, here are the (Perl) scripts. Click here to download the current version, built from the 20120104 version of the English Wikipedia, with added lexicalizations from the Dutch (20120119), French (20120117), German (20120115), Italian (20120126), Arabic (20120123), Bulgarian (20120129), Farsi (20120124), Japanese (20120121), Korean (20120122), Russian (20120130), Turkish (20120124) and Chinese (20120128) versions (but it contains lexicalizations in many more languages -- check the language statistics file for that). It contains a direct (index.wiki) and a reversed index (reversed_index.wiki) (both multi-lingual), a file with relations (data.wiki), definitions (defs.wiki) and more.

The structure is as follows:
  • direct index: ConceptName ConceptID1 ConceptID2 ...
  • reversed index: ConceptID1 NEType ConceptName1 ConceptName2 ...
  • relations file: ConceptID1 Relation1 ConceptID11 ConceptID12 ... ConceptID1n Relation2 ConceptID21 ConceptID22 ...
A bit more details in a README file, the relation statistics, the language statistics (number of lexicalizations and number of entries covered for each language represented), and a paper. Additional files include in-/out-going links between concepts, corresponding to the hyperlinks in the article bodies.

There are approximately 3 million concepts, and 38+ million relations.

We have a toolkit for visualizing and extracting information from WikiNet: WikiNetTK.

A precursor of the resource in simple text format (in English) is WikiRelations.


WikiNetTK


WikiNetTK is a tool that allows you to visualize WikiNet, and embed it in your NLP applications.  Below are a few screenshots from the visualization component (click to enlarge).

Starting point -- choose the concept to visualize, by inputting first the name, and then choosing from the candidates found the one you want:


Expand the relations surrounding a concept:


Visualize and browse information for a concept in text format:


Visualize the paths between concept pairs:



Dependencies and selectional preferences


Download here a description of concepts in terms of their grammatical relations to open-class words,
and selectional preferences for open-class words in terms of (general) concepts


A multi-lingual dictionary extracted from Wiktionary

Download here a multi-lingual dictionary extracted from the English dump of Wiktionary (20100403). The formatting is tab separated values (tsv) as follows:

ENTRY    ID    DIS    POS    VAR1    VAR2...
  • ENTRY, VARi (i=1,2) have the same form: "LANG":"EXPRESSION" where LANG is a language code. The difference between ENTRY and VARi is that ENTRY is built from the article title in Wiktionary, while VARi is built from the cross-language links in the article.
  • ID is the numeric ID of the article.
  • DIS is a "disambiguation" expression extracted from the article -- when an expression can have multiple meanings (each corresponding to a different translation), the article groups the translations for each meaning and labels the group with this (DIS) expression.
  • POS is the part of speech of the entry.
This dictionary contains only entries that have at least one translation. The total number of entries is 74,568, obtained by processing 1,741,886 articles. In the future we will combine this with the multi-lingual expressions extracted from Wikipedia.

reversed_index.wiki (or reversed_index.all.wiki) from WikiNet can also work as a parallel dictionary. They both contain also entries that have names only in English.



 
page last modified: 21.02.2014,14:15



Project Manager

Prof. Dr. Michael Strube
Email:
Phone: +49 (0)6221 - 533 - 243

Fax: +49 (0)6221 - 533 - 298

more >>