We are developing WikiNet - a multi-language ontology by exploiting several aspects of Wikipedia. If you want to make your own WikiNet, here
are the (Perl) scripts. Click here
to download the current version, built from the 20120104 version of the English Wikipedia, with added lexicalizations from the Dutch (20120119), French (20120117), German (20120115), Italian (20120126), Arabic (20120123), Bulgarian (20120129), Farsi (20120124), Japanese (20120121), Korean (20120122), Russian (20120130), Turkish (20120124) and Chinese (20120128) versions (but it contains lexicalizations in many more languages -- check the language statistics
file for that). It contains a direct (index.wiki) and a reversed index (reversed_index.wiki) (both multi-lingual), a file with relations (data.wiki), definitions (defs.wiki) and more.
The structure is as follows:
A bit more details in a README file, the relation statistics, the language statistics (number of lexicalizations and number of entries covered for each language represented), and a paper. Additional files include in-/out-going links between concepts, corresponding to the hyperlinks in the article bodies.
- direct index: ConceptName ConceptID1 ConceptID2 ...
- reversed index: ConceptID1 NEType ConceptName1 ConceptName2 ...
- relations file: ConceptID1 Relation1 ConceptID11 ConceptID12 ... ConceptID1n Relation2 ConceptID21 ConceptID22 ...
There are approximately 3 million concepts, and 38+ million relations.
We have a toolkit for visualizing and extracting information from WikiNet: WikiNetTK.
A precursor of the resource in simple text format (in English) is WikiRelations.
WikiNetTK is a tool that allows you to visualize WikiNet, and embed it in your NLP applications. Below are a few screenshots from the visualization component (click to enlarge).
Starting point -- choose the concept to visualize, by inputting first the name, and then choosing from the candidates found the one you want:
Expand the relations surrounding a concept:
Visualize and browse information for a concept in text format:
Visualize the paths between concept pairs: