Babelnet



Using the BabelNet disambiguation program in Windows

Babelnet

Www.babelnet.sbg.ac.at

Ciarán Ó Duibhín

Www.babelnet.sbg.ac.at

BabelNet is — among other things — a system for performing word-sense-disambiguation (WSD) on running text in several languages. It was created at the Sapienza Università di Roma by a group headed by Roberto Navigli. The system is written in Java and implemented on Unix. As of May 2014, the current version is 2.5.

Wordnet

The API archive download below contains a BabelNet demonstration program, which explores the BabelNet resources, but does not process running text. The compiled program is found in binituniroma1lclbabelnetBabelNetDemo.class, and the source is in srcituniroma1lclbabelnetBabelNetDemo.java. Instructions to install and run the compiled program are given for Unix in the README file in the root of the same archive. The present file describes how to adapt these instructions for Windows, to help Windows users who want to try out BabelNet without studying Unix or Java.

There is also, in Figure 3 of the paper Multilingual WSD with Just a Few Lines of Code: the BabelNet API, by Roberto Navigli and Simone Paolo Ponzetto, a Java program which uses BabelNet to perform WSD on running text. However, the 'path indexes' which must be downloaded to run this program have not been updated since version 1.0.1 of BabelNet, and it cannot be run on any more recent version. (To run v1.0.1, download the files named near the end of https://groups.google.com/forum/#!topic/babelnet-kb/1mrYql7FwrA and use the program given in https://groups.google.com/forum/#!topic/babelnet-kb/2EIKgvDVE2c . This process will not be explained here.)

BabelNet is both a multilingual encyclopedic dictionary, with lexicographic and encyclopedic coverage of terms in 284 languages, and a semantic network which connects concepts and named entities in a very large network of semantic relations, made up of more than 15 million entries. „BabelApp utilizes modern cryptographic know-how, applying the right strength of cryptographic techniques. A thorough review of the cryptographic design did not uncover any weaknesses or errors from a cryptographic standpoint with BabelApp developers applying state-of.

Www.babelnet.sbg.ac.at

Due to the work involved in updating the present file each time a BabelNet update is issued, I do not intend to update again until such time as I learn that the path indexes have been made available.

Pre-requisites

Before BabelNet can be used, the programming language Java and the lexical database WordNet have to be installed, and we begin with them.

Java

Java is a programming language. What you download will depend on whether you just want to run programs already written in Java by others (like the BabelNet demonstration program) — in this case, you only need download the Java Runtime Environment (JRE); or whether you want to write or modify programs (as you will certainly want to do with the BabelNet WSD program) — in this case, you download the Java Development Kit (JDK), which includes the JRE. The Java downloads are .exe files, one or other of which just needs to be run in order to install Java. The default installation directories, as of November 2013, are C:Program FilesJavajre7 and C:Program FilesJavajdk1.7.0_45

WordNet

Version 3.0 (at least) of the lexical database WordNet is required for BabelNet (BabelNet may be configured to set the WordNet version to 2.1, but this will be ignored!) The WordNet website is contradictory as to whether WordNet 3.0 is usable under Windows — the WordNet 3.0 README talks of a self-extracting archive containing WordNet 3.0 for Windows, but the Download page and the Current Version page both say that WordNet 3.0 is for Unix only. WordNet have declined to answer my query on the matter, but, as of November 2013, the real position seems to be that the WordNet 3.0 download contains data files which work perfectly well with Windows applications, but lacks a Windows implementation of the WordNet GUI browser program which is included in source and/or binary form in all Unix releases, and in Windows releases up to 2.1; but this does not matter to us, as we will be using the data files with BabelNet. So download WordNet 3.0 for UNIX-like systems — probably any of the three downloads will work, but I used the tar-gzipped one — and unpack into Program FilesWordNet-3.0. Disregard the mention of source code and binaries — the source code, if included in your download, can be ignored; and there are no binaries.

Downloading and unpacking BabelNet

Download the BabelNet Precompiled Index, Core, v2.5, CC_BY_NC_SA_30 licence (1.20 GB)
and unpack it to C:Program FilesBabelNet, so that the following subdirectories are created directly under C:Program FilesBabelNet:
core_CC_BY_NC_SA_30
graph_CC-BY_NC_SA_30
dict
gloss
lexicon
Download also ONE or more of the following, according to the type of licence required:
• the BabelNet Precompiled Index, v2.5, CC_BY_30 licence (39.4 MB)
dict_CC_BY_30
gloss_CC_BY_30
lexicon_CC_BY_30
• the BabelNet Precompiled Index, v2.5, CC_BY_SA_30 licence (2.94 GB)
dict_CC_BY_SA_30
gloss_CC_BY_SA_30
lexicon_CC_BY_SA_30
• the BabelNet Precompiled Index, v2.5, CC_BY_NC_SA_30 licence (1.98 MB).
dict_CC_BY_NC_SA_30
gloss_CC_BY_NC_SA_30
lexicon_CC_BY_NC_SA_30
• the BabelNet Precompiled Index, v2.5, APACHE-20 licence (1.36 MB).
dict_APACHE_20
gloss_APACHE_20
lexicon_APACHE_20
• the BabelNet Precompiled Index, v2.5, CECILL-C licence (4.43 MB).
dict_CECILL_C
gloss_CECILL_C
lexicon_CECILL_C
Alternatively, download the BabelNet Precompiled Index Bundle, v2.5 — note that this download is 5.20 GB! — and unpack it to C:Program FilesBabelNet, which will create all 20 of the above-named subdirectories directly under C:Program FilesBabelNet. This is certainly the easier option, in the absence of any guidance on choosing a licence.
WinRAR v5.0 can be used for all unpacking, but beware that the out-of-date version WinRAR 3.8 may report that the downloaded archive is corrupt and may not unpack all the files.

Next, download the BabelNet Java API, v2.5 (30.5 MB), and unpack to C:Program FilesBabelNet, so that the subdirectories bin, config, docs, lib, licenses, resources and src are directly under C:Program FilesBabelNet.

Uninstallation involves only the removal of the unpacked folders and files.

Running the BabelNetDemo program

BabelNet must be informed of the locations to which BabelNet and WordNet have been unpacked. Two files in the config subdirectory must be changed.
Assuming you have followed the unpacking suggestions given above, then in config/babelnet.var.properties, put
babelnet.dir=C:/Program Files/BabelNet
and de-comment the line if necessary;
and in config/jlt.var.properties, put
jlt.wordnetPrefix=C:/Program Files/WordNet
ie. removing the -3.0.
In the latter file, do NOT change the line
jlt.wordnetVersion=3.0
as any such change will have no effect.
I don't change the Unix line-ends in these files, nor in any other BabelNet files.

Next, we come to the file run-babelnetdemo.sh, which is meant to run the demo program. As distributed, it contains the line
java -classpath bin:lib/*:config it.uniroma1.lcl.babelnet.BabelNetDemo
• Change the file extension, to .bat like this: run-babelnetdemo.bat
• Change the two colons in the classpath value to semi-colons
• Comments in Windows batch files start with rem not with # — either make the change or just remove the comments
• Add a line containing pause at the end of the file, if you want to hold the command window when finished while you examine it
• You probably want to redirect BabelNet's output to a file, so add something like > output.txt to your java line
• If you are running out of Java heap space, add an argument like -Xmx512M on your java line
You should now have a file run-babelnetdemo.bat, containing perhaps
java -Xmx512M -classpath bin;lib/*;config it.uniroma1.lcl.babelnet.BabelNetDemo > output.txt
pause
Double-clicking this batch file with the mouse will run the demo program, without leaving the Windows GUI. If have are sending the output to a file, you will not see it yet, so allow the program enough time to finish.
You can compare the output of the demo program with the source in BabelNet/src/it/uniroma1/lcl/babelnet/BabelNetDemo.java — but remember the demo is not working from the source but from a compiled version in BabelNet/bin/it/uniroma1/lcl/babelnet/BabelNetDemo.class

Running WSD

This is not possible in BabelNet 2.5 using presently available downloads, but we will follow the process as far as we can.

Babelnet

A Java program for WSD is given in Figure 3 of Multilingual WSD with Just a Few Lines of Code: the BabelNet API, by Roberto Navigli and Simone Paolo Ponzetto.

In order to compile it, you should have downloaded and installed the Java JDK (see above). In any case, you will want to amend the program and recompile it, since it has the example sentence (of English) built-in. The command to compile a Java program is javac. If using this command (at the command prompt, or in a batch file) results in a message that javac is not recognized as an internal or external command, you may need to add the name of the Java directory to your SystemPath environment variable. Go to the System control panel, Advanced system settings, Environment Variables, System variables; scroll down to Path, and edit it by appending the string ;C:Program FilesJavajdk1.7.0_45 After restarting the command prompt, you should now be able to use the javac command.

You can paste the program source from the paper, eg. into a file called wsddemo.java in C:Program FilesBabelNet, and then make the following alterations:
• Replace the four pairs of matched left and right single quotes on lines 23 and 24 by ASCII apostrophes
• Catch an IOException in procedure disambiguate, ie. place the following outline around lines 3–18 of the source from the paper, with those lines replacing the ... below:
try
{
...
}
catch (IOException ioe)
{
System.out.println('Trouble: ' + ioe.getMessage());
}
• Place the following outline around the entire program, which replaces the ... below:
import it.uniroma1.lcl.jlt.util.Language;
import it.uniroma1.lcl.jlt.util.ScoredItem;
import it.uniroma1.lcl.jlt.util.Strings;
import it.uniroma1.lcl.jlt.ling.Word;
import it.uniroma1.lcl.knowledge.*;
import it.uniroma1.lcl.knowledge.graph.*;
import java.io.IOException;
import java.util.*;
public class wsddemo
{
...
}
With these alterations, the program will compile. On the command line, or in a batch file, in C:Program FilesBabelNet do:
javac -classpath bin;lib/*;config wsddemo.java
and a compiled file wsddemo.class will be created in C:Program FilesBabelNet. I suggest moving wsddemo.class to C:Program FilesBabelNetbin in preparation for the next step.

To run C:Program FilesBabelNetbinwsddemo.class, I suggest creating a batch file, C:Program FilesBabelNetrun-wsd.bat, with the following content:
java -Xmx512M -classpath bin;lib/*;config wsddemo > wsdout.txt
pause
This begins to run, but fails because it cannot find something called the 'path index' when trying to load the knowledge base. The location of the path index can be specified by putting a line in config/knowledge.var.properties:
knowledge.graph.pathIndex=C:/Program Files/BabelNet/data
It appears that the path index has not been included in the downloads of BabelNet since version 1.0.1.
I will update this information on how to perform WSD with the current version of BabelNet when I learn that it is again possible to do so.

Disclaimer

This page is offered as a facility for corpus analysis on Windows. By using it, you are deemed to accept that the author bears no responsibility for any adverse consequences. Needless to say, he hopes that there will be no such consequences. He will be pleased to receive comments, but cannot promise to act upon them.

Ciarán Ó Duibhín
2014/05/23
Clár cinn / Home page / Page d'accueil / Hauptseite