It appears that cld2 has no mention of how one would go about using it (or at-least that is the way it looks to me). The language detection ability is one of the better ones, and I decided to make use of it.
I came across a blog mentioning how to install cld2 on ubuntu but it just fell short of using it directly through a command line. It mentions how to build a Python binding.
Luckily, I also came across another blog where a Slackware script mentions building a command line tool which is perfectly what I was looking for, except that I had CentOS, not Slackware.
So with a little bit of digging around the various compile scripts on cld2’s SVN trunk, I got a faint sense of combining the ideas from these two blogs, and give it a try. I succeeded! Here’s what I did
- Get
g++
, it is required to build cld2
on your CentOS machine
$ /usr/bin/sudo /usr/bin/yum install gcc-c++
...
$ which g++
/usr/bin/g++
- Get the
cld2
source through SVN on your local CentOS machine. In my case I used /tmp
folder
$ pwd
/tmp
$ svn checkout http://cld2.googlecode.com/svn/trunk/ cld2
- Next, make a copy of one of the already existing compile scripts to make a few changes, specifically
compile_libs.sh
. The step is mentioned already in how to install cld2 on ubuntu. I use 32-bit, hence I use the same step remove the -m64
flag.
$ pwd
/tmp/cld2/internal
$ cat compile_libs.sh | sed 's/\ \-m64\ //g' 1> compile_libs_32bit.sh
- To make a standalone
cld2
executable, again I followed the steps from Slackware script example. I made following changes to my copied compile script. Here’s a diff of what changes I made from compile_libs.sh
to my custom compile_libs_32bit.sh
script
https://gist.github.com/visitsb/8affec514ef5829c6bd0/revisions
- That’s it! Now
compile_libs_32bit.sh
is ready to build a standalone cld2
executable on your machine. It is just a matter of executing your custom compile_libs_32bit.sh
script now
$ chmod u+x compile_libs_32bit.sh
$ ./compile_libs_32bit.sh
- It takes a few mins to build, and voila, you have a standalone
cld2
executable built, and installed on your machine.
$ which cld2
/usr/local/bin/cld2
$ echo "Hello World こんにちは γει? σου" | cld2
ExtLanguage Japanese(35% 3904p), GREEK(33% 1024p), ENGLISH(27% 1194p), 45/43 bytes of non-tag letters, Summary: Japanese*
SummaryLanguage Japanese(un-reliable) at 8391021 of 43 562us (0 MB/sec), (null)
- For the record, here is what get’s installed
$ which cld2
/usr/local/bin/cld2
$ ls -l /usr/include/cld2/*
/usr/include/cld2/internal:
total 52
-rw-r--r--. 1 root root 28159 Jun 20 17:49 generated_language.h
-rw-r--r--. 1 root root 5839 Jun 20 17:49 generated_ulscript.h
-rw-r--r--. 1 root root 945 Jun 20 17:49 integral_types.h
-rw-r--r--. 1 root root 8326 Jun 20 17:49 lang_script.h
/usr/include/cld2/public:
total 24
-rw-r--r--. 1 root root 14850 Jun 20 17:49 compact_lang_det.h
-rw-r--r--. 1 root root 7056 Jun 20 17:49 encodings.h
$
$ ls -l /usr/lib/libcld2*
-rwxr-xr-x. 1 root root 6457627 Jun 20 17:49 /usr/lib/libcld2_full.so
-rwxr-xr-x. 1 root root 1742462 Jun 20 17:49 /usr/lib/libcld2.so
$
Hope this helps someone, and kudos to cld2
for being awesome!