It appears that cld2 has no mention of how one would go about using it (or at-least that is the way it looks to me). The language detection ability is one of the better ones, and I decided to make use of it.
I came across a blog mentioning how to install cld2 on ubuntu but it just fell short of using it directly through a command line. It mentions how to build a Python binding.
Luckily, I also came across another blog where a Slackware script mentions building a command line tool which is perfectly what I was looking for, except that I had CentOS, not Slackware.
So with a little bit of digging around the various compile scripts on cld2’s SVN trunk, I got a faint sense of combining the ideas from these two blogs, and give it a try. I succeeded! Here’s what I did
- Get
g++
, it is required to buildcld2
on your CentOS machine$ /usr/bin/sudo /usr/bin/yum install gcc-c++ ... $ which g++ /usr/bin/g++
- Get the
cld2
source through SVN on your local CentOS machine. In my case I used/tmp
folder$ pwd /tmp $ svn checkout http://cld2.googlecode.com/svn/trunk/ cld2
- Next, make a copy of one of the already existing compile scripts to make a few changes, specifically
compile_libs.sh
. The step is mentioned already in how to install cld2 on ubuntu. I use 32-bit, hence I use the same step remove the-m64
flag.$ pwd /tmp/cld2/internal $ cat compile_libs.sh | sed 's/\ \-m64\ //g' 1> compile_libs_32bit.sh
- To make a standalone
cld2
executable, again I followed the steps from Slackware script example. I made following changes to my copied compile script. Here’s a diff of what changes I made fromcompile_libs.sh
to my customcompile_libs_32bit.sh
scripthttps://gist.github.com/visitsb/8affec514ef5829c6bd0/revisions
- That’s it! Now
compile_libs_32bit.sh
is ready to build a standalonecld2
executable on your machine. It is just a matter of executing your customcompile_libs_32bit.sh
script now$ chmod u+x compile_libs_32bit.sh $ ./compile_libs_32bit.sh
- It takes a few mins to build, and voila, you have a standalone
cld2
executable built, and installed on your machine.$ which cld2 /usr/local/bin/cld2 $ echo "Hello World こんにちは γει? σου" | cld2 ExtLanguage Japanese(35% 3904p), GREEK(33% 1024p), ENGLISH(27% 1194p), 45/43 bytes of non-tag letters, Summary: Japanese* SummaryLanguage Japanese(un-reliable) at 8391021 of 43 562us (0 MB/sec), (null)
- For the record, here is what get’s installed
$ which cld2 /usr/local/bin/cld2 $ ls -l /usr/include/cld2/* /usr/include/cld2/internal: total 52 -rw-r--r--. 1 root root 28159 Jun 20 17:49 generated_language.h -rw-r--r--. 1 root root 5839 Jun 20 17:49 generated_ulscript.h -rw-r--r--. 1 root root 945 Jun 20 17:49 integral_types.h -rw-r--r--. 1 root root 8326 Jun 20 17:49 lang_script.h /usr/include/cld2/public: total 24 -rw-r--r--. 1 root root 14850 Jun 20 17:49 compact_lang_det.h -rw-r--r--. 1 root root 7056 Jun 20 17:49 encodings.h $ $ ls -l /usr/lib/libcld2* -rwxr-xr-x. 1 root root 6457627 Jun 20 17:49 /usr/lib/libcld2_full.so -rwxr-xr-x. 1 root root 1742462 Jun 20 17:49 /usr/lib/libcld2.so $
Hope this helps someone, and kudos to cld2
for being awesome!