Tag: command

  • cld2 – Google’s Compact Language Detector 2 – standalone command line on Cent OS

    It appears that cld2 has no mention of how one would go about using it (or at-least that is the way it looks to me). The language detection ability is one of the better ones, and I decided to make use of it.

    I came across a blog mentioning how to install cld2 on ubuntu but it just fell short of using it directly through a command line. It mentions how to build a Python binding.

    Luckily, I also came across another blog where a Slackware script mentions building a command line tool which is perfectly what I was looking for, except that I had CentOS, not Slackware.

    So with a little bit of digging around the various compile scripts on cld2’s SVN trunk, I got a faint sense of combining the ideas from these two blogs, and give it a try. I succeeded! Here’s what I did

    1. Get g++, it is required to build cld2 on your CentOS machine
      $ /usr/bin/sudo /usr/bin/yum install gcc-c++
      ...
      $ which g++
      /usr/bin/g++
      
    2. Get the cld2 source through SVN on your local CentOS machine. In my case I used /tmp folder
      $ pwd
      /tmp
      $ svn checkout http://cld2.googlecode.com/svn/trunk/ cld2
    3. Next, make a copy of one of the already existing compile scripts to make a few changes, specifically compile_libs.sh. The step is mentioned already in how to install cld2 on ubuntu. I use 32-bit, hence I use the same step remove the -m64 flag.
      $ pwd
      /tmp/cld2/internal
      $ cat compile_libs.sh | sed 's/\ \-m64\ //g' 1> compile_libs_32bit.sh
      
    4. To make a standalone cld2 executable, again I followed the steps from Slackware script example. I made following changes to my copied compile script. Here’s a diff of what changes I made from compile_libs.sh to my custom compile_libs_32bit.sh script
      https://gist.github.com/visitsb/8affec514ef5829c6bd0/revisions
    5. That’s it! Now compile_libs_32bit.sh is ready to build a standalone cld2 executable on your machine. It is just a matter of executing your custom compile_libs_32bit.sh script now
      $ chmod u+x compile_libs_32bit.sh
      $ ./compile_libs_32bit.sh
      
    6. It takes a few mins to build, and voila, you have a standalone cld2 executable built, and installed on your machine.
      $ which cld2
      /usr/local/bin/cld2
      $ echo "Hello World こんにちは γει? σου" | cld2
      ExtLanguage Japanese(35% 3904p), GREEK(33% 1024p), ENGLISH(27% 1194p), 45/43 bytes of non-tag letters, Summary: Japanese*
        SummaryLanguage Japanese(un-reliable) at 8391021 of 43 562us (0 MB/sec), (null)
      
    7. For the record, here is what get’s installed
      $ which cld2
      /usr/local/bin/cld2
      $ ls -l /usr/include/cld2/*
      /usr/include/cld2/internal:
      total 52
      -rw-r--r--. 1 root root 28159 Jun 20 17:49 generated_language.h
      -rw-r--r--. 1 root root  5839 Jun 20 17:49 generated_ulscript.h
      -rw-r--r--. 1 root root   945 Jun 20 17:49 integral_types.h
      -rw-r--r--. 1 root root  8326 Jun 20 17:49 lang_script.h
      
      /usr/include/cld2/public:
      total 24
      -rw-r--r--. 1 root root 14850 Jun 20 17:49 compact_lang_det.h
      -rw-r--r--. 1 root root  7056 Jun 20 17:49 encodings.h
      $ 
      $ ls -l /usr/lib/libcld2*
      -rwxr-xr-x. 1 root root 6457627 Jun 20 17:49 /usr/lib/libcld2_full.so
      -rwxr-xr-x. 1 root root 1742462 Jun 20 17:49 /usr/lib/libcld2.so
      $ 
      

    Hope this helps someone, and kudos to cld2 for being awesome!