Tag: language

  • False positive demystified

    It happens sometimes when your colleagues come from a non-English language talking in a common language such as English, there are phrases which don’t readily convey what you want to say.

    The other day, when a friend of mine used “It’s a false positive” so suggest something, the other colleague gave a puzzled expression “What’s a false positive?”. Up until that point, I must admit neither did I have a clear understanding, but looking at the two words – false, positive – I could muster up an intuitive explanation on the spot.

    But back in my mind, I wasn’t really sure if I did explain the meaning simply using the dictionary definitions of those two words apart. I mean, it isn’t hard. But it was one of those phrases that you want to know more about with some level of confidence & clarity. Maybe it’s just me.

    The other day, when working with some study materials for a course about machine learning, I came across a brilliant piece of explanation that finally answered the meaning of “false positive” in one fell swoop. Take a look at this table below

    Actual = CorrectActual = Incorrect
    Prediction = CorrectTrue positiveFalse positive
    Prediction = IncorrectFalse negativeTrue negative

    Does it make sense? Absolutely yes – but I’ll explain if it doesn’t.

    The prediction is a machine learning result saying something is correct or incorrect (in my example the problem the machine learning is trying to solve/predict is if a patient has cancer or not looking at the size of the tumor & patient’s age). The actual is what you know – based on evidence or tests done beforehand proving that the patient did indeed have cancer or not. Lining up correct/incorrect with what the prediction says against what actually is makes the understanding about “false positive” much more clearer to me now.

    So, if you thought something was correct but it actually isn’t then it’s a false positive.

    The other way round, if you thought something was incorrect but it turned out to be correct then it’s a false negative.

    Makes much more sense now.

  • cld2 – Google’s Compact Language Detector 2 – standalone command line on Cent OS

    It appears that cld2 has no mention of how one would go about using it (or at-least that is the way it looks to me). The language detection ability is one of the better ones, and I decided to make use of it.

    I came across a blog mentioning how to install cld2 on ubuntu but it just fell short of using it directly through a command line. It mentions how to build a Python binding.

    Luckily, I also came across another blog where a Slackware script mentions building a command line tool which is perfectly what I was looking for, except that I had CentOS, not Slackware.

    So with a little bit of digging around the various compile scripts on cld2’s SVN trunk, I got a faint sense of combining the ideas from these two blogs, and give it a try. I succeeded! Here’s what I did

    1. Get g++, it is required to build cld2 on your CentOS machine
      $ /usr/bin/sudo /usr/bin/yum install gcc-c++
      ...
      $ which g++
      /usr/bin/g++
      
    2. Get the cld2 source through SVN on your local CentOS machine. In my case I used /tmp folder
      $ pwd
      /tmp
      $ svn checkout http://cld2.googlecode.com/svn/trunk/ cld2
    3. Next, make a copy of one of the already existing compile scripts to make a few changes, specifically compile_libs.sh. The step is mentioned already in how to install cld2 on ubuntu. I use 32-bit, hence I use the same step remove the -m64 flag.
      $ pwd
      /tmp/cld2/internal
      $ cat compile_libs.sh | sed 's/\ \-m64\ //g' 1> compile_libs_32bit.sh
      
    4. To make a standalone cld2 executable, again I followed the steps from Slackware script example. I made following changes to my copied compile script. Here’s a diff of what changes I made from compile_libs.sh to my custom compile_libs_32bit.sh script
      https://gist.github.com/visitsb/8affec514ef5829c6bd0/revisions
    5. That’s it! Now compile_libs_32bit.sh is ready to build a standalone cld2 executable on your machine. It is just a matter of executing your custom compile_libs_32bit.sh script now
      $ chmod u+x compile_libs_32bit.sh
      $ ./compile_libs_32bit.sh
      
    6. It takes a few mins to build, and voila, you have a standalone cld2 executable built, and installed on your machine.
      $ which cld2
      /usr/local/bin/cld2
      $ echo "Hello World こんにちは γει? σου" | cld2
      ExtLanguage Japanese(35% 3904p), GREEK(33% 1024p), ENGLISH(27% 1194p), 45/43 bytes of non-tag letters, Summary: Japanese*
        SummaryLanguage Japanese(un-reliable) at 8391021 of 43 562us (0 MB/sec), (null)
      
    7. For the record, here is what get’s installed
      $ which cld2
      /usr/local/bin/cld2
      $ ls -l /usr/include/cld2/*
      /usr/include/cld2/internal:
      total 52
      -rw-r--r--. 1 root root 28159 Jun 20 17:49 generated_language.h
      -rw-r--r--. 1 root root  5839 Jun 20 17:49 generated_ulscript.h
      -rw-r--r--. 1 root root   945 Jun 20 17:49 integral_types.h
      -rw-r--r--. 1 root root  8326 Jun 20 17:49 lang_script.h
      
      /usr/include/cld2/public:
      total 24
      -rw-r--r--. 1 root root 14850 Jun 20 17:49 compact_lang_det.h
      -rw-r--r--. 1 root root  7056 Jun 20 17:49 encodings.h
      $ 
      $ ls -l /usr/lib/libcld2*
      -rwxr-xr-x. 1 root root 6457627 Jun 20 17:49 /usr/lib/libcld2_full.so
      -rwxr-xr-x. 1 root root 1742462 Jun 20 17:49 /usr/lib/libcld2.so
      $ 
      

    Hope this helps someone, and kudos to cld2 for being awesome!