/dev/dumb

Read this first

Authorship Attribution of WhatsApp Users through Lexical Analysis

Imagine that you’re being messaged by a stranger. Your gut says that it’s someone you already know. How do you find which one of your diabolical friends is pulling this out.

Problem

Let’s generalize the problem. Given chat histories of 2 users and a large enough corpus from an anonymous user, we want to find which user has a writing style which is similar to that of the anonymous user.
It’s a good ol’ binary classification problem.

To recognize the person we can analyze different features of their writing styles.

Possible approaches

Lexical features
Syntactic features
Bag of words model

Lexical analysis is what we’ll be using here. We can analyze certain properties like sentence length variation, number of words per sentence, use of punctuation, emoticons etc..

Syntactically we can analyze the use of nouns, pronouns, adverbs, singular/plural words and so on.

Bag of words model is...

Continue reading →


The Michael Jordan in me

Ever heard of Michael B. Jordan? (Hint: He is not a basketball player)

He is the actor behind “that guy on fire” in Fantastic Four. Guess what? He looks 22% similar to me and here’s how I found it.

So I was reading about PCA and wanted to find which Hollywood actor looks closest to me. Well, who wouldn’t want to?

Eigenfaces are basically an approach that capture the variations (AKA Principle Components) of a set of images and uses this information to compare a new image. The way Eigenfaces work is out of the scope of this post.

There’s a distinction between face detection and recognition which many places on interweb seem to get wrong. Face detection is finding a face in an image. This is usually done with a haar classifier. Training a haar classifier is detailed in one of my previous posts. Face recognition on the other hand is finding the similarity of a given image with another...

Continue reading →


Degree of separation of Sri Lankan Twitter community

I recently moved to Arch Linux :D While backing up the old project files I came across some of the experiments[1] [2] I’ve done on SNA. Thought I’d document at least some of the stuff before I forget everything :D

Inspired by Erdős number and Bacon number I wanted to calculate the degree of separation of Sri Lankan Twitter community.

Toughest and the most epa karapu part was the data collection phase. What I wanted was a graph of Sri Lankans with nodes as users and edges as friends. Using Twecoll on Bestatlk twitter account its followers and followers of followers were scraped. Whole process spanned over an entire week due to Twitter’s API limitations. At the end a gml file was generated.

I then ran the gml file through a python script to extract twitter ids and generate a dictionary with keys as user ids and values as a list of twitter ids of the friends of the key.

Next step was...

Continue reading →


How to Train Your Haar Classifier

I’ve been working with image processing in the last few months for a research project in SCoRe lab at UCSC. A part of the work required me to train a haar classifier.

Haar cascades are used to detect different objects in an image by using so-called haar-like features. Detection algorithm is based on the works of Viola and Jones. Although it’s not the sharpest tool in the shed (compared to HOG+SVM), it was the first to work real-time. Haar classifiers are widely used for face detection. Most image processing libraries like OpenCV and SimpleCV supports haar classifiers. And there are lots of trained classifiers on the internet. You just have to download the xml file and use it in your code.

But sometimes it really boils down to training your own classifier. Maybe because the ones in the web are not good enough or there are no classifiers available for your case at all.

Training a...

Continue reading →


Whatsapp tailored notifications

I always mute WhatsApp group chat notifications (don’t we all?). But
I still wanted to get a notification when someone mentions my name.

Did a little bit of googling and found out Yowsup, a Python API for WhatsApp.
You can install it with
pip install yowsup2

Then register it with a new number. Keep in mind that you can’t use your existing WhatsApp number. I used Ammamma’s (aka grandma)number (pretty sure she won’t come to WhatsApp)

Once it’s done I modified the sample app given in Yowsup’s github to track some keywords.

I also used pushbullet as mentioned in one of my previous posts to issue push notifications.

from yowsup.layers.interface                           import YowInterfaceLayer, ProtocolEntityCallback
from yowsup.layers.protocol_messages.protocolentities  import TextMessageProtocolEntity
from yowsup.layers.protocol_receipts.protocolentities  import
...

Continue reading →


Harry Potter, P versus NP and time travel

Sorry for the clickbait of a title.

Disclaimer: Might contain HPMOR spoilers.

In the last KopiJS session, I came to know about Harry Potter and the Methods of Rationality. It’s a fanfic based on HP universe. I’m not a fanfic kind of guy but after seeing Thameera, Chanux and Gaveen fan-boying, I decided to put aside The Sound and the Fury for a while and start reading HPMOR.

It’s a page turner. Insanely fun to read and far better than the original series (I’m already seeing die-hard HP fans waving pitchforks).

Harry is a rationalist. His father (uncle, rather) is a professor in biochemistry. Harry is well versed in scientific literature. He has read Godel, Escher, Bach and Judgment Under Uncertainty: Heuristics and Biases and volume one of The Feynman Lectures on Physics. He’s no ordinary 11 year old.

In chapter 17 of the book (the one I just finished reading) he comes up with an...

Continue reading →


Automatically syncing Kindle clippings with Evernote

While looking for a code snippet I stumbled upon a certain hack I made a few years ago. This was back when I was first introduced to programming and Linux. I still remember reading about all the tech for hours to get it working. Thought it was worth documenting even-though I’ve since found a better solution.

One of the extensively used features on my Kindle is the text highlights. I wanted a way to automatically sync these clippings to Evernote so I could check them on the go through Evernote’s Android app.

Solution

  1. Detect the Kindle when plugged
  2. rsync clippings.txt to Dropbox’s public folder via a shell script
  3. Make a IFTTT recipe to sync My clippings.txt with Evernote

udev system is a way to handle peripherals such as usb devices. It’s your friend when you want to change the behavior when an usb device is plugged in.
First thing we have to do is to find the device node of the...

Continue reading →


Getting real time alerts from server to phone with Pushbullet

So I wanted to receive alerts from the long-running scripts on my Raspberry Pi. I used to have postfix to send an email but sometimes it took hours to notice the mail.

I already had Pushbullet installed on Android, which is a service that supports notification/file/link sharing between your devices.
Turns out with a little scripting we can make it send error reports/alerts to the phone. The only thing we need is an Access Token.

Then use a simple shell script in PATH with a POST request and call it to send alerts when required.

!/bin/sh

curl -u ACCESS_TOKEN_HERE: -X POST https://api.pushbullet.com/v2/pushes --header 'Content-Type: application/json' --data-binary '{"type":"note", "title": "Shell", "body": "'"$1"'"}' > /dev/null 2>&1

Following Python library does the same

import os, inspect

class Pypush:
    """A simple wrapper around Pushbullet"""

    def __init__(self, key):
...

Continue reading →


Moving from Openbox to i3wm on Crunchbang

I have been using ! for almost 9 months with pretty much the vanilla setup. A couple of months ago, on one fateful rainy Sunday, I decided to try out a tiling window manager. Chose to go with i3wm since most cool kids appeared to be big fans.

Having used i3 for few months, I don’t think I will go back to a Floating WM ever again. I absolutely love its flexibility and the ability to control pretty much everything with key combos. It was a bit difficult to get everything working, but after the initial yak-shaving it has been butter smooth. i3’s default key combos are intuitive and I almost don’t use the mouse anymore.

I’m documenting what I had to go through cause the installation and getting everything to run smoothly was a tedious process and I couldn’t find everything I had to do in a single place.

Installation was straightforward.

sudo apt-get install i3-wm

If you want the...

Continue reading →