At the start of this year I wrote about an interesting project from researchers from UC Berkeley that was trying to create machines capable of automatically detecting sarcasm. Whilst they admit that their work needed refining, they were nonetheless confident that it shed some light on the attributes of sarcasm.
Their work has been built upon by a second paper that was published recently. The team, from the Indian Institute of Technology Bombay, developed an approach that they believe allows them to spot sarcasm with very high success rate.
Sarcasm spotters
The approach ignores any attempt to look for sentiment in what is said, focusing instead on the similarity of the words. The algorithm was trained on stories mined from Google News, with a total of some 3 million words analyzed to gauge how often they appear alongside each other.
By doing this, the researchers were able to represent each word as a vector, with similar words being represented by similar vectors. A bit of mathematical wizardry can then determine the relationships between words.
The theory is, that if words are in similar vector spaces then they quite probably belong together, but if they’re apart in the vector, then the chances increase that sarcasm is being used, especially if they’re mingling with words that are closely grouped.
Testing the theory
The algorithm was put through its paces on a database of famous quotes pulled from the web. The quotes were selectively chosen on account of their sarcastic nature, with a control group selected to pit the algorithm against.
They suggest that the algorithm was decent at spotting the 759 sarcastic quotes included in the pool of 3,629 they were analyzing, but there is still clearly work to be done before it can be practically used in the marketplace.
What it does however is showcase both the progress being made, and also the areas for improvement that are still requiring work by this, and other teams to hone and refine their models.