Posted by Sorin Mustaca on January 18, 2015.
I have several WordPress blogs I use for various types of posts. In one of them, I have written some posts where I asked my readers if something similar has happened to them. So, there is a short article describing a situation and at the end of the post is a simple form containing Name (optional), email address (optional), the answer (mandatory) and a comment (optional). The result gets sent via email to me.
The form is looking like this (click on the link to see the orignal):
I have about 9 such forms, each in a different post.
I received a few answers, most of them without comments. What drew my attention were two similar comments for the same form, posted 20 minutes apart. I immediately identified them as spams, because of the nonsense they have in the Comment area.
Here is the text:
Please wait revert diflucan 0.5 gel innocence “This is an issue that we have raised in earnest with the United States in the past few days as we have all previous occasions of such arrests in which the Afghan laws were disregarded,” Karzai said, referring to the capture of commander Latif Mehsud.
The text is a copy from a decent news website from 2013 (I actually googled it and found the original). I guess that the purpose of this post are these two words: “propranolol 40 mg”. The other post contains: “diflucan 0.5 gel“. Propranolol is a drug used for treating patients with heart diseases. Diflucan is a drug against some kind of skin diseases. So, nothing suspicious – I was, honestly, expecting some potency related drugs like Cialis, Viagra and alike.
As a side note, what is special about the second drug is that Google gave me a lot of Canadian Online Pharmacies instead of reputable drugs online stores as for the first one.
The IP address in both posts is the same: 18.104.22.168. According to the whois information, it belongs to the France Paris Online – the owner of www.free.fr. Basically, they are an ISP which offers residential broadband connections and also free email accounts.
Why this spam?
So, after all this useless text, I can’t stop asking me: What was the purpose of this post?
What do the spammers have to gain from this useless spam? As can be seen in the text above, this is not the classical referer spam as there is no URL, no name or anything that might link back or produce something of value to the spammer.
In comparison, a usual referer spam (actually, grammatically correct would be “referrer”, with double “r”, so I will use this one), also known as comment spam (not in a form, just a reply) looks like this:
and it is usually posted by from an IP located in China (according to the geolocation of the IP address).
Same as you, I stopped long time ago to think that there is actually a spammer as in a human being clicking there. So, the crawler created the post automatically and it was … well… buggy (it malfunctioned). It should have posted a string as a comment and the user should have had an advertising URL (or referrer URL), just as in the case of the normal referrer spam in the screenshot above. But, the names of my “spammers” are Clinton and Mason, not the name of some website or similar.
Nevertheless, even the “proper” comment spams are pretty useless. I use Akismet with WordPress, and I don’t remember to have ever had a referrer spam. It is true, that I don’t have so many visitors and comments as a big website with thousands or millions clicks a day, but still no spam in about 10 years, is no spam.
My experience with WordPress attacks is that most of them are being performed from China and are either trying to login by performing a brute force attack or are trying to exploit known vulnerabilities in WordPress. So, less posting, more hacking.
Why do we still have referrer spam?
Referrer spam exists because it makes use of a weakness (in my personal opinion) of the web search engines, Google’s in particular. A URL is ranked in search results higher if it is referred by multiple websites. The more websites refer to it, the higher the URL gets in the list. An increased ranking often results in the spammer’s commercial site being listed ahead of other sites (possibly legitimate) for certain searches, increasing the number of potential visitors and paying customers.
So, “spreading the word” about a particular URL with the intention of tricking the search engines is the only reason referrer spam exists. Of course, the search engines also adapted and are constantly “cleaning” up the referral lists for certain URLs based on their reputation. Unfortunately, this has pushed the spammers to produce even more spam in even a shorter time than before. This also created a new industry – companies sell link advertisements with links of a high page rank. Google has several times publicly announced that if they determine that a website is trying to artificially raise the page Rank of their pages, they will devalue the websites (will ignore it from the Page Rank calculation).
Initially, the antispam technologies used to have a simple (but efficient) Bayesian filter to determine if the content of the post falls into the good or bad category. This is actually why, nowadays, we have spams like the ones I got, which copy text from random sources. Random, neutral text makes impossible for a Bayesian filter to determine, just based on the words used, if the content is good or bad. Taking the example above, the only word which would trigger a negative reaction of the filter would be the name of the drug. Everything else is neutral. This makes the classical approach useless. Fortunately, the combination of IP addresses and URL blacklists (or graylists), human input, various versions of CAPTCHA, Bayesian filters and associative filters (which try to match the text of the original post with that of the comment) have made the referral spam relatively easy to be detected.
Referrer spam is definitely no longer what it used to be some years ago. Due to the fact that the antispam software is good and is keeping the pace with the spam, my assumption is, that it doesn’t produce enough to pay the effort of producing it by humans (as it was at the beginning). Moreover, I think that even using only bots to crawl the Internet 24/7 in search of blogs that accept non-moderated posts, the real value is still not that high and the risk of being disqualified from Page Rank is not paying off in the end. The only reason we keep seeing referrer spam is that there are still millions of low page rank websites, most of the time malicious or at least with a questionable reputation, trying to make money.
Sorin Mustaca, CSSLP, Security+, Project+Submitted in: Expert Views, Sorin Mustaca |