When looking for links that matter, that is links that get traction and pass-through value, it is important to know where you are posting and how that site:
- Allows search engines to index pages
- Allows search engines to follow links
- If it passes PR value how much is it passing
- Relates to your topic
Craigslist's robots.text file
A few years ago I created a site called BlogX3.com where I invited 25 real estate bloggers to participate. It literally took 3 months to get to a PR3 which, to me, was absolutely amazing. In fact we saw search results in the 100′s in the first 90 days. Call it a lucky storm if you want but in reality it was because so many PR4 and PR5 sites were linked to “The24: America’s 24 Chosen Real Estate Bloggers” and they were linking outbound with good content. There was no Facebook, Twitter or LinkedIn back then. The moral is there were high PR links pointing to The24. (It was downed for two reasons one of which included repeated hack attacks and denial of service attacks and I was too busy to fight it, unfortunately.)
When planning your strategy to build Page Rank you should treat it differently than when you are going for links on clean sites (non-blackhat) and when you’re going for community relevant content. Click links from organic search depend on search engine placement which comes from multiple factors including PR and search relevance. Click-throughs from referring pages depend on neither search relevance nor Page Rank but they do depend on traffic and topic relevance. So, let’s look at two online ad sites which both have a high volume of traffic and how they interact with the search engine robots.
First is Craigslist
If you have not heard of Craigslist welcome to Earth. Craigslist is an online ad site created in 1995 just for San Francisco. It has since gone global and now boasts a whopping 64,000,000 monthly unique visitors, 80,000,000 pages indexed, 113,000,000 backlinks and a PR of 7. Those are some powerful numbers and some everyone in SEO and online marketing would like to take advantage of. The pass-through value alone could be enormous but aside from that capturing even a fraction of a percentage of that monthly traffic is huge, too. Thus the battle.
Craigslist does allow robots but greatly limits their search. You can see from the image the robots.txt disallows a few directories. It’s once you know what those directories are that the impact is really seen. Let’s look at those triple letter directories to see exactly what they contain:
- ccc = all community ads
- hhh = all housing ads
- sss = all for sale ads
- bbb = all service ads
- ggg = all gigs
- jjj = all jobs
So what else is there you’re wondering? Why personals of course. Even the “res” means resumes so that’s not patrolled either. Now you’re wondering, “why even bother with Craigslist?”
Let me introduce you to our friend RSS. Because Craigslist is nice enough to make their RSS feed available by search category there are many websites which use that data to stuff their content and republish CL content to their sites. The site publishing the RSS feed may not block indexing and you just may get links from search engines picking up your keywords from these sites. The downside is that CL uses the nofollow tag even in their RSS feed. However, if the content is indexed by Google et. al., you’re going to get an increased likelihood for click-throughs from those visitors. And it does happen.
Still, for CL, the best way is to write compelling titles and post in relevant categories.
What about this Backpage thing?
Backpage's robots.txt file
This is a unique approach to the same format as CL started by the Village Voice and partnering with a dozen or so newspapers nationwide. BP handles search engines and their syndication feed a little differently. The RSS feed on BP, for example, strips all HTML from the content and delivers a text only version. If you type your links like <a href=”http://icobb.com”>CLICK HERE</a> it will completely strip it down to just “CLICK HERE”. So on BP you want to create your links as <a href=”http://icobb.com”>http://icobb.com</a> and the feed will be stripped to just “http://icobb.com”.