Archive for the 'search engine optimization' Category

Should I cross link my sites for better rankings?

My loyal reader Jez asks a very interesting question. I am sure the same question is on the minds of others in the same situation.

Finally, I am in the process of creating multiple sites around a similar theme. I have unique content for all sites, and will host on different servers in Europe and the US, however the whois for each domain will show my name (The company I used does not allow me to hide this info).

Is the common whois likely to make much difference when I begin cross linking the sites?

Cross linking (or reciprocal linking) in a small scale (maybe 10 to 15 sites maximum) should not be a major concern. I’ve seen many sites do it and they are ranking in highly competitive phrases. Most of their link juice comes from non-cross-linked sites though.

When you try to do this on a massive scale, things start to get interesting. I know this from experience.

Back in 2003 and 2004, I managed to get a couple of my sites ranking on Google for “Viagra” and most variations. That is one of the most competitive industries, because you make really good money as an affiliate. I got those rankings through link exchanges exclusively. Being a developer, I created scripts to ‘borrow’ links from my competitors link directories and later traded links with my sites. When I hit the 5,000 links mark, my sites got banned and I dropped in all my rankings. Back then, Google was not as sophisticated as it is now.

Later, I carefully studied competitors that were doing a more advanced type of cross linking. They created large networks of sites that they owned, and they created complex inter linking structures to boost the rank of a few of their sites for highly competitive terms. Pair.com was a common web host as they provided IP address in different class C blocks.

That worked well for a while–until Google became a registrar. It is illegal to use fake domain registration information, and by having access to the domain ownership information Google could more easily identify complex cross linking. I think they became a registrar with that sole purpose. I don’t see them selling domains in the future. They haven’t yet. Have they?

Making your cross linked domains’ registration private won’t help much either. I think registrars have access to the real information anyways, but even if I am wrong, it would be suspicious for your site to have all inbound links coming from private registrations.

There are far more complex cross linking schemes where there are a few owners cooperating in the creation of massive collection of websites with well planned link boosting structures. The funny thing is that search engine researchers have already identified most of them. Check the paper “Link Spam Alliances“, it is a very interesting read.

So, If you want to cross link on a massive scale, you better have a very intricate and complex linking plan to avoid detection.

Great Content + Bad Headline = Mediocre Results

You can spend a few hours researching, structuring, drafting and proofreading a great post, to completely miss it by choosing a really bad title.

I recently submitted a carefully crafted rebuttal to the Seomoz article: Proof Google is Using Behavioral Data in Rankings. The post generated some controversy and some heated discussion as to the validity of the tests and results. I read everything. And, given my technical nature, I decided to dig deeper in myself.

I ended up with slightly different conclusions about the experiments. If you want to find out please read the post at Youmoz.

Now, here’s the bad news.

As Kurt, wisely points out, I tragically missed the mark by poorly choosing an empty title: “Relevance feedback“.

Kurt (86)

Sat (6/16/07) at 05:38 PM

Good post… well thought out and presented… gave it a thumbs up.

Unfortunately, it will most likely get overlooked by most readers due to its title/headline.

Look at the article you’re a referencing, “Proof Google is Using Behavioral Data in Rankings“. You know that headline will bring in some clicks. It was moved to the blog of SEOmoz from the Youmoz section (even with its flawed testing and logic). The mozzers aren’t stupid… they know this type of headline and article will stir up some controversy and bring in some links.

I’m no expert copywriter… far from it. I just hate to see a good post sit on the sidelines because of a bad headline.

The title I chose did not offer the reader any incentive to click or learn more. I guess that I operate in two modes: engineer and marketer and that I forgot to flip the switch while writing this post.

First, let me state that his remarks about the mozzers are valid for most journalists, trade publications, social media sites, etc. It is human nature to judge books by their cover. If the cover is crap, the content must be crap. That is how we normally think.

Again, whether you are writing:

1. A blog post
2. A book
3. An email
4. A fax cover letter
5. An article
6. A Digg submition
7. etc.

Write title/subjects that entice users to read further.

What can you learn from my mistake?

1. Most people scan web pages. They don’t have the time to follow each link. The title must be a call to action: “this is interesting, click to learn more”.
2. Summary/excerpt is very important too. I chose a really bad first paragraph. If you write post as guest for other popular blogs, you want your title and first paragraph to be cliff hangers. You must get people to click further.
3. Content importance is second to title and excerpt! This is sad, but true. While crappy content won’t get the word out, crappy titles won’t even get the word in the first place.

Deceptive titles are not a good idea

Am I suggesting you start writing bait and switch posts? Definitely not.

While controversy draws attention, writing titles that say one thing and when you read the content you find another is the best way to brand yourself as a charlatan.

Ideally, you should spend enough time carefully writing your posts (especially, if they are to be published on other websites), and spend a few minutes carefully writing the titles as well. Be creative!

Google’s inner workings – part 1

Google keeps tweaking its search engine, and now it is more important than ever to better understand its inner workings.

Google lured Mr. Manber from Amazon last year. When he arrived and began to look inside the company’s black boxes, he says, that he was surprised that Google’s methods were so far ahead of those of academic researchers and corporate rivals.

While Google closely guards its secret sauce, for many obvious reasons, it is possible to build a pretty solid picture of Google’s engine. In order to do this we are going to start by carefully dissecting Google’s original engine: How Google was conceived back in 1998. Although a newborn baby, it had all the basic elements it needed to survive in the web world.

The plan is to study how it worked originally, and follow all the published research papers and patents in order to put together the missing pieces. It is going to be very interesting.

Google has added and improved many things over the years. The original paper only describes the workings of the web search engine. Missing features are the ability to search news, images, documents (PDF, word, etc.), video, products, addresses, books, patents, maps, blogs, etc.

Also missing are substantial improvements such as local search, mobile search, personalized search, universal search, supplemental index, freshness, spam detection and PageRank improvements. Some things that will be hard to know is how Google uses the data it collects through other services, like Google Toolbar, Google Analytics, Google Adsense, Doubleclick, Gmail, Gtalk, Feedburner, etc. There is a lot of information that can be used both for better ads and for better search results.

No matter the type of search you are conducting, conceptually, search engines have three key components: the crawler, the indexer and the searcher.

The crawler’s (also known as a search engine robot) job is to collect all the information that will be later searched. Whether it’s images, video, text or RSS feeds. These documents are stored for later processing by the indexer module. Webmasters and site owners can control how crawlers access their websites via a robots.txt file and the robots exclusion protocol. In this file you basically tell the crawler what pages or sections it is not allowed to crawl. I posted an entry about this several days ago.

The indexer module is the one doing the heavy lifting. It has the daunting task of carefully organizing the information collected by the crawler. The power of the search engine is on this specific task. Depending on how well classified the information is – the faster and the better the search. Search engines conceptually classify documents similar to the way you file documents on a cabinet. Without some sort of labeling you will probably waste a lot time finding your bank statements, notes, etc. Search engines label documents in a way that makes it easy for them to find them later by words or phrases (also known as keywords). In the case of text and similar documents the indexer breaks down the document in words and collects some additional information about those words, such as the frequency of the word in the document.

The searcher module is the one that takes the user search, cleans it to remove ambiguities, misspellings, etc., finds the documents in the index that more closely match the search, and rank them according to the current ranking formula. The ranking formula is the most closely guarded secret of all major commercial search engines.

These basic components remain the same nowadays, but they say that the devil is on the details. Today, Google’s inner workings are far more complex than what I am going to explain, but the basic principles are the same. I will quote the original paper as necessary.

There is quite a bit of recent optimism that the use of more hypertextual information can help improve search and other applications [Marchiori 97] [Spertus 97] [Weiss 96] [Kleinberg 98]. In particular, link structure [Page 98] and link text provide a lot of information for making relevance-related assessements and quality filtering. Google makes use of both link structure and anchor text (see Sections 2.1 and 2.2).

One notable improvement Google brought to the commercial search markeplace was the use of link structure and anchor/link text to improve the quality of results. This proved to be a significant factor that helped fuel their growth. Today, these elements remain significant, but Google makes use of very sophisticated filters to detect most attempts at manipulation. Proof that they remain signifficant are the successful Google bombs of late.

To support novel research uses, Google stores all of the actual documents it crawls in compressed form

Here is the reference to their caching feature we are acostumed to use.

Now, let’s see how they define PageRank — how important or high quality are pages for Google’s search engine.

…a page can have a high PageRank if there are many pages that point to it, or if there are some pages that point to it and have a high PageRank. Intuitively, pages that are well cited from many places around the web are worth looking at. Also, pages that have perhaps only one citation from something like the Yahoo! homepage are also generally worth looking at. If a page was not high quality, or was a broken link, it is quite likely that Yahoo’s homepage would not link to it. PageRank handles both these cases and everything in between by recursively propagating weights through the link structure of the web.

Here is a clear description of why they use anchor text for searching.

The text of links is treated in a special way in our search engine. Most search engines associate the text of a link with the page that the link is on. In addition, we associate it with the page the link points to. This has several advantages. First, anchors often provide more accurate descriptions of web pages than the pages themselves. Second, anchors may exist for documents which cannot be indexed by a text-based search engine, such as images, programs, and databases. This makes it possible to return web pages which have not actually been crawled. Note that pages that have not been crawled can cause problems, since they are never checked for validity before being returned to the user. In this case, the search engine can even return a page that never actually existed, but had hyperlinks pointing to it. However, it is possible to sort the results, so that this particular problem rarely happens.

Now, let’s read about on-page elements that Google considered that were not in regular use back then. Proximity, capitalization and font weight, and page caching.

Aside from PageRank and the use of anchor text, Google has several other features. First, it has location information for all hits and so it makes extensive use of proximity in search. Second, Google keeps track of some visual presentation details such as font size of words. Words in a larger or bolder font weigh heavier than other words. Third, full raw HTML of pages is available in a repository.

Now let’s see have a big picture view as to how everything fits together. This is very technical, but I will try to explain it the best I can.

In Google, the web crawling (downloading of web pages) is done by several distributed crawlers. There is a URLserver that sends lists of URLs to be fetched to the crawlers. The web pages that are fetched are then sent to the storeserver. The storeserver then compresses and stores the web pages into a repository. Every web page has an associated ID number called a docID which is assigned whenever a new URL is parsed out of a web page. The indexing function is performed by the indexer and the sorter. The indexer performs a number of functions. It reads the repository, uncompresses the documents, and parses them. Each document is converted into a set of word occurrences called hits. The hits record the word, position in document, an approximation of font size, and capitalization. The indexer distributes these hits into a set of “barrels”, creating a partially sorted forward index. The indexer performs another important function. It parses out all the links in every web page and stores important information about them in an anchors file. This file contains enough information to determine where each link points from and to, and the text of the link.

The URLresolver reads the anchors file and converts relative URLs into absolute URLs and in turn into docIDs. It puts the anchor text into the forward index, associated with the docID that the anchor points to. It also generates a database of links which are pairs of docIDs. The links database is used to compute PageRanks for all the documents.

The sorter takes the barrels, which are sorted by docID (this is a simplification, see Section 4.2.5), and resorts them by wordID to generate the inverted index. This is done in place so that little temporary space is needed for this operation. The sorter also produces a list of wordIDs and offsets into the inverted index. A program called DumpLexicon takes this list together with the lexicon produced by the indexer and generates a new lexicon to be used by the searcher. The searcher is run by a web server and uses the lexicon built by DumpLexicon together with the inverted index and the PageRanks to answer queries.

Google uses distributed crawlers/downloaders. If you have ever looked at your server log files you will notice that when Googlebot is visiting your site you will see hits coming from different IPs. That is because the crawling is distributed among several computers. They need a URL server to feed the URLs to download, to the crawlers, because the URL server is the one coordinating the crawling efforts.

All the urls are sent to the storeserver for compression and storage and are assigned an ID (doctID). For computers, it is easier and more efficient to use numbers to refer to things.

The indexers does some heavy work:

  • Reads, uncompresses and parses documents. Converts documents into hits (word ocurrences)
  • Creates partially sorted forwarded indices.
  • Create anchors file (link text, and to and from links). URLResolver fixes relative URLs and assigns docIDs.
  • Include anchor text in forward index but using the link it points to as the docID. Associates the text in the link to the document it points to.
  • Maintains a link databases used to compute PageRanks
  • Generates a lexicon–the list of all different words in the index

Basically, the forward index allows you to find the words of a document given the docID. In order to be useful for searching, this needs to be inverted. ie.: find documents by the words. The sorter does this addtional step, by assisting the indexer in creating an inverted index that uses wordIDs as keys to the docIDs. The inverted index includes the offsets and list of words. Dumplexicon is used to update the lexicon used by the searcher.

Finally, the searcher combines the lexicon, the inverted index and the PageRanks to respond the queries.

Next, I’ll describe each of the processes in more detail. Can’t wait? Read the document yourself and draw your own conclusions :-)

Why Viralinks are a waste of time?

I’m new to blogging, and I’m catching up with a lot of interesting things. One of them is the Viralink, coined by Andy Coates.

I was exposed to the concept while reading John’s blog. One of the readers mentioned he was trying out a Viralink on his blog and he was getting a little bit of traffic.

What is a Viralink? A Viralink is basically a new scheme to build up the PageRank of the participating sites. The instructions at Andy’s blog explain everything better.

———copy and paste the Viralink and instructions below this line———

Below is a matrix of 120 stars, I have already added a link to my blog onto one of the stars, all you need to do is copy and paste the grid into your blog and add your own link to one of the other spare stars, and tell others to do the same!

Viralink

********************
*
*******************
********************
**
******************
********************
******
**************

When I receive a ping back once you have added the Viralink to your site I will add your link to this grid, and each person who copies the grid from here will also link to your site!

Rules
No Porn Sites
Only 1 link per person (i.e don’t hog the viralink!)
Please don’t tamper with other peoples url’s
Enjoy!

———copy and paste the Viralink and instructions above this line———

I have to admit that it is a very clever idea. By participating in a Viralink, you can potentially get hundreds or thousands of links, and a very nice PageRank.

Now, let me give you the specific reasons why I think this is risky, and pretty much a waste of time.

  • There is no direct or indirect benefit for your readers. The links don’t even have text or descriptions. You can’t expect readers to mouse over the links, and try to guess from the URL whether they want to click to the linked page or not. This is simply designed to fool the search engines.
  • There is no anchor text benefit. I am not sure who wants to be #1 for the highly popular phrase ‘*’.

Quality guidelines

These quality guidelines cover the most common forms of deceptive or manipulative behavior, but Google may respond negatively to other misleading practices not listed here (e.g. tricking users by registering misspellings of well-known websites). It’s not safe to assume that just because a specific deceptive technique isn’t included on this page, Google approves of it. Webmasters who spend their energies upholding the spirit of the basic principles will provide a much better user experience and subsequently enjoy better ranking than those who spend their time looking for loopholes they can exploit.

If you believe that another site is abusing Google’s quality guidelines, please report that site at http://www. google.com/contact/spamreport.html. Google prefers developing scalable and automated solutions to problems, so we attempt to minimize hand-to-hand spam fighting. The spam reports we receive are used to create scalable algorithms that recognize and block future spam attempts.

Quality guidelines – basic principles

  • Make pages for users, not for search engines. Don’t deceive your users or present different content to search engines than you display to users, which is commonly referred to as “cloaking.”
  • Avoid tricks intended to improve search engine rankings. A good rule of thumb is whether you’d feel comfortable explaining what you’ve done to a website that competes with you. Another useful test is to ask, “Does this help my users? Would I do this if search engines didn’t exist?”
  • Don’t participate in link schemes designed to increase your site’s ranking or PageRank. In particular, avoid links to web spammers or “bad neighborhoods” on the web, as your own ranking may be affected adversely by those links.
  • Don’t use unauthorized computer programs to submit pages, check rankings, etc. Such programs consume computing resources and violate our Terms of Service. Google does not recommend the use of products such as WebPosition Gold™ that send automatic or programmatic queries to Google.
  • Your competitor will see this and report you.
  • This is very easy for search engines to detect automatically. They just need to look for blocks of links with ‘*’ in their anchor text.

Doing this is probably very time consuming. Why not spend the time creating useful content that attracts links naturally (Linkbait)?

“Make Money Online” link bomb defused?

If you read my previous post advising John to change his link-building strategy, then you won’t be surprised to learn that John Chow is no longer ranking on the first page for the term “make money online“.

On January this year, Danny Sullivan from Search Engine Land, reported that Google had devised an algorithm to catch “link bombs” – a common practice consisting on having many sites link to yours with a particular target phrase in their link text.

SEOs had a great deal of success with this particular link bomb regarding David Colbert: “The greatest living American”, and it took Google a couple of weeks to defuse it.

This is Google’s official response to “link bombs”, written by Google Search Evangelist Adam Lasnik (source: searchengineland.com,):

“Our effort to defuse Googlebombs continues to be purely algorithmic. We do not make manual changes. We prefer to tune these algorithms to avoid all false positives in exchange for less immediacy and slightly less thoroughness in catching all Google bombs.”

I wasn’t really sure if I understood the last part, did this mean that Google knew the link bomb fix wouldn’t catch ALL bombs in order to avoid having their filters exclude helpful uses of anchor text? Adam replied:

“Correct. We don’t want to impact situations with search results that may be associated with, say, breaking news events… things that have nothing to do with groups of folks (however playfully) attempting to game search results.”

Luckily for John, this ranking only represented about 150 visitors per day to his blog which receives several thousand visitors daily. Anyhow, he decided to stop the “review me for a link” campaign. It’s too bad Google spoiled all the fun.

This highlights the importance of having a diverse marketing mix and not to rely solely on search engines for your traffic.

Why it’s good to mix your incoming link anchor text?

I’ve been reading John Chow’s blog for a while and it is very interesting how he is getting a lot of reviews with the anchor text “make money online” in exchange for a link from his blog. He is ranking #2 in Google for the phrase “make money online.”

I know a lot of SEOs read John’s blog and are not alerting him of some potential problems with this approach. I like the guy and I think he deserves to know.

It is not a good idea to have most of your incoming links with the same anchor text. Especially if most links are pointing to the home page, and the rest of the pages don’t get any links, or very few of them do. Search engines, notably Google, flag this as an attempt to manipulate their results.

Nobody knows for sure how it works but Google has proven in the past that they can detect this and act accordingly.

My advise is to request variations of the target phrase for the anchor text with each batch. For example: make money online free, making money online, make money at home online, work from home, etc… Use a keyword suggestion tool to get the variations and make sure you include synonyms too.

I would also require reviewers to include a link to their favorite post in the review. This way the rest of the pages will get links too and look more natural.

This is documented in other sites. Please check:

http://www.marketingpilgrim.com/2007/01/google-defuses-googlebombs-does-this-change-link-building-practices.html

http://www.linkbuildingblog.com/2007/04/how_not_to_buil.html

http://diagnostics.googlerankings.com/anchor-text-link.html Case #2

http://www.webmasterworld.com/forum30/29269.htm

http://www.seobook.com/archives/000894.shtml

Why start SEO and Affiliate Marketing with PPC?

1. Accurate keyword research.  There are numerous keyword research tools that help you identify keywords that people are searching for, their volume of searches, level of competition, etc… Unfortunately, every single tool has a critical problem: the source of the information.

Wordtracker relies on information from meta search engine Dogpile, and similar sources. Yahoo mixes plurals, singulars, and phrases typed in different order; the information reported is from the previous month. Google tries to estimate traffic and fails to provide good predictions most of the time. There are other popular tools that have similar problems.

Running a test PPC campaign for a week or two will provide actual and dependable statistics about the amount and quality of the traffic to be expected for each keyword.

2. High click-through titles and descriptions. Page titles and meta descriptions are what people will normally see in the search results. We need to provide an incentive for the searcher to click-through.

Unfortunately it is very tricky to test changing titles and meta descriptions for SEO. We need to be able to rank first!

PPC management tools are designed so that we can easily split test multiple ads and the system will tell us which ads perform better. When we find the winning PPC ads we can use them to create our titles and meta descriptions.

3. High converting landing pages. Having a high conversion rate and high converting landing pages is not only important for our bottom line, it’s very important to retain top affiliates as well.

Another advantage of running test PPC campaigns is that we can tweak our landing pages until they give us the desired results.

Top affiliates measure the merchants effectiveness by their earn per click (EPC) — how much they make from every click they send. You can offer large commissions, incentives, etc… What really matters is how well their traffic will convert.

Even if you don’t plan to run a PPC campaign, it makes perfect sense to run at least one as a test to help you improve the results you will get with other channels.

Why Google needs a supplemental index?

Search engines researchers use two main concepts to identify the success of search engine algorithms:  precision and recall.

Precision measures how accurate the algorithm is in finding the best matches for our searches.  Recall measures how comprehensive the algorithm is in finding as many relevant results as possible.

In their effort to fight spam Google has filtered out a lot of pages that would otherwise rank.  They have created a separate index for such pages which they call the supplemental index.

Basically this is an effort to improve the comprehensiveness of the search engine — the recall.  If there are no pages in the main index for a particular search, they will at least have the supplemental results.

SEO 2.0 is all about links

Similar to the versioning used for the web (web 1.0, web 2.0, web 3.0), I like to version SEO (Search Engine Optimization) as well.

In SEO 1.0, in order to achieve high rankings, SEOs simply needed to include the phrases they wanted to rank enough times to get to the first positions.  Back then, it was all about keyword density, meta keyword tags, etc… What is commonly known now as on-page optimization.

With the arrival of Google, SEOs had to adapt.  Google is “a large scale hyper-textual search engine.”  This means that the search engine relies heavily on links and the information associated with them.

We responded with SEO 2.0.  That is, we trade, buy, beg, etc.., for as many links as we can pointing to our sites with our desired keywords in the link text.

Google has updated its algorithm multiple times, but the links always carry a lot of weight.  We only need to look for them in the right places.

On-page factors are still very important, but as many have demonstrated it is possible to rank for keywords that are not included in the body of the page.

I consider the title, meta description and headings very important.  They do help the rankings, but the most important benefit is that they need to be treated in the same way we treat Google ads in Adwords.

The page title and meta description are what visitors see when they search. We better have great titles and descriptions that motivate the searcher to click through. Having the keywords on the heading of the page is also important so that the visitor feels she is in the right page and doesn’t hit the back button.

Now with the introduction of personalized search, a new battlefield has emerged.

Every user can have a potentially completely different page of results for the same search based on his searching and browsing habits, physical location, etc…

Users will be happier but as SEOs we will face our most difficult challenge.  Links will not be as important as before.

I think that visitor traffic and loyalty will be the primary measures search engines will use to identify worthy websites.

As SEOs, we need to start focusing more on learning marketing skills than in learning technical ones.  Understanding what visitors want, and giving it to them, will be paramount for success.