We frequently hear statements like this:
“No smart engineer would ever build a search engine that requires websites to follow certain rules or principles in order to be ranked or indexed.
Anyone with half a brain would want a system that can crawl through any architecture, parse any amount of complex or imperfect code, and still find a way to return the most relevant results, not the ones that have been ‘optimized’ by unlicensed search marketing experts.”
But Wait …
Imagine you posted online a picture of your family dog.
A human might describe it as “a black, medium-sized dog, looks like a Lab, playing fetch in the park.” On the other hand, the best search engine in the world would struggle to understand the photo at anywhere near that level of sophistication.
How do you make a search engine understand a photograph? Fortunately, SEO allows webmasters to provide clues that the engines can use to understand content.
In fact, adding proper structure to your content is essential to SEO.
Understanding both the abilities and limitations of search engines allows you to properly build, format, and annotate your web content in a way that search engines can digest.
Without SEO, a website can be invisible to search engines.
The Limits of Search Engine Technology
The major search engines all operate on the same principles, as explained in Chapter 1.
Automated search bots crawl the web, follow links, and index content in massive databases.
They accomplish this with dazzling artificial intelligence, but modern search technology is not all- powerful.
There are numerous technical limitations that cause significant problems in both inclusion and rankings.
We’ve listed the most common below:
Problems Crawling and Indexing
Online forms: Search engines aren’t good at completing online forms (such as a login), and thus any content contained behind them may remain hidden.
Duplicate pages: Websites using a CMS (Content Management System) often create duplicate versions of the same page; this is a major problem for search engines looking for completely original content.
Blocked in the code: Errors in a website’s crawling directives (robots.txt) may lead to blocking search engines entirely.
Problems Matching Queries to Content
Uncommon terms: Text that is not written in the common terms that people use to search.
For example, writing about “food cooling units” when people actually search for “refrigerators.”
Language and internationalization subtleties: For example, “color” vs.
“colour.” When in doubt, check what people are searching for and use exact matches in your content.
Incongruous location targeting: Targeting content in Polish
Poor link structures: If a website’s link structure isn’t understandable to the search engines, they may not reach all of a website’s content; or, if it is crawled, the minimally-exposed content may be deemed unimportant by the engine’s index.
Non-text Content: Although the engines are getting better at reading non-HTML text, content in rich media format is still diﬃcult for search engines to parse.
This includes text in Flash files, images, photos, video, audio, and plug-in content.
when the majority of the people who would visit your website are from Japan.
Mixed contextual signals: For example, the title of your blog post is “Mexico’s Best Coﬀee” but the post itself is about a vacation resort in Canada which happens to serve great coﬀee.
These mixed messages send confusing signals to search engines.
Make sure your content gets seen
Getting the technical details of search engine-friendly web development correct is important, but once the basics are covered, you must also market your content.
The engines by themselves have no formulas to gauge the quality of content on the web.
Instead, search technology relies on the metrics of relevance and importance, and they measure those metrics by tracking what people do: what they discover, react, comment, and link to.
So, you can’t just build a perfect website and write great content; you also have to get that content shared and talked about.
Constantly Changing SEO
When search marketing began in the mid-1990s, manual submission, the meta keywords tag, and keyword stuﬃng were all regular parts of the tactics necessary to rank well.
In 2004, link bombing with anchor text, buying hordes of links from automated blog comment spam injectors, and the construction of inter-linking farms of websites could all be leveraged for traﬃc.
In 2011, social media marketing and vertical search inclusion are mainstream methods for conducting search engine optimization.
The search engines have refined their algorithms along with this evolution, so many of the tactics that worked in 2004 can hurt your SEO today.
The future is uncertain, but in the world of search, change is a constant.
For this reason, search marketing will continue to be a priority for those who wish to remain competitive on the web.
Some have claimed that SEO is dead, or that SEO amounts to spam.
As we see it, there’s no need for a defense other than simple logic: websites compete for attention and placement in the search engines, and those with the knowledge and experience to improve their website’s ranking will receive the benefits of increased traﬃc and visibility.
To perform better in search engine listings, your most important content should be in HTML text format.
Images, Flash files, Java applets, and other non-text content are often ignored or devalued by search engine crawlers, despite advances in crawling technology.
The easiest way to ensure that the words and phrases you display to your visitors are visible to search engines is to place them in the HTML text on the page.
However, more advanced methods are available for those who demand greater formatting or visual display styles:
Seeing your site as the search engines do
Many websites have significant problems with indexable content, so double-checking is worthwhile.
By using tools like Google’s cache, SEO-browser.com, and the MozBar you can see what elements of your content are visible and indexable to the engines.
Take a look at Google’s text cache of this page you are reading now.
See how different it looks?
“I have a problem with getting found.
I built a huge Flash site for juggling pandas and I’m not showing up anywhere on Google.
Whoa! That’s what we look like?
Using the Google cache feature, we can see that to a search engine, JugglingPandas.com’s homepage doesn’t contain all the rich information that we see.
This makes it diﬃcult for search engines to interpret relevancy.
Hey, where did the fun go?
Uh oh …
via Google cache, we can see that the page is a barren wasteland.
There’s not even text telling us that the page contains the Axe Battling Monkeys.
The site is built entirely in Flash, but sadly, this means that search engines cannot index any of the text content, or even the links to the individual games.
Without any HTML text, this page would have a very hard time ranking in search results.
It’s wise to not only check for text content but to also use SEO tools to double-check that the pages you’re building are visible to the engines.
This applies to your images, and as we see below, to your links as well.
Crawlable Link Structures
Just as search engines need to see content in order to list pages in their massive keyword-based indexes, they also need to see links in order to find the content in the first place.
A crawlable link structure—one that lets the crawlers browse the pathways of a website—is vital to them finding all of the pages on a website.
Hundreds of thousands of sites make the critical mistake of structuring their navigation in ways that search engines cannot access, hindering their ability to get pages listed in the search engines’ indexes.
Below, we’ve illustrated how this problem can happen:
In the example above, Google’s crawler has reached page A and
sees links to pages B and E.
However, even though C and D might be important pages on the site, the crawler has no way to reach them (or even know they exist).
This is because no direct, crawlable links point pages C and D.
As far as Google can see, they don’t exist! Great content, good keyword targeting, and smart marketing won’t make any difference if the crawlers can’t reach your pages in the first place.
If you require users to complete an online form before accessing certain content, chances are search engines will never see those protected pages.
Forms can include a password-protected login or a full-blown survey.
In either case, search crawlers generally will not attempt to submit forms, so any content or links that would be accessible via a form are invisible to the engines.
Robots don’t use search forms
Although this relates directly to the above warning on forms, it’s such a common problem that it bears mentioning.
Some webmasters believe if they place a search box on their site, then engines will be able to find everything that visitors search for.
Unfortunately, crawlers don’t perform searches to find content, leaving millions of pages inaccessible and doomed to anonymity until a crawled page links to them.
Links in Flash, Java, and other plug-ins
either do not crawl or give very little weight to the links embedded within.
Links pointing to pages blocked by the Meta Robots tag or robots.txt
The Meta Robots tag and the robots.txt file both allow a site owner to restrict crawler access to a page.
Just be warned that many a webmaster has unintentionally used these directives as an attempt to block access by rogue bots, only to discover that search engines cease their crawl.
Frames or iframes
Technically, links in both frames and iframes are crawlable, but both present structural issues for the engines in terms of organization and following.
Unless you’re an advanced user with a good technical understanding of how search engines index and follow links in frames, it’s best to stay away from them.
The links embedded inside the Juggling Panda site (from our above example) are perfect illustrations of this phenomenon.
Although dozens of pandas are listed and linked to on the page, no crawler can reach them through the site’s link structure, rendering them invisible to the engines and hidden from users’ search queries.
Links on pages with many hundreds or thousands of links
Search engines will only crawl so many links on a given page.
This restriction is necessary to cut down on spam and conserve rankings.
Pages with hundreds of links on them are at risk of not getting all of those links crawled and indexed.
Rel=”nofollow” can be used with the following syntax:
<a href=”http://moz.com” rel=”nofollow”>Lousy Punks!</a>
Links can have lots of attributes.
The engines ignore nearly all of them, with the important exception of the rel=”nofollow” attribute.
In the example above, adding the rel=”nofollow” attribute to the link tag tells the search engines that the site owners do not want this link to be interpreted as an endorsement of the target page.
Nofollow, taken literally, instructs search engines to not follow a link (although some do).
The nofollow tag came about as a method to help stop automated blog comment, guest book, and link injection spam (read more about the launch here), but has morphed over time into a way of telling the engines to discount any link value that would ordinarily be passed.
Links tagged with nofollow are interpreted slightly differently by each of the engines, but it is clear they do not pass as much weight as normal links.
Are nofollow links bad?
Although they don’t pass as much value as their followed cousins, nofollowed links are a natural part of a diverse link profile.
A website with lots of inbound links will accumulate many nofollowed links, and this isn’t a bad thing.
In fact, Moz’s Ranking Factors showed that high ranking sites tended to have a higher percentage of inbound nofollow links than lower-ranking sites.
Keyword Usage and Targeting
Keywords are fundamental to the search process.
They are the building blocks of language and of search.
In fact, the entire science of information retrieval (including web-based search engines like Google) is based on keywords.
As the engines crawl and index the contents of pages around the web, they keep track of those pages in keyword-based indexes rather than storing 25 billion web pages all in one database.
Millions and millions of smaller databases, each centered on a particular keyword term or phrase, allow the engines to retrieve the data they need in a mere fraction of a second.
Obviously, if you want your page to have a chance of ranking in the search results for “dog,” it’s wise to make sure the word “dog” is part of the crawlable content of your document.
Keywords dominate how we communicate our search intent and interact with the engines.
When we enter words to search for, the engine matches pages to retrieve based on the words we entered.
The order of the words (“pandas juggling” vs.
“juggling pandas”), spelling, punctuation, and capitalization provide additional information that the engines use to help retrieve the right pages and rank them.
Search engines measure how keywords are used on pages to help determine the relevance of a particular document to a query.
One of the best ways to optimize a page’s rankings is to ensure that the keywords you want to rank for are prominently used in titles, text, and metadata.
Generally speaking, as you make your keywords more specific, you narrow the competition for search results, and improve your changes of achieving a higher ranking.
The map graphic to the left compares the relevance of the broad term “books” to the specific title Tale of Two Cities.
Notice that while there are a lot of results for the broad term, there are considerably fewer results (and thus, less competition) for the specific result.
Since the dawn of online search, folks have abused keywords in a misguided effort to manipulate the engines.
This involves “stuﬃng” keywords into text, URLs, meta tags, and links.
Unfortunately, this tactic almost always does more harm than good for your site.
In the early days, search engines relied on keyword usage as a prime relevancy signal, regardless of how the keywords were actually used.
Today, although search engines still can’t read and comprehend text as well as a human, the use of machine learning has allowed them to get closer to this ideal.
The best practice is to use your keywords naturally and strategically (more on this below).
If your page targets the keyword phrase “Eiffel Tower” then you might naturally include content about the Eiffel Tower itself, the history of the tower, or even recommended Paris hotels.
On the other hand, if you simply sprinkle the words “Eiffel Tower” onto a page with irrelevant content, such as a page about dog breeding, then your efforts to rank for “Eiffel Tower” will be a long, uphill battle.
The point of using keywords is not to rank highly for all keywords, but to rank highly for the keywords that people are searching for when they want what your site provides.