Web Developers Guide to SEO

The end goal of search engine optimization is to drive traffic, relevant traffic, and top rankings on major search engines can send lots of it. Web development can bring about massive changes to a websites SEO environment, oftentimes with very slight forces or adjustments.

From a development perspective, it’s important to keep some essential SEO principles in mind when working on a website. Although this is certainly not a comprehensive listing of factors, it will apply to most of the activities a web developer engages in on a daily basis.

I. Robots.txt
Robots.txt controls what Google can and cannot see. Anything blocked in the robots file will not be indexed into Google’s search engine results pages. For a typical website with everything visible the file should look like this:

User-agent: *
Disallow:

If there are folder you’d like to be blocked (/sessionid, for example) insert this into the disallow rule as follows:
User-agent: *
Disallow: /schools/

For the sake of being ‘elegant’ we generally want everything visible to Google. But, as with many things in life – our websites aren’t perfect and sometimes certain sections of the website need to be kept private to avoid duplicate content penalties or sensitive information leaks.

II. Server Response Codes
404 Deathray
For the sake of usability, always make sure a 404 response code retains the ‘boilerplate’ of the website – such as the masthead, top navigation, and possibly important sidebars or footer links to engage users and get them into the website.

403 Forbidden
Always avoid 403 forbidden errors, as the search engines do not look favorably on them. If you have a part of the site that, for whatever reason – is forbidden, adjust the response code to either a) instigate a 404, or b) instigate a 200 response code

301 Redirects
301 redirects are a signal to the search engine that the page has permanently changes its URL location. It is important to understand that PageRank accumulated on this page will not all be passed to the new page, it will be diminished by about 10-15% for every 301 redirect it passes through. Also keep in mind that the new page needs to retain some of the older pages ‘themes.’ By themes we are talking about the general ideas expressed on the webpage.

An example:
If I had a webpage about Cows, and it accumulated several backlinks from external sources, say with anchor text “Find All About Cows Here” – and suddenly I were to 301 redirect this page to a new webpage about Horses – a search engine would devalue/ignore all external backlinks to the page because obviously, there has been a massive content change and any backlinks pointing to the page subjectively with “Cows” have now been deemed irrelevant. So, only 301 redirect pages to places where it makes ‘sense.

302 Redirects
302s should never be implemented for the purposes of our websites. Suffice to say they differ from 301’s only in the fact that they are not declared permanent changes, and PageRank does not get passed on a 302 redirect. It’s quite possible to continue ranking on a webpage that has been 302 redirected.

III. .htaccess
This file is used for telling Google what we’d like our response codes to be for certain webpages. It’s important to always be consistent in your conditions. Within htaccess you can also use rewrite rules, which are very much like redirects except do not “pass” one page to another. They simply rewrite themselves. However, certain criteria do need to be met for a rewrite rule, such as the webpage maintaining the same file path.

Rewrites are helpful if we are interested in making minor URL changes to better optimize webpages, as they (theoretically) do not lose any PageRank in so doing, unlike a 301 permanent redirect.

IV. Cannonical issues
Canonicalization occurs when I’m able to access ‘uscollegesearch.org’ and ‘www.uscollegesearch.org’ independently of each other, that is – I am able to access both URL locations. This is not a good thing, because a search engine will recognize these 2 locations as being independent of each other (ie, a subdomain). Another important consideration involves backlinks. Many webmasters will insert backlinks with and without the “www” prefix – so if we’d like to capture all of these backlinks into one, concentrated location – we need to address this and make only the website with the “www” prefix accessible.

A canonical issue can be fixed within the tags on the website with the following code inserted directly:
(link rel="canonical" href="http://www.yoursite.com/page.php" /)

V. XML Sitemaps
Sitemaps are treated as a guide for search engines to use when crawling your website. We’ve seen increased crawl rate and increased indexing when using these, so always be sure to submit a proper sitemap for a website you manage. Not all web-based crawlers will give you an accurate sitemap. One locally hosted piece of software that is reliable is GSite Crawler. However, a local sitemap generation is always preferred. Once you have an XML Sitemap generated, upload it into the ‘Sitemaps’ section of Google Webmaster Tools and Yahoo Site Explorer.

VI. Database generation
Always be mindful of database changes. These can be broad, sweeping changes with very little effort that can have a dramatic effect on a website. A database driven website is essentially a visual representation of the data source - the database. Be conscious on webpage elements such as URLs, Title Tags, Headlines, Images, and Link Anchor text when modifying the way a database is called.

VII. URL Structure
Dynamically or programmatically generated URLs using PHP or ASP scripting for the template are not helpful for SEO purposes and ignored entirely by search engines. URLs need to be static, descriptive ,within 3-5 words in length (not counting folders occurring before the webpage), and having continuity with their capitalization. We generally do not capitalize anything in the URLs. Remember that the “_” character is not seen by Google, so when denoting a space, always use a “-“ dash.

VIII. Use of Java and Flash
We try to keep the use of flash and javascript to a minimum. Instead we find it more practical to work with div manipulation to give ‘usability’ to websites. Both ajax and div manip’s are crawlable, where javascript and flash are not typically crawlable. Google has been making headway on properly crawling Flash, a good rule of thumb is to have your SEO dudes make the primary content optimized in addition to the Flash movie.

IX. Web Developers Toolbar
Whenever in doubt about what a search engine will see on a webpage or website your modifying, go into the web develops toolbar and:

-Disable images
-Disable javascript and flash
-Disable CSS styling

This will give you an idea as to what Googlebot actually ‘views’ when visiting your website. There is also new and improved functionality on Google Webmaster Tools that enables you to see a printout of what Google claim their bots use.