Key Terminology: What is Information Architecture?
Information architecture (often referred to by the acronym “IA”) is the structure of a system, or how information is categorized, linked together, and made navigable by system users. In relation to search engine optimization, the information architecture of a website enables a search engine crawler to find and index individual web pages, or restricts it from doing so. In essence, IA sets the foundation for optimization, because you must be indexed before you can ever begin to hope or plan to rank.
Warning Signs That Your IA Is Acting As A Barrier to Search Engine Crawlers
1. Core site navigation not crawlable
2. Unclear hierarchy and text links
A clear hierarchy means a clear categorization of content levels from top to bottom (e.g., Home page > Widgets > Blue Widgets > Small Blue Widgets). Make sure that you link to each of these pages using text links and use Heading tags (H1, H2, etc.) on each page level.
3. Too many broken links or 404 pages
Having too many broken links within your site’s navigation or on the pages themselves is an unhealthy indication of the site’s overall crawlability. Use Google Webmaster Tools to help identify broken links on your site.
4. Website designed all in Flash or consisting of all images
Search engines have trouble reading (crawling) content developed in Flash or trapped within an image. Since content is one of the most important ranking factors for a website, it makes sense to ensure search engines can crawl and index it.
5. Too many parameters or variables in dynamic URLs
Google recommends 2-3 at the most. Their Google Webmaster Guidelines explains it best — “If you decide to use dynamic pages (i.e., the URL contains a “?” character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them few.”
6. Using session IDs in URLs
Session IDs are unique to each visitor and crawler. This would give you many possible URLs containing the same content. Duplicate content is not ideal for search engines. They need to be able to come back and visit the same URL each time they crawl your site. The historical values of a site’s URLs are important to search engine optimization.
7. Errors in the Robots.txt file
Make sure you have a simple robots.txt file available to the search engines, which will check for this file before crawling your site to see if you want certain areas of the site blocked from being indexed. If you already have a live robots.txt file, review it and ensure that you are not blocking areas of the website that should be indexed. More information about robot.txt files is available at www.robotstxt.org.
8. Not installing 301 redirects after moving or renaming site pages
If you go through a site redesign (and your URLs have changed) or if you change domain names, you need to let the crawlers and visitors to your site know what the new URLs are. Do this by placing 301 redirects from the old URLs to the new ones.
9. Offering duplicate pages to users and crawlers
Many times shopping carts and CMS (content management systems) will do this by accident. Offering session IDs in URLs, printable versions of pages, archived pages, and duplicate shopping cart pages (blue widgets and red widgets could essentially contain the exact same copy) is not effective. When you do this, in essence, you leave it up to the search engines to decide which duplicate page to index, if any. It is much better to either use canonical tags or make the URLs unique.
10. Using frames of any kind
Frame web design was very popular before search engines became important. Frame web design actually references other files (header, footer, body, sidebars, etc.) to pull together a fully constructed page. The problem is that search engines usually try to index all of the pieces separately, so they will not provide any value from an SEO perspective. If the search engines can’t see all the content together, already assembled, your site may not rank as well as it could.
11. Password-protected or login pages blocking access to content
Search engines cannot find content that is blocked using password protection of any kind. If you want that content indexed, then remove the protection.
12. Invalid source code
It is not absolutely necessary to have valid code to rank well. In fact, some of Google’s own pages are not completely valid, but attempt to make sure the site is as valid as common practice of normal web design. Use a free tool such as W3C Markup Validation Service (http://validator.w3.org/).
Many larger sites “paginate” (i.e., divide content/links into multiple pages) to make it easier for users to find what they are looking for, which can lead to search engines seeing each “paginated” page as a duplicate of each other. Ensure you use unique titles, descriptions, and product snippets on each of these pages.
14. Excessive amounts of hard-to-crawl code
15. No sitemap in sight
Sitemaps are the easiest way for search engines to find all the content on your site. You should create a sitemap that is visible to your end users as well as an XML-based one. You can then submit the XML-based sitemap to the webmaster tool services for each search engine. Learn more on proper XML sitemap protocol at http://sitemaps.org/protocol.php.
16. Long page load times
Search engines are getting better at reviewing website load time, and are factoring faster loading pages into algorithmic considerations for determining rank. Review your site’s page speed with the help of these free tools from Google: http://code.google.com/speed/tools.html.
Want to learn more? Here are three great resources for continuing your education on improving information architecture:
1. Individual Search Engine Webmaster Tools – Search engines have been more open to helping webmasters and provide valuable feedback on issues they find with your website — all for free. Get data about crawling, indexing, and search traffic. Receive notifications about problems on your site.
2. Information Architecture Tutorial at Webmonkey (www.webmonkey.com/2010/02/information_architecture_tutorial)
3. Web Design from Scratch at www.webdesignfromscratch.com/website-architecture/
Image: Nuts & Bolts Still Life from Shutterstock