Synopsis — As the concluding part of his series for Search Marketing Standard on faceted search, Jaimie Sirovich contributed an article to the Summer 2010 issue, detailing three major hazards that those choosing facets for navigational and SEO purposes need to be aware of, and how to deal with them. For those unfamiliar with the term, faceted search is defined in Wikipedia thus: “Faceted search, also called faceted navigation or faceted browsing, is a technique for accessing a collection of information represented using a faceted classification, allowing users to explore by filtering available information. A faceted classification system allows the assignment of multiple classifications to an object, enabling the classifications to be ordered in multiple ways, rather than in a single, pre-determined, taxonomic order. Each facet typically corresponds to the possible values of a property common to a set of digital objects.” For example, if you are selling cameras on your website, the facets that might be included in your navigation are color, brand, price, lens type, etc. Although mainly a concern for sites with large inventories that cross many categories, faceted search can also teach smaller sites much about the logical and effective classification of smaller inventories for potential customers and search engine optimization.
In this article, Jaimie follows up his earlier two posts (here and here) with an analysis of the major problems one has to face when venturing down the faceted navigation path. First amongst these is a consideration of whether, in fact, facets provide an opportunity or is just a potential problem waiting to happen, due to issues of spider traps and duplicate content ramifications. Second, one must be prepared to deal with the massive amount of data needed to make sense of such navigation and to address the problem of user intent (e.g., when a user enters “red” into the search box, are they looking for a red camera or one that has red eye reduction capabilities?) Finally, he addresses the issue of how faceted navigation can impact matters such as site speed and hosting/server capabilities and costs.
Backed by extensive experience in implementing faceted navigation for a number of sites with large and varied inventory, Jaimie not only can attest to the problems, but also offers concrete solutions from real-world examples. The problems that can arise from presenting a varied inventory of products online may range from issues of usability by potential customers to how search engines themselves interpret the situation. It’s a complex problem that provides lessons not only for the Amazons and eBays of the internet, but anyone presenting more than a few items for sale to the online customer.
The complete article follows …
Faceted Search: Hazards Lie Ahead For Ecommerce
In a blog post on the Search Marketing Standard site (“Facets As A Navigational & SEO Powerhouse,” March 30, 2010), I discussed the academic roots and benefits of faceted navigation, which does seem to be a universally positive user experience. Five of the most important benefits include:
1. Facets improve findability — In searching for a camera, a user may find something that is [Yellow], [10 megapixel], and from [Sony] quickly and intuitively by clicking just three times.
2. Facets eliminate frustration — Users may select only valid combinations of properties, so will not be faced with a dead-end of zero products. If [Sony] has nothing in [Yellow], that choice will not be shown or otherwise disabled after the selection of [Sony].
3. Facets provide a guided means to navigate in any order — Users may believe [Yellow] is more important than [10 megapixel] and click accordingly. This contrasts with rigid, category-based schemes, where the structure of categorization presumes it knows how a user will navigate.
4. Facets remove noise from irrelevant results — Combined with keyword search, filtering by any facet (such as [$1000-$1500] or [Sony]), may zero in better on desired products. … and, most relevant to search marketers …
5. Facets provide relevant landing pages for long-tail keywords, just as category-based navigation has always done — A [megapixels] facet could combine with categories “Film SLRs,” “Digital SLRs,” as well as “point and shoot cameras.” This helps catch the “X megapixel + camera-type” queries, while assisting search engines in providing users with exactly what they seek — relevant information.
But are there search-marketing-related hazards associated with facets? The answer is easy: yes.
1. Facets And SEO: Opportunity Or Problem?
Exposing some facet-based pages creates effective landing pages for longer tail external-search queries that categories can only address in contrived ways. However, the emphasis is on “some” pages. Without precautions, faceted navigation creates spider traps with seemingly infinite permutations of similar products. Some implementations attempt to address this by excluding all of their facet-based pages from search engines. This is clearly not ideal, as that dismisses many organic opportunities.
Having consulted on various facetrelated duplicate content problems, it seems clear that the majority of the duplicate content derives from facets applied in various orders, or multiple value-selections within facets. For example, let’s say our camera shopper wants both [Sony] and [Yellow]. A faceted navigation URL might appear in one of two ways:
cameras. html?facets=color:YELLOW_ brand:SONY
cameras. html?facets=brand:SONY_ color:YELLOW
Now we now have two paths to substantially similar content. It’s easy to see that this will grow factorially worse as the number of selections and options increases. Set operations are commutative — “yellow, Sony digital cameras” is equivalent to “Sony, yellow digital cameras.” Further complications arise when multiple value selections are permitted:
digital-cameras.html?fac ets=color:YELLOW,BLUE,RE D_brand:SONY
cameras.html?facets=c olor:BLUE,RED,YELLOW_ brand:SONY
This results in a big mess of URLs — a spider trap. The order of application may be desirable for presenting breadcrumb navigation elements, but it’s a duplicate content disaster waiting to happen. Search engines are not interested in the order of selection, and it would be wise not to depend even on the vast intelligence of Google engineers to infer intent of these URLs.
The most straightforward solution may seem to be employing that everpopular rel=”canonical” tag. However, on a multi-thousand product site, that will not scale. The raw number of pages that yield the same result will still frustrate a robot. Even a reasonably sized product set with 3-4 facets and many values will unwittingly create a spider trap of thousands of pages. Instead, the solution is eliminating duplicate content entirely by the following:
- Present the facet parameters in a predictable order regardless of click order (e.g., color, then brand, then price)
- If a breadcrumb with the facet values in the selected order is needed, use a session-based method to store the order in which the selections were made
- If multiple value selections within a facet are permitted, exclude all URLs with multiple values, perhaps by adding a parameter and using robots.txt to exclude it (cameras.html?facet s=color:BLUE,RED,YELLOW_ brand:SONY&exclude=yes)
Bots will no longer get dizzy while the user experience remains equivalent. Those who pay attention to these concerns will be rewarded accordingly — yet another example of leveraging data in search marketing, because, after all, it’s all about the data.
2. Facets And Data: It’s All About The Data
One of the beauties of keywordbased search is that implementation requires very little data massaging to get started. Simply place relevant copy in the search component, feed combinations of keywords from the user to the black box, and then hope for the best. Most implementations use relative frequencies and/or distance between keywords in each product as a cue to decide which keywords and records are most important. For example, “Sony” might appear less often than “Red,” and therefore receives more weight. If the words are juxtaposed, the score will reflect that. However, these are approximations, and even a well-executed internal keyword search will typically present some noise. Important queries are easily hand-coded or tweaked, but this does not scale. Keyword search does not know user intent, and does nothing to further aid the user in the decision-making process.
It’s roughly accurate to consider a keyword search as a single facet containing all keyword values. But with only one facet, search does not know whether “red” is actually the color “red” or “red eye reduction.” Knowing user intent for various keywords would increase search precision. Facets accomplish this.
|Blue||[x] Red Eye|
Unfortunately, this entails the use of structured data that may not be readily available — whether provided by manufacturers, purchased, or derived from prose via extraction tools. The data must also be combined from the various sources, and one must also take care to normalize redundant values such as “Red” vs. “Ruby Red,” a source of both duplicate content and user frustration. Neither users nor bots are interested in the slightly different names or inflections of names. For example, a classification query for a facet dataextraction tool might look like this:
color,bezel_color:(‘maroon’ red$), which roughly means “exactly maroon or anything that ends in red” from either attributes color or bezel color
This is more manageable than manual work classifying thousands of SKUs into various facet attributes if the data are not available, but still involves some sweat. Some very complex edge cases even require the use of a more complex script.
3. Facets And Complexity: Be Quick!
Facets are an occupational hazard for any business employing a simple ecommerce platform, or relying on inexpensive shared web hosting. The implementation is difficult, and few platforms currently have a competent faceted navigation feature. For various reasons, the computational resources associated with facets vastly outstrip what any web host is actually willing to provide for $9.95/ month, regardless of lofty promises of infinite bandwidth, support, etc. The presence of site speed itself as a search engine optimization factor further muddies the waters.
In a relational database, and certainly the ubiquitous MySQL, facet data must be stored in what’s called an “open” model, because of the unknown number of facets and variance across the range of products. Some products have a color, while color is not relevant to others. Digital cameras have a “number of megapixels” attribute, but refrigerators do not. If a business sells both, this requires more preparation and computational resources.
Proprietary database components exist to help and ease some of the hazards. However, since MySQL is the only commodity database generally available and affordable to the average business, few of these options are actually viable — and none a magic bullet. A tweaked MySQL installation will do the job, with the advantage of making use of generally available software that works in many environments.
One might succeed with a few hundred products without attention to technology, but facets shine the most in product databases with thousands of products. The result is either extremely expensive and/ or slow queries, or the need for a lot of storage for a “warehouse” that speeds up most queries by storing data in a simplified result set. Either will make you an extremely unpopular neighbor in shared hosting, and/or requires maintenance beyond the grasp of most. The computation required for counting products within a particular selection could bring a busy shared database server to its knees for a few milliseconds, even if done in a mostly optimized fashion. Moore’s law won’t solve the entire problem. Facet functionality can be computationally intensive regardless of implementation.
4. So Where Does This Leave You?
If one has the wherewithal and the capital, facets present a great opportunity. Even as the various ecommerce platforms scramble to adopt some form of this navigation, they will require more computational power than ever before just to provide navigation. Most do not provide the tools required to massage data not provided in the ideal format from manufacturers, which may manifest itself as additional unanticipated work. Dedicated vendors for faceted navigation solutions exist, such as Endeca, as do custom development houses. Both types have invested the R&D to solve the facets problem, betting that it will increasingly become a requirement.
Users may not be able to identify navigation as “faceted,” but they will certainly bounce away if they cannot find something as easily as the next site. The barriers to entry and competitiveness in the ecommerce space are increasingly fierce. Paying careful attention to a proper facets implementation will help users find what they want — and the SEO concerns therein will yield a marketing edge. So stop dragging your feet — and start responding.