Bill Slawski is the Director of Internet Marketing for Key Relevance, Inc., and writes about search engines and search-related patents on his blog at seobythesea.com. With an undergraduate degree in English from the University of Delaware, and a Juris Doctor degree from Widener University School of Law, he began promoting web pages in 1996.
SMS: Bill, you are considered to be a “patent guru” by most experts in the search marketing industry. How did you come to develop such a unique approach to search engine marketing research and how does this fit in to the service you provide for your clients?
Bill: My interest in search-related patent filings came from a desire to try to get as much information as possible directly from the search engines themselves, and to try to understand their perspective on search and the web. I dislike the term “guru,” because it’s misleading. While I often write at my blog about the patent filings that I find, I urge people to read the documents themselves and form their own opinions about what those documents say. I look at patent filings because they are primary sources of information that haven’t been filtered by opinion or folklore.
Search engines do have trade secrets that aren’t revealed in patent applications, and many patents that are filed cover processes and methods that may never be developed, but it’s possible to glean some insights into the assumptions and thought processes that go into the development of a patent. It’s also pretty exciting to see a new program or method come out from one of the search engines that had been described in a patent application published days or months before.
I do consider looking through search-related patents as a due diligence requirement for providing services to clients, in that the information contained in those patents is publicly available, and they come straight from the search engines themselves. The act of reading patents often raises more questions than answers, but knowing that those questions exist can be helpful.
For instance, a couple of recent patent filings from Microsoft explored a number of ways that they might rank images for image search, and create an image score that could influence the rankings of web pages in web search. It’s impossible to say how many of the processes they described are actually being used today, but the patent filings provided some insights and questions to explore about how search engines might view images, beyond just looking at alt text and text that might be associated with pictures.
SMS: We’ve seen blended search. We’ve seen some personalization. Based on the patent fillings you have seen, what are some of the new features search engines might have in store for us in the not-so-distant future?
Bill: Some interesting features that I’ve seen described in search-related patents? Well, one search feature that I thought was pretty interesting was an “inversion search” from Microsoft that would allow you to enter the URL for a web page, and see and search upon keywords that are related to that page. I think this would be pretty helpful to searchers who don’t know much about a topic, but wanted to explore it more deeply.
Another recent feature, from Google, is the creation of a database containing information found from different tables on the web as part of their WebTables project. Many web pages contain tables filled with data on different subjects – all the way from baseball statistics, to scientific data, to historical information, and many other topics. If you’re doing research, and you want facts and figures to back up that research, being able to find that kind of data might be very helpful, but may also be very difficult through traditional search.
Google has also recently presented their vision of an advanced local search for mobile devices that could use your changing position to update distances to locations and display information from specialized templates you select to show – for example – real estate prices in an area, cost of living information for those locations, weather and traffic data, fuel costs, political boundaries, and more.
Another mobile program – this one from Yahoo! – would allow you to create user-defined private maps, that you could share with friends, to help locate each other on those maps. This is a feature that could be useful to members of a group in places like resorts, shopping centers, and city locations. You would also be able to access tags that people leave about specific locations, and local search information relevant to the areas you visit.
The patents and white papers from the major commercial search engines are filled with many other possibilities – some will likely be developed, while others will remain intellectual property on paper only. Some interesting times are ahead for us from projects being worked on by the search engines.
SMS: Of course, every search marketer out there is interested in having their website indexed and ranked well. Right now, we know that factors like internal website structure, backlinks, and the age of a website are important in ranking. From what you have seen in the patents, what other factors can we expect to be added to the mix?
Bill: There are a number of different ways to classify ranking algorithms from search engines, but one that I find helpful is to break them down into link-based, content-based, and user-behavior-based signals. I think using this kind of classification can help us identify some of the new ranking factors that search engines may be adding to rank pages. The lists below include some signals that search engines may be looking at now, as well as others they may be looking at in the future.
Link-based signals – begin by looking at the number and importance of links between pages, but also consider:
- the age of links pointing to pages
- the frequency of growth and loss of links to a page
- the number of broken or redirected links on pages
- the use of anchor text in links
- the use of “related” anchor text in links as determined by how frequently the text within those anchors co-occur with query terms that the page may be found for in searches at the search engine
- the genre of sites that links originate from (such as blogs or news articles or web pages)
- the age of domains that incoming links originate from
- the number of links that come from pages that might be included in the top results from a search for a certain query
Content-based signals – begin by looking at the words that appear upon pages, but also consider:
- where those words appear within the layout of a page
- how those words are formatted through HTML
- a reading level score for a page
- spelling and grammar and sentence structure
- whether “related” words and phrases appear on the same page, as defined by how frequently those words co-occur with the query terms on other pages upon the web
- how facts about specific people and places and things are formatted and presented
- the “freshness” of news results
- what kinds of features and meta information about images might exist on the pages
- the rate of change of content upon a page and site
The content a search engine examines might begin to focus to a finer level of granularity to look at individual segments of pages, or expand out to a greater range to explore how related the content of a page might be to a whole site or to a set of inter-related sites.
User-behavior-based signals – consider how people use websites by looking at such things as click-throughs in SERPs for specific queries and during query sessions covering visits to multiple pages, but also consider:
- query refinements by searchers
- bookmarking of pages
- tagging of pages
- browsing activity, including how long someone spends upon a page, how far they scroll down a page, and where they move their mouse pointer
- selection and use of alerts
- subscription to RSS feeds
- searching activity in vertical searches such as maps or news or images
- rankings and ratings of businesses offering goods and services
- sentiment analysis of reviews
While some search patent filings describe possible signals that might play a role in delivering the right pages to the right people and meeting a searcher’s intentions based upon a small number of words typed into a search box, it really is difficult to state with any certainty what might be added to the mix.
SMS: There has been a lot of talk about making search results more social, harnessing the “wisdom of the crowd.” What are some of the less-explicit ways search engines might be collecting user behavior in determining the popularity of a given page or a website? How might this affect the way we optimize websites?
Bill: Once search engines stopped looking only at the words that appeared upon web pages, and started looking at things such as the links to pages, they began to become social, harnessing the wisdom of the crowd that published links to the web. Early PageRank patents and papers do mention that actual user behavior could possibly play a role in how web pages are ranked by a search engine.
Search engines collect a lot of information about searchers and searching from query sessions found in their log files, from tracking browsing behavior with toolbars, from search and web histories collected when someone is logged into personalized search, from data purchased from ISPs, from profiles created on social sites, and many other ways.
Search engines may create profiles about individual searchers from all of these sources, and about groups of searchers that share some common interests, but they may also create profiles for specific websites and for specific query terms.
Profiles for sites could be developed from analytics tools, from appearances in search results for specific terms, from visitor interactions as measured through toolbars, personalized web histories, bookmarks or other annotations, and in other ways.
Profiles for query terms might be developed by looking at the kinds of results showing up for those terms, how fresh or old those results might be, how often the terms appear in searches, what kinds of pages are selected during those searches, how the terms might be changed or refined by a searcher during a search session, etc.
The profiles developed for searchers (or groups of searchers) that may appear to share some common interests, for websites, and for query terms could play a role in which pages are presented to different searchers. For example, if a search engine knows a specific searcher likes baseball and lives in Ohio, when he or she types in a search for the word “Reds,” the search engine may assume they are more likely looking for information about the Cincinnati Reds than about communism. Therefore, they may be shown search results for web pages, blogs, and newspaper articles that other searchers interested in baseball and located in Ohio have chosen previously when typing that query into the search engine.
As far as optimizing a site when search engines may be considering user behavior more, it still helps to know something about the audience for your site, what they are interested in, and what words they will likely choose to find your site.
SMS: Let’s talk about social media. Clearly, social media profiles hold a lot of valuable information that can be used in refining search results. What are some of the ways you believe that search engines like Google and Yahoo! are trying to incorporate these “social graphs” into SERPs?
Bill: Looking at social media profiles is only a part of developing search around the interests of people who use a search engine, and they can be a noisy source of information. Whenever you collect information that has been developed for one use, and try to use it for another, you run the risk of misapplying that information. For instance, when someone tags a page in a social bookmarking setting, the words that they use may have more to do with the relationship between them and the page that they tag rather than the content of the page itself, such as a “toread” tag.
It may be more beneficial for a search engine to look at the actions of an individual to learn about their interests than to try to gather that information from a profile page. If someone frequently visits and searches for baseball sites, their activities may be a stronger indication of their interest in baseball than a listing of that interest in a MySpace profile.
Patent filings from Google, Microsoft, and Yahoo! do describe how they might create implicit profiles for searchers (and groups of searchers) who appear to share some common interests, based upon their activities on the web, and use that information when presenting search results. These activities may involve more than just browsing and searching, looking at user reviews, annotations, bookmarks, tags, and other activities that might take place at social media sites, and it’s likely that those actions are more important than stated interests on a profile page.
SMS: Many of the new search technology developments have privacy implications. How do you see the struggle between search engines and privacy advocates unfolding?
Bill: The web is forcing us to think carefully about some of our notions regarding privacy. Information that has been publicly accessible, yet hard to access, is becoming easier to look at on the web (such as deed and property information, civil and criminal case information, news from small town newspapers, etc.). Publishing through blogs and content management systems allows for individuals and businesses to place more information online than ever before. Search engines can provide access to a lot of information that might have been difficult to find previously.
Search engines are also collecting a lot of information about individuals and what they look for and browse online. While they may make some of the information that they collect about us available to view through places like Google’s Web History feature, there are limits to our ability to control what information a search engine collects about what we do online.
Organizations like The Center for Democracy and Technology, the Electronic Frontier Foundation, the Electronic Privacy Information Center, and the ACLU provide a fair amount of information on many issues surrounding privacy on the web and on practices that search engines have adopted. It’s worth spending some time with these resources.
SMS: If someone doing search marketing wants to get more involved in looking at and researching patent filings, can you give us an idea of a couple of places to check on a regular basis to keep up to date (aside from your blog of course)?
Bill: It’s not a very large niche, but I can recommend a couple of other people who do write about search-related patents: David Harry at The Firehorse Trail (www.huomah.com) and Stephen E. Arnold at Beyond Search (arnoldit.com/wordpress).