Showing posts with label search quality. Show all posts
Showing posts with label search quality. Show all posts

Giving you fresher, more recent search results

Donal Trung 8:19 AM Add Comment
Search results, like warm cookies right out of the oven or cool refreshing fruit on a hot summer’s day, are best when they’re fresh. Even if you don’t specify it in your search, you probably want search results that are relevant and recent.

If I search for [olympics], I probably want information about next summer’s upcoming Olympics, not the 1900 Summer Olympics (the only time my favorite sport, cricket, was played). Google Search uses a freshness algorithm, designed to give you the most up-to-date results, so even when I just type [olympics] without specifying 2012, I still find what I’m looking for.

Given the incredibly fast pace at which information moves in today’s world, the most recent information can be from the last week, day or even minute, and depending on the search terms, the algorithm needs to be able to figure out if a result from a week ago about a TV show is recent, or if a result from a week ago about breaking news is too old.

We completed our Caffeine web indexing system last year, which allows us to crawl and index the web for fresh content quickly on an enormous scale. Building upon the momentum from Caffeine, today we’re making a significant improvement to our ranking algorithm that impacts roughly 35 percent of searches and better determines when to give you more up-to-date relevant results for these varying degrees of freshness.
  • Recent events or hot topics. For recent events or hot topics that begin trending on the web, you want to find the latest information immediately. Now when you search for current events like [occupy oakland protest], or for the latest news about the [nba lockout], you’ll see more high-quality pages that might only be minutes old. 
  • Regularly recurring events. Some events take place on a regularly recurring basis, such as annual conferences like [ICALP] or an event like the [presidential election]. Without specifying with your keywords, it’s implied that you expect to see the most recent event, and not one from 50 years ago. There are also things that recur more frequently, so now when you’re searching for the latest [NFL scores], [dancing with the stars] results or [exxon earnings], you’ll see the latest information. 
  • Frequent updates. There are also searches for information that changes often, but isn’t really a hot topic or a recurring event. For example, if you’re researching the [best slr cameras], or you’re in the market for a new car and want [subaru impreza reviews], you probably want the most up to date information. 
There are plenty of cases where results that are a few years old might still be useful for you. [fast tomato sauce recipe] certainly saved me after a call from my wife reminded me I had volunteered to make dinner! On the other hand, when I search for the [49ers score], a result that is a week old might be too old.

Different searches have different freshness needs. This algorithmic improvement is designed to better understand how to differentiate between these kinds of searches and the level of freshness you need, and make sure you get the most up to the minute answers.

Update 11/7/11: To clarify, when we say this algorithm impacted 35% of searches, we mean at least one result on the page was affected, as opposed to when we've said noticeably impacted in the past, which means changes that are significant enough that an average user would notice. Using that same scale, this change noticeably impacts 6 - 10% of searches, depending on the language and domain you're searching on.



(Cross-posted on the Inside Search blog)

Another look under the hood of search

Donal Trung 10:04 AM Add Comment
(Cross-posted on the Inside Search blog and the Public Policy blog)

Over the past few years, we’ve released a series of blog posts to share the methodology and process behind our search ranking, evaluation and algorithmic changes. Just last month, Ben Gomes, Matt Cutts and I participated in a Churchill Club event where we discussed how search works and where we believe it’s headed in the future.

Beyond our talk and various blog posts, we wanted to give people an even deeper look inside search, so we put together a short video that gives you a sense of the work that goes into the changes and improvements we make to Google almost every day. While an improvement to the algorithm may start with a creative idea, it always goes through a process of rigorous scientific testing. Simply put: if the data from our experiments doesn’t show that we’re helping users, we won’t launch the change.



In the world of search, we’re always striving to deliver the answers you’re looking for. After all, we know you have a choice of a search engine every time you open a browser. As the Internet becomes bigger, richer and more interactive it means that we have to work that much harder to ensure we’re unearthing and displaying the best results for you.

Inside Google's search office

Donal Trung 2:11 PM Add Comment
(Cross-posted on the Inside Search Blog)

I’ve been working with Matt Cutts and Ben Gomes in the same office for over 10 years. We work on search every day, and earlier this week, we took our office talk to the stage at an event hosted by the Churchill Club. Search Engine Land’s Danny Sullivan moderated our in-depth discussion on search, how it works, and what’s ahead for us in the future. We also reminisced about first joining Google, the time my car ran out of gas as Ben and I discussed a change to the algorithm, and other great memories over the years.

Come sit inside our office for a chat about Google Search:


  • To hear more about the principles that drive changes to the algorithm and how these changes are tested and implemented, go to 15:40
  • To hear the discussion on why we don’t hand-pick results, start watching at 41:04
  • For more on my vision for the future of search, jump to 1:12:28
  • Guess who Danny thinks is the brains, looks, and brawn of this operation at 1:08 (hint: I’m the brains).

Google Commerce Search 3.0: You won’t believe it’s online shopping

Donal Trung 8:00 AM Add Comment
When we first introduced Google Commerce Search—our search solution for e-commerce websites—our focus was on improving search quality and speed to help online shoppers find what they’re looking for. Retailers such as Woodcraft Supply, BabyAge.com and HealthWarehouse.com implemented Google Commerce Search on their respective websites; Woodcraft increased search revenues 34 percent, BabyAge increased site searches 64 percent and HealthWarehouse saw online conversions increase 19 percent—and all have reported an increase in customer satisfaction.

Today we’re building on the capabilities that have proved useful to our retail partners with the third-generation Google Commerce Search (GCS). With this new version, we hope to help create an even more interactive and engaging experience for shoppers and retailers.



Here are some of the cool new features in GCS 3.0:
  • Search as You Type provides instant gratification to shoppers, returning product results with every keystroke, right from the search bar
  • Local Product Availability helps retailers bridge online and offline sales by showing shoppers when a product is also available in a store nearby—in-line with the search results
  • Enhanced Merchandising tools allow retailers to create product promotions that display in banners alongside related search queries, and to easily set query-based landing pages (for example, when a visitor types [shoes], they’re directed to a “shoe” page)
  • Product Recommendations (Labs) helps shoppers make purchase decisions by showing them what others viewed and ultimately bought

Search As You Type on www.babyage.com

With this release we're also welcoming three new retail partners: Forever21, General Nutrition Company (GNC) and L’Occitane. GNC implemented Google Commerce Search in less than a week on their mobile website, while Forever 21 and L’Occitane are currently working to implement various new features of GCS, such as Search as You Type and Local Product Availability. Here’s what Christine Burke, VP of International E-Commerce at cosmetics staple L’Occitane had to say about GCS 3.0:
L’Occitane is unique in that our beauty products center around ingredients—such as lavender, shea butter and verbena. As our customers visit our re-designed website to shop and research our products, we’re excited about the speed and accuracy of on-site search results that will be provided to us through Google Commerce Search. We’re also very excited about the possibility of the new local inventory feature, which can help us connect our customers with their favorite products in one of our 170 U.S. boutiques.
For more information, visit google.com/commercesearch.

Hide sites to find more of what you want

Donal Trung 11:00 AM Add Comment
Over the years we’ve experimented with a number of ways to help you personalize the results you find on Google, from SearchWiki to stars in search to location settings. Now there’s yet another way to find more of what you want on Google by blocking the sites you don’t want to see.

You’ve probably had the experience where you’ve clicked a result and it wasn’t quite what you were looking for. Many times you’ll head right back to Google. Perhaps the result just wasn’t quite right, but sometimes you may dislike the site in general, whether it’s offensive, pornographic or of generally low quality. For times like these, you’ll start seeing a new option to block particular domains from your future search results. Now when you click a result and then return to Google, you’ll find a new link next to “Cached” that reads “Block all example.com results.”


As always, Matt’s been gracious enough to let us use him as an example. His site is awesome, though, and we doubt many people will want to block it!

Once you click the link to “Block all example.com results” you’ll get a confirmation message, as well as the option to undo your choice. You’ll see the link whether or not you’re signed in, but the domains you block are connected with your Google Account, so you’ll need to sign in before you can confirm a block.


Once you’ve blocked a domain, you won’t see it in your future search results. (Side note: Sometimes you may have to search on a new term, rather than simply refreshing your browser, before you'll notice the domain has been successfully removed.) The next time you’re searching and a blocked page would have appeared, you’ll see a message telling you results have been blocked, making it easy to manage your personal list of blocked sites. This message will appear at the top or bottom of the results page depending on the relevance of the blocked pages.


You can see a list of your blocked sites in a new settings page, which you can access by visiting your Search Settings or clicking on the “Manage blocked sites” link that appears when you block a domain. On the settings page you can find details about the sites you’ve blocked, block new sites, or unblock sites if you’ve changed your mind.


We’re adding this feature because we believe giving you control over the results you find will provide an even more personalized and enjoyable experience on Google. In addition, while we’re not currently using the domains people block as a signal in ranking, we’ll look at the data and see whether it would be useful as we continue to evaluate and improve our search results in the future. The new feature is rolling out today and tomorrow on google.com in English for people using Chrome 9+, IE8+ and Firefox 3.5+, and we’ll be expanding to new regions, languages and browsers soon. We hope you find it useful, and we’ll be listening closely to your suggestions.

Finding more high-quality sites in search

Finding more high-quality sites in search

Donal Trung 6:50 PM Add Comment

Our goal is simple: to give people the most relevant answers to their queries as quickly as possible. This requires constant tuning of our algorithms, as new content—both good and bad—comes online all the time.

Many of the changes we make are so subtle that very few people notice them. But in the last day or so we launched a pretty big algorithmic improvement to our ranking—a change that noticeably impacts 11.8% of our queries—and we wanted to let people know what’s going on. This update is designed to reduce rankings for low-quality sites—sites which are low-value add for users, copy content from other websites or sites that are just not very useful. At the same time, it will provide better rankings for high-quality sites—sites with original content and information such as research, in-depth reports, thoughtful analysis and so on.

We can’t make a major improvement without affecting rankings for many sites. It has to be that some sites will go up and some will go down. Google depends on the high-quality content created by wonderful websites around the world, and we do have a responsibility to encourage a healthy web ecosystem. Therefore, it is important for high-quality sites to be rewarded, and that’s exactly what this change does.

It’s worth noting that this update does not rely on the feedback we’ve received from the Personal Blocklist Chrome extension, which we launched last week. However, we did compare the Blocklist data we gathered with the sites identified by our algorithm, and we were very pleased that the preferences our users expressed by using the extension are well represented. If you take the top several dozen or so most-blocked domains from the Chrome extension, then this algorithmic change addresses 84% of them, which is strong independent confirmation of the user benefits.

So, we’re very excited about this new ranking improvement because we believe it’s a big step in the right direction of helping people find ever higher quality in our results. We’ve been tackling these issues for more than a year, and working on this specific change for the past few months. And we’re working on many more updates that we believe will substantially improve the quality of the pages in our results.

To start with, we’re launching this change in the U.S. only; we plan to roll it out elsewhere over time. We’ll keep you posted as we roll this and other changes out, and as always please keep giving us feedback about the quality of our results because it really helps us to improve Google Search.

Update April 11: We’ve rolled out this algorithmic change globally to all English-language Google users and incorporated new signals as we iterate and improve. We’ll continue testing and refining the change before expanding to additional languages. You can learn more on our Webmaster Central Blog.

New Chrome extension: block sites from Google’s web search results

Donal Trung 12:00 PM Add Comment
(Cross-posted on the Google Chrome Blog)

We’ve been exploring different algorithms to detect content farms, which are sites with shallow or low-quality content. One of the signals we're exploring is explicit feedback from users. To that end, today we’re launching an early, experimental Chrome extension so people can block sites from their web search results. If installed, the extension also sends blocked site information to Google, and we will study the resulting feedback and explore using it as a potential ranking signal for our search results.

You can download the extension and start blocking sites now. It looks like this:


When you block a site with the extension, you won’t see results from that domain again in your Google search results. You can always revoke a blocked site at the bottom of the search results, so it's easy to undo blocks:


You can also edit your list of blocked sites by clicking on the extension's icon in the top right of the Chrome window.


This is an early test, but the extension is available in English, French, German, Italian, Portuguese, Russian, Spanish and Turkish. We hope this extension improves your search experience, and thanks in advance for participating in this experiment. If you’re a tech-savvy Chrome user, please download and try the Personal Blocklist extension today.

Microsoft’s Bing uses Google search results—and denies it

Donal Trung 2:56 PM Add Comment
By now, you may have read Danny Sullivan’s recent post: “Google: Bing is Cheating, Copying Our Search Results” and heard Microsoft’s response, “We do not copy Google's results.” However you define copying, the bottom line is, these Bing results came directly from Google.

I’d like to give you some background and details of our experiments that lead us to understand just how Bing is using Google web search results.

It all started with tarsorrhaphy. Really. As it happens, tarsorrhaphy is a rare surgical procedure on eyelids. And in the summer of 2010, we were looking at the search results for an unusual misspelled query [torsorophy]. Google returned the correct spelling—tarsorrhaphy—along with results for the corrected query. At that time, Bing had no results for the misspelling. Later in the summer, Bing started returning our first result to their users without offering the spell correction (see screenshots below). This was very strange. How could they return our first result to their users without the correct spelling? Had they known the correct spelling, they could have returned several more relevant results for the corrected query.



This example opened our eyes, and over the next few months we noticed that URLs from Google search results would later appear in Bing with increasing frequency for all kinds of queries: popular queries, rare or unusual queries and misspelled queries. Even search results that we would consider mistakes of our algorithms started showing up on Bing.

We couldn’t shake the feeling that something was going on, and our suspicions became much stronger in late October 2010 when we noticed a significant increase in how often Google’s top search result appeared at the top of Bing’s ranking for a variety of queries. This statistical pattern was too striking to ignore. To test our hypothesis, we needed an experiment to determine whether Microsoft was really using Google’s search results in Bing’s ranking.

We created about 100 “synthetic queries”—queries that you would never expect a user to type, such as [hiybbprqag]. As a one-time experiment, for each synthetic query we inserted as Google’s top result a unique (real) webpage which had nothing to do with the query. Below is an example:


To be clear, the synthetic query had no relationship with the inserted result we chose—the query didn’t appear on the webpage, and there were no links to the webpage with that query phrase. In other words, there was absolutely no reason for any search engine to return that webpage for that synthetic query. You can think of the synthetic queries with inserted results as the search engine equivalent of marked bills in a bank.

We gave 20 of our engineers laptops with a fresh install of Microsoft Windows running Internet Explorer 8 with Bing Toolbar installed. As part of the install process, we opted in to the “Suggested Sites” feature of IE8, and we accepted the default options for the Bing Toolbar.

We asked these engineers to enter the synthetic queries into the search box on the Google home page, and click on the results, i.e., the results we inserted. We were surprised that within a couple weeks of starting this experiment, our inserted results started appearing in Bing. Below is an example: a search for [hiybbprqag] on Bing returned a page about seating at a theater in Los Angeles. As far as we know, the only connection between the query and result is Google’s result page (shown above).


We saw this happen for multiple queries. For the query [delhipublicschool40 chdjob] we inserted a search result for a credit union:


The same credit union soon showed up on Bing for that query:


For the query [juegosdeben1ogrande] we inserted a page of hip hop bling jewelry:


And the same hip hop bling page showed up in Bing:


As we see it, this experiment confirms our suspicion that Bing is using some combination of:
or possibly some other means to send data to Bing on what people search for on Google and the Google search results they click. Those results from Google are then more likely to show up on Bing. Put another way, some Bing results increasingly look like an incomplete, stale version of Google results—a cheap imitation.

At Google we strongly believe in innovation and are proud of our search quality. We’ve invested thousands of person-years into developing our search algorithms because we want our users to get the right answer every time they search, and that’s not easy. We look forward to competing with genuinely new search algorithms out there—algorithms built on core innovation, and not on recycled search results from a competitor. So to all the users out there looking for the most authentic, relevant search results, we encourage you to come directly to Google. And to those who have asked what we want out of all this, the answer is simple: we'd like for this practice to stop.

Google search and search engine spam

Google search and search engine spam

Donal Trung 9:00 AM Add Comment
January brought a spate of stories about Google’s search quality. Reading through some of these recent articles, you might ask whether our search quality has gotten worse. The short answer is that according to the evaluation metrics that we’ve refined over more than a decade, Google’s search quality is better than it has ever been in terms of relevance, freshness and comprehensiveness. Today, English-language spam in Google’s results is less than half what it was five years ago, and spam in most other languages is even lower than in English. However, we have seen a slight uptick of spam in recent months, and while we’ve already made progress, we have new efforts underway to continue to improve our search quality.

Just as a reminder, webspam is junk you see in search results when websites try to cheat their way into higher positions in search results or otherwise violate search engine quality guidelines. A decade ago, the spam situation was so bad that search engines would regularly return off-topic webspam for many different searches. For the most part, Google has successfully beaten back that type of “pure webspam”—even while some spammers resort to sneakier or even illegal tactics such as hacking websites.

As we’ve increased both our size and freshness in recent months, we’ve naturally indexed a lot of good content and some spam as well. To respond to that challenge, we recently launched a redesigned document-level classifier that makes it harder for spammy on-page content to rank highly. The new classifier is better at detecting spam on individual web pages, e.g., repeated spammy words—the sort of phrases you tend to see in junky, automated, self-promoting blog comments. We’ve also radically improved our ability to detect hacked sites, which were a major source of spam in 2010. And we’re evaluating multiple changes that should help drive spam levels even lower, including one change that primarily affects sites that copy others’ content and sites with low levels of original content. We’ll continue to explore ways to reduce spam, including new ways for users to give more explicit feedback about spammy and low-quality sites.

As “pure webspam” has decreased over time, attention has shifted instead to “content farms,” which are sites with shallow or low-quality content. In 2010, we launched two major algorithmic changes focused on low-quality sites. Nonetheless, we hear the feedback from the web loud and clear: people are asking for even stronger action on content farms and sites that consist primarily of spammy or low-quality content. We take pride in Google search and strive to make each and every search perfect. The fact is that we’re not perfect, and combined with users’ skyrocketing expectations of Google, these imperfections get magnified in perception. However, we can and should do better.

One misconception that we’ve seen in the last few weeks is the idea that Google doesn’t take as strong action on spammy content in our index if those sites are serving Google ads. To be crystal clear:
  • Google absolutely takes action on sites that violate our quality guidelines regardless of whether they have ads powered by Google;
  • Displaying Google ads does not help a site’s rankings in Google; and
  • Buying Google ads does not increase a site’s rankings in Google’s search results.
These principles have always applied, but it’s important to affirm they still hold true.

People care enough about Google to tell us—sometimes passionately—what they want to see improved. We deeply appreciate this feedback. Combined with our own scientific evaluations, user feedback allows us to explore every opportunity for possible improvements. Please tell us how we can do a better job, and we’ll continue to work towards a better Google.

A recent improvement for Arabic searches

Donal Trung 12:28 PM Add Comment
This post is the latest in an ongoing series about how we harness the data we collect to improve our products and services for our users. - Ed.

We've learned that when performing a search on Google, people sometimes forget to separate words with spaces. Moreover, people often mistakenly repeat a letter within a single word. For instance, when writing the query [amazingly beautiful poem], you might write it as [amazingly beautiifullpoem].

These types of errors are much more common in languages like Arabic, where most of the letters are cursive. That means that the shapes of the letters change, based on the position of the letter in the word (initial, middle, final or isolated). Moreover, some Arabic letters are considered word breaks, meaning that the following letter must be in an "initial" shape. In other words, if the last letter of one word is a word break, the following word may not be separated with a space.

For example, the queries [وزارةالتعليم] and [وزارة التعليم] have an identical meaning (Ministry of Education) and they're both written in a common form for Arabic documents. But they have different, albeit correct, formats — the first query is written as a single word, while the second is written as two. Google needs to understand that while they're written differently, they mean the same thing and should yield the exact same search results. In this example, both queries were written correctly, just in different formats. But sometimes people just make errors — like repeating the same letter twice. For example, you might write [راائعة الجماال], repeating the letter "ا" twice in both query words. In this case the correct spelling should be [رائعة الجمال]. It's important that Google search recognizes your query — despite spelling errors.

To address issues like this, we recently developed a search ranking improvement that targets certain Arabic queries. Our algorithm employs rules of Arabic spelling and grammar along with signals from historical search data to decide when to leave out spaces between words or when to remove unnecessarily repeated letters. Now, when you type a query leaving out spaces or repeating a letter, we'll return better results based not only on what you typed, but also on what our algorithm understands is the "correct" query. For example, here's what happens when you type [قصيدة راائعةالجماال] ([amazingly beautiful poem] in Arabic) with repeated letters and dropped spaces between words.


As you can see, the Google results contain the corrected query, the terms قصيدة رائعة الجمال, in bold.

For most people, this might seem like a small enhancement. But for us, it’s a big change. Our tests show we've improved search for 10% of Arabic language queries. Which, when you think about it, is a lot of people.

Understanding the web to make search more relevant

Donal Trung 10:30 AM Add Comment
Last year at our second Searchology event, we announced Google Squared and Rich Snippets, two approaches to improve search by better understanding the web. Today, we're kicking off the new year with two improvements based on those technologies. First, we're applying the research behind Google Squared to add a new "answer-highlighting" feature to search, and second we're expanding Rich Snippets to include events.

Answer highlighting in search results

Most information on the web is unstructured. For example, blogs integrate paragraphs of text, videos and images in ways that don't follow simple rules. Product review sites each have their own formats, rating scales and categories. Unstructured data is difficult for a computer to interpret, which means that we humans still have to do a fair amount of work to synthesize and understand information on the web.

Google Squared is one of our early efforts to automatically identify and extract structured data from across the Internet. We've been making progress, and today the research behind Google Squared is, for the first time, making search better for everyone with a new feature called "answer highlighting."

Answer highlighting helps you get to information more quickly by seeking out and bolding the likely answer to your question right in search results. The feature is meant for searches with factual answers, such as [meet john doe director], [john lennon died], or [what was the political party of president ford]. If the pages returned for these queries contain a simple answer, the search snippet will more often include the relevant text and bold it for easy reference.

Consider the example, [empire state height]. The first search result used to look like this:

With today's improvements, the answer —1250 ft, or 381 m — is highlighted right in the search result:

This kind of quick answer only makes sense for certain kinds of searches. For example, the answer to [history of france] can't readily fit in a search snippet. However, for the kinds of information you can easily put in a table, we've been able to take what we've learned from Google Squared to make search better for a wide range of queries. Answer highlighting is rolling out during the next couple days on google.com in English.

Rich Snippets for events

Sometimes the easiest way to understand somebody is by having a conversation. The web is similar. As much as we're happy with the progress we're making with Google Squared, we also appreciate that a great way to understand web pages is to simply ask webmasters to teach us (and other search engines) about their content. To that end, we continue to make improvements to our search results with Rich Snippets, enabling webmasters to annotate pages with structured data in a standard format.

So far we've launched improved search result snippets for reviews and people. When your search results contain web pages with review information, you might see the number of user reviews on the page and the average rating in the search result. When your search contains a public profile page about a person from a social networking site, you may see the person's location and occupation, or a list of her friends.

Today, we're announcing support for a new Rich Snippets format for events. The new format improves search results by including links to specific event names, dates and locations. Here's an example of a new event result from livenation.com if you search for [irving plaza]:


The new result format provides a fast and convenient way to identify pages with events and click directly to the ones you find interesting. If you're into Hip Hop Karaoke, you can quickly find out when and where the next show is in Irving Plaza, and click for more info. We've been working with a few sites to ramp them up for our initial launch, but it will take time for other webmasters to start implementing the new markup. Check out our blog post on Webmaster Central for more details.

Helping computers understand language

Donal Trung 11:51 AM Add Comment
This post is the latest in an ongoing series about how we harness the data we collect to improve our products and services for our users. - Ed.

An irony of computer science is that tasks humans struggle with can be performed easily by computer programs, but tasks humans can perform effortlessly remain difficult for computers. We can write a computer program to beat the very best human chess players, but we can't write a program to identify objects in a photo or understand a sentence with anywhere near the precision of even a child.

Enabling computers to understand language remains one of the hardest problems in artificial intelligence. The goal of a search engine is to return the best results for your search, and understanding language is crucial to returning the best results. A key part of this is our system for understanding synonyms.

What is a synonym? An obvious example is that "pictures" and "photos" mean the same thing in most circumstances. If you search for [pictures developed with coffee] to see how to develop photographs using coffee grinds as a developing agent, Google must understand that even if a page says "photos" and not "pictures," it's still relevant to the search. While even a small child can identify synonyms like pictures/photos, getting a computer program to understand synonyms is enormously difficult, and we're very proud of the system we've developed at Google.

Our synonyms system is the result of more than five years of research within our web search ranking team. We constantly monitor the quality of the system, but recently we made a special effort to analyze synonyms impact and quality. Most of the time, you probably don't notice when your search involves synonyms, because it happens behind the scenes. However, our measurements show that synonyms affect 70 percent of user searches across the more than 100 languages Google supports. We took a set of these queries and analyzed how precise the synonyms were, and were happy with the results: For every 50 queries where synonyms significantly improved the search results, we had only one truly bad synonym.

An example of a bad synonym from this analysis is in the search [dell system speaker driver precision 360], where Google thinks "pc" is a synonym for precision. Note that you can still see that on Google today, because while we know it's a bad synonym, we don't typically fix bad synonyms by hand. Instead, we try to discover general improvements to our algorithms to fix the problems. We hope it will be fixed automatically in some future changes.

We also recently made a change to how our synonyms are displayed. In our search result snippets, we bold the terms of your search. Historically, we have bolded synonyms such as stemming variants — like the word "picture" for a search with the word "pictures." Now, we've extended this to words that our algorithms very confidently think mean the same thing, even if they are spelled nothing like the original term. This helps you to understand why that result is shown, especially if it doesn't contain your original search term. In our [pictures developed with coffee] example, you can see that the first result has the word "photos" bolded in the title:


(Note that because our synonyms depend on the other words in your search and use many signals, you won't necessarily always see the word "photos" bolded for "pictures", only when our algorithms think it is useful and important to bold.)

We use many techniques to extract synonyms, that we've blogged about before. Our systems analyze petabytes of web documents and historical search data to build an intricate understanding of what words can mean in different contexts. In the above example "photos" was an obvious synonym for "pictures," but it's not always a good synonym. For example, it's important for us to recognize that in a search like [history of motion pictures], "motion pictures" means something special (movies), and "motion photos" doesn't make any sense. Another example is the term "GM." Most people know the most prominent meaning: "General Motors." For the search [gm cars], you can see that Google bolds the phrase "General Motors" in the search results. This is an indication that for that search we thought "General Motors" meant the same thing as "GM." Are there any other meanings? Many people can think of the second meaning, "genetically modified," which is bolded when GM is used in queries about crops and food, like in the search results for [gm wheat]. It turns out that there are more than 20 other possible meanings of the term "GM" that our synonyms system knows something about. GM can mean George Mason in [gm university], gamemaster in [gm screen star wars], Gangadhar Meher in [gm college], general manager in [nba gm] and even gunners mate in [navy gm].

Here are screenshots of those disambiguations of GM in action:


As a nomenclatural note, even obvious term variants like "pictures" (plural) and "picture" (singular) would be treated as different search terms by a dumb computer, so we also include these types of relationships within our umbrella of synonyms. Pictures/picture are typically called stemming variants, which refers to the fact that they share the same word stem, or root. The same systems that need to understand that "pictures" and "photos" mean the same thing also need to understand that "pictures" and "picture" mean the same thing. This is something that is even more obvious to a human but is also still a difficult task for a computer. An example of how this is difficult are the words "animal" and "animation," which share the same stem and etymology, but don't mean the same thing in standard use. Another tricky case that is very dependent on the other words in the query is "arm" vs. "arms." Arms might seem like the plural of arm, but consider how it might be used in a search: [arm reduction] vs. [arms reduction]. Google search is smart enough to know that the former is about removing fat from one's arm, and the latter is about reducing stockpiles of weaponry, and that arm/arms are dangerous synonyms in that case because they would change the meaning. These subtle differences between words that seem related is what makes synonymy very hard to get right.

Here are some other examples of synonyms we thought were interesting:

[song words], "lyrics" is bolded for "words".
[what state has the highest murder rate], "homicide" is bolded for "murder".
[himalayan kitten breeder], Google knows that "cat breeder" is the same as "kitten breeder".
[dura ace track bb axle njs], Google knows that "bb" here means "bottom bracket".
[software update on bb color id], "blackberry is bolded for "bb".
[bb cream dark], Google knows here that bb means "blemish balm".
[southeastern usa bb fitness & figure], "bodybuilding" is bolded for "bb."

Lastly, language is used with as much variety and subtlety as is present in human culture, and our algorithms still make mistakes. We flinch when we find such mistakes; we're always working to fix them. One of the best ways for us to discover these problems is to get feedback from real users, which we then use to inspire improvements to our computer programs. If you have specific complaints about our synonyms system, you can post a question at the web search help center forum or you can tweet them with the hash tag #googlesyns. You can also turn off a synonym for a specific term by adding a "+" before it or by putting the words in quotation marks.