Given a query, how does Google determine which documents to display?

Question

Given a query, how does Google determine which documents to display?

580 views Asked by Edward Gong At 04 December 2013 at 22:20

I'm curious about the intricacies of the search. I understand that tf-idf is used to evaluate the importance of a word in a document within a corpus. I also understand that the Page Rank algorithm ranks the relative importance of a web page by using its probability of being viewed as a heuristic. However, I'm not sure how the two interplay when given a specific query.

Intuitively, I would think that a language model would be used to rank documents, and this relates to tf-idf. But how does the Page Rank algorithm relate to the document retrieval?

Original Q&A

There are 1 answers

**bdean20** · Answer 1 · 2013-12-05T05:28:26+00:00

Ranking and retrieval are separate functions of a search engine.

The purpose of the retrieval component is to decide which documents are worth ranking. The purpose of the ranking component is to decide which documents are most relevant to the query. Page Rank is applied in the ranking phase as one of the factors to determine whether a query is relevant. This works because of the context of a web search engine being that you typically wish to search for web pages that other people have also found useful.

You can also use the Page Rank in deciding whether to rank the document at all, but I believe Google's approach focuses around giving stronger or weaker Page Ranks (based on incoming and outgoing links and the strengths of those links) rather than filtering.

In terms of answering the title question...
It's very complicated, and I don't work for them, so this is mostly just speculation, but I believe their system is built around a few fundamental concepts.

Is the query correct?
spell-checking, query-suggestion
Is the content on this page relevant to the query?
tf-idf and others**, phrase/proximity search
Does this page have a high reputation?
page rank, feedback from google's analytics
Do the links to this page match the content in the query?
link analysis
Does this person (or people like them) want to see the content on this page?
personalisation, localisation, etc
Are there already too many results from the one website?
diversification, uniquing
What does the user mean by this query?
relevance feedback, stemming, query expansion)

I'm sure there are more, but that's just off the top of my head.

** There are a lot of different methods that have been used for information retrieval. If you already know TF-IDF, BM25 would be a good one to look at next.

Note: If you have a different search context, these methods may not work very well. There are some types of search that are better suited to different models. For example, if your data is structured according to a schema then your best bet is to use a database.

TechQA.

Given a query, how does Google determine which documents to display?

There are 1 answers

Related Questions in SEARCH-ENGINE

Related Questions in GOOGLE-SEARCH

Related Questions in TF-IDF

Related Questions in PAGERANK

Popular Questions

Popular Tags

Trending Questions