🍮 Blog 7

1. What are some of the reasons that might warrant the need to use a search system on a website?
A user might use a search system in a website to find specific content within a site faster without having to go through all the sites pages. Also if the user is unable to find something in the website they can easily use the search bar to find it.

2. Why is an Information Architect interested in search systems?
Information Architecture is interested in Search systems because it aids a users when browsing a website and it’s Information Architect responsibility eases their experience and  its efficacy. Furthermore Information Architects can help better implement search systems within the overall website structure and how to optimize them.

3. Describe the core components of a search engine.
The core components of a search engine are
1) The webcrawler: combs through the pages on the internet and gathers the information for the search engine
2) The database: All of the information that a web crawler retrieves is stored in a database. Every time you use a search engine, it is this database you are searching, not the live internet.
3) The search algorithm: Each search engine interprets the terms you enter into the search box in different ways. Features that affect the search are operators, phrase searching and truncation
4) The ranking algorithm: How a search engine ranks the results of your search is possibly the most important component of a search engine. Most searches will retrieve thousands of results. Since most users only search through the first couple of pages, it is very important that the most relevant results are displayed first

Reference: practice.sph.umich.edu/micphp/files/Retrieving_Online_Info/R_O_I/CD_Master/CD/content/Search_Engines.pdf

4. What is a search zone? What are the approaches for creating search zones?
Search zones are subsets of a web site that have been indexed separately from the rest of the site’s content. When a user searches a search zone, he has, through interaction with the site, already identified himself as interested in that particular information.

Approaches for creating a search zone are the following:
– Content type
– Audience
– Role
– Subject/topic
– Geography
– Chronology
– Author
– Department/business unit

Reference: seanconnolly.ca/web/0596527349/I_0596527349_CHP_8_SECT_4.html

5. Explain the difference between recall and precision in terms of search results.

RECALL
Recall is the ratio of the number of relevant records retrieved  from a search to the total number of relevant records in the database. Recall is usually expressed as a percentage
Capture11

Precision
Precision is the ratio of the number of relevant records retrieved from the search to the total number of irrelevant and relevant records retrieved. Precision like recall is usually expressed as a percentage.
Capture22

Reference: Lecture notes – Measuring Search Effectiveness

6. Consider the following search engines:

a. Search engine A retrieves 600 documents out of a total of 8,200 documents. Out of the 600 documents retrieved, only 500 are relevant out of a total of 923 relevant documents. Calculate the recall and precision rates for the query.

Recall = 500 / ( 500 + (923-600) ) * 100 = 500 / (500 + 323) * 100 = ( 500 / 823 ) * 100 = 60.7%

Precision = 500 / (500 + (600 – 500) ) * 100 = 500 / (500 + 100) * 100 = (500 / 600) * 100 = 83%

b. Search engine B retrieves 131 documents out of a total of 8,200 documents. Out of the 131 documents retrieved, all 131 are relevant out of a total of 923 relevant documents. Calculate the recall and precision rates for the query.

Recall = 131 / (131 + 923 – 131) * 100 = 131 / (131 + 792) * 100 = (131 / 923) * 100 = 14%

Precision = 131 / (131 + 923 – 131) * 100 = 131 / (131 + 792) * 100 = (131 / 923) * 100 = 14%

c. Search engine C retrieves 700 documents out of a total of 8,200 documents. Out of the 700 documents retrieved, 0 are relevant out of a total of 923 relevant documents. Calculate the recall and precision rates for the query.

Recall = 0 / (0 + (923 – 0)) * 100 = 0 / (0 + 923) * 100 = (0 / 923) * 100 = 0

Precision = 0 / ( 0 +  (700 – 0) ) * 100 = 0 / ( 0 + 700) * 100 = (0 / 700) * 100 = 0

d. Search engine D retrieves 5,000 documents out of a total of 8,200 documents. Out of the 5,000 documents retrieved, 923 are relevant out of a total of 923 relevant documents. Calculate the recall and precision rates for the query.

Recall = 923 / ( 923 + ( 923 – 923)) * 100 = ( 923 / 923 ) * 100 =   100%

Precision = 923 / ( 923 + (5000 – 923)) * 100 = 923 / (923 + 4077) * 100 = (923 / 5000) * 100 = 18 %

7. What is the purpose of a stemming tool? Explain the difference between strong and weak stemming. Provide examples of strong and weak stemming.
Stemming tools allow users to enter a term (e.g., “lodge”) and retrieve documents that contain variant terms with the same stem (e.g., “lodging”, “lodger”).
Strong stemming: dance
– dancing
– dancer
Weak stemming: dance
– dances

Reference: Section 8.6. Query Builders – Information Architecture for the World Wide Web

8. What are two main issues to consider when displaying the results of a search?
The two main issues to consider when displaying the results of a search are:
– Which content components to display
– How to list or group those results

Reference: Lecture Notes – Search Systems

9. How many documents should you display in a search result?
The amount of documents displayed can depend on the preceding two factors. If your engine is configured to display a lot of information for each retrieved document, it might be best to have a smaller retrieval set, and vice versa

Chapter 8 Search Systems – Information Architecture for the World Wide Web

10. Describe some approaches for sorting and ranking search results for display.
Some of the approaches that can be used for sorting are:
– Sorting by alphabet
– Sorting by chronology

Some of the approaches that can be used for ranking are:
– Ranking by relevance
– Ranking by popularity
– Ranking by users’ or experts’ ratings
– Ranking by pay-for-placement

Reference: Chapter 8 Search Systems – Information Architecture for the World Wide Web

11. When sorting search results alphabetically, why is it a good idea to omit articles such as “a” and “the”?
It is best to omit ‘”a” and “the” from sorting because a user is most likely to search for example “The Little Mermaid” under “L” as opposed to under “T”

Reference: Chapter 8 Search Systems – Information Architecture for the World Wide Web

12. How does “best bets” ranking operate?
Indexing by humans is another means of establishing relevance. Keyword and descriptor fields can be searched, leveraging the value judgments of human indexers. For example, manually selected recommendations popularly known as “Best Bets”can be returned as relevant results

Reference: Section 8.6 Query Builders – Information Architecture for the World Wide Web

13. What are four key factors to consider when designing a search system interface?
The key factors to consider when designing a search system are
– what to search
– how to get results
– how to present results

Reference: Lecture Notes – Search Systems

14. What are some of the ways search system designers can help a user when no results are returned for a query?
The designer can help the user if no results are returned by
– Provide a means of revising the search
– Provide search tips or other advice on how to improve the search
– Provide a means of browsing
– Provide a human contact if searching and browsing don’t work

Reference: Lecture Notes – Search SystemsSection 8.8 Designing the Search Interface – Information Architecture for the World Wide Web

15. Describe how Google’s PageRank algorithm operates.
Google PageRank is done by factoring in how many links there are to a retrieved document. Google also distinguishes the quality of these links: a link from a site that itself receives many links is worth more than a link from a little-known site.

Reference: Section 8.6 Query Builders – Information Architecture for the World Wide Web

16. What is SERP?
SERP stands for “Search Engine Result Page”. It is the page displayed after a user search for a word on a search engine.

Screen Shot 2015-09-24 at 3.08.31 pm

17. Describe the main Boolean operators used in search engine queries.
Boolean operators used in search engine to narrow user search, :
– AND
– OR
– NOT / AND NOT / ANDNOT

Reference: Section 8.2. Search System Anatomy – Information Architecture for the World Wide Web

18. What is meant by the terms Deep and Surface Web? How might documents end up in the Deep Web?
– Surface Web: documents on the Web that search engines index and retrieve for us – Deep Web: Certain parts of the Web are either intentionally hidden or are inaccessible to search engines

Reference: Lecture Notes – Search Engine Architecture

19. What are the two primary goals when designing a search engine’s architecture?
The two primary goals to achieve while designing a search engine are
– Effectiveness (quality): retrieve the most relevant information for a given query
– Efficiency (speed): display search results as quickly as possible

Reference: Lecture Notes – Search Engine Architecture

20. Describe the search engine indexing process.
Is one of the major functions that is responsible for  building the structures that enable searching. Indexing consist in processing all the pages previously “crawled” and creating a large index containing text and metadata from each documents.

Reference: Lecture Notes – Search Engine Architecture

21. What is the purpose of a web crawler?
The primary responsibility of a web crawler is to identify and acquire documents for the search engine

Reference: Lecture Notes – Search Engine Architecture

22. What is the purpose of a web feed in terms of a search engine?
The purpose of a web feed is to access a real-time stream of documents

Reference: Lecture Notes – Search Engine Architecture

23. Describe the main processes involved with text transformation in a search engine.
The main processes involved with text transformation is to transforms documents into index terms or features

Reference: Lecture Notes – Search Engine Architecture

24. Describe the mechanisms a search engine can use to evaluate its performance.
A mechanism a search engine can use to evaluate its performance is a Performance analysis. Performance analysis involves monitoring and improving overall system performance

Reference: Lecture Notes – Search Engine Architecture

25. Design a search system for your Drupal web site.

seeach

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s