Driving
Traffic to Your Site: Understanding and Using Search Engines for Search and
Site Optimization
Remember when you first discovered the World Wide Web and asked the question "How do I find pages about the topic I'm looking for?" The solution to this problem came in the form of powerful search engines that index and catalog the Internet. Search of the WWW is all about finding information on topics of interest, about new sites, and even relocating sites that you once visited, but are now lost to you in cyberspace. In this chapter we consider search in depth, focusing on identifying different types of search engines and how they work, commands and strategies for more effective searches, and finally how to improve search rankings for your Web site by building in specific information that search engines look for.
Part 1: An Introduction to Web Search Sites
Web Search Sites include portal and search engine sites that provide search or directory information about the World Wide Web. Many types of search sites exists, including not only a general search of the entire web, but those are focused on search of specific information categories such as news items, children sites, people, acronyms, apartments, attorneys, company information, computer information, definitions, domain names, maps, ISP's, jobs, legislative voting records, mailing lists, newsgroups, online events, stock filings, and zip codes, to name a few. In each of these cases, the World Wide Web has been searched and cataloged for specific types of information and the search engine queries the database.
The term "search engine" refers to three components of the search and retrieval process: (1) The Web crawler or web spider, (2) the search index in database of information that supports the search index, and (3) the search engine that queries the index and reports information from the database. The search engine uses a computer program (called a spider, crawler, or bot ) that is sent to a specific site on the Internet and copies the information found there. This file is then analyzed to identify the information contained by the site. The search engine indexes such things as the site title, keywords and descriptions identified in META tags, and links and files referenced by the site (URLs, ftp addresses, and graphics files referenced). The search engine may then return to the site to copy and further index pages referenced by the links.
In performing this search process, web crawlers or web spiders either engage in shallow or deep searches of the website. The difference is in the number of levels of directories considered in the search of the site. When a website is setup on a server, a directory path is defined that holds the files for the website. Shallow spiders report and index either the URL given or all pages in the same directory, while deep spiders evaluate all of the pages in all directories of site. This an important distinction because deep spiders will index all pages in the server's sub-directories even though they may not be linked from the home page or intended for general viewing. When setting up your server, be careful to include only desired subdirectories and files in your server's accessible path.
"Search engine" is a term that is also used to refer to what are properly called "directories" and "search engines". The distinction being that a directory is organized by subject and most sites are suggested by users and manually cataloged into the directory. Yahoo is a directory. Sites are registered, review, and included into the database. Directories typically do not the employ spiders that visit and catalog your Web site. Search engines on the other hand employ an automated process that directs the spider to visit, information is analyzed and cataloged into the database for query. Hybrid search engines are those that have an associated directory along with the search database.
How Do Search Engines Rank Pages?
Many search engines provide relevance or confidence rankings that indicate how closely they believe a specific site matches your search query. Relevance ratings differ by search engine, but are generally based on an evaluation of popularity of the site, keyword frequency and keyword location within the document. Keywords in the title, as part of the name of linked files, text, or graphics will increase relevance ratings. The repetition of keywords for the purpose of increasing relevancy ratings is called "spamming", and if detected by the search engine, will often result in exclusion of the page from the index. For the Web master, improving search engine placement is a worthy goal that leads to increased traffic and sales. In part three of this chapter we will discuss ways to build a Web site that increases rankings and placement.
Introduction to Top Search Engines
What qualifies as a top search engine? One that provides the most complete index of sites and at the same time provides the most accurate search in producing valued results. AltaVista and FAST Search, each indexing more than 200 million pages.
Perhaps the best reference for search engine information is found at http://searchenginewatch.com/sereport/bysubject.html . Search Engine Watch provides an excellent overview of 21 search engine sites. This overview is included below.
"AOL Search
http://search.aol.com/
AOL Search allows its members to search across the web and AOL's own content from one place. The "external" version, listed above, does not list AOL content. The main listings for categories and web sites come from the Open Directory (see below). Inktomi (see below) also provides crawler-based results, as backup to the directory information. Before the launch of AOL Search in October 1999, the AOL search service was Excite-powered AOL NetFind.
AltaVista
http://www.altavista.com/
AltaVista is consistently one of the largest search engines on the web, in terms of pages indexed. Its comprehensive coverage and wide range of power searching commands makes it a particular favorite among researchers. It also offers a number of features designed to appeal to basic users, such as "Ask AltaVista" results, which come from Ask Jeeves (see below), and directory listings primarily from the Open Directory. AltaVista opened in December 1995. It was owned by Digital, then run by Compaq (which purchased Digital in 1998), then spun off into a separate company which is now controlled by CMGI.
Ask Jeeves
http://www.askjeeves.com/
Ask Jeeves is a human-powered search service that aims to direct you to the exact page that answers your question. If it fails to find a match within its own database, then it will provide matching web pages from various search engines. The service went into beta in mid-April 1997 and opened fully on June 1, 1997. Results from Ask Jeeves also appear within AltaVista.
Direct Hit
http://www.directhit.com/
Direct Hit is a company that works with other search engines to refine their results. It does this by monitoring what users click on from the results they see. Sites that get clicked on more than others rise higher in Direct Hit's rankings. Thus, the service dubs itself a "popularity engine." Direct Hit's technology is currently best seen at HotBot. It also refines results at Lycos and is available as an option at LookSmart and MSN Search. The company also crawls the web and refines this database, which can be viewed via the link above.
Excite
http://www.excite.com/
Excite is one of the most popular search services on the web. It offers a medium-sized index and integrates non-web material such as company information and sports scores into its results, when appropriate. Excite was launched in late 1995. It grew quickly in prominence and consumed two of its competitors, Magellan in July 1996, and WebCrawler in November 1996. These continue to run as separate services.
FAST Search
http://www.alltheweb.com/
Formerly called All The Web, FAST Search aims to index the entire web. It was the first search engine to break the 200 million web page index milestone. The Norwegian company behind FAST Search also powers the Lycos MP3 search engine. FAST Search launched in May 1999.
Go / Infoseek
http://www.go.com/
Go is a portal site produced by Infoseek and Disney. It offers portal features such as personalization and free e-mail, plus the search capabilities of the former Infoseek search service, which has now been folded into Go. Searchers will find that Go consistently provides quality results in response to many general and broad searches, thanks to its ESP search algorithm. It also has an impressive human-compiled directory of web sites. Go officially launched in January 1999. It is not related to GoTo, below. The former Infoseek service launched in early 1995.
GoTo
http://www.goto.com/
Unlike the other search engines, GoTo sells its listings. Companies can pay money to be placed higher in the search results, which GoTo feels improves relevancy. Non-paid results come from Inktomi. GoTo launched in 1997 and incorporated the former University of Colorado-based World Wide Web Worm. In February 1998, it shifted to its current pay-for-placement model and soon after replaced the WWW Worm with Inktomi for its non-paid listings. GoTo is not related to Go, above.
http://www.google.com/
Google is a search engine that makes heavy use of link popularity as a primary way to rank web sites. This can be especially helpful in finding good sites in response to general searches such as "cars" and "travel," because users across the web have in essence voted for good sites by linking to them.
HotBot
http://www.hotbot.com/
Like AltaVista, HotBot is another favorite among researchers due to its large index of the web and many power searching features. In most cases, HotBot's first page of results comes from the Direct Hit service (see above), and then secondary results come from the Inktomi search engine, which is also used by other services. It gets its directory information from the Open Directory project (see below). HotBot launched in May 1996 as Wired Digital's entry into the search engine market. Lycos purchased Wired Digital in October 1998 and continues to run HotBot as a separate search service.
Inktomi
http://www.inktomi.com/
Originally, there was an Inktomi search engine at UC Berkeley. The creators then formed their own company with the same name and created a new Inktomi index, which was first used to power HotBot. Now the Inktomi index also powers several other services. All of them tap into the same index, though results may be slightly different. This is because Inktomi provides ways for its partners to use a common index yet distinguish themselves. There is no way to query the Inktomi index directly, as it is only made available through Inktomi's partners with whatever filters and ranking tweaks they may apply.
LookSmart
http://www.looksmart.com/
LookSmart is a human-compiled directory of web sites. In addition to being a stand-alone service, LookSmart provides directory results to MSN Search, Excite and many other partners. AltaVista provides LookSmart with search results when a search fails to find a match from among LookSmart's reviews. LookSmart launched independently in October 1996, was backed by Reader's Digest for about a year, and then company executives bought back control of the service.
Lycos
http://www.lycos.com/
Lycos started out as a search engine, depending on listings that came from spidering the web. In April 1999, it shifted to a directory model similar to Yahoo. Its main listings come from the Open Directory project, and then secondary results come from either Direct Hit or Lycos' own spidering of the web. In October 1998, Lycos acquired the competing HotBot search service, which continues to be run separately.
MSN Search
http://search.msn.com/
Microsoft's MSN Search service is a LookSmart-powered directory of web sites, with secondary results that come from AltaVista. RealNames and Direct Hit data is also made available. MSN Search also offers a unique way for Internet Explorer 5 users to save past searches.
Netscape Search
http://search.netscape.com/
Netscape Search's results come primarily from the Open Directory and Netscape's own "Smart Browsing" database, which does an excellent job of listing "official" web sites. Secondary results come from Google. At the Netscape Netcenter portal site, other search engines are also featured.
Northern Light
http://www.northernlight.com/
Northern Light is another favorite search engine among researchers. It features one of the largest indexes of the web, along with the ability to cluster documents by topic. Northern Light also has a set of "special collection" documents that are not readily accessible to search engine spiders. There are documents from thousands of sources, including newswires, magazines and databases. Searching these documents is free, but there is a charge of up to $4 to view them. There is no charge to view documents on the public web -- only for those within the special collection. Northern Light opened to general use in August 1997.
Open Directory
http://dmoz.org/
The Open Directory uses volunteer editors to catalog the web. Formerly known as NewHoo, it was launched in June 1998. It was acquired by Netscape in November 1998, and the company pledged that anyone would be able to use information from the directory through an open license arrangement. Netscape itself was the first licensee. Lycos and AOL Search also make heavy use of Open Directory data, while AltaVista and HotBot prominently feature Open Directory categories within their results pages.
RealNames
http://www.realnames.com/
The RealNames system is meant to be an easier-to-use alternative to the current web site addressing system. Those with RealNames-enabled browsers can enter a word like "Nike" to reach the Nike web site. To date, RealNames has had its biggest success through search engine partnerships. In particular, it is strongly featured in results at AltaVista, Go and MSN Search.
Snap
http://www.snap.com/
Snap is a human-compiled directory of web sites, supplemented by search results from Inktomi. Like LookSmart, it aims to challenge Yahoo as the champion of categorizing the web. Snap launched in late 1997 and is backed by Cnet and NBC.
WebCrawler
http://www.webcrawler.com/
WebCrawler has the smallest index of any major search engine on the web -- think of it as Excite Lite. The small index means WebCrawler is not the place to go when seeking obscure or unusual material. However, some people may feel that by having indexed fewer pages, WebCrawler provides less overwhelming results in response to general searches. WebCrawler opened to the public on April 20, 1994. It was started as a research project at the University of Washington. America Online purchased it in March 1995 and was the online service's preferred search engine until Nov. 1996. That was when Excite, a WebCrawler competitor, acquired the service. Excite continues to run WebCrawler as an independent search engine.
Yahoo
http://www.yahoo.com/
Yahoo is the web's most popular search service and has a well-deserved reputation for helping people find information easily. The secret to Yahoo's success is human beings. It is the largest human-compiled guide to the web, employing about 150 editors in an effort to categorize the web. Yahoo has over 1 million sites listed. Yahoo also supplements its results with those from Inktomi. If a search fails to find a match within Yahoo's own listings, then matches from Inktomi are displayed. Inktomi matches also appear after all Yahoo matches have first been shown. Yahoo is the oldest major web site directory, having launched in late 1994. "
Source: http://searchenginewatch.internet.com/links/Major_Search_Engines/The_Major_Search_Engines/index.html
MetaSearch Engines
Search engines that employee " meta search" operate under the assumption that if one search engine is good, then using multiple search engines at once is even better. MetaSearch engines include Cyber 411 Inference Find Dogpile MetaFind SavvySearch and MetaCrawler As an example, MetaCrawler is the most popular meta search engine and is part of the go2net.com network of companies that includes MetaCrawler and DogPile. MetaCrawler is the number one metasearch engine and differs from traditional search engines in that it is not have its own database. Rather, MetaCrawler relies upon other search engines better query. Results of these queries are organized, ranked using the MetaCrawler relevance criteria, and then presented to the user. MetaCrawler queries such search engines as Lycos, Infoseek, WebCrawler, Excite, AltaVista, Thunderstone, The Mining Company, Looksmart, and Yahoo. This approach provides verification through multiple sources, each of which attempts to produce quality results. In addition, the use of a meta search engine is based on a better view of the web as indexed by multiple search engines.
All-In-One Search Page posts fields that search virtually any engine, directory, or specialized site you can think of. However, it's a megasearch--not a metasearch--tool, so it searches each engine separately.
Ask Jeeves lets you enter natural-language queries, and then posts additional plain-English questions that help you focus the search. Great for search rookies.
Dogpile conducts Web metasearches using 14 different engines, but it doesn't eliminate the duplicates.
EZ-Find at the River simply searches several engines--all of those we compared here except HotBot--one at a time.
Find-It is like EZ-Find in that it puts multiple engines at your disposal but doesn't search them simultaneously. Use its Advanced Search feature for the best results. A similarly named site, Find It, offers access to more engines but works the same way.
Mamma applies its metasearches to few sites (only seven), but it combines the results and reorders them using its own relevancy ratings.
MetaCrawler, one of the oldest and best metasearch tools on the Web, collates pages found from a slew of sites, including Lycos, Infoseek, Excite, and AltaVista. The MiniCrawler, which operates in a discreet window easily tucked away on the desktop, is a winner.
MetaFind works just like Dogpile except it doesn't sift through Usenet, newsfeeds, or FTP sites. MetaFind omits descriptions of found pages.
SavvySearch comes courtesy of the Colorado State University. This metasearcher's latest interface lets you pick the search engines; then it nicely organizes the results and drops duplicates. (source: http://www.cnet.com/Content/Reviews/Compare/Search2/ss13.html )
According to Online Columnist Gregg Notess, MetaSearch engines are not without problems: "All have significant limitations as a comprehensive search tool. They are subject to time outs, when search processing takes too long. Since most only retrieve the top 10-50 hits from each search engine, the total number of hits retrieved may be considerably less than found by doing a direct search on one of the search engines. Advanced search features on individual search engines are not usually available. Phrase and Boolean searching may not be properly processed or available. None of them search Northern Light and few search any of the Inktomi databases such as HotBot." ( http://www.notess.com/search/multi/ )
Which Search Engine is Best?
Many PC magazines have run "Shoot Outs" that compare the accuracy and coverage of the web. PC magazine recently compared search engines and found that search engine of choice varied depending on the purpose. But more than that, there are general and topic specific searches. For general search, Yahoo!, Northern Light, HotBot, MetaCrawler and Ask Jeeves for Kids seem to fill the bill. However for specialty topics, there is a search index for every topic. One list of specialty topic search sites is included at the end of this chapter.
Directory search: Yahoo!
Yahoo is a directory search, that is created by editors
that review, categorize, and organize their directory based on the quality of
sites reviewed. As a directory, Yahoo
is the best, providing browse or search
that produces high-quality results.
Research: Northern Light
Northern Light offers full-text based material from a
collection of more than 4500 publications on a pay for view basis, as well as
one of the best web search and indexing tools.
All-purpose search: HotBot
HotBot provides the easiest to use interface for advanced search features, and produces excellent results. Owned by Lycos and powered by the Inktomi search engine, HotBot has a large web index that can be searched with the aid of filters for media type, page depth and pornographic content.
Kids' search: Ask Jeeves for Kids
Ask Jeeves for kids ( http://www.ajkids.com ) provides an easy to use interface for kids that not only helps them to limit the scope of their search, but automatically limits the response so that they are not overwhelmed.
Metasearch: MetaCrawler
MetaCrawler searches AltaVista, Excite, Infoseek, InfoSpace’s Ultimate Directory, Lycos, Thunderstone, WebCrawler, and Yahoo! One of the helpful features of MetaCrawler is that it tests for dead links.
Part 2: Using Search Engines to Conduct Effective Searches
Few things are more frustrating than performing searches that produce thousands of links, but nothing of value. In this section we discuss the art of how to make your search is more effective in produce the results to you want. While almost everyone has their own style of searching, several basic rules will help in almost any search situation. First, the most effective searches are. Specific for which you are looking for. Think of the search as a journey down a directory tree that becomes more and more specific. A search using yahoo produces many directory trees and show issue how this worked. “Recreation” leads to “sports” which leads to “snow skiing” which leads to “ski areas” which leads to “Snowbird”. If you want to know about “snow skiing at Snowbird”, then be specific. Additionally, Boolean logic and the closet and minus symbols allow you to further include or exclude information from your search. It is not on, and four different search engines to produce different results. Don't be afraid to switch search engines or use a met a search engine if you don't find which are looking for. Backward searches are also effective. Find a site that deals with the topic of interest, and then search for that URL to find gurus sites that reference it and have additional information.
Chris Sherman of about.com suggests seven hints that produce effective searches. (http://websearch.miningco.com/internet/websearch/library/weekly/aa010199.htm )
Hint 1:study search engine help files
understand how to use your search engine. The search engine help files detailed the commands and limitations of the search engine. It often helps to find your most preferred engine and learn that syntax first. Other search engines will be similar. Online Columnist Greg Notess http://www.notess.com/search/features/ provides an excellent chart comparing and summarizing the primary search features of the Web search engines. Each section of his chart links to more detailed explanation of the features.
Hint 2: Use the "Three Strikes" rule. All search engines are not alike. They index different web sites and have different search capabilities. If you don't get the results you want with one search engine, try another. It may be a waste of time thinking up different key words.
Hint 3: Don't Play Favorites: Use several search engines on a regular basis.
Hint 4: Use 2 or 3 word Phrases: All search engines respond better to phrase searches than a list of keywords.
Hint 5: Use Boolean Operators Selectively: Understand how each search engine uses the specific boolean operator you specify. They are not all the same. For a specific tutorial on Boolean logic, and its use with the major search engines, view the University of Albany's Library site tutorial at: http://www.albany.edu/library/internet/boolean.html. One table from their site identifies general types of boolean operators and their use in each major search engine.
Hint 6: Use Specialized Search Sites: Again, Chris Sherman points to a huge number of specialty sites for searching for everything from religion to politics, to graphics, products, people, competitors, news and so forth. http://websearch.about.com/internet/websearch/mlibrary.htm
Help Files
for the Major Search Engines
Search Engine Watch provides a quick overview of the math, power searching, boolean, search assistance and display features and capabilities of the major search engines. See http://www.searchenginewatch.com/facts/ataglance.html . This tutorial offers an excellent one stop overview of search.
|
Feature Boolean operators |
Search Engine AltaVista Advanced Search | C4 | Dogpile | Excite | HotBot | HotBot SuperSearch | Northern Light | ProFusion | Snap Power Search |
|
Full Boolean logic with parentheses e.g., behavior
and (cats or felines) |
AltaVista Advanced Search
| C4 | Excite
| HotBot | HotBot
SuperSearch | Lycos Pro | MSN Web Search Advanced Search | Northern Light | Snap
Power Search |
|
Implied Boolean +/- |
AltaVista | AOL.COM Search | C4
| Chubba | Excite
| HotBot | HotBot
SuperSearch | Infoseek | Lycos | Lycos Pro
| Mamma | MetaCrawler
| Northern Light | Northern Light Power Search | PlanetSearch | Snap |
|
Boolean logic by template terminology |
AOL.COM Search Options
| Excite (Power Search) | HotBot | HotBot
SuperSearch | HuskySearch |
Infoseek Advanced Search | Lycos Pro | SavvySearch
| Snap Power Search |
|
Proximity operators |
AltaVista (Advanced Search)
| Dogpile | Lycos Pro |
(source: http://www.searchenginewatch.com/facts/ataglance.html )
For a more detailed view of the math commands for the
leading search engines, Ken Bogucki
describes in his usegroup how to use various search engines, and provides an
explanation of the query syntax, with examples, and some helpful hints on
searching the Web. http://www.faqs.org/faqs/www/wisefaq/. The following excerpt from Ken's site shows
these commands. Note that the double
brackets [] are not part of the query syntax.
AltaVista
Basic Help: http://altavista.digital.com/av/content/help.htm
Advanced Help: http://altavista.digital.com/av/content/help_advanced.htm
Refine Results: http://altavista.digital.com/av/content/help_refine.htm
[apples
"orange juice"]
"apples" or the phrase "orange juice"
[+apples
-"orange juice"]
"apples" & not the phrase "orange juice"
[app*
(wildcard)]
"apples", "applets", "appraise"
(wildcard
in Alta Vista requires Min. of three letters before the wildcard and will
return from 0-5 characters Max.)
Complex
Searches (Can use either logical word or symbol expressions)
AND
or &, OR or |, NOT or !, NEAR or ~
[apple
AND orange] "apple"
& the word "orange"
[apple
OR orange] "apple"
or the word "orange"
[apples
NOT oranges] "apples" but
not the word "oranges"
[apple
NEAR juice] "juice"
within ten words of "apple"
RESTRICTING
A SIMPLE AND COMPLEX SEARCH
[anchor:click-here] pages with "click-here" in
the hyperlink.
[applet:<java
class>] pages with the Java
class in the applet tag
[domain:xyz] pages in the domain
"xyz"
[host:xyz.com] sites at the host name xyz.com.
[image:a.jpg] sites with an image tag,
"a.jpg".
[link:xyz.com] sites with a link to xyz.com.
[text:orange] sites with "orange"
in the visible text
[title:"A,
B and C"] sites with
"A, B and C" in the title.
RANKING
Simple
searches: The ranking is automatic.
Complex
searches: Enter any word or groups of words in the ranking window. Alta Vista
will sort the results based on these
words.
Excite
Search Help: http://www.excite.com/Info/searching.html?a-tip-t
Help File for Webmasters: http://www.excite.com/info/listing.html
Concept
Based Search
[+apples
+pears] "apples"
and "pears"
[-apples
+peach] "peach" but
not "apples"
[+apples
-pears -berries] "apples"
but not "peaches" or
"berries"
Exact
match queries use Logical Word Expressions to find Web documents. The Logical Word Operator are: AND, OR, AND
NOT.
Using
logical word expressions will turn off Excite's concept based option. Precise searches require the use of Logical
Word Operators.
[apples
AND peaches] pages with "apples"
and "peaches"
[apples
OR peaches] pages with either
"apples" or "peaches"
[apples
AND NOT peaches] pages with
"apples" but not with "peaches"
HotBot
Getting Started: http://www.hotbot.com/help/tips/getting_started.asp
Advanced Search Features List: http://www.hotbot.com/help/tips/search_features.asp
HotBot uses a graphic interface with pull down menus and check boxes to make searching easier. However, HotBot lacks some of the sophisticated query options available at other sites. Even some of the more elemental query options are missing from HotBot. For example, HotBot does not allow proximity searches ("apple" within 10 words of "juice") and HotBot does not support wild card searches. At most search engines, a search for "appl*" will yield results that contain "apple", "apples", "applejack", and "applesauce." This wild card search is not possible at Hot Bot.
Infoseek
How Do I Search?: http://www.infoseek.com/Help?pg=HomeHelp.html
Advanced Search: http://www.infoseek.com/Help?pg=advanced_search.html
Search Tutorial: http://www.infoseek.com/Help?pg=tutorial.html
Simple
Searches
[apples
oranges] either
"apples" or "oranges".
[+apples
oranges] "apples",
pages with "oranges" are ranked lower.
["apple
juice"]
"apple" and "juice" appear next to each other.
Caps
are used to indicate proper names and a case sensitive search:
[Johnny
Appleseed] will find the name
"Johnny Appleseed".
[Johnny,Appleseed] will find either name.
Note:
commas are only used to separate names.
[apples
-grapes] "apples"
but not "grapes".
Complex
Searches
[fruit
| apple | juice] will find
"fruit" then search results for "apple" then search those
results for "juice".
[title:fruit] "fruit" in the title of the page.
[url:www.orange.com] sites with address
"www.orange.com".
[url:fruit] sites with "fruit"
in the URL, "www.fruit.com" or "www.fruitandnuts.com".
[link:www.juice.com] will find sites linked to the specified
URL
[site:xyz.com] will find all sites at the specified address.
Lycos Pro Advanced Search
Lycos Pro Search: http://www.lycos.com/help/lycospro-help.html
STANDARD
SEARCH
Standard
searches do not use logical word operators.
[apples
oranges peaches] pages where any of
the words appear
[apples
+berries] "apples"
and "berries"
[apples
-berries] "apples"
but not "berries"
[app$
(wildcard)] "apples",
"applets" etc..
[apple.] "apple" but not
the word "apples"
CUSTOM
SEARCHES
Complex
searches are done through an intuitive menu interface.
Northern Light
Search Help: http://www.nlsearch.com/docs/prod_help.htm#simplesearch
Search Syntax Meaning Hints/Examples
AND + must
have cats AND dogs
NOT - must
not have dolphins NOT
football
OR any of the terms encryption OR cryptography
( ) parenthetical
expression (cats OR kittens) AND dogs
" " phrase "four score and seven years ago"
* replaces multiple characters chemi* [will find chemistry, chemical]
% replaces one character gene%logy [will find genealogy,
geneology]
Fields Searches for Hints/Examples
COMPANY: a company name* COMPANY:"General Electric"
PUB: a Special Collection™ PUB:Lancet
publication name
TEXT: words anywhere TEXT:mergers AND acquisitions
TICKER: a company's stock ticker* TICKER:MSFT
TITLE: words in a document title TITLE:"Northern Light" AND
"research engine"
URL: text
in a URL URL:nlsearch [will
find www.nlsearch.com and
all
other URLs that contain the word nlsearch]
* Only certain pre-selected Special Collection documents are
indexed on COMPANY and TICKER.
You can use any combination of fielded search and search syntax in
any of the Northern Light search forms.
For example: PUB:Lancet AND
TITLE:"heart surgery"
This search will find articles from the Lancet with the phrase
"heart surgery" in the title.
TICKER:MSFT OR TEXT:MSFT This search will find articles that are
indexed on the Microsoft ticker symbol or include MSFT anywhere in the text.
Search Forms
SEARCH Search
entire WWW and Special Collection for words anywhere.
POWER SEARCH Advanced searching of WWW and Special
Collection with control of source, date, language, country, subject, and type.
PUBLICATION SEARCH Search
the Special Collection, with date sorting option.
INDUSTRY SEARCH Search the WWW and Special Collection on
specific industry categories, date range, and document types.
CURRENT NEWS Search past 2 weeks of news stories and
browse continually updated headlines, weather, sports, and financial
information.
WEBCRAWLER http://www.webcrawler.com
[apples
oranges or apples OR oranges] pages
that contain any of the words.
[apples
AND oranges] "apples"
and "oranges"
[fruit
NOT apples] "fruit"
but not "apples"
[cheese
NEAR/(x) wine] "wine" is
within "x" words of "cheese"
[world
ADJ war] "world"
& "war" are next to each other
["..
" Phrases searches] "us army", "jack and jill
went up the hill"
[(..)] used to organize search
expressions
Yahoo!
Searching Yahoo!: http://howto.yahoo.com/chapters/7/1.html
Advanced Search Syntax: http://www.search.yahoo.com/search/syntax?
Advanced
Options:
[apples
+oranges] "apples"
as well as "oranges"
[apples
-oranges] "apples"
but not with "oranges".
[t:] confines the search to
certain Web titles.
[u:] confines the search to
certain URLs.
["
"] phrase operator
"orange juice", "apple juice", etc.
[pea* (wildcard)] "pears", "peas", "peaches"
etc.
Future Search Technology
One possible direction for future search technology is to employ Virtual DataBase (VDB) technology to expand the scope and capabilities of web data processing. According to Anand Rajaraman, one of the founders of Junglee systems, "this VDB technology lets applications ask powerful SQL queries of data that is scattered over a variety of data sources both in and out of the web. The VDB gathers, structures and integrates the data from these disparate data sources and provides the application programmer with the appearance of a single, unified relational database system. VDB technology enables the development of an exciting new breed of applications that use all the data." This technology will extend current search engines to
·
Find information intelligently