Quick Links: [Northern Light] |
||
CPE/CSC 580
Cal Poly
Spring 2001
Northern Light is a sophisticated search engine distinguished by its ability to present search results not only as a complete group of findings, but also organizes the information into “folders” that allow the user to focus in on one part of the search and eliminate matches that do not meet the searcher’s criteria. Northern Light also features an extensive library of additional exclusive materials (the “Special Collection”) that are available for a fee.
Northern Light users are allowed to directly enter URLs for possible inclusion in the database. Users are advised to enter only a single page of their site, since their crawler “Gulliver” will follow all links on the page submitted. Northern Light also claims that their crawler is constantly searching the web not only for new information to include, but that it also checks on and deletes dead links. Gulliver complies with the Robot Exclusion Protocol, and well as other crawler protocols.
As Northern Light is a search engine, there is no apparent description of the organization of information pre-query. As sites are added to its database, Northern Light reads and indexes the text on the site and analyzes it in order to determine the search terms. Northern Light does NOT use META-tags, ALT-tags, or comment tags in determining relevance. Documents are classified by those search terms and accessed via an index that points to all pages containing a given search term.
The Special Collection is a separate database of premium content that is available for a fee. Search matches from this collection are integrated into the search results, unless specifically excluded by the user. Searches can also be tailored to focus exclusively on this collection. Northern Light describes this collection as “an online business library comprising 7100 trusted, full-text journals, books, magazines, newswires, and reference sources.”
Northern Light presents its search
results in two ways. The middle of
the page consists of numbered search results, ten to a page. Each results has the link to the page right
after the number listing. Below the
link is a relevance percentage score followed by the type of page (e.g. “Articles & General
info”) and then a brief description of the site and
the date the site was most recently updated.
The text of the link is then displayed at the bottom of the paragraph
after the site’s general classification (e.g. “Commercial site“).
One advantage of
Northern Light is that it will group multiple findings from a site of
publication (if in the Special Collection) in its own folder accessible from a
link just below the paragraph. All of
these findings are accounted for in the total matches though, as at the top of
the search results it will list “xxx items in (subject or sub-folder) in xxx
sources for:”
Northern Light also will sub-classify all of its findings into “Custom Search Folders” that organize the search results into a number of sub-categories. The sub-folders are listed in the far left section of the page and can contain more than one level of depth. The methodology for determining what folders are created is covered in the next section.
Northern Light has invested a lot of energy into helping the user get to the findings most relevant to their needs given the current limitations on technology. To this end it conducts its own document evaluations in order to determine the appropriate keywords (as denoted above in Organization of Knowledge) to match up a document with a given query.
Queries are made via a standard listbox at the top of the page (and also at the bottom of each results page). Northern Light also offers specialized searches (e.g. Business Search, Stock Quotes, real-time news feeds) as well as “Special Editions” on selected topics. Northern Light supports natural language processing and simple word searches. It supports full Boolean capability, wildcards, and full fielded searching.
The Custom Search Folders are generated on the fly and are unique to each search. Northern Light organizes the search results into folders by subject, type, source, and language based on the individual search. There is also an additional folder created called “all others” for documents that don’t fall into any of the generated categories. These folders can be used to focus on a subset of the search results, and further layers of folders beneath can be explored to narrow down the search even more. I have found it useful to sometimes do an initial search and look for some of the search terms on the Custom Search Folders. This often helps in identifying key phrases that are common in a subject area. After reviewing these folders, I will then submit another query, often with much more focused results.
Northern Light looks at its collected body of knowledge in determining its relevancy rankings. One of the factors used in determining how relevant a document is to the current search is the frequency of the search term(s) in the entire database. Very common words are not assigned as much weight as less common words, suggesting an inverse proportion of influence to number of hits. In addition, it considers the number of other pages that link to the page in question as an implication of relevance.
Since Northern Light is a commercial search engine available to the public, it is geared more for knowledge retrieval than knowledge entry. The site could be considered as a “collaboration” of its authors, its web-crawlers, and the submissions of its users. Most of this process is automated given the sheer volume, though individuals are given the means to add to and request the amendment of items in the database.
A search engine serves as a knowledge bank, pointing users towards existing knowledge items in its index. Since Northern Light offers itself as a publicly accessible repository, it can serve (though in varying degrees of context and relevance) as an organizational memory of the web itself and by extension, a large subset of accumulated human knowledge. It is a database that is ever growing in its attempt to locate, identify, review, categorize, and retain the accelerating amount of online content.
None at this time.
Last modified: Tue Apr 17 14:43:55 PDT 2001