Competency E – Databases

…design, query and evaluate information retrieval systems

Introduction

Information retrieval is a multi-faceted and interconnected set of rules and design. Each aspect must be in working order for other aspects to achieve desired results, whether it is in designing the database schema, indexing it, or searching. Designing, querying, and evaluating databases is a “process of inquiry, of interacting with the materials of the situation to see how they respond” as a means of creating solutions. (Weedman, 2008, p. 114).

Principles of Design

Weedman (2008) explains, “how we store [information] determines how we retrieve it.” (p. 115) As such, user needs are an elemental part of designing an IR system in that they determine a user’s success or failure in resource discovery. To create a useful database, designers should be attuned with people’s search habits, flaws, mistakes, etc to prevent potential searching problems for the user. While creating the data structure, a database designer must consider user’s needs while assigning attributes (access points) that will aid findability for the user. The designer must consider myriad ways that a user will attempt his/her search.  Another aspect is in determining aboutness. A designer must consider how aboutness is subject to user’s needs and perspectives; there are many shades of gray. For example, one user may define an information object’s aboutness as intellectual freedom while another user may define its aboutness in terms of privacy. When compiling a validation list for the data structure, the indexer must select precoordinate/postcoordinate terms that support findability for the user. These terms are also referred to as descriptors in that they describe elements of the content being retrieved. The indexer will want to ensure that the descriptor lists achieve a good balance between aggregation and discrimination.

Controlled vocabularies support information retrieval functions by limiting potential variables for an end user i.e. when choosing search terms. Due to the versatile and somewhat ambiguous nature of language, validation lists, thesauri, and other controlled vocabularies aid users in successfully finding their needed information objects more quickly. Conversely, while performing natural language searches, a user may experience difficulties finding relevant resources due to said ambiguity and versatility of language. It is important for database designers to disambiguate key terms to help clarify and standardize usage to promote findability.

Principles of Querying

“Understanding how a system is designed is crucial to understanding how to use and evaluate it…” (Weedman, 2008, p. 115) As such, our search strategies must match the structure of the information system. For example, when performing simple web searches using Google, Yahoo, or another type of search engine, users may receive relevant results from entering a single word or phrase as a search term. However, for more complicated searches that require specificity and exhaustivity, it is advantageous for users to enter specific search terms in quotation marks (e.g. “information retrieval”), or employ other methods such as proximity searching which require important words to appear a certain distance apart in the retrieved document. Proximity searching ensures more exact and relevant results because it discriminates non-matching documents that may include an irrelevant but similar or matching term.

Many searchers employ Boolean logic to achieve highly consistent and relevant results. Users choose key descriptors to represent their search, and employ the use of AND, OR, and NOT operators to combine or delimit descriptors and control results. Boolean searches are widely used to query databases to achieve a good balance between discrimination and aggregation, and can also be utilized to search Web engines like Google.

Principles of Evaluation

Relevance is “essential to the evaluation of information systems.” (Weedman, 2008, p. 123) As a highly subjective concept dependent on the users’ search, “concrete knowledge of both the users and the subject domain are important” [during evaluation]. (Weedman, 2008, p. 124) For example, usability is an important aspect of both Web and database design, and considerations like site layout and maintaining current links facilitate information retrieval.

Central to the primary purpose of evaluation is the need for users to retrieve needed and useful information germane to their search. A critical and standardized method of evaluation is the means of assessing item precision and recall. Recall measures the number of relevant objects retrieved divided by the number of relevant objects in the collection. Precision measures the number of relevant objects retrieved divided by the total number of objects retrieved. Both metrics evaluate relevancy; however, recall measures whether the system retrieves all relevant objects, whereas precision measures whether the system retrieves only the relevant objects to a user’s search needs.

Evidence

Libr 202_Assignment 1_Final Report

My first piece of evidence to demonstrate my mastery of competency E is a group assignment completed for LIBR 202: Information Retrieval in which my teammate and I created, designed, queried, and evaluated a database. In this assignment, my teammate and I collaborated on designing a database to be used by FBI forensic detectives seeking to match boot prints from crime scenes to those registered in our database. We each identified attributes and accompanying rules for searching the database based on our user community. In our statement of purpose, we explain the needs of our chosen user community, and identify potential search criteria based on the purposes of their searches. The rules for our attributes “define how the data for each attribute are entered into the database.” (Ellee Wilson, LIBR 202 lecture, Spring 2010) We began preliminary testing of our database in the Alpha phase, and then traded records with our subgroup for evaluation purposes. In this scenario, we switched roles from indexers to end-users.

This report includes our rules for searching the database, thesauri, data structures, and validation lists as well as our team evaluation of the usefulness of our database to its intended user community. This report demonstrates my competency in designing, testing and evaluating a database to determine its’ application and purpose in serving our users.

Libr 202_Querying Database

My second piece of evidence to demonstrate my mastery of competency E is a report on which my teammates and I collaborated for LIBR 202: Information Retrieval. In this report, we discuss our individual results from querying the boot database we designed using DB/Textworks as explicated in my first evidence for this competency. We also discuss difficulties with achieving the expected recall and precision for one of the queries, and determine the cause of this problem to be inconsistencies in the ‘Historical Style’ validation list. We also evaluate the need for greater controlled results when entering data in the ‘Gender’ field and explain the need for a validation list or controlled vocabulary. Overall, we discovered that consistency was vital to facilitating greater precision and recall for our users, and standardizing all database fields was advisable based on query results.

This paper demonstrates my competency in determining and understanding results of querying a database, and offers suggestions for improvement to aid user retrieval.

Libr 202_The Green Team Assignment 2

My third piece of evidence to demonstrate my mastery of competency E is a report on which my teammates and I collaborated for LIBR 202: Information Retrieval. In this report, we outline and describe how we created, designed, queried and evaluated a database of 14 seed articles about information seeking behaviors. In designing our database, all records were standardized to reflect key fields of author, title, abstract, journal, etc. Our database also contained two fields for two distinct controlled vocabularies: one using precoordinate terms and one using postcoordinate, as well as two natural language fields for title and abstract, respectively. We created records for each by “assigning appropriate terms from each of the vocabulary fields.” (Ellee Wilson, LIBR 202 lecture, Spring 2010) We also developed a user model to reflect our user community with consideration for how they might search, and also wrote a users’ guide to introduce our users to the database and instruct them on best ways to navigate to aid findability.

To prepare for the evaluation phase of this assignment, we determined a set of criteria “for evaluating the subject access the fields in our database provided.” (Ellee Wilson, LIBR 202 assignment, Spring 2010) These criteria involved evaluating the completeness of returned results from searching the database in terms of precision, recall, and fallout.

This paper demonstrates my competency in creating, designing, querying, and evaluating a database in a team environment to aid retrievability by striking a healthy balance between precision and recall for our users.

Libr 259_Digital Curation Spreadsheet

My fourth piece of evidence to demonstrate my mastery of competency E is an assignment I created for LIBR 259: Digital Preservation in which I appraised and selected a small group of ten various file types to “preserve” and make findable and readable for future generations. As I discussed in Comp G, each digital object was assigned a “unique identifier” and cataloged in individual spreadsheets along with the object’s preservation metadata in order to both test their stability and serve as an access point to the historical metadata of the object. Each week, I normalized and validated the ten personal digital objects with consideration for maintaining readability, renderability, and findability for my son and his future family.

This paper demonstrates my competency in determining the aboutness of digital objects to curate and preserve them for future generations. Assigning preservation metadata and logging each digital object’s unique characteristics was critical to ensuring my primary goal of long-term information retrieval.

Conclusion

Information retrieval is integral to providing quality service to our patrons, and is the bedrock of our foundation as intellectual freedom fighters to remove barriers and “promote access to all.” (ALA, 2012)

References

American Library Association. (2012). Access. Accessed October 29, 2012: http://www.ala.org/advocacy/access

Joudrey, Daniel N.; Taylor, Arlene G. (2009) The organization of information. Westport, CT: Libraries Unlimited.

Morville, Peter. (2005). Ambient Findability.  Sebastopol, CA: O’Reilly Media, Inc.

Weedman, Judith. (2008). Information Retrieval: Designing, Querying, and Evaluating Information Systems. In Haycock and Sheldon (Eds.), The Portable MLIS (112). Westport, CT: Libraries Unlimited.

Leave a comment