The similarity relation of a number of texts is important not only for congress organizers (who need to group the proposed contributions to meaningful sessions) but to everybody who wants to find certain information within a larger number of texts. Existing information retrieval methods compare texts according to their similarity. Because these methods mostly remain on the surface of the words, the resemblance is not primary a semantic one, but a stylistic and vocabulary dependent one. Based on psychological considerations we have developed an algorithm called Hofmethode, which compares the semantic ‘environment’ of key words. Using the example of the SGP congress we show in this paper how the Hofmethode can be used to help both congress organizers and participants to find the appropriate contributions.
Tag Archives: information retrieval
On the Need for Open-Source Ground Truths for Medical Information Retrieval Systems
Smart information retrieval systems are becoming increasingly prevalent due to the rate at which the amount of digitized raw data has increased, and continues to increase. This is especially true in the medical domain, as there is much data stored in unstructured formats which contain “hidden” information within them. By hidden, this means information that cannot ordinarily be found by performing a simple text search. To test the information retrieval systems that handle such data, a ground truth, or gold standard, is normally required in order to gain performance values according to an information need. In this paper we emphasize the lack of freely available, annotated medical data and wish to encourage the community of developers working in this area to make available whatever data they can. Also, the importance of such annotated medical data is raised, especially its importance and potential impact on teaching and training in medicine. As well as this, this paper will point out some of the advantages that access to a freely available pool of annotated medical objects would provide to several areas of medicine and informatics. The paper then discusses some of the considerations that would have to be made for any future systems developed that would provide a service to make the creating, sharing, and annotating of such data easy to perform (by using an online, web-based interface, for example). Finally, the paper discusses in detail the benefits of such a system to teaching and examining medical students.
An Exploratory Study on the Explicitness of User Intentions in Digital Photo Retrieval
Search queries are typically interpreted as specification of information need of a user. Typically the search query is either interpreted as is or based on the context of a user, being for instance a user profile, his/her previously undertaken searches or any other background information. The actual intent of the user – the goal s/he wants to achieve with information retrieval – is an important part of a user’s context. In this paper we present the results of an exploratory study on the interplay between the goals of users and their search behavior in multimedia retrieval.
Information Retrieval Services for Heterogeneous Information Spaces
Many enterprises loose work time because they lack of global search solutions or their solutions are not able to satisfy the needs in a reasonable time. This results in costs for lost work time as well as increased response time. We present a novel approach to federated search engines that use case based reasoning to rerank results according to the searchers needs and therefore leads to a higher quality of search results and faster information retrieval.
Analyzing Organizational Information Gaps
In this paper, we analyze the relation of private and public information spaces in organizations and its implication for organizational knowledge management. By private information spaces, we mean all (electronic) information, which is only accessible by a single person in an organization (e.g. local files or personal E-Mails). The organizational information space in turn, consists of all electronic information, which can be accessed by all or most members of an organization. Based on this distinction, we develop a notion of information gaps between the organizational and the individual worker’s information space. We derive four basic situations and discuss the implications for organizational knowledge management in each one. We support our claims by describing results from initial evaluation studies.
PrestoSpace Publication Platform: A System for Searching and Retrieving Enriched Audiovisual Materials
We present the Publication Platform, a component of the PrestoSpace1 project, which provides retrieval and browsing functionalities of enriched audio-visual material. The Prestospace Factory is a system for enriching audio-visual documents in order to provide automated content and semantic analysis.The Publication Platform provides a user interface for semantic queries and produces a Web page with the results of the AV analysis and additional information about related external documents.
A Similarity Approach on Searching for Digital Rights
We present an innovative approach that treats the right management metadata as metric objects, enabling similarity search on IPR attributes between digital items. We show how the content base similarity search can help both the user to deal with a huge amount of similar items with different licenses and the content providers to detect fake copies or illegal uses. Our aim is the management of the metadata related to the Digital Rights in centralized systems or networks with indexing capabilities for both text and similarity searches, providing the basic infrastructure enabling the private use and the commercial exploitation as well.
Service Oriented Information Supply Model for Knowledge Workers
This paper suggests a powerful yet so far not used way to assist knowledge workers: while they are working on a problem, a system in the background is continuously checking to determine if similar or helpful material has not been published before, elsewhere. The technique described aims to reduce effort and time required to search relevant data on the World Wide Web by moving from a “pull” paradigm, where the user has to become active, to a “push” paradigm, where the user is notified if something relevant is found. We claim that the approach facilitates work by providing context aware passive web search, result analysis, extraction and organization of information according to the tasks at hand. Instead of conventional information retrieval enhancements we suggest a model where relevant information automatically moves into the attention field of the knowledge worker.
The Suffix Tree Document Model Revisited
In text-based information retrieval, which is the predominant retrieval task at present, several document models have been proposed, such as boolean, probabilistic, or (extended) vector models [Baeza-Yates and Ribeiro-Neto 1999]. Interestingly, the suffix tree document model is usually not discussed in the literature on the subject though it comes along with a property that sets it apart from the other models: It encodes information about word order. The suffix tree document model owes much of its popularity from the Vivísimo search engine, which operationalizes on-the-fly categorization of Internet search results. While the classical document models can be considered as vectors of words, the suffix tree document model as well as the related similarity measures are graph-based. Both types of document models provide an efficient means to compute document similarities, and, according to various publications, both types of document models work well in practice. However, there is no comparison between both paradigms that explains the concepts of one in terms of the other, or that contrasts their advantages and disadvantages with respect to certain retrieval tasks. In this paper we start to tackle this gap by shading light on the following questions: (1) How does similarity computation work in the suffix tree document model? (2) Based on the insights of Question 1, is it possible to combine concepts of both document model types within classification or categorization tasks? (3) Which of the document model types is more powerful with respect to unsupervised document classification?
Fuzzy-Fingerprints for Text-Based Information Retrieval
This paper introduces a particular form of fuzzy-fingerprints—their construction, their interpretation, and their use in the field of information retrieval. Though the concept of fingerprinting in general is not new, the way of using them within a similarity search as described here is: Instead of computing the similarity between two fingerprints in order to access the similarity between the associated objects, simply the event of a fingerprint collision is used for a similarity assessment. The main impact of this approach is the small number of comparisons necessary to conduct a similarity search.
Personalized Information Retrieval in Bibster, a Semantics-Based Bibliographic Peer-to-Peer System
Bibster is a semantics-based Peer-to-Peer system for exchanging bibliographic data among researchers. Bibster exploits ontologies in data storage, query formulation, query routing and answer presentation. While the original Bibster system assumed a globally shared domain ontology, we here describe extensions to the Bibster system, that allow to learn personalized ontologies from the local bibliographic metadata. These personal ontologies can not only be used for subsequently classifying the bibliographic metadata, but also for supporting an improved query refinement process.
Automated Retrieval of Information in the Internet by Using Thesauri and Gazetters as Knowledge Sources
There is an immense number of information resources on the Internet that can be utilized free of charge. So many knowledge workers try to make use of this information in their daily tasks. Nevertheless, it is very hard to find the relevant information in the Internet by using the full-text retrieval techniques which are offered by most existing search engines. This paper demonstrates that Thesauri, which have been used in established online retrieval systems for a long time, also open up new methods for the automated search for information in the Internet. In addition, thesaurus-like structures known as Gazetteers allow handling geographical references of information resources in a very effective way. The knowledge represented in thesauri and gazetteers can be used to process a variety of thematic and geographical queries and to retrieve the information of interest from the Internet. Comfortable ways of specifying queries can be offered to the users, e.g., by navigating in a hierarchical tree of descriptors, by using synonymous, related or foreign-language terms rather than fixed elements of a controlled vocabulary, or by indicating a geographical region of interest on a cartographic map.
In addition to the general principles, examples of powerful query processors and advanced user interfaces are presented which demonstrate the effective usage of the knowledge stored in thesauri and gazetteers. The implemented solutions turn out to be considerably more comfortable than the “black box search” offered by most existing library catalogs and Internet search engines.
The Role of Interaction Histories in Mental Model Building and Knowledge Sharing in the Legal Domain
This paper reports on a study examining attorneys’ and law librarians’ use of their memory and information they record externally in searching for, using, and sharing legal information. The paper suggests automatically and manually recording search histories and basing user interface tools on this information to support mental model building and knowledge sharing in the legal information domain. The research described is part of the author’s dissertation research [1] that examined the use of search histories in legal information seeking and use, and proposed interface design recommendations for information systems. While searching for and using information, attorneys learn about legal topics and use this knowledge in their work. They create mental models and share their new knowledge with colleagues. Computers can automatically record human-computer interaction events. This information can help searchers represent and share new knowledge. The recorded information can be provided back to the user through the user interface to support searching for and using information, learning about the subject matter and sharing this knowledge with others. In this study, attorneys and law librarians were interviewed and observed to assess their use of their memory and external memory aids while searching for and using legal information. The results reported here focus on the role of interaction histories and history-based interface tools in supporting mental model development of legal information seekers of a topical area and sharing this information with other users.
Instance Cooperative Memory to Improve Query Expansion in Information Retrieval Systems
The main goal of this research is to improve Information Retrieval Systems by enabling them to generate search outcomes that are relevant and customized to each specific user. Our proposal advocates the use of Instance Based Reasoning during the information retrieval process. When conducting a search, the system retrieves a previous similar search experience and traces back previous human reasoning and behavior and then replicates it in the current situation. Thus, user information retrieval experiences or instances are saved to be reused in future similar cases. The resulting cooperative memory is used for user query expansion.
In order to improve the information retrieval experience, we propose to conceptualize and model both the user profile, and the information retrieval process. This leads us to define some similarity functions between user profiles and information retrieval situations. The reuse of past experiences serves to enrich the initial user query by words from documents found in similar
cases. Unlike the classical Rocchio method, these documents are those already judged as valid by users with similar profile and in similar search situation. The value this method brings to the user is an increasing relevance of the search outcomes while reducing user interaction with the system.
This method has been implemented in the COSYDOR (Cooperative System for Document Retrieval) prototype based on Intermedia (Oracle 8i). Tests and evaluations have been performed on the COSYDOR prototype using the test corpus of TREC (Text Retrieval Conference) and its standard procedures for performance analysis and benchmarking. The results of these analyses show a significant improvement of performance in the first search iterations compared to the Intermedia benchmark.