Developing and Applying a Company, Product and Business Event Ontology for Text Mining

The company, product and event (CoProE) ontology is an ontology that is being developed for use as a component of DAVID, a text mining system for business intelligence. The main design principle of the ontology, as well as the whole text miming system, is based on heavy reuse of existing freely available resources. This paper introduces the ontology and the domain knowledge component that utilizes it. In addition to describing the ontology and its design principles, we consider the ways in which the design process of the domain knowledge model and the CoProE ontology facilitated the design of our whole business intelligence text mining system.

On the Need for Open-Source Ground Truths for Medical Information Retrieval Systems

Smart information retrieval systems are becoming increasingly prevalent due to the rate at which the amount of digitized raw data has increased, and continues to increase. This is especially true in the medical domain, as there is much data stored in unstructured formats which contain “hidden” information within them. By hidden, this means information that cannot ordinarily be found by performing a simple text search. To test the information retrieval systems that handle such data, a ground truth, or gold standard, is normally required in order to gain performance values according to an information need. In this paper we emphasize the lack of freely available, annotated medical data and wish to encourage the community of developers working in this area to make available whatever data they can. Also, the importance of such annotated medical data is raised, especially its importance and potential impact on teaching and training in medicine. As well as this, this paper will point out some of the advantages that access to a freely available pool of annotated medical objects would provide to several areas of medicine and informatics. The paper then discusses some of the considerations that would have to be made for any future systems developed that would provide a service to make the creating, sharing, and annotating of such data easy to perform (by using an online, web-based interface, for example). Finally, the paper discusses in detail the benefits of such a system to teaching and examining medical students.

Framework for Analyzing and Clustering Short Message Database of Ideas

We introduce a framework for a new idea tool Note, which gathers, fosters and manages innovative ideas. Note supports the development of organizational memory and is connected to the practices of organizational innovativeness. The tool utilizes text mining methods in idea processing, management and visualization and is thus a new approach in idea management software. The tool is under development.

Text Mining for Indication of Changes in Long-Term Market Trends

For investment decisions the development of market trends is very important. In this contribution we present our results concerning the influence of news on market trends. We processed the stock news delivered by the Wall Street Journal with two methods of text mining – Bayes classification and grammar-driven classification. We found some potentialities of Dow Jones trend prediction and present promising results.

GCC: An Environment for Knowledge Management in Scientific Research and Higher Education Centres

Scientific research centres and universities are knowledge-intensive institutions, where knowledge creation and distribution are constant – and this knowledge should be managed. In spite of this, scientific work has been known for being solitary work, in which human interaction takes place only in small groups within a research domain. Currently, due to technology improvements, scientific data from different sources is available, communication between researchers is facilitated and scientific information creation and exchange are faster than in the past. However, the focus on information exchange is too limited to create systems that enable true cooperation and knowledge management in scientific environments. To facilitate more expressive exchanging, sharing and dissemination of knowledge and its
management, we have created a scientific knowledge management environment in which researchers may share their data, experiences, ideas, process definition and execution, and obtain all the necessary information to perform their tasks, make decisions, learn and disseminate knowledge.

Text Mining Supported Terminology Construction

In this paper we investigate the contribution of text mining techniques to a methodology of terminology construction from natural language corpora. The application area of our experiment is accidentology. In this context, the results of text mining techniques are used in order to guide the construction of a terminology of road accidents from a collection of accident reports. A model of our field, ontology of accidentology, is used that allows us to carry out the text mining process. The Terminae methodology and the tool supporting it offer the general frame for the resource construction. Further on we shall present our employed text mining techniques and the integration of the results we obtained into different phases of the construction process. Suggestions for further research to improve our techniques are also presented in this study.

Discourse Visualization Strategies for a Comprehensive Medial Analysis of Cultural Science Communities

Knowledge creation in the cultural sciences is very often of discursive nature. The individuals participating in these discussions can be regarded as a part of a community. In our collaborative research center on “Media and Cultural Communication” we are analyzing the impact of ”Networked Multimedia Information Systems in Cultural Science Cornmunities” on organization and creation of knowledge. In this aspect, we are researching on discourse visualization strategies for a comprehensive medial analysis of cultural science communities. Subject of the present investigation are discourses in about 40 discussion forums with an amount of more than 25.000 emails from more than 2500 individuals collected so far. Research aspects focus on the detection of emergent phenomena by the concatenation of discussion maps originated in the area of data mining, and by classifying meta data for the analyzed documents. Interesting phenomena in cultural science discourses are the sudden emergence and disappearance of terms (burstiness) in a particular forum as well as their spanning among various forums. The correlation of these burstiness phenomena and individuals is also an indicator for roles and hierarchies of individuals within debates. Another phenomenon is the drifting of terms within a community or among them what often can be accredited to members being active in several communities and ”infecting” them by spreading new terms. Fraunhofer FIT’s metadata-based tool SWAP-it is designed to visually support interactive text analysis on the web. It is based on web services and has been equipped with a community specific terminology, allowing a comprehensive study on the previously described phenomena also by self-supervision.

Automatic Discovery and Aggregation of Compound Names for the USe in Knowledge Representations

Automatic acquisition of information structures like Topic Maps or semantic networks from large document collections is an important issue in knowledge management. An inherent problem with automatic approaches is the treatment of multiword terms as single semantic entities. Taking company names as an example, we present a method for learning multiword terms from large text corpora exploiting their internal structure. Through the iteration of a search step and a verification step the single words typically forming company names are learnt. These name elements are used for recognizing compounds in order to use them for further processing. We give some evaluation of experiments on company name extraction and discuss some applications.

Topic Map Generation Using Text Mining

Starting from text corpus analysis with linguistic and statistical analysis algorithms, an infrastructure for text mining is described which uses collocation analysis as a central tool.
This text mining method may be applied to different domains as well as languages. Some examples taken form large reference databases motivate the applicability to knowledge management using declarative standards of information structuring and description. The ISO/IEC Topic Map standard is introduced as a candidate for rich metadata description of information resources and it is shown how text mining can be used for automatic topic map generation.