Workshop on

Integrating Data Mining and Knowledge Management

November 29, 2001

Organized in conjunction with

ICDM'01: The 2001 IEEE International Conference on Data Mining

Doubletree Hotel, San Jose, California, USA

November 29 - December 2, 2001

Workshop Home Page: http://cui.unige.ch/~hilario/icdm-01/cfp.html

Conference Home Page: http://kais.mines.edu/~xwu/icdm/icdm-01.html

WorkshopOrganizers:

Franz J. Kurfess
California Polytechnic State University
Department of Computer Science
San Luis Obispo, CA 93407
Tel: (805) 756 7179
Fax: (805) 756 2956
Email: fkurfess@csc.calpoly.edu

Melanie Hilario
CUI - University of Geneva
24 rue General-Dufour
CH-1211 Geneva 4, Switzerland
Tel: +41 22/705 7791
Fax: +41 22/705 7780
Email: hilario@cui.unige.ch

Contibutions

System Frameworks

K. L. Ong, W. K. Ng, E. P. Lim.
A Web Mining Platform for Enhancing Knowledge Management on the Web

S. K. Gupta, V. Bhatnagar, S. K. Wasari.
A Proposal for a Data Mining Management System

A. Maedche, R. Volz.
The Ontology Extraction and Maintenance Framework Text-To-Onto

Case-Based Knowledge Management, Case Mining, and Protocol Analysis

T. Li, S. Zhu, M. Ogihara.
Mining Patterns from Case Base Analysis

W. Jing, Y. Lu, S. J. De.
Case-Based Knowledge Management and Case Mining in Optimization of GSM Network

P. C. Matthews, S. Ahmed, M. Aurisicchio.
Extracting Experience through Protocol Analysis

Mining of Association Rules and Structured Knowledge

A. Veloso, B. Possas, W. Meira Jr., M. B. de Carvalho
Knowledge Management in Association Rule Mining

T. Semenova, M. Hegland, W. Graco, G. Williams.
Effectiveness of Mining Association Rules for Identifying Trends on Large Health Databases

J. J. Denimal, F. Boussu.
Improvement of Knowledge Management Based on the Use of Data Mining Techniques for the Textile and Garment Industry

F. Wu, G. Gardarin.
A Genetic Algorithm for Retrieving Sequence Strategies

Foreword

Knowledge management and data mining developed independently of each other, and their complementarity has not yet been fully recognized, much less exploited. Born of the understanding that knowledge is one if not the most important asset of an organization, both knowledge management (KM) and data mining (DM) grew and fluorished at the confluent of information technologies (machine learning, knowledge-based systems, databases), statistics and data analysis, and the business and management sciences.

KM and DM embody distinct perspectives on different knowledge-related issues. One is knowledge capture in the broadest sense of the term. The KM community has traditionally focused on knowledge acquisition from humans, either directly (e.g., via manual ontology construction, expert interviews, authoring tools) or indirectly, as when human know-how is mimicked by a program that observes a human expert in action (e.g., learning apprentices, programming by demonstration). In data mining, databases and data warehouses are the ultimate sources from which knowledge is extracted. Machine learning emerged precisely as a way of alleviating difficulties raised by knowledge elicitation from humans. In addition, the exponential growth of process-generated data has spurred the development for more scalable--hence more thoroughly automated--ways of generating useful knowledge from data. A second issue is knowledge refinement and revision. The typical KM approach is again manual: the knowledge engineer readjusts domain ontologies, rewrites rules in collaboration with domain experts, etc. While DM research has yielded a promising harvest of automated techniques for knowledge revision and theory refinement, these have been demonstrated on highly circumscribed domain theories and have yet to be validated on medium and large-scale applications. What is needed, then, is an integrative approach that exploits synergies between knowledge management and data mining in order to monitor and manage the full lifecycle of knowledge--its capture and discovery, representation, storage, retrieval, revision or refinement, and reuse--in an organization or community of practice.

Integrating data mining and knowledge management is not a simple matter of juxtaposing KM and DM techniques borrowed from two independent toolboxes. It implies resolving many still open issues at the junction of the two fields, such as:

the representational mismatch: In knowledge management systems, priority is placed on the readability and usability of knowledge by humans. Many intuitive representational structures such as domain ontologies cannot be directly fed into data mining techniques, and vice-versa, many mining tools such as neural networks are black boxes whose results and internal knowledge representation schemes can be very difficult to interpret and verify, or to integrate into an organization's workflow and decision-making processes.
the use of domain- and problem-specific knowledge to prime and guide the knowledge discovery process: The better known approaches to this problem are Bayesian methods, inductive logic programming, and knowledge-based neurocomputing. Techniques in this area are mostly in the experimental stage, and can benefit greatly from a combination of data- and knowledge-oriented perspectives.
knowledge deployment and reuse: In general, data mining tasks yield highly context-dependent patterns which can only be deployed on a one-shot basis. Mined patterns are rapidly outdated, especially in dynamically evolving environments. Should the same mining task arise again later, the learning process is reinitiated from scratch to mitigate population or context drift.
knowledge assimilation: This is the process whereby new knowledge is incorporated into the long-term knowledge store. A particularly important issue is the assimilation of models and patterns extracted via data mining.

The papers presented in this workshop address many interesting aspects of the above issues. Three papers propose architectures and frameworks for KM-DM integration. Gupta et al. describe how knowledge about the knowledge discovery process can be brought to bear in managing the data mining process itself, while Ong et al. propose a web mining platform for organizing and exploiting the vast knowledge stores available via the Web. In Maedche et al.'s Text-To-Onto framework for ontology extraction and maintenance, domain-specific knowledge is distilled into a formal conceptualization (an ontology) which can be used to guide knowledge discovery in both structured data and text; at the same time, this ontology can itself be extracted using data mining techniques. In the majority of papers, representational mismatch is avoided through the use of symbolic knowledge structures; this explains the predominance of case-based (Jing et al., Li et al.), protocol-based (Matthews et al.), and rule-based (Semenova et al., Veloso et al.) systems, both for managing and capturing knowledge. Interestingly, two papers illustrate how non-symbolic data mining techniques can lead to human comprehensible results that can be meaningfully interpreted and exploited by decision makers in industry and finance. In Wu and Gardarin's paper, genetic algorithms are used to detect sequence strategies and express them in the form of high readable rules such as "if the relative strength index of a stock is less than 30%, then sell it". Denimal and Boussu's approach is based on a combination of purely statistical techniques such as correspondence analysis and hierarchical classification, but careful interpretation and visualization of results led to an understanding of sales and fashion patterns that impacted decisions concerning which clothes to produce and market. Finally, in many papers, the integrative thrust came from the need to solve both KM and DM problems in the context of concrete real-world tasks which span domains as varied as engineering design, health care, finance, and the textile and garment industry. Given the breadth and complexity of the applications and issues involved, this first workshop on integrating data mining and knowledge management will certainly not be the last.