Workshop on
Integrating Data Mining and Knowledge Management
November 29, 2001
Organized in conjunction with
ICDM'01: The 2001 IEEE International Conference on Data Mining
Doubletree Hotel, San Jose, California, USA
November 29 - December 2, 2001
WorkshopOrganizers:
Franz J. Kurfess
California Polytechnic State University
Department of Computer Science
San Luis Obispo, CA 93407
Tel: (805) 756 7179
Fax: (805) 756 2956
Email: fkurfess@csc.calpoly.edu
Melanie Hilario
CUI - University of Geneva
24 rue General-Dufour
CH-1211 Geneva 4, Switzerland
Tel: +41 22/705 7791
Fax: +41 22/705 7780
Email: hilario@cui.unige.ch
Contibutions
System Frameworks
K. L. Ong, W. K. Ng, E. P. Lim.
A Web Mining Platform for Enhancing Knowledge Management
on the Web
S. K. Gupta, V. Bhatnagar, S. K. Wasari.
A Proposal for a Data Mining Management
System
A. Maedche, R. Volz.
The Ontology Extraction and Maintenance Framework
Text-To-Onto
Case-Based Knowledge Management, Case Mining, and Protocol Analysis
T. Li, S. Zhu, M. Ogihara.
Mining Patterns from Case Base Analysis
W. Jing, Y. Lu, S. J. De.
Case-Based Knowledge Management and Case Mining
in Optimization of GSM Network
P. C. Matthews, S. Ahmed, M. Aurisicchio.
Extracting Experience through Protocol Analysis
Mining of Association Rules and Structured Knowledge
A. Veloso, B. Possas, W. Meira Jr., M. B. de Carvalho
Knowledge Management in Association Rule Mining
T. Semenova, M. Hegland, W. Graco, G. Williams.
Effectiveness of Mining Association Rules for
Identifying Trends on Large Health Databases
J. J. Denimal, F. Boussu.
Improvement of Knowledge Management Based on
the Use of Data Mining Techniques for the Textile and Garment Industry
F. Wu, G. Gardarin.
A Genetic Algorithm for Retrieving
Sequence Strategies
Foreword
Knowledge management and data mining developed independently of
each other, and their complementarity has not yet been fully recognized,
much less exploited. Born of the understanding that knowledge is one if
not the most important asset of an organization, both knowledge management
(KM) and data mining (DM) grew and fluorished at the confluent
of information technologies (machine learning, knowledge-based systems,
databases), statistics and data analysis, and the business and management
sciences.
KM and DM embody distinct perspectives on different
knowledge-related issues. One is knowledge capture in the broadest sense
of the term. The KM community has traditionally focused on knowledge
acquisition from humans, either directly (e.g., via manual ontology
construction, expert interviews, authoring tools) or indirectly,
as when human know-how is mimicked by a program that observes a human expert
in action (e.g., learning apprentices, programming by demonstration).
In data mining, databases and data warehouses are the ultimate sources
from which knowledge is extracted. Machine learning emerged precisely as
a way of alleviating difficulties raised by knowledge elicitation
from humans. In addition, the exponential growth of process-generated
data has spurred the development for more scalable--hence more thoroughly
automated--ways of generating useful knowledge from data. A second issue
is knowledge refinement and revision. The typical KM approach is
again manual: the knowledge engineer readjusts domain ontologies,
rewrites rules in collaboration with domain experts, etc.
While DM research has yielded a promising harvest of automated techniques
for knowledge revision and theory refinement, these have been demonstrated
on highly circumscribed domain theories and have yet to be validated on
medium and large-scale applications. What is needed, then, is an
integrative approach that exploits synergies between knowledge management
and data mining in order to monitor and manage the full lifecycle
of knowledge--its capture and discovery, representation, storage, retrieval,
revision or refinement, and reuse--in an organization or community of practice.
Integrating data mining and knowledge management is not a simple matter
of juxtaposing KM and DM techniques borrowed from two independent
toolboxes. It implies resolving many still open issues at the junction
of the two fields, such as:
-
the representational mismatch: In knowledge management systems, priority
is placed on the readability and usability of knowledge by humans. Many
intuitive representational structures such as domain ontologies cannot
be directly fed into data mining techniques, and vice-versa, many mining
tools such as neural networks are black boxes whose results and internal
knowledge representation schemes can be very difficult to interpret and
verify, or to integrate into an organization's workflow and decision-making
processes.
-
the use of domain- and problem-specific knowledge to prime and guide the
knowledge discovery process: The better known approaches to this problem
are Bayesian methods, inductive logic programming, and knowledge-based
neurocomputing. Techniques in this area are mostly in the experimental
stage, and can benefit greatly from a combination of data- and
knowledge-oriented perspectives.
-
knowledge deployment and reuse: In general, data mining tasks yield highly
context-dependent patterns which can only be deployed on a one-shot basis.
Mined patterns are rapidly outdated, especially in dynamically evolving
environments. Should the same mining task arise again later, the learning
process is reinitiated from scratch to mitigate population or context drift.
-
knowledge assimilation: This is the process whereby new knowledge is incorporated
into the long-term knowledge store. A particularly important issue is the
assimilation of models and patterns extracted via data mining.
The papers presented in this workshop address many interesting aspects
of the above issues. Three papers propose architectures and frameworks
for KM-DM integration. Gupta et al.
describe how knowledge about the knowledge
discovery process can be brought to bear in managing the data mining process
itself, while Ong et al. propose
a web mining platform for organizing and
exploiting the vast knowledge stores available via the Web.
In Maedche et al.'s Text-To-Onto framework
for ontology extraction and maintenance,
domain-specific knowledge is distilled into a formal conceptualization
(an ontology) which can be used to guide knowledge discovery in both structured
data and text; at the same time, this ontology can itself be extracted
using data mining techniques. In the majority of papers, representational
mismatch is avoided through the use of symbolic knowledge structures; this
explains the predominance of
case-based (Jing et al., Li et al.),
protocol-based (Matthews et al.),
and rule-based (Semenova et al.,
Veloso et al.) systems,
both for managing and capturing knowledge. Interestingly, two papers
illustrate how non-symbolic data mining techniques can lead to human comprehensible
results that can be meaningfully interpreted and exploited by decision
makers in industry and finance.
In Wu and Gardarin's paper,
genetic algorithms
are used to detect sequence strategies and express them in the form of
high readable rules such as "if the relative strength index of a stock
is less than 30%, then sell it".
Denimal and Boussu's approach is
based on a combination of purely statistical techniques such as correspondence
analysis and hierarchical classification, but careful interpretation and
visualization of results led to an understanding of sales and fashion patterns
that impacted decisions concerning which clothes to produce and market.
Finally, in many papers, the integrative thrust came from the need to solve
both KM and DM problems in the context of concrete real-world tasks which
span domains as varied as engineering design, health care, finance, and
the textile and garment industry. Given the breadth and complexity
of the applications and issues involved, this first workshop on integrating
data mining and knowledge management will certainly not be the last.