Proceedings of the Second International Conference on Knowledge Discovery and Data Mining
Sponsored by the American Association for Artificial Intelligence
Edited by Evangelos Simoudis, Jiawei Han, and Usama Fayyad
Published by The AAAI Press, Menlo Park, California. This proceedings is also available in book and CD format.
Please Note: Abstracts are linked to individual titles, and will appear in a separate browser window. Full-text versions of the papers are linked to the abstract text. Access to full text may be restricted.
Contents
KDD-96 Organization / x
Sponsoring Organizations/ xi
Preface / xiii
KDD-96 Regular Papers
Combining Data Mining and Machine Learning
Sharing Learned Models among Remote
Database Partitions by Local Meta-Learning / 2
Philip K. Chan, Florida Institute of Technology and Salvatore J. Stolfo,
Columbia University
Combining Data Mining and Machine
Learning for Effective User Profiling / 8
Tom Fawcett and Foster Provost, NYNEX Science and Technology
Local Induction of Decision Trees:
Towards Interactive Data Mining / 14
Truxton Fulton, Simon Kasif, and Steven Salzberg, Johns Hopkins University;
David Waltz, NEC Research Institute
Knowledge Discovery in RNA Sequence
Families of HIV Using Scalable Computers / 20
Ivo L. Hofacker, University of Illinois; Martijn A. Huynen, Los Alamos
National Laboratory and Santa Fe Institute; Peter F. Stadler, University of
Vienna and Santa Fe Institute; Paul E. Stolorz, Jet Propulsion Laboratory,
California Institute of Technology
Parallel Halo Finding in N-body
Cosmology Simulations / 26
David W. Pfitzner, Mount Stromlo Observatory, Australia and John K. Salmon,
California Institute of Technology
Scalable Exploratory Data Mining
of Distributed Geoscientific Data / 32
Eddie C. Shek, University of California, Los Angeles and Hughes Research
Laboratories; Richard R. Muntz, Edmond Mesrobian, and Kenneth Ng, University of
California, Los Angeles
Data Mining Applications
Using a Hybrid Neural/Expert System
for Data Base Mining in Market Survey Data / 38
Victor Ciesielski and Gregory Palstra, Royal Melbourne Institute of
Technology, Australia
Discovering Knowledge in Commercial
Databases Using Modern Heuristic Techniques / 44
B. de la Iglesia, J. C. W. Debuse, and V. J. Rayward-Smth, University of
East Anglia, United Kingdom
KDD for Science Data Analysis:
Issues and
Examples / 50
Usama Fayyad, Microsoft Research; David Haussler, University of California,
Santa Cruz; and Paul Stolorz, Jet Propulsion Laboratory, California Institute
of Technology
Data Mining and Model Simplicity:
A Case Study in Diagnosis / 57
Gregory M. Provan, Rockwell Science left and Moninder Singh, University of
Pennsylvania
Automated Discovery of Medical
Expert System Rules from Clinical Databases Based on Rough Sets / 63
Shusaku Tsumoto and Hiroshi Tanaka, Tokyo Medical and Dental University,
Japan
Automated Discovery of Active
Motifs in Multiple RNA Secondary Structures / 70
Jason T. L. Wang, New Jersey Institute of Technology; Bruce A. Shapiro,
National Institutes of Health; Dennis Shasha, New York University; Kaizhong
Zhang, The University of Western Ontario, Canada; Chia-Yo Chang, New Jersey
Institute of Technology
Detecting Early Indicator Cars
in an Automotive Database: A Multi-Strategy Approach / 76
Ruediger Wirth and Thomas P. Reinartz, Daimler-Benz AG, Germany
Data Mining and Its Applications: A General Overview
Knowledge Discovery and Data Mining:
Towards a Unifying Framework / 82
Usama Fayyad, Microsoft Research; Gregory Piatetsky-Shapiro, GTE
Laboratories; and Padhraic Smyth, University of California, Irvine
An Overview of Issues in Developing
Industrial Data Mining and Knowledge Discovery Applications
/ 89
Gregory Piatetsky-Shapiro, GTE Laboratories; Ron Brachman, AT&T Reseach;
Tom Khabaza, ISL, United Kingdom; Willi Kloesgen, GMD, Germany; and Evangelos
Simoudis, IBM Almaden Research Center
Decision-Tree and Rule Induction
Linear-Time Rule Induction
/ 96
Pedro Domingos, University of California, Irvine
Learning from Biased Data Using
Mixture Models / 102
A. J. Feelders, Data Distilleries Ltd., The Netherlands
Discovery of Relevant New Features
by Generating Non-Linear Decision Trees / 108
Andreas Ittner, Chemnitz University of Technology and Michael Schlosser,
Fachhochschule Koblenz, Germany
Error-Based and Entropy-Based Discretization of Continuous Features /
114
Ron Kohavi, Silicon Graphics, Inc. and Mehran Sahami, Stanford
University
Learning, Probability, and Graphical Models
Rethinking the Learning of Belief
Network
Probabilities / 120
Ron Musick, Lawrence Livermore National Laboratory
Clustering Using Monte Carlo Cross-Validation / 126
Padhraic Smyth, University of California, Irvine
Harnessing Graphical Structure
in Markov Chain Monte Carlo Learning / 134
Paul E. Stolorz, Jet Propulsion Laboratory, California Institute of
Technology and Philip C. Chew, University of Pennsylvania
Mining with Noise and Missing Data
Imputation of Missing
Data Using Machine Learning Techniques / 140
Kamakshi Lakshminarayan, Steven A. Harp, Robert Goldman, and Tariq Samad,
Honeywell Technology Center
Discovering Generalized Episodes
Using Minimal Occurrences / 146
Heikki Mannila and Hannu Toivonen, University of Helsinki, Finland
Pattern-Oriented Data Mining
Metapattern Generation for Integrated
Data Mining / 152
Wei-Min Shen, University of Southern California and Bing Leng, Inference
Corporation
Automated Pattern Mining with
a Scale Dimension / 158
Jan M. Zytkow, Wichita State University and Polish Academy of Sciences,
Poland; Robert Zembowicz, Wichita State University
Prediction and Deviation
A Linear Method for Deviation
Detection in Large Databases / 164
Andreas Arning, IBM German Software Development Laboratory, Germany; Rakesh
Agrawal and Prabhakar Raghavan, IBM Almaden Research Center
Planning Tasks for Knowledge Discovery
in Databases; Performing Task-Oriented User-Guidance / 170
Robert Engels, University of Karlsruhe, Germany
Predictive Data Mining with Finite
Mixtures / 176
Petri Kontkanen, Petri Myllymäki, and Henry Tirri, University of
Helsinki, Finland
An Empirical Test of the Weighted
Effect Approach to Generalized Prediction Using Recursive Neural
Nets / 183
Rense Lange, University of Illinois at Springfield
Multiple Uses of Frequent Sets
and Condensed Representations: Extended Abstract / 189
Heikki Mannila and Hannu Toivonen, University of Helsinki, Finland
A Comparison of Approaches for
Maximizing Business Payoff of Prediction Models / 195
Brij Masand and Gregory Piatetsky-Shapiro, GTE Laboratories
Scalability and Extensibility of Data Mining Systems
Scaling Up the Accuracy of Naive-Bayes
Classifiers: A Decision-Tree Hybrid / 202
Ron Kohavi, Silicon Graphics, Inc.
Quakefinder: A Scalable Data Mining
System for Detecting Earthquakes from Space / 208
Paul Stolorz and Christopher Dean, Jet Propulsion Laboratory, California
Institute of Technology
Extensibility in Data Mining Systems / 214
Stefan Wrobel, Dietrich Wettschereck, Edgar Sommer, and Werner Emde, GMD,
FIT.KI, Germany
Spatial, Text and Multimedia Data Mining
Mining Knowledge in Noisy Audio
Data / 220
Andrzej Czyzewski, Technical University of Gdansk, Poland
A Density-Based Algorithm for
Discovering Clusters in Large Spatial Databases with Noise
/ 226
Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu,
University of Munich, Germany
A Method for Reasoning with Structured
and Continuous Attributes in the INLEN-2 Multistrategy Knowledge
Discovery System / 232
Kenneth A. Kaufman, George Mason University and Ryszard S. Michalski, George
Mason University and Polish Academy of Sciences, Poland
Self-Organizing Maps of Document
Collections: A New Approach to Interactive Exploration / 238
Krista Lagus, Timo Honkela, Samuel Kaski, and Teuvo Kohonen, Helsinki
University of Technology, Finland
Systems for Mining Large Databases
The Quest Data Mining System
/ 244
Rakesh Agrawal, Manish Mehta, John Shafer, and Ramakrishnan Srikant, IBM
Almaden Research Center; Andreas Arning and Toni Bollinger, IBM German Software
Development Laboratory, Germany
DBMiner: A System for Mining Knowledge
in Large Relational Databases / 250
Jiawei Han, Yongjian Fu, Wei Wang, Jenny Chiang, Wan Gong, Krzystof
Koperski, Deyi Li, Yijun Lu, Amynmohamed Rajan, Nebojsa Stefanovic, Betty Xia,
and Osmar R. Zaiane, Simon Fraser University, Canada
DataMine: Application Programming
Interface and Query Language for Database Mining / 256
Tomasz Imielinski, Aashu Virmani, and Amin Abdulghani, Rutgers
University
KDD-96 Technology Spotlight (Concise) Papers
Application of Mathematical Theories
Evaluating the Interestingness
of Characteristic
Rules / 263
Micheline Kamber, Simon Fraser University and Rajjan Shinghal, Concordia
University, Canada
The Field Matching Problem: Algorithms
and Applications / 267
Alvaro E. Monge and Charles P. Elkan, University of California, San Diego
Discovering Classification Knowledge
in Databases Using Rough Sets / 271
Ning Shan, Wojciech Ziarko, Howard J. Hamilton, and Nick Cercone, University
of Regina, Canada
Exceptional Knowledge Discovery
in Databases Based on Information Theory / 275
Einoshin Suzuki, Yokohama National University and Masamichi Shimura, Tokyo
Institute of Technology, Japan
Interactive Knowledge Discovery
from Marketing Questionnaire Using Simulated Breeding and Inductive
Learning Methods / 279
Takao Terano, The University of Tsukuba, Tokyo and Yoko Ishino, The
University of Tokyo, Japan
Representing Discovered Patterns
Using Attributed Hypergraph / 283
Yang Wang and Andrew K. C. Wong, University of Waterloo, Canada
Data Mining: Integration and Application
Developing Tightly-Coupled Data
Mining Applications on a Relational Database System / 287
Rakesh Agrawal and Kyuseok Shim, IBM Almaden Research Center
Mining Entity-Identification Rules
for Database Integration / 291
M. Ganesh and Jaideep Srivastava, University of Minnesota; Travis
Richardson, Apertus Technologies, Inc.
Undiscovered Public Knowledge:
A Ten-Year
Update / 295
Don R. Swanson and Neil R. Smalheiser, University
of Chicago
Genetic Algorithms
A Genetic Algorithm-Based Approach
to Data
Mining / 299
Ian W. Flockhart, Quadstone Ltd. and Nicholas J. Radcliffe, Quadstone Ltd.
and University of Edinburgh, United Kingdom
Deriving Queries from Results
Using Genetic Programming / 303
Tae-Wan Ryu and Christoph F. Eick, University of Houston
Mining Association Rules
Maintenance of Discovered Knowledge:
A Case in
Multi-Level
Association Rules / 307
David W. Cheung, The University of Hong Kong; Vincent T. Ng, Hong Kong
Polytechnic University; and Benjamin W. Tam, The University of Hong Kong
Analysing Binary Associations / 311
Arno J. Knobbe and Pieter W. Adriaans, Syllogic, The Netherlands
Rule Induction and Decision Tree Induction
Growing Simpler Decision Trees
to Facilitate Knowledge Discovery / 315
Kevin J. Cherkauer and Jude W. Shavlik,
University of Wisconsin
Efficient Specific-to-General
Rule Induction / 319
Pedro Domingos, University of California, Irvine
Data Mining and Tree-Based Optimization / 323
Robert Grossman, Magnify, Inc. and University of Illinois; Haim Bodek and
Dave Northcutt, Magnify, Inc.; Vince Poor, Princeton University
Induction of Condensed Determinations / 327
Pat Langley, Stanford University
SE-Trees Outperform Decision Trees
in Noisy
Domains / 331
Ron Rymon, University of Pittsburgh
Learning Limited Dependence Bayesian
Classifiers / 335
Mehran Sahami, Stanford University
RITIO - Rule Induction Two In
One / 339
David Urpani, CSIRO; Xindong Wu, Monash University; and Jim Sykes, Swinburne
University of Technology, Australia
Spatial, Temporal, and
Multimedia Data Mining
Mining Associations in Text in
the Presence of Background Knowledge / 343
Ronen Feldman, Bar-Ilan University, Israel and Haym Hirsh, Rutgers
University
Extraction of Spatial Proximity
Patterns by Concept Generalization / 347
Edwin M. Knorr and Raymond T. Ng, University of British Columbia,
Canada
Pattern Discovery in Temporal
Databases: A Temporal Logic Approach / 351
Balaji Padmanabhan and Alexander Tuzhilin, New York University
Special Data Mining Techniques
Exploiting Background Knowledge
in Automated Discovery / 355
John M. Aronis, University of Pittsburgh; Foster J. Provost, NYNEX Science
and Technology; and Bruce G. Buchanan, University of Pittsburgh
Data Mining with Sparse and Simplified
Interaction Selection / 359
Gerald Fahner, International Computer Science Institute
Inferring Hierarchical Clustering
Structures by Deterministic Annealing / 363
Thomas Hofmann and Joachim M. Buhmann, Rheinische
Friedrich-Wilhelms-Universität, Germany
Static Versus Dynamic Sampling
for Data Mining / 367
George H. John and Pat Langley, Stanford University
Efficient Search for Strong Partial
Determinations / 371
Stefan Kramer and Bernhard Pfahringer, Austrian Research Institute for
Artificial Intelligence, Austria
Reverse Engineering Databases
for Knowledge
Discovery / 375
Stephen Mc Kearney, Bournemouth University and Huw Roberts, BT Laboratories,
United Kingdom
Performing Effective Feature Selection
by Investigating the Deep Structure of the Data / 379
Marco Richeldi and Pier Luca Lanzi, CSELT, Italy
Invited Papers
Harnessing the Human in Knowledge
Discovery / 384
Georges G. Grinstein, University of Massachusetts at Lowell and The MITRE
Corporation
Efficient Implementation of Data
Cubes Via Materialized Views / 386
Jeffrey D. Ullman, Stanford University
Index / 389