• About Us
  • Gifts
  • AITopics
  • AI Magazine
  • Conferences
  • Library
  • Membership
  • Publications
  • Symposia
  • Contact

Proceedings of the Second International Conference on Knowledge Discovery and Data Mining

Sponsored by the American Association for Artificial Intelligence

Edited by Evangelos Simoudis, Jiawei Han, and Usama Fayyad

Published by The AAAI Press, Menlo Park, California. This proceedings is also available in book and CD format.


Please Note: Abstracts are linked to individual titles, and will appear in a separate browser window. Full-text versions of the papers are linked to the abstract text. Access to full text may be restricted.


Contents

KDD-96 Organization / x

Sponsoring Organizations/ xi

Preface / xiii

KDD-96 Regular Papers

Combining Data Mining and Machine Learning

Sharing Learned Models among Remote Database Partitions by Local Meta-Learning / 2
Philip K. Chan, Florida Institute of Technology and Salvatore J. Stolfo, Columbia University

Combining Data Mining and Machine Learning for Effective User Profiling / 8
Tom Fawcett and Foster Provost, NYNEX Science and Technology

Local Induction of Decision Trees: Towards Interactive Data Mining / 14
Truxton Fulton, Simon Kasif, and Steven Salzberg, Johns Hopkins University; David Waltz, NEC Research Institute

Knowledge Discovery in RNA Sequence Families of HIV Using Scalable Computers / 20
Ivo L. Hofacker, University of Illinois; Martijn A. Huynen, Los Alamos National Laboratory and Santa Fe Institute; Peter F. Stadler, University of Vienna and Santa Fe Institute; Paul E. Stolorz, Jet Propulsion Laboratory, California Institute of Technology

Parallel Halo Finding in N-body Cosmology Simulations / 26
David W. Pfitzner, Mount Stromlo Observatory, Australia and John K. Salmon, California Institute of Technology

Scalable Exploratory Data Mining of Distributed Geoscientific Data / 32
Eddie C. Shek, University of California, Los Angeles and Hughes Research Laboratories; Richard R. Muntz, Edmond Mesrobian, and Kenneth Ng, University of California, Los Angeles

Data Mining Applications

Using a Hybrid Neural/Expert System for Data Base Mining in Market Survey Data / 38
Victor Ciesielski and Gregory Palstra, Royal Melbourne Institute of Technology, Australia

Discovering Knowledge in Commercial Databases Using Modern Heuristic Techniques / 44
B. de la Iglesia, J. C. W. Debuse, and V. J. Rayward-Smth, University of East Anglia, United Kingdom

KDD for Science Data Analysis: Issues and
Examples
/ 50
Usama Fayyad, Microsoft Research; David Haussler, University of California, Santa Cruz; and Paul Stolorz, Jet Propulsion Laboratory, California Institute of Technology

Data Mining and Model Simplicity: A Case Study in Diagnosis / 57
Gregory M. Provan, Rockwell Science left and Moninder Singh, University of Pennsylvania

Automated Discovery of Medical Expert System Rules from Clinical Databases Based on Rough Sets / 63
Shusaku Tsumoto and Hiroshi Tanaka, Tokyo Medical and Dental University, Japan

Automated Discovery of Active Motifs in Multiple RNA Secondary Structures / 70
Jason T. L. Wang, New Jersey Institute of Technology; Bruce A. Shapiro, National Institutes of Health; Dennis Shasha, New York University; Kaizhong Zhang, The University of Western Ontario, Canada; Chia-Yo Chang, New Jersey Institute of Technology

Detecting Early Indicator Cars in an Automotive Database: A Multi-Strategy Approach / 76
Ruediger Wirth and Thomas P. Reinartz, Daimler-Benz AG, Germany

Data Mining and Its Applications: A General Overview

Knowledge Discovery and Data Mining: Towards a Unifying Framework / 82
Usama Fayyad, Microsoft Research; Gregory Piatetsky-Shapiro, GTE Laboratories; and Padhraic Smyth, University of California, Irvine

An Overview of Issues in Developing Industrial Data Mining and Knowledge Discovery Applications / 89
Gregory Piatetsky-Shapiro, GTE Laboratories; Ron Brachman, AT&T Reseach; Tom Khabaza, ISL, United Kingdom; Willi Kloesgen, GMD, Germany; and Evangelos Simoudis, IBM Almaden Research Center

Decision-Tree and Rule Induction

Linear-Time Rule Induction / 96
Pedro Domingos, University of California, Irvine

Learning from Biased Data Using Mixture Models / 102
A. J. Feelders, Data Distilleries Ltd., The Netherlands

Discovery of Relevant New Features by Generating Non-Linear Decision Trees / 108
Andreas Ittner, Chemnitz University of Technology and Michael Schlosser, Fachhochschule Koblenz, Germany

Error-Based and Entropy-Based Discretization of Continuous Features / 114
Ron Kohavi, Silicon Graphics, Inc. and Mehran Sahami, Stanford University

Learning, Probability, and Graphical Models

Rethinking the Learning of Belief Network
Probabilities
/ 120
Ron Musick, Lawrence Livermore National Laboratory

Clustering Using Monte Carlo Cross-Validation / 126
Padhraic Smyth, University of California, Irvine

Harnessing Graphical Structure in Markov Chain Monte Carlo Learning / 134
Paul E. Stolorz, Jet Propulsion Laboratory, California Institute of Technology and Philip C. Chew, University of Pennsylvania

Mining with Noise and Missing Data

Imputation of Missing Data Using Machine Learning Techniques / 140
Kamakshi Lakshminarayan, Steven A. Harp, Robert Goldman, and Tariq Samad, Honeywell Technology Center

Discovering Generalized Episodes Using Minimal Occurrences / 146
Heikki Mannila and Hannu Toivonen, University of Helsinki, Finland

Pattern-Oriented Data Mining

Metapattern Generation for Integrated Data Mining / 152
Wei-Min Shen, University of Southern California and Bing Leng, Inference Corporation

Automated Pattern Mining with a Scale Dimension / 158
Jan M. Zytkow, Wichita State University and Polish Academy of Sciences, Poland; Robert Zembowicz, Wichita State University

Prediction and Deviation

A Linear Method for Deviation Detection in Large Databases / 164
Andreas Arning, IBM German Software Development Laboratory, Germany; Rakesh Agrawal and Prabhakar Raghavan, IBM Almaden Research Center

Planning Tasks for Knowledge Discovery in Databases; Performing Task-Oriented User-Guidance / 170
Robert Engels, University of Karlsruhe, Germany

Predictive Data Mining with Finite Mixtures / 176
Petri Kontkanen, Petri Myllymäki, and Henry Tirri, University of Helsinki, Finland

An Empirical Test of the Weighted Effect Approach to Generalized Prediction Using Recursive Neural Nets / 183
Rense Lange, University of Illinois at Springfield

Multiple Uses of Frequent Sets and Condensed Representations: Extended Abstract / 189
Heikki Mannila and Hannu Toivonen, University of Helsinki, Finland

A Comparison of Approaches for Maximizing Business Payoff of Prediction Models / 195
Brij Masand and Gregory Piatetsky-Shapiro, GTE Laboratories

Scalability and Extensibility of Data Mining Systems

Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid / 202
Ron Kohavi, Silicon Graphics, Inc.

Quakefinder: A Scalable Data Mining System for Detecting Earthquakes from Space / 208
Paul Stolorz and Christopher Dean, Jet Propulsion Laboratory, California Institute of Technology

Extensibility in Data Mining Systems / 214
Stefan Wrobel, Dietrich Wettschereck, Edgar Sommer, and Werner Emde, GMD, FIT.KI, Germany

Spatial, Text and Multimedia Data Mining

Mining Knowledge in Noisy Audio Data / 220
Andrzej Czyzewski, Technical University of Gdansk, Poland

A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise / 226
Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu, University of Munich, Germany

A Method for Reasoning with Structured and Continuous Attributes in the INLEN-2 Multistrategy Knowledge Discovery System / 232
Kenneth A. Kaufman, George Mason University and Ryszard S. Michalski, George Mason University and Polish Academy of Sciences, Poland

Self-Organizing Maps of Document Collections: A New Approach to Interactive Exploration / 238
Krista Lagus, Timo Honkela, Samuel Kaski, and Teuvo Kohonen, Helsinki University of Technology, Finland

Systems for Mining Large Databases

The Quest Data Mining System / 244
Rakesh Agrawal, Manish Mehta, John Shafer, and Ramakrishnan Srikant, IBM Almaden Research Center; Andreas Arning and Toni Bollinger, IBM German Software Development Laboratory, Germany

DBMiner: A System for Mining Knowledge in Large Relational Databases / 250
Jiawei Han, Yongjian Fu, Wei Wang, Jenny Chiang, Wan Gong, Krzystof Koperski, Deyi Li, Yijun Lu, Amynmohamed Rajan, Nebojsa Stefanovic, Betty Xia, and Osmar R. Zaiane, Simon Fraser University, Canada

DataMine: Application Programming Interface and Query Language for Database Mining / 256
Tomasz Imielinski, Aashu Virmani, and Amin Abdulghani, Rutgers University

KDD-96 Technology Spotlight (Concise) Papers

Application of Mathematical Theories

Evaluating the Interestingness of Characteristic
Rules
/ 263
Micheline Kamber, Simon Fraser University and Rajjan Shinghal, Concordia University, Canada

The Field Matching Problem: Algorithms and Applications / 267
Alvaro E. Monge and Charles P. Elkan, University of California, San Diego

Discovering Classification Knowledge in Databases Using Rough Sets / 271
Ning Shan, Wojciech Ziarko, Howard J. Hamilton, and Nick Cercone, University of Regina, Canada

Exceptional Knowledge Discovery in Databases Based on Information Theory / 275
Einoshin Suzuki, Yokohama National University and Masamichi Shimura, Tokyo Institute of Technology, Japan

Interactive Knowledge Discovery from Marketing Questionnaire Using Simulated Breeding and Inductive Learning Methods / 279
Takao Terano, The University of Tsukuba, Tokyo and Yoko Ishino, The University of Tokyo, Japan

Representing Discovered Patterns Using Attributed Hypergraph / 283
Yang Wang and Andrew K. C. Wong, University of Waterloo, Canada

Data Mining: Integration and Application

Developing Tightly-Coupled Data Mining Applications on a Relational Database System / 287
Rakesh Agrawal and Kyuseok Shim, IBM Almaden Research Center

Mining Entity-Identification Rules for Database Integration / 291
M. Ganesh and Jaideep Srivastava, University of Minnesota; Travis Richardson, Apertus Technologies, Inc.

Undiscovered Public Knowledge: A Ten-Year
Update
/ 295
Don R. Swanson and Neil R. Smalheiser, University
of Chicago

Genetic Algorithms

A Genetic Algorithm-Based Approach to Data
Mining
/ 299
Ian W. Flockhart, Quadstone Ltd. and Nicholas J. Radcliffe, Quadstone Ltd. and University of Edinburgh, United Kingdom

Deriving Queries from Results Using Genetic Programming / 303
Tae-Wan Ryu and Christoph F. Eick, University of Houston

Mining Association Rules

Maintenance of Discovered Knowledge: A Case in
Multi-Level Association Rules
/ 307
David W. Cheung, The University of Hong Kong; Vincent T. Ng, Hong Kong Polytechnic University; and Benjamin W. Tam, The University of Hong Kong

Analysing Binary Associations / 311
Arno J. Knobbe and Pieter W. Adriaans, Syllogic, The Netherlands

Rule Induction and Decision Tree Induction

Growing Simpler Decision Trees to Facilitate Knowledge Discovery / 315
Kevin J. Cherkauer and Jude W. Shavlik,
University of Wisconsin

Efficient Specific-to-General Rule Induction / 319
Pedro Domingos, University of California, Irvine

Data Mining and Tree-Based Optimization / 323
Robert Grossman, Magnify, Inc. and University of Illinois; Haim Bodek and Dave Northcutt, Magnify, Inc.; Vince Poor, Princeton University

Induction of Condensed Determinations / 327
Pat Langley, Stanford University

SE-Trees Outperform Decision Trees in Noisy
Domains
/ 331
Ron Rymon, University of Pittsburgh

Learning Limited Dependence Bayesian Classifiers / 335
Mehran Sahami, Stanford University

RITIO - Rule Induction Two In One / 339
David Urpani, CSIRO; Xindong Wu, Monash University; and Jim Sykes, Swinburne University of Technology, Australia

Spatial, Temporal, and
Multimedia Data Mining

Mining Associations in Text in the Presence of Background Knowledge / 343
Ronen Feldman, Bar-Ilan University, Israel and Haym Hirsh, Rutgers University

Extraction of Spatial Proximity Patterns by Concept Generalization / 347
Edwin M. Knorr and Raymond T. Ng, University of British Columbia, Canada

Pattern Discovery in Temporal Databases: A Temporal Logic Approach / 351
Balaji Padmanabhan and Alexander Tuzhilin, New York University

Special Data Mining Techniques

Exploiting Background Knowledge in Automated Discovery / 355
John M. Aronis, University of Pittsburgh; Foster J. Provost, NYNEX Science and Technology; and Bruce G. Buchanan, University of Pittsburgh

Data Mining with Sparse and Simplified Interaction Selection / 359
Gerald Fahner, International Computer Science Institute

Inferring Hierarchical Clustering Structures by Deterministic Annealing / 363
Thomas Hofmann and Joachim M. Buhmann, Rheinische Friedrich-Wilhelms-Universität, Germany

Static Versus Dynamic Sampling for Data Mining / 367
George H. John and Pat Langley, Stanford University

Efficient Search for Strong Partial Determinations / 371
Stefan Kramer and Bernhard Pfahringer, Austrian Research Institute for Artificial Intelligence, Austria

Reverse Engineering Databases for Knowledge
Discovery
/ 375
Stephen Mc Kearney, Bournemouth University and Huw Roberts, BT Laboratories, United Kingdom

Performing Effective Feature Selection by Investigating the Deep Structure of the Data / 379
Marco Richeldi and Pier Luca Lanzi, CSELT, Italy

Invited Papers

Harnessing the Human in Knowledge Discovery / 384
Georges G. Grinstein, University of Massachusetts at Lowell and The MITRE Corporation

Efficient Implementation of Data Cubes Via Materialized Views / 386
Jeffrey D. Ullman, Stanford University

Index / 389