Proceedings of the First International Conference on Knowledge Discovery and Data Mining
Sponsored by the American Association for Artificial Intelligence
Edited by Usama Fayyad and Ramasamy Uthurusamy
Published by The AAAI Press, Menlo Park, California. This proceedings is also available in book and CD format.
Please Note: Abstracts are linked to individual titles, and will appear in a separate browser window. Full-text versions of the papers are linked to the abstract text. Access to full text may be restricted.
Contents
Preface / v
Usama Fayyad and Ramasamy Uthurusamy
Active
Data Mining / 3
Rakesh Agrawal and Giuseppe Psaila, IBM Almaden Research Center
Are
We Losing Accuracy While Gaining Confidence in Induced Rules -
An Assessment of PrIL / 9
F. Özden Gür Ali, GE Corporate Research and Development and
William A. Wallace, Rensselaer Polytechnic Institute
STAR:
A General Architecture for the Support of Distortion Oriented Displays
/ 15
Paul Anderson, Ray Smith, and Zhongwei Zhang, Monash University,
Australia
Learning
First Order Logic Rules with a Genetic Algorithm / 21
S. Augier, G. Venturini, and Y. Kodratoff, Université de Paris-Sud,
France
Discovery
and Maintenance of Functional Dependencies by Independencies
/ 27
Siegfried Bell, University Dortmund, Germany
Intelligent
Instruments: Discovering How to Turn Spectral Data into Information
/ 33
Wray L. Buntine and Tarang Patel, NASA Ames Research Center
Learning
Arbiter and Combiner Trees from Partitioned Data for Scaling Machine
Learning / 39
Philip K. Chan and Salvatore J. Stolfo, Columbia University
Designing
Neural Networks from Statistical Models: A New Approach to Data
Exploration / 45
Antonio Ciampi, McGill University and Yves Lechevallier INRIA-Rocquencourt,
France
Capacity
and Complexity Control in Predicting the Spread Between Borrowing
and Lending Interest Rates / 51
Corinna Cortes, Harris Drucker, Dennis Hoover, and Vladimir Vapnik, AT&T
Bell Laboratories
Limits
on Learning Machine Accuracy Imposed by Data Quality / 57
Corinna Cortes, L. D. Jackel, and Wan-Ping Chiang, AT&T Bell
Laboratories
Applying
a Data Miner To Heterogeneous Schema Integration / 63
Son Dao and Brad Perry, Hughes Research Laboratories
Exploiting
Upper Approximation in the Rough Set Methodology / 69
Jitender S. Deogun, University of Nebraska at Lincoln; Vijay V. Raghavan and
Hayri Sever, University of Southwestern Louisiana
Analyzing
the Benefits of Domain Knowledge in Substructure Discovery /
75
Surnjani Djoko, Diane J. Cook, and Lawrence B. Holder, University of Texas
at Arlington
Knowledge
Discovery in a Water Quality Database / 81
Saso Dzeroski, Jozef Stefan Institute and Jasna Grbovic, Hydrometeorological
Institute of Slovenia
A
Statistical Perspective On Knowledge Discovery In Databases
/ 87
John Elder, Rice University and Daryl Pregibon, Daryl, AT&T Bell
Laboratories
A
Database Interface for Clustering in Large Spatial Databases
/ 94
Martin Ester, Hans-Peter Kriegel, and Xiaowei Xu, University of Munich,
Germany
Knowledge
Discovery in Telecommunication Services Data
Using
Bayesian Network Models / 100
Kazuo J. Ezawa and Steve W. Norton, AT&T Bell Laboratories
Data
Mining for Loan Evaluation at ABN AMRO: A Case Study / 106
A. J. Feelders and A. J. F. le Loux, University of Twente; J. W. van’t Zand,
ABN AMRO Bank, The Netherlands
Knowledge
Discovery in Textual Databases (KDT) / 112
Ronen Feldman and Ido Dagan, Bar-Ilan University, Israel
Optimization
and Simplification of Hierarchical Clusterings / 118
Doug Fisher, Vanderbilt University
Structured
and Unstructured Induction with EDAGs / 124
Brian R. Gaines, University of Calgary, Canada
Available
Technology for Discovering Causal Models, Building Bayes Nets, and
Selecting Predictors: The TETRAD II Program / 130
Clark Glymour, Carnegie Mellon University
Restructuring
Databases for Knowledge Discovery by Consolidation and Link Formation
/ 136
Henry G. Goldberg and Ted E. Senator, U.S. Department of the Treasury -
Financial Crimes Enforcement Network (FinCEN)
Discriminant
Adaptive Nearest Neighbor Classification / 142
Trevor Hastie, Stanford University and Robert Tibshirani, University of
Toronto
A
Perspective on Databases and Data Mining / 150
Marcel Holsheimer and Martin Kersten, CWI Database Research Group, The
Netherlands; Heikki Mannila and Hannu Toivonen, University of Helsinki,
Finland
Estimating
the Robustness of Discovered Knowledge / 156
Chun-Nan Hsu and Craig A. Knoblock, University of Southern California
Rough
Sets Similarity-Based Learning from Databases / 162
Xiaohua Hu and Nick Cercone, University of Regina, Canada
Efficient
Algorithms for Attribute-Oriented Induction / 168
Hoi-Yee Hwang and Wai-Chee Fu, Chinese University of Hong Kong
Robust
Decision Trees: Removing Outliers from Databases / 174
George H. John, Stanford University
Conceptual
Clustering in Structured Databases: A Practical Approach / 180
A. Ketterlin, P. Gançarski, and J. J. Korczak, LSIIT,
Université Louis Pasteur, France
Anonymization
Techniques for Knowledge Discovery in Databases / 186
Willi Klösgen, German National Research Center for Information
Technology (GMD)
Feature
Subset Selection Using the Wrapper Method: Overfitting and Dynamic
Search Space Topology / 192
Ron Kohavi and Dan Sommerfield, Stanford University
Exploiting
Visualization in Knowledge Discovery / 198
Hing-Yan Lee, Hwee-Leng Ong, and Lee-Hian Quek, Information Technology
Institute, Singapore
Knowledge-Based
Scientific Discovery in Geological Databases / 204
Cen Li and Gautam Biswas, Vanderbilt University
Discovering
Frequent Episodes in Sequences / 210
Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo, University of
Helsinki, Finland
MDL-Based
Decision Tree Pruning / 216
Manish Mehta, Jorma Rissanen, and Rakesh Agrawal, IBM Almaden Research
Center
Decision
Tree Induction: How Effective is the Greedy Heuristic? / 222
Sreerama K. Murthy and Steven Salzberg, Johns Hopkins University
An
Iterative Improvement Approach for the Discretization of Numeric
Attributes in Bayesian Classifiers / 228
Michael J. Pazzani, University of California, Irvine
Compression-Based
Evaluation of Partial Determinations / 234
Bernhard Pfahringer and Stefan Kramer, Austrian Research Institute for
Artificial Intelligence, Austria
Knowledge
Discovery from Multiple Databases / 240
James S. Ribeiro, Kenneth A. Kaufman, and Larry Kerschberg, George Mason
University
Discovering
Enrollment Knowledge in University Databases / 246
Arun P. Sanjeev and Jan M. Zytkow, Wichita State University
Extracting
Support Data for a Given Task / 252
Bernhard Schölkopf, Chris Burges, and Vladimir Vapnik, AT&T Bell
Laboratories
Feature
Extraction for Massive Data Mining / 258
V. Seshadri and Raguram Sasisekharan, AT&T Bell Laboratories; Sholom M.
Weiss, Rutgers University
Using
Rough Sets as Tools for Knowledge Discovery / 263
Ning Shan, Wojciech Ziarko, Howard J. Hamilton, and Nick Cercone, University
of Regina, Canada
Data
Surveying: Foundations of an Inductive Query Language / 269
Arno Siebes, CWI, Database Research Group, The Netherlands
On
Subjective Measures of Interestingness in Knowledge Discovery
/ 275
Avi Silberschatz, AT&T Bell Laboratories and Alexander Tuzhilin, New
York University
Using
Recon for Data Cleaning / 282
Evangelos Simoudis, IBM Almaden Research Center; Brian Livezey and Randy
Kerber, Lockheed Palo Alto Research Laboratories
Discovery
of Concurrent Data Models from Experimental Tables: A Rough Set
Approach / 288
Andrzej Skowron, Warsaw University and Zbigniew Suraj, Pedagogical
University, Poland
Learning
Bayesian Networks with Discrete Variables from Data / 294
Peter Spirtes and Christopher Meek, Carnegie Mellon University
Fast
Spatio-Temporal Data Mining of Large Geophysical Datasets /
300
Paul Stolorz, Jet Propulsion Laboratory, California Institute of Technology;
H. Nakamura, University of Tokyo; E. Mesrobian, R. R. Muntz, E. C. Shek, J. R.
Santos, J. Yi, K. Ng, S.-Y. Chien, C. R. Mechoso, and J. D. Farrara, University
of California, Los Angeles
Accelerated
Quantification of Bayesian Networks with Incomplete Data / 306
Bo Thiesson, Aalborg University, Denmark
Automated
Selection of Rule Induction Methods Based on Recursive Iteration
of Resampling Methods and Multiple Statistical Testing / 312
Shusaku Tsumoto and Hiroshi Tanaka, Tokyo Medical and Dental University,
Japan
Automated
Discovery of Functional Components of Proteins from Amino-Acid Sequences
Based on Rough Sets and Change of Representation / 318
Shusaku Tsumoto and Hiroshi Tanaka, Tokyo Medical and Dental University,
Japan
Fuzzy
Interpretation of Induction Results / 325
Xindong Wu, Monash University, Australia and Petter Måhlén,
Royal Institute of Technology, Sweden
Resource
and Knowledge Discovery in Global Information Systems: A Preliminary
Design and Experiment / 331
Osmar R. Zaïane and Jiawei Han, Simon Fraser University, Canada
Toward
a Multi-Strategy and Cooperative Discovery System / 337
Ning Zhong, The University of Tokyo and Setsuo Ohsuga, The Waseda
University, Japan
Index / 344
AAAI Digital Library
AAAI relies on your generous support through membership and donations. If you find these resources useful, we would be grateful for your support.