Integrating Clustering and Classification for Estimating Process Variables in Materials Science

Aparna S. Varde, Elke A. Rundensteiner, Carolina Ruiz, David C. Brown, Mohammed Maniruzzaman and Richard D. Sisson Jr.

In domains such as Materials Science an experimental result is often plotted as a two-dimensional graph of process variables to aid visual analysis. Performing laboratory experiments with specified input conditions and plotting such graphs consumes significant time and resources motivating the need for computational estimation. The goals are to estimate the graph obtained in an experiment given its input conditions, and to estimate the conditions needed to obtain a desired graph. State-of-the-art estimation approaches do not meet the requirements in targeted applications. In this dissertation, an estimation approach, AutoDomainMine, is proposed. In AutoDomainMine, graphs obtained from existing experiments are clustered and decision tree classification is used to learn the clustering criteria in order to build representative pair of input conditions and graph per cluster. Given the conditions of a new experiment, the relevant decision tree path is then traced to estimate its cluster. The representative graph of that cluster is the estimated graph. Given a desired graph, the closest matching representative graph is found. The conditions of the corresponding representative pair are the estimated conditions. One sub-problem of this dissertation is preserving semantics of graphs during clustering. This is addressed through our proposed technique, LearnMet, to learn domain-specific distance metrics for graphs. LearnMet iteratively compares actual clusters of graphs given by experts with predicted clusters obtained from any fixed clustering algorithm. It guesses an initial metric as a weighted sum of metrics, adjusts it in each epoch using error between predicted and actual clusters until error is below threshold, and returns the metric giving lowest error. Another sub-problem is capturing relevant details of each cluster through its representative yet being concise. This is addressed by our proposed methodology, DesRept that designs domain-specific cluster representatives showing different levels of detail, e.g., medoid and summarized representatives and returns the winner determined using an MDL-based encoding. AutoDomainMine is evaluated in the Heat Treating domain that motivated this dissertation. Upon conducting user studies comparing the estimation with laboratory experiments, it is found that AutoDomainMine gives satisfactory estimation accuracy and efficiency. Applications of AutoDomainMine include our QuenchMiner decision support system, simulation tools and intelligent tutors.

Subjects: 12. Machine Learning and Discovery; 10. Knowledge Acquisition

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.