Scaling Up: Distributed Machine Learning with Cooperation

Foster John Provost, Daniel N. Hennessy

Machine-learning methods are becoming increasingly popular for automated data analysis. However, standard methods do not scale up to massive scientific and business data sets without expensive hardware. This paper investigates a practical alternative for scaling up: the use of distributed processing to take advantage of the often dormant PCs and workstations available on local networks. Each workstation runs a common rule-learning program on a subset of the data. We first show that for commonly used rule-evaluation criteria, a simple form of cooperation can guarantee that a rule will look good to the set of cooperating learners if and only if it would look good to a single learner operating with the entire data set. We then show how such a system can further capitalize on different perspectives by sharing learned knowledge for significant reduction in search effort. We demonstrate the power of the method by learning from a massive data set taken from the domain of cellular fraud detection. Finally, we provide an overview of other methods for scaling up machine learning.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.