Learning the Common Structure of Data

Kristina Lerman and Steven Minton, University of Southern California

The proliferation of online information sources has accentuated the need for tools that automatically validate and recognize data. We present an efficient algorithm that learns structural information about data from positive examples alone. We describe two Web wrapper maintenance applications that employ this algorithm. The first application detects when a wrapper is not extracting correct data. The second application automatically identifies data on Web pages so that the wrapper may be re-induced when the source format changes.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.