A Co-operative Web Services Paradigm for Supporting Crawlers

To address these issues, we present a collaborative approach where the Websites coordinate with the crawlers to provide increased capabilities. Our system supports a querying mechanism wherein the crawler can issue queries to the Web service on the Website and to answer these queries, we exploit valuable information present in the Web logs and file system on the Web server. We also investigate a novel URL ordering algorithm that exploits the access count information present in the Web logs on the individual Websites. In particular, we develop URL ordering algorithms based on internal and external counts and compare them empirically with a breadth first search crawl.

in the training phase. In contrast to generating decision trees, the k-Nearest
Neighbor (Dasarathy, 1991) algorithm classifies new data points by observing
the class labels for similar data points in the training set. One drawback with k-
Nearest ...