delicious

CasperJS

A murb'ed feed, posted almost 2 years ago filed in , , , & .

Crawling. It used to be relatively easy for quite some time: the web was curlabel. You could basicaly do it in a bash script. But with the advent of single page apps more and more ‘pages’ only exist because a script made them. Hence you need something more advanced, something that includes a Javascript parser. CasperJS helps you here. They even give a hint how to crawl Google. Let the example speak for itself:

var links = [];
var casper = require('casper').create();

function getLinks() {
    var links = document.querySelectorAll('h3.r a');
    return Array.prototype.map.call(links, function(e) {
        return e.getAttribute('href');
    });
}

casper.start('http://google.fr/', function() {
   // Wait for the page to be loaded
   this.waitForSelector('form[action="/search"]');
});

casper.then(function() {
   // search for 'casperjs' from google form
   this.fill('form[action="/search"]', { q: 'casperjs' }, true);
});

casper.then(function() {
    // aggregate results for the 'casperjs' search
    links = this.evaluate(getLinks);
});

Now the links-array has all the search results from the first page.

CasperJS has a basic testing framework included as well. So you can assert your truths. Happy crawling / blackbox testing :)

Go to the original link.