phantomJS web scraping

PhantomJS is a headless WebKit with JavaScript API. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, SVG. The official web site is phantomjs.org.

PhantomJS is cross-platform, it can be compiled for Linux, Windows, FreeBSD, and Mac OS X. PhantomJS scripts can be written in JavaScript or CoffeeScript.

Here a 14-line PhantomJS script which finds pizzeria in New York (using Google Places), along with the address and the telephone number:

  page = new WebPage()
page
.open 'http://www.google.com/m/local?site=local&q=pizza+in+new+york',
 
(status) ->
   
if status isnt 'success'
      console
.log 'Unable to access network'
   
else
      results
= page.evaluate ->
        pizza
= []
        list
= document.querySelectorAll 'div.bf'
       
for item in list
          pizza
.push(item.innerText)
       
return pizza
      console
.log results.join('\n')
    phantom
.exit()

Comments