peteris.rocks

Configuring websites with PhantomJS

How to automatically configure websites using a headless browser

Last updated on

In this blog post, I want to show you how you can use a headless browser like PhantomJS to do automatic configuration of web based software like WordPress during an unattended installation.

PhantomJS

Let's get PhantomJS.

cd /tmp
wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2
tar xf phantomjs-*.tar.bz2

Let's create a template for running your automation code. It is based on this StackOverflow answer.

var system = require('system')
var page = require('webpage').create()

var arg1 = system.args[1]

var steps = [
  // your automation code here
]

var stepindex = 0
var loading = false
setInterval(executeRequestsStepByStep, 50)

function executeRequestsStepByStep(){
  if (loading == false && steps[stepindex]) {
    steps[stepindex]()
    stepindex++
  }
  if (!steps[stepindex]) {
    phantom.exit()
  }
}

page.onLoadStarted = function() { loading = true }
page.onLoadFinished = function() { loading = false }
page.onConsoleMessage = function(msg) { console.log(msg) }

Then run it like this:

/tmp/phantomjs-2.1.1-linux-x86_64/bin/phantomjs yourscript.js arg1

Finally, if you are performing an unattended installation, you probably won't want to keep PhantomJS around.

Clean up is very simple:

rm -rf /tmp/phantomjs-*

Searching on Google.com

Now, let's create a sample script google.js that will

In the template that we prepared earlier, the var steps = [] should be replaced with an array of one or more functions.

var steps = [
  function() {
    // step 1
  },
  function() {
    // step 2
  }
]

So the first step is to go to google.com. It is simple enough.

function() {
  console.log('Going to google.com')
  page.open('https://www.google.com')
}

Then we need to enter some text in the search box.

function() {
  console.log('Doing the search')
  page.evaluate(function() {
    document.querySelector('input[name="q"]').value = "peteris.rocks"
    document.querySelector('form').submit()
  })
}

PhantomJS will inject our JavaScript code that's in page.evaluate(function() { ... }) on the website and execute it.

What I did was open google.com in Google Chrome, hit F12, right click on the textbox and select Inspect. I noticed that the name of this input is q which I believe is unique and will work for all languages of google.com.

So then we can use document.getElementById or document.getElementByTagName or document.querySelector or document.querySelectorAll functions to grab a reference to this element. document.querySelector is very nice and works well if you are familiar with jQuery selectors.

While you're in the page inspector, you can paste these lines of code in the console and try them for yourself to see if they work. If they do, add them to your script.

Note that everything inside page.evaluate(function() { ... }) will be executed in the context of the webpage. So in this case console.log was executed in PhantomJS but not on the webpage.

I did the same steps on the results page and noticed that the links in the results are wrapped in a h3 tag and within there's a link a. What we're doing here is grabbing the text of the first link.

function() {
  page.evaluate(function() {
    console.log('Got search results back')
    console.log('First result title is ' + document.querySelector('h3 a').text)
  })
}

When you run the script, the output should be:

$ /tmp/phantomjs-2.1.1-linux-x86_64/bin/phantomjs google.js
Going to google.com
Doing the search
Got search results back
First result title is peteris.rocks: Pēteris Ņikiforovs

WordPress configuration

Here is a script that will

var domain = system.args[1]
var user = system.args[2]
var pass = system.args[3]

var steps = [
  function() {
    console.log('Going to the dashboard')
    page.open("http://"+domain+"/wp-login.php")
  },
  function() {
    console.log('Logging in')
    // notice that user and pass are not available in page.evaluate
    // so we need to pass them as params
    page.evaluate(function(user, pass) {
      document.getElementById('user_login').value = user
      document.getElementById('user_pass').value = pass
      document.getElementById('wp-submit').click()
    }, user, pass)
  },
  function() {
    console.log('Going to the plugins page')
    page.open("http://"+domain+"/wp-admin/plugins.php?plugin_status=active")
  },
  function() {
    console.log('Deactivating all plugins')
    page.evaluate(function() {
      document.getElementById('cb-select-all-1').click()
      document.getElementById('bulk-action-selector-top').value = 'deactivate-selected'
      document.getElementById('bulk-action-form').submit()
    })
  },
  function() {
    console.log('Going to the network page')
    page.open("http://"+domain+"/wp-admin/network.php")
  },
  function() {
    console.log('Converting the site to a network')
    page.evaluate(function() {
      document.querySelector('input[name=subdomain_install]').click()
      document.querySelector('#submit').click()
    })
  },
  function() {
    console.log('Done')
  }
]

Run it as

$ /tmp/phantomjs-2.1.1-linux-x86_64/bin/phantomjs wordpress.js blog.example.com admin password
Going to the dashboard
Logging in
JQMIGRATE: Migrate is installed, version 1.4.0
Going to the plugins page
JQMIGRATE: Migrate is installed, version 1.4.0
Deactivating all plugins
JQMIGRATE: Migrate is installed, version 1.4.0
Going to the network page
JQMIGRATE: Migrate is installed, version 1.4.0
Converting the site to a network
JQMIGRATE: Migrate is installed, version 1.4.0
Done

where JQMIGRATE: Migrate is installed, version 1.4.0 are messages from the webpage's console.log.

You can comment out this line

// page.onConsoleMessage = function(msg) { console.log(msg) }

and the output will be much nicer.

$ /tmp/phantomjs-2.1.1-linux-x86_64/bin/phantomjs wordpress.js blog.example.com admin password
Going to the dashboard
Logging in
Going to the plugins page
Deactivating all plugins
Going to the network page
Converting the site to a network
Done

But what if you need to return something from the page.evaluate function but don't want to see the web page's console.log?

You can return a value from page.evaluate:

function() {
  console.log('Done')
  var code = page.evaluate(function() {
    return document.querySelector('textarea.code').value
  })
  console.log('Add this to your wp-config.php:')
  console.log(code)
}

which will produce

$ /tmp/phantomjs-2.1.1-linux-x86_64/bin/phantomjs wordpress.js blog.example.com admin password
Going to the dashboard
Logging in
Going to the plugins page
Deactivating all plugins
Going to the network page
Converting the site to a network
Done
Add this to your wp-config.php:
define('MULTISITE', true);
define('SUBDOMAIN_INSTALL', true);
define('DOMAIN_CURRENT_SITE', 'blog.example.com');
define('PATH_CURRENT_SITE', '/');
define('SITE_ID_CURRENT_SITE', 1);
define('BLOG_ID_CURRENT_SITE', 1);

Final remarks

I was surprised to learn that it wasn't so hard to automate all these actions.

Nowadays PhantomJS is compiled statically which means you don't have to deal with installing various dependencies just to run a simple script with PhantomJS. All you need to do is download and unarchive the PhantomJS binary.