peteris.rocks

Real time stats with WebSockets and React

Roll your own alternative to Google Analytics Real Time Statistics

Last updated on

I was annoyed by Google Analytics referrer spam.

It also seemed that Google Analytics Real Time statistics are not quite accurate.

In this tutorial, we are going to roll our own real time analytics with node.js, WebSockets and React.

Overview

Just like with Google Analytics, we are going to have a simple script at /analytics.js that we are going to include on every page that we want to track.

The script is going to be very basic, it's going to open a WebSocket connection and that's basically it.

var socket = new WebSocket('ws://host/');

Obviously, we are not going to see users whose browsers do not support WebSockets or who cannot connect. But this is going to suffice if you use it for your personal blog. I think it is better to miss a few users rather than shove down 70 KB of JavaScript of socket.io or SockJS that support falling back to XHR polliing to every single user.

Then we are going to have a server-side application that is going to track all WebSocket connections and serve a password protected dashboard to you. I assume that the dashboard will be used by you only and for just an hour or so at a time.

Track visits

Let's create a package.json file in a new empty folder stats.

npm init -y

We are going to use express as a web server and ws to handle websocket connections.

npm install --save express ws

Additionally, let's use geoip-lite to find out the location of an IP address and useragent to parse the User-Agent header to figure out the browser and operating system.

npm install --save geoip-lite useragent

We are going to start by setting up our infrastructure in server.js.

In this code, we create an HTTP server (server) that is going to be used by our web server (app) and also bound to our websockets server (wss).

let http = require('http')
let express = require('express')
let ws = require('ws')
let geoip = require('geoip-lite')
let useragent = require('useragent')

let app = express()
let server = http.Server(app)
let wss = new ws.Server({ server: server, path: '/', clientTracking: false, maxPayload: 1024 })

let config = {
  port: 8080,
  wshost: 'ws://localhost:8080'
}

app.disable('x-powered-by')
server.listen(config.port)

Here's the explanation of the parameters to ws.Server:

Now, let's start tracking visits to the website.

let users = {}
let userCount = 0
let userLastID = 0

setInterval(() => console.log(`Users online: ${userCount}`), 10 * 1000)

Here is the meat of our little project.

wss.on('connection', socket => {
  userCount++

  let id = userLastID++
  let ip = socket.upgradeReq.headers['x-real-ip'] || socket.upgradeReq.connection.remoteAddress
  let user = users[id] = {
    id: id,
    host: socket.upgradeReq.headers['host'],
    ip: ip,
    ipgeo: geoip.lookup(ip),
    ua: useragent.lookup(socket.upgradeReq.headers['user-agent']).toJSON(),
    date: Date.now(),
    updated: Date.now()
  }

  socket.once('close', () => {
    delete users[id]
    userCount--
  })
})

wss.on('error', err => console.error(err))

When someone connects to our websocket server, we register a new visit. When the socket is closed, we simply remove it.

Let's see if it works.

First, we are going to make our tracking script available as http://host/analytics.js which you will embed in all your webpages that you want to track.

It simply opens a websocket connect to our stats server.

app.get('/analytics.js', (req, res) => {
  let trackerjs = `var socket = new WebSocket('${config.wshost}');`

  res.set('Content-Type', 'application/javascript')
  res.send(trackerjs)
})

To make it easy to test, let's create a test page.

app.get('/test/*', (req, res) => {
  let html = `
    <!doctype html>
    <html>
    <head>
      <meta charset="utf-8">
      <title>Test Page</title>
    </head>
    <body>
      <h1>test page</h1>
      <script src="/analytics.js"></script>
    </body>
    </html>`

  res.send(html)
})

Oh no! Spaghetti code!

If you haven't run away screaming and want to continue to follow the tutorial, the next step is to start the server

node server.js

and navigate to http://localhost:8080/test/123 and http://localhost:8080/test/456 in your browser.

You should see in the terminal that there are two users online.

Screenshot of VSCode and Chrome

This is not quite real time yet though but we can see that it works.

If you take a look at the user data that we store, there is one huge problem with it.

{ id: 0,
  host: 'localhost:8080',
  ip: '::1',
  ipgeo: null,
  ua:
   { family: 'Chrome',
     major: '55',
     minor: '0',
     patch: '2883',
     device: Device { family: 'Other', major: '0', minor: '0', patch: '0' },
     os: OperatingSystem { family: 'Windows 10', major: '0', minor: '0', patch: '0' } },
  date: 1482142408424,
  updated: 1482142408424 }

How do we know which page they are on and how they got there (referrer)?

socket.upgradeReq does not give us this information, so we need to roll our own solution.

When we connect to the websocket server, we are going to send a message with the page location and referrer which we can get with JavaScript.

Note that websockets transmit text not JSON (can trip you up if you are used to using Socket.io or SockJS), that's why we use JSON.stringify.

var socket = new WebSocket('ws://host/');
socket.onopen = function() {
  socket.send(JSON.stringify({
    type: 'init',
    url: document.location.href,
    ref: document.referrer
  }));
};

Let's update our /analytics.js endpoint.

app.get('/analytics.js', (req, res) => {
  let trackerjs = `
    var socket = new WebSocket('${config.wshost}');
    socket.onopen = function() {
      socket.send(JSON.stringify({
        type: 'init',
        url: document.location.href,
        ref: document.referrer
      }));
    };`

  res.set('Content-Type', 'application/javascript')
  res.send(trackerjs)
})

And handle it on the server side.

wss.on('connection', socket => {
  // ...

  socket.on('message', msg => {
    try {
      msg = JSON.parse(msg)
    } catch (e) {
      return
    }

    switch (msg.type) {
      case 'init':
        user.url = msg.url
        user.ref = msg.ref
        break
    }

    user.updated = Date.now()
  })

  // ...
})

That's it!

In my custom version of this project, I also implemented periodic updates that send the scrolling position and whether the page is in focus. I'll explain why at then of this tutorial.

Dashboard

We are going to create a dashboard with React.

It is going to be served by the same server that tracks visits.

Plumbing

First, we need to bring in the heavy machinery.

npm install --save react react-dom
npm install --save webpack webpack-dev-middleware webpack-hot-middleware
npm install --save bootstrap style-loader css-loader url-loader file-loader
npm install --save babel-core babel-loader babel-preset-es2015 babel-preset-react

Normally, we'd create dashboard.jsx and webpack.config.js and then run webpack to create a single file bundle.js with all dependencies in it.

But if you haven't noticed already, an unofficial goal of this tutorial is to try to cram everything into one file.

That's because I am curious to see if it's possible to have our server compile and serve everything for us instead of manually launching a new process to do that.

Here's how to do this. Add this plumbing code to server.js.

let webpack = require('webpack')

let isProd = process.env.NODE_ENV === 'production'
let config = {
  port: 8080,
  wshost: 'ws://localhost:8080',
  webpack: {
    entry: ['./dashboard.jsx', !isProd && 'webpack-hot-middleware/client'].filter(x=>x),
    output: { path: '/' },
    module: {
      loaders: [
        {
          test: /.jsx?$/,
          loader: 'babel',
          exclude: /node_modules/,
          query: { presets: ['es2015', 'react'] }
        },
        { test: /\.css$/, loader: 'style!css' },
        { test: /\.png$/, loader: "url" },
        { test: /\.(woff|woff2)(\?v=\d+\.\d+\.\d+)?$/, loader: 'file' },
        { test: /\.ttf(\?v=\d+\.\d+\.\d+)?$/, loader: 'file' },
        { test: /\.eot(\?v=\d+\.\d+\.\d+)?$/, loader: 'file' },
        { test: /\.svg(\?v=\d+\.\d+\.\d+)?$/, loader: 'file' }
      ]
    },
    plugins: [
      isProd ? new webpack.DefinePlugin({ 'process.env': { 'NODE_ENV': "'production'" } }) : function() {},
      isProd ? new webpack.optimize.UglifyJsPlugin({ compress: { warnings: false } }) : function() {},
      new webpack.optimize.OccurenceOrderPlugin(),
      new webpack.HotModuleReplacementPlugin(),
      new webpack.NoErrorsPlugin()
    ],
    devtool: !isProd && 'source-map'
  }
}

I know it looks horrible but this piece of configuration will tell webpack to transpile and concatinate all dependencies (including CSS and image files) and our application code into one file and also optimize it and minify for production. This will also enable hot reloading which you'll see later.

Then add this at the end of server.js.

let webpackDevMiddleware = require('webpack-dev-middleware')
let webpackHotMiddleware = require('webpack-hot-middleware')
let compiler = webpack(config.webpack)

app.use(webpackDevMiddleware(compiler, {
  publicPath: config.webpack.output.publicPath,
  noInfo: true
}))

if (!isProd) {
  app.use(webpackHotMiddleware(compiler))
}

app.get('/', (req, res) => {
  let html = `
    <!doctype html>
    <html>
    <head>
      <meta charset="utf-8">
      <title>Stats</title>
    </head>
    <body>
      <div id="root"></div>
      <script src="bundle.js"></script>
    </body>
    </html>
  `

  res.send(html)
})

The purpose of this code is to replace running webpack in another terminal or using webpack-dev-server. Instead, this functionality is expressed by express middleware functions.

When you start the server, everything will be compiled and served from cache. If you make changes, the bundle will be automatically recompiled.

By the way, it is possible to omit the html, head and body tags and just have <title>, <div id="root"></div> and <script>.

dashboard.jsx is going to be a separate file.

import React from 'react'
import ReactDOM from 'react-dom'
import 'bootstrap/dist/css/bootstrap.css'

if (module.hot) {
  module.hot.accept()
}

class App extends React.Component {
  render() {
    return <h1>Stats</h1>
  }
}

ReactDOM.render(<App/>, document.getElementById('root'))

Run the server

node server.js

and navigate to http://localhost:8080.

Whenever you make changes to dashboard.jsx, the page is automatically updated as long as the NODE_ENV variable is not set to production. Pretty cool.

WebSocket connection

Let's create another WebSockets server in server.js that is going to be available at /dashboard.

When we connect to it, we send all the data that we currently have.

let wssadmin = new ws.Server({ server: server, path: '/dashboard' })

wssadmin.on('connection', socket => {
  socket.send(JSON.stringify(users))
})

setInterval(() => wssadmin.clients.forEach(s => s.send(JSON.stringify(users))), 1000)

Here is how to connect to our websocket server on the front-end.

class App extends React.Component {
  constructor(props) {
    super(props)

    this.state = {
      users: [],
      error: null
    }
  }

  componentDidMount() {
    this.ws = new WebSocket('ws://localhost:8080/dashboard')
    this.ws.onmessage = e => this.setState({ users: Object.values(JSON.parse(e.data)) })
    this.ws.onerror = e => this.setState({ error: 'WebSocket error' })
    this.ws.onclose = e => !e.wasClean && this.setState({ error: `WebSocket error: ${e.code} ${e.reason}` })
  }

  componentWillUnmount() {
    this.ws.close()
  }

  render() {
    return (
      <div className="container">
        <h1>Real Time Stats</h1>
        {this.state.error &&
          <div className="alert alert-danger">
            <a onClick={() => this.setState({ error: null })} className="pull-right">x</a>
            {this.state.error}
          </div>}
        <div className="well text-center">
          <span style={{ fontSize: '72px', fontWeight: 'bold' }}>{this.state.users.length}</span><br/>
          Users Online
        </div>
      </div>
    )
  }
}

But this is not very real time, is it? We send updates every 1 second.

If we wanted to make it truly real time, our code would look like this.

function broadcast() {
  wssadmin.clients.forEach(s => s.send(JSON.stringify(users)))
}

wssadmin.on('connection', socket => {
  socket.send(JSON.stringify(users))
})

wss.on('connection', socket => {
  broadcast()
  socket.on('message', () => broadcast())
  socket.once('close', () => broadcast())
})

Whenever someone visits our website and lets us know about it (wss), we send an update to the dashboard (wssadmin).

If you run this code like I did, you will quickly notice that after a while you get many updates per second which is simply too much. Not only do you as a human not care about too quick updates, it puts a bit of load on the server.

My suggestion is to simply send updates every second or so.

More stats

Let's add some more interesting stats to the dashboard.

Install lodash which has got useful helper functions.

npm install --save lodash

Display our data in tables.

import _ from 'lodash'

function groupsort(array, f) {
  return _.chain(array).countBy(f).toPairs().sortBy(p => p[1]).reverse().value()
}

// ..

class App extends React.Component {
  // ...

  render() {
    return (
      <div className="container">
        <h1>Real Time Stats</h1>
        {/* ... */}
        {this.renderTable('Pages', groupsort(this.state.users, u => u.url))}
        {this.renderTable('Referers', groupsort(this.state.users, u => u.ref))}
        {this.renderTable('Countries', groupsort(this.state.users, u => u.ipgeo ? u.ipgeo.country : ''))}
      </div>
    )
  }

  renderTable(name, data) {
    return (
      <table className="table table-bordered table-condensed">
      <thead>
        <tr><th>{name}</th><th>Count</th></tr>
      </thead>
      <tbody>
        {data.map(item => <tr key={item[0]}><td>{item[0] || '(none)'}</td><td>{item[1]}</td></tr>)}
      </tbody>
      </table>
    )
  }
}

We are recomputing all views whenever we get an update from the server, instead of precomputing them on the server. This is a trade off I chose to make.

Map

Let's add a map to visualize where users come from. We have the data thanks to the geoip-lite module.

We are going to use the fantastic Leaflet library and OpenStreetMap.

npm install --save leaflet react-leaflet

We can easily use it in our React component.

import { Map, Marker, Popup, TileLayer } from 'react-leaflet'
//import 'leaflet/dist/leaflet.css'

class App extends React.Component {
  // ...

  render() {
    return (
      <div className="container">
        <h1>Real Time Stats</h1>
        {/* ... */}
        <link rel="stylesheet" href="https://unpkg.com/[email protected]/dist/leaflet.css" />
        <Map center={[0,0]} zoom={1} style={{ height: '400px', marginBottom: '20px' }}>
          <TileLayer url='http://{s}.tile.osm.org/{z}/{x}/{y}.png'/>
          {this.state.users.filter(u => u.ipgeo).map(u => (
            <Marker key={u.id} position={u.ipgeo.ll}>
              <Popup>
                <span>{u.url}<br/>{[u.ipgeo.city, u.ipgeo.region, u.ipgeo.country].filter(x=>x).join(', ')}</span>
              </Popup>
            </Marker>))}
        </Map>
        {/* ... */}
      </div>
    )
  }

  // ...
}

There is something weird going on with leaflet.css and css-loader and/or url-loader in webpack. That's why as a temporary quick fix I've used <link rel="stylesheet" href="leaflet.css"> instead.

The map does not work with local IPs, but when I add a fake real IP, it will look something like this.

Authentication

We can use Basic HTTP Authentication to protect the dashboard with a username and password.

I could not quickly find out a handy node.js module for it so I made my own.

let config = {
  auth: { user: 'user', password: 'pass' },
  // ...
}

function isAuth(req) {
  try {
    let header = req.headers.authorization
    let userpass = config.auth.user + ':' + config.auth.password
    return header && header.indexOf('Basic ') === 0 &&
      new Buffer(header.split(' ')[1], 'base64').toString() === userpass
  } catch (e) {
    return false
  }
}

function basicAuth(req, res, next) {
  if (isAuth(req)) {
    return next()
  }

  res.set('WWW-Authenticate', 'Basic realm="Admin Area"')
  setTimeout(() => res.status(401).send('Authentication required'), req.headers.authorization ? 5000 : 0)
}

We can use it like this.

app.use(basicAuth, webpackDevMiddleware(compiler, { // ..
app.use(basicAuth, webpackHotMiddleware(compiler))
app.get('/', basicAuth, (req, res) => { // ...

wssadmin.on('connection', socket => {
  if (!isAuth(socket.upgradeReq)) return socket.close()
  // ...
})

isAuth checks if the Authorization header is present and valid. basicAuth is a middlware for express. Finally, authentication for WebSockets is very hackish but in the next version of the library (2.0), there will be an overridable shouldHandle function that will make it more elegant.

Final remarks

I was right. Google Analytics was lying. There were many, many more users on my site than Google Analytics showed.

But my total number of users online was useless.

That's because lots of people open a page, then keep it in another tab and don't look at it again for days.

This is why Google Analytics says "active users on site".

But it still was a fun little project.

The code is on GitHub. I've improved the code a little since the publication of the post.

Demo

You can see the live stats of peteris.rocks here: https://peteris.rocks/stats/