I was annoyed by Google Analytics referrer spam.
It also seemed that Google Analytics Real Time statistics are not quite accurate.
In this tutorial, we are going to roll our own real time analytics with node.js, WebSockets and React.
Overview
Just like with Google Analytics,
we are going to have a simple script at /analytics.js
that we are going to include on every page that we want to track.
The script is going to be very basic, it's going to open a WebSocket connection and that's basically it.
var socket = new WebSocket('ws://host/');
Obviously, we are not going to see users whose browsers do not support WebSockets or who cannot connect. But this is going to suffice if you use it for your personal blog. I think it is better to miss a few users rather than shove down 70 KB of JavaScript of socket.io or SockJS that support falling back to XHR polliing to every single user.
Then we are going to have a server-side application that is going to track all WebSocket connections and serve a password protected dashboard to you. I assume that the dashboard will be used by you only and for just an hour or so at a time.
Track visits
Let's create a package.json
file in a new empty folder stats
.
npm init -y
We are going to use express
as a web server and ws
to handle websocket connections.
npm install --save express ws
Additionally, let's use geoip-lite
to find out the location of an IP address and
useragent
to parse the User-Agent
header to figure out the browser and operating system.
npm install --save geoip-lite useragent
We are going to start by setting up our infrastructure in server.js
.
In this code, we create an HTTP server (server
) that is going to be used by our web server (app
) and also bound to our websockets server (wss
).
let http = require('http')
let express = require('express')
let ws = require('ws')
let geoip = require('geoip-lite')
let useragent = require('useragent')
let app = express()
let server = http.Server(app)
let wss = new ws.Server({ server: server, path: '/', clientTracking: false, maxPayload: 1024 })
let config = {
port: 8080,
wshost: 'ws://localhost:8080'
}
app.disable('x-powered-by')
server.listen(config.port)
Here's the explanation of the parameters to ws.Server
:
server
is an HTTP server to bind to. This is why we need a separate instance ofhttp.Server
instead of just usinglet app = express()
.path: '/'
restricts the websocket server tows://host/
, we are going to use another websockets server forws://host/dashboard
in the same HTTP server later.clientTracking: false
will not keep a list of connected clients. We don't need this because we are not going to send any messages to them.maxPayload: 1024
will limit how much data bored people can send to the server.
Now, let's start tracking visits to the website.
let users = {}
let userCount = 0
let userLastID = 0
setInterval(() => console.log(`Users online: ${userCount}`), 10 * 1000)
users
maps a user visit with an ID to detailed stats about the visit.userCount
will be a total number of users online, it's the same asObject.keys(users).length
but much faster obviously.userLastID
is used to generate the next ID for a new visit.
Here is the meat of our little project.
wss.on('connection', socket => {
userCount++
let id = userLastID++
let ip = socket.upgradeReq.headers['x-real-ip'] || socket.upgradeReq.connection.remoteAddress
let user = users[id] = {
id: id,
host: socket.upgradeReq.headers['host'],
ip: ip,
ipgeo: geoip.lookup(ip),
ua: useragent.lookup(socket.upgradeReq.headers['user-agent']).toJSON(),
date: Date.now(),
updated: Date.now()
}
socket.once('close', () => {
delete users[id]
userCount--
})
})
wss.on('error', err => console.error(err))
When someone connects to our websocket server, we register a new visit. When the socket is closed, we simply remove it.
Let's see if it works.
First, we are going to make our tracking script available as http://host/analytics.js
which you will embed in all your webpages that you want to track.
It simply opens a websocket connect to our stats server.
app.get('/analytics.js', (req, res) => {
let trackerjs = `var socket = new WebSocket('${config.wshost}');`
res.set('Content-Type', 'application/javascript')
res.send(trackerjs)
})
To make it easy to test, let's create a test page.
app.get('/test/*', (req, res) => {
let html = `
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<title>Test Page</title>
</head>
<body>
<h1>test page</h1>
<script src="/analytics.js"></script>
</body>
</html>`
res.send(html)
})
Oh no! Spaghetti code!
If you haven't run away screaming and want to continue to follow the tutorial, the next step is to start the server
node server.js
and navigate to http://localhost:8080/test/123
and http://localhost:8080/test/456
in your browser.
You should see in the terminal that there are two users online.
This is not quite real time yet though but we can see that it works.
If you take a look at the user data that we store, there is one huge problem with it.
{ id: 0,
host: 'localhost:8080',
ip: '::1',
ipgeo: null,
ua:
{ family: 'Chrome',
major: '55',
minor: '0',
patch: '2883',
device: Device { family: 'Other', major: '0', minor: '0', patch: '0' },
os: OperatingSystem { family: 'Windows 10', major: '0', minor: '0', patch: '0' } },
date: 1482142408424,
updated: 1482142408424 }
How do we know which page they are on and how they got there (referrer)?
socket.upgradeReq
does not give us this information,
so we need to roll our own solution.
When we connect to the websocket server, we are going to send a message with the page location and referrer which we can get with JavaScript.
Note that websockets transmit text not JSON (can trip you up if you are used to using Socket.io or SockJS),
that's why we use JSON.stringify
.
var socket = new WebSocket('ws://host/');
socket.onopen = function() {
socket.send(JSON.stringify({
type: 'init',
url: document.location.href,
ref: document.referrer
}));
};
Let's update our /analytics.js
endpoint.
app.get('/analytics.js', (req, res) => {
let trackerjs = `
var socket = new WebSocket('${config.wshost}');
socket.onopen = function() {
socket.send(JSON.stringify({
type: 'init',
url: document.location.href,
ref: document.referrer
}));
};`
res.set('Content-Type', 'application/javascript')
res.send(trackerjs)
})
And handle it on the server side.
wss.on('connection', socket => {
// ...
socket.on('message', msg => {
try {
msg = JSON.parse(msg)
} catch (e) {
return
}
switch (msg.type) {
case 'init':
user.url = msg.url
user.ref = msg.ref
break
}
user.updated = Date.now()
})
// ...
})
That's it!
In my custom version of this project, I also implemented periodic updates that send the scrolling position and whether the page is in focus. I'll explain why at then of this tutorial.
Dashboard
We are going to create a dashboard with React.
It is going to be served by the same server that tracks visits.
Plumbing
First, we need to bring in the heavy machinery.
npm install --save react react-dom
npm install --save webpack webpack-dev-middleware webpack-hot-middleware
npm install --save bootstrap style-loader css-loader url-loader file-loader
npm install --save babel-core babel-loader babel-preset-es2015 babel-preset-react
Normally, we'd create dashboard.jsx
and webpack.config.js
and then run webpack
to create a single file bundle.js
with all dependencies in it.
But if you haven't noticed already, an unofficial goal of this tutorial is to try to cram everything into one file.
That's because I am curious to see if it's possible to have our server compile and serve everything for us instead of manually launching a new process to do that.
Here's how to do this. Add this plumbing code to server.js
.
let webpack = require('webpack')
let isProd = process.env.NODE_ENV === 'production'
let config = {
port: 8080,
wshost: 'ws://localhost:8080',
webpack: {
entry: ['./dashboard.jsx', !isProd && 'webpack-hot-middleware/client'].filter(x=>x),
output: { path: '/' },
module: {
loaders: [
{
test: /.jsx?$/,
loader: 'babel',
exclude: /node_modules/,
query: { presets: ['es2015', 'react'] }
},
{ test: /\.css$/, loader: 'style!css' },
{ test: /\.png$/, loader: "url" },
{ test: /\.(woff|woff2)(\?v=\d+\.\d+\.\d+)?$/, loader: 'file' },
{ test: /\.ttf(\?v=\d+\.\d+\.\d+)?$/, loader: 'file' },
{ test: /\.eot(\?v=\d+\.\d+\.\d+)?$/, loader: 'file' },
{ test: /\.svg(\?v=\d+\.\d+\.\d+)?$/, loader: 'file' }
]
},
plugins: [
isProd ? new webpack.DefinePlugin({ 'process.env': { 'NODE_ENV': "'production'" } }) : function() {},
isProd ? new webpack.optimize.UglifyJsPlugin({ compress: { warnings: false } }) : function() {},
new webpack.optimize.OccurenceOrderPlugin(),
new webpack.HotModuleReplacementPlugin(),
new webpack.NoErrorsPlugin()
],
devtool: !isProd && 'source-map'
}
}
I know it looks horrible but this piece of configuration will tell webpack to transpile and concatinate all dependencies (including CSS and image files) and our application code into one file and also optimize it and minify for production. This will also enable hot reloading which you'll see later.
Then add this at the end of server.js
.
let webpackDevMiddleware = require('webpack-dev-middleware')
let webpackHotMiddleware = require('webpack-hot-middleware')
let compiler = webpack(config.webpack)
app.use(webpackDevMiddleware(compiler, {
publicPath: config.webpack.output.publicPath,
noInfo: true
}))
if (!isProd) {
app.use(webpackHotMiddleware(compiler))
}
app.get('/', (req, res) => {
let html = `
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<title>Stats</title>
</head>
<body>
<div id="root"></div>
<script src="bundle.js"></script>
</body>
</html>
`
res.send(html)
})
The purpose of this code is to replace running webpack
in another terminal or using webpack-dev-server
.
Instead, this functionality is expressed by express
middleware functions.
When you start the server, everything will be compiled and served from cache. If you make changes, the bundle will be automatically recompiled.
By the way, it is possible to omit the html
, head
and body
tags and just have <title>
, <div id="root"></div>
and <script>
.
dashboard.jsx
is going to be a separate file.
import React from 'react'
import ReactDOM from 'react-dom'
import 'bootstrap/dist/css/bootstrap.css'
if (module.hot) {
module.hot.accept()
}
class App extends React.Component {
render() {
return <h1>Stats</h1>
}
}
ReactDOM.render(<App/>, document.getElementById('root'))
Run the server
node server.js
and navigate to http://localhost:8080
.
Whenever you make changes to dashboard.jsx
, the page is automatically updated
as long as the NODE_ENV
variable is not set to production
. Pretty cool.
WebSocket connection
Let's create another WebSockets server in server.js
that is going to be available at /dashboard
.
When we connect to it, we send all the data that we currently have.
let wssadmin = new ws.Server({ server: server, path: '/dashboard' })
wssadmin.on('connection', socket => {
socket.send(JSON.stringify(users))
})
setInterval(() => wssadmin.clients.forEach(s => s.send(JSON.stringify(users))), 1000)
Here is how to connect to our websocket server on the front-end.
class App extends React.Component {
constructor(props) {
super(props)
this.state = {
users: [],
error: null
}
}
componentDidMount() {
this.ws = new WebSocket('ws://localhost:8080/dashboard')
this.ws.onmessage = e => this.setState({ users: Object.values(JSON.parse(e.data)) })
this.ws.onerror = e => this.setState({ error: 'WebSocket error' })
this.ws.onclose = e => !e.wasClean && this.setState({ error: `WebSocket error: ${e.code} ${e.reason}` })
}
componentWillUnmount() {
this.ws.close()
}
render() {
return (
<div className="container">
<h1>Real Time Stats</h1>
{this.state.error &&
<div className="alert alert-danger">
<a onClick={() => this.setState({ error: null })} className="pull-right">x</a>
{this.state.error}
</div>}
<div className="well text-center">
<span style={{ fontSize: '72px', fontWeight: 'bold' }}>{this.state.users.length}</span><br/>
Users Online
</div>
</div>
)
}
}
But this is not very real time, is it? We send updates every 1 second.
If we wanted to make it truly real time, our code would look like this.
function broadcast() {
wssadmin.clients.forEach(s => s.send(JSON.stringify(users)))
}
wssadmin.on('connection', socket => {
socket.send(JSON.stringify(users))
})
wss.on('connection', socket => {
broadcast()
socket.on('message', () => broadcast())
socket.once('close', () => broadcast())
})
Whenever someone visits our website and lets us know about it (wss
), we send an update to the dashboard (wssadmin
).
If you run this code like I did, you will quickly notice that after a while you get many updates per second which is simply too much. Not only do you as a human not care about too quick updates, it puts a bit of load on the server.
My suggestion is to simply send updates every second or so.
More stats
Let's add some more interesting stats to the dashboard.
Install lodash
which has got useful helper functions.
npm install --save lodash
Display our data in tables.
import _ from 'lodash'
function groupsort(array, f) {
return _.chain(array).countBy(f).toPairs().sortBy(p => p[1]).reverse().value()
}
// ..
class App extends React.Component {
// ...
render() {
return (
<div className="container">
<h1>Real Time Stats</h1>
{/* ... */}
{this.renderTable('Pages', groupsort(this.state.users, u => u.url))}
{this.renderTable('Referers', groupsort(this.state.users, u => u.ref))}
{this.renderTable('Countries', groupsort(this.state.users, u => u.ipgeo ? u.ipgeo.country : ''))}
</div>
)
}
renderTable(name, data) {
return (
<table className="table table-bordered table-condensed">
<thead>
<tr><th>{name}</th><th>Count</th></tr>
</thead>
<tbody>
{data.map(item => <tr key={item[0]}><td>{item[0] || '(none)'}</td><td>{item[1]}</td></tr>)}
</tbody>
</table>
)
}
}
We are recomputing all views whenever we get an update from the server, instead of precomputing them on the server. This is a trade off I chose to make.
Map
Let's add a map to visualize where users come from.
We have the data thanks to the geoip-lite
module.
We are going to use the fantastic Leaflet library and OpenStreetMap.
npm install --save leaflet react-leaflet
We can easily use it in our React component.
import { Map, Marker, Popup, TileLayer } from 'react-leaflet'
//import 'leaflet/dist/leaflet.css'
class App extends React.Component {
// ...
render() {
return (
<div className="container">
<h1>Real Time Stats</h1>
{/* ... */}
<link rel="stylesheet" href="https://unpkg.com/[email protected]/dist/leaflet.css" />
<Map center={[0,0]} zoom={1} style={{ height: '400px', marginBottom: '20px' }}>
<TileLayer url='http://{s}.tile.osm.org/{z}/{x}/{y}.png'/>
{this.state.users.filter(u => u.ipgeo).map(u => (
<Marker key={u.id} position={u.ipgeo.ll}>
<Popup>
<span>{u.url}<br/>{[u.ipgeo.city, u.ipgeo.region, u.ipgeo.country].filter(x=>x).join(', ')}</span>
</Popup>
</Marker>))}
</Map>
{/* ... */}
</div>
)
}
// ...
}
There is something weird going on with leaflet.css
and css-loader
and/or url-loader
in webpack
.
That's why as a temporary quick fix I've used <link rel="stylesheet" href="leaflet.css">
instead.
The map does not work with local IPs, but when I add a fake real IP, it will look something like this.
Authentication
We can use Basic HTTP Authentication to protect the dashboard with a username and password.
I could not quickly find out a handy node.js module for it so I made my own.
let config = {
auth: { user: 'user', password: 'pass' },
// ...
}
function isAuth(req) {
try {
let header = req.headers.authorization
let userpass = config.auth.user + ':' + config.auth.password
return header && header.indexOf('Basic ') === 0 &&
new Buffer(header.split(' ')[1], 'base64').toString() === userpass
} catch (e) {
return false
}
}
function basicAuth(req, res, next) {
if (isAuth(req)) {
return next()
}
res.set('WWW-Authenticate', 'Basic realm="Admin Area"')
setTimeout(() => res.status(401).send('Authentication required'), req.headers.authorization ? 5000 : 0)
}
We can use it like this.
app.use(basicAuth, webpackDevMiddleware(compiler, { // ..
app.use(basicAuth, webpackHotMiddleware(compiler))
app.get('/', basicAuth, (req, res) => { // ...
wssadmin.on('connection', socket => {
if (!isAuth(socket.upgradeReq)) return socket.close()
// ...
})
isAuth
checks if the Authorization
header is present and valid.
basicAuth
is a middlware for express.
Finally, authentication for WebSockets is very hackish but in the next version of the library (2.0
),
there will be an overridable shouldHandle
function that will make it more elegant.
Final remarks
I was right. Google Analytics was lying. There were many, many more users on my site than Google Analytics showed.
But my total number of users online was useless.
That's because lots of people open a page, then keep it in another tab and don't look at it again for days.
This is why Google Analytics says "active users on site".
But it still was a fun little project.
The code is on GitHub. I've improved the code a little since the publication of the post.
Demo
You can see the live stats of peteris.rocks here: https://peteris.rocks/stats/
- Username:
peteris
- Password:
weekendproject