The Prüfer Sequence is a cool way of representing a tree of nodes with a sequence of labels. I wrote a javascript version to create arbitrary tree structures for testing node streams for a recent project. Here’s an implementation for converting a sequence to an array of connected edges:

function prufer(a) {
  var tree = [];
  var T = Array.apply(null, Array(a.length + 2)).map(function(_, i) { return i; });
  var deg = Array.apply(null, Array(1*T.length)).map(function() { return 1; });
  a.map(function(i) { deg[i]++; });

  for(var i = 0; i < a.length; i++) {
    for(var j = 0; j < T.length; j++) {
      if(deg[T[j]] === 1) {
        tree.push([a[i], T[j]]);
        deg[a[i]]--;
        deg[T[j]]--;
        break;
      }
    }
  }

  var last = T.filter(function(x) { return deg[x] === 1; });
  tree.push(last);

  return tree;
}

See the Pen WbNeYg by Johnathan Leppert (@jleppert) on CodePen.

Here’s the code on github and npm module.

Stereo Vision with Oculus Rift

I was fortunate to work on this project as part of a hackathon at work. I’ve always had an interest in FPV flying, and I have a pair of FatShark goggles that I’ve use to fly with a mini CMOS NTSC camera. The quality is poor, and it can be difficult to gauge your surroundings as there are no depth cues. It works, but is far from ideal or immersive.

Some people online have tried to replicate a real-time view of their surroundings using stereo cameras and the Oculus Rift. I had a spare devkit I ordered awhile ago of the DK1, so I decided to give it a shot, trying to go for a pure javascript approach. I only had 3 days, so I wasn’t able to arrive at a perfect solution but it did work.

The Hack

I needed this hack to be on the cheap, and since stereo cameras are expensive I had to find a way to make my own, using two cheap chinese IP cameras I had laying around. The idea is to run the cameras via the datalink on the drone to the ground, and use ffmpeg to serve the video over a web socket to the browser for each eye, where I can apply barrel distortion using webgl.

Or so that’s how it’s supposed to go. Ignoring timing issues with the camera left and right frames and no way to reliably sync them, it actually almost had hope of working and produced some initially good results testing indoors.

Following the 1/30th rule of stereography, which states that the interaxial separation (distance between eyes, or in this case camera lenses) should be no more than 1/30th the distance of the closet subject, or in this case about 2.5 inches to have features of about 75 inches away and background at infinity. Convergence of the two final images can then be adjusted while wearing the rift, as well as doing the necessary barrel distortion required due to the optics in the rift (what give the immersive feeling).

Cheap cameras and their cheap mount

RTSP In The Browser

After hearing about a cool pure javascript mpeg decoder called jsmpeg, I wanted to try to do all the video processing in the browser. This meant somehow getting the video to the browser from the RTSP stream offered by the cameras. Unfortunately, although RTSP is a standard transport for video streaming and supports several standard video formats, it’s not possible to natively use in the browser without some kind of plugin, typically flash.

No worries, though. FFMpeg, the swiss-army knife of streaming video and video transcoding comes to the rescue with the ability to transform a conventional mpeg1 streaming video transport that we can use directly with jsmpeg. In fact, someone had already created the node module node rtsp stream that wraps and handles the ffmpeg process creation and sends the data over a websocket!

In my case, I had two cameras so I simply had two servers and two different node/ffmpeg processes, each running on their own websocket:

var Stream = require('node-rtsp-stream');

// left eye
var stream = new Stream({
  name: 'video1',
  streamUrl: 'rtsp://192.168.1.11:554/user=admin&password=&channel=1&stream=0.sdp?real_stream--rtp-caching=100',
  wsPort: 9991
});

// right eye
var stream = new Stream({
  name: 'video2',
  streamUrl: 'rtsp://192.168.1.11:554/user=admin&password=&channel=1&stream=0.sdp?real_stream--rtp-caching=100',
  wsPort: 9992
});

Decoding MPEG1 video in the browser actually works quite well thanks to jsmpeg, which takes data in via websockets and renders each frame to a canvas element. Using three.js, it is possible to create a texture from the contents of the canvas with video data:

// create client and stream to canvas element
var client = new WebSocket('ws://127.0.0.1:9991'),
    canvas  = document.createElement('canvas'),
    player = new jsmpeg(leftClient, { canvas: canvas });
// create texture from canvas content
var texture = new THREE.Texture(canvas);
texture.magFilter = THREE.LinearFilter;
texture.minFilter = THREE.LinearFilter;
texture.format = THREE.RGBFormat;

Ready for VR: Lens Distortion

Rift Lens Distortion

You may have noticed using a headset like the rift there is a lot of distortion when viewing content that isn’t specifically designed for it, and vice versa. You’re probably familair with the oculus screenshots showing two rounded views of images. That’s because the lenses apply a pincushion distortion, which is an optical trick to make it seem like a flat image is actually immersive. In order to reverse the effect, a barrel distortion must be applied to the image.

This is easiest done with a fragment shader. A shader is best described as a simple program that operates on either all verticies of the 3D scene or all rendered pixel values. These programs are run in parallel on the GPU for each frame of the scene (and pixel/vertex) so they are quite handy and well suited for the task. In my case, I was interested in the fragment shader.

The fragment shader operates on pixel values, so applying perspective transformations and my case barrel distortion can work in real-time, as fast as the video stream. Here’s a fragment shader to do a simple barrel distortion and the accompyning vertex shader:

Vertex shader:

<script id="vertexShader" type="x-shader/x-vertex">
  varying vec2 v_pos;

  void main() {
    v_pos = vec2(position.xy);
    gl_Position = vec4(position, 1.0);
    gl_PointSize = 1.0;
  }
</script>

Fragment shader:

<script id="fragmentShader" type="x-shader/x-fragment">
  precision highp float;
  uniform float barrel_power;

  vec2 distort(vec2 p) {
    float theta  = atan(p.y, p.x);
    float radius = length(p);
    radius = pow(radius, barrel_power);
    p.x = radius * cos(theta);
    p.y = radius * sin(theta);

    return 0.5 * (p + 1.0);
  }

  varying vec2 v_pos;
  uniform sampler2D video;

  void main() {
    vec2 d = distort(v_pos);
    if (d.x > 1.0 || d.x < 0.0 || d.y > 1.0 || d.y < 0.0) {
      gl_FragColor = vec4(0.0, 0.0, 0.0, 1.0);
    } else {
      gl_FragColor = texture2D(video, d);
  }
}
</script>

The video in the first step of the jsmpeg client is being rendered to the canvas, element but we need a way to get it into a webgl texture so the fragment shader can do its work. To do this, simply read the canvas raw RGB image data and create a texture in three.js:

// create texture from canvas content
var texture = new THREE.Texture(canvas);
texture.magFilter = THREE.LinearFilter;
texture.minFilter = THREE.LinearFilter;
texture.format = THREE.RGBFormat;

Video data is now being sent to the GPU in the form of a texture. Now, create a new three.js material with the barrel distortion shaders:

var vertShader = document.getElementById('vertexShader').innerHTML;
var fragShader = document.getElementById('fragmentShader').innerHTML;

var left = new THREE.ShaderMaterial({
  uniforms: {
    video: {
      type: 't',
      value: texture
    },
    barrel_power: {
      type: 'f',
      value: 1.5
    }
  },
  vertexShader: vertexShader,
  fragmentShader: fragmentShader,
  wireframe: false
});

Now, simply create a flat plane geometry of arbitrary size (we’re using shaders to compose our image so it doesn’t really matter) and create a mesh of the geometry and material:

var plane  = new THREE.PlaneGeometry(256, 256, 4, 4);
var mesh = new THREE.Mesh(plane, material);

In standard three.js parlance, let’s create a camera, scene, renderer and insert the three.js webgl canvas to the document:

var camera = new THREE.PerspectiveCamera(75, window.innerWidth / window.innerHeight, 1, 10000);

var scene = new THREE.Scene();
var renderer = new THREE.WebGLRenderer();

renderer.setPixelRatio(window.devicePixelRatio);
renderer.setSize(window.innerWidth, window.innerHeight);
document.body.appendChild(renderer.domElement);

Pulling it all together, the last thing to do is to setup the render loop. On each render, we want to update the texture and render the webgl frame:

function render() {
  texture.needsUpdate = true;
  renderer.render(scene, camera);
  requestAnimationFrame(render);
}

How well does it work? If you have a modern browser you can check it out right here! Here’s an example of streaming a static video file instead of live camera (aspect ratio and chromatic abberation are not corrected in this example):

Left & Right

Now that we’ve demonstrated the ability to apply a simple shader to our video, we can use a more complete example that takes care of things like chromatic abberation caused by the lens and handles both cameras. This is accomplished by drawing both video feeds to an enlarged canvas, side by side:

function render() {
  texture.needsUpdate = true;
  //left
  ctx.drawImage(video, 0, 0, size.width, size.height);
  //right
  ctx.drawImage(video, size.width, 0, size.width, size.height);
  renderer.render(scene, camera);
  requestAnimationFrame(render);
}

I’ve also used a more complete rift shader that handles chromatic abberation and the center spacing in the rift optics and LCD screens, but the same basic concepts hold true.

Test Flight

I mounted my camera rig on a gimbal and was able to do a few test flights before going for a fly at Treasure Island. The link drops out in some places, but overall worked well. The effect was certainly immersive and has given me the motivation to continue to work on the project. Here’s a rather compressed video of the flight:

fpv from Johnathan Leppert on Vimeo.

I have plans to use a board like Nvidia’s Jetson and run the necessary code directly on the GPU, on the drone itself and send back a composite stream suitable for viewing directly in a headset, possibly using USB 3.0 high speed cameras, or using the built-in MIPI camera interface, which allows direct frame access on the GPU.

I posted the code of my hack to Github.

What happens when you have half a million addresses and need to find out where in the world they actually are? You geocode them! Which isn’t the easiest thing to do — the difficulty increasing as your need for performance, quality and quantity goes up.

Geocoding Options

The first, and probably the most obvious, is to simply use an external service. There are plenty to choose from, with many major companies offering up their Geocoder API for your personal enjoyment (or dis-enjoyment). So pick your poison — Google, Yahoo, Bing all have nice API’s that will happily take an address and give you something that looks like a lat and long.

Of course, can you really make 500,000 web requests and no one be the wiser? Of course not. There are limitations, both on the service and potentially on your process to make these requests. For one, it’s pretty expensive for most API’s I looked at to purchase a commercial license suitable for bulk geocoding. But it’s also slow. And in slow I mean it takes time to make 500,000 web requests, wait for them to complete, and put the data some place. That’s not counting the fact you’ll need to somehow address failures, which probably means some sort of queue/worker system, that’ll need supervision, management, et al. Yeah, not fun. So even if you manage to find an awesome deal on 500k web requests to someone’s API, you still need to manage the carrier pigeon effect. At least until connections get faster and it suddenly becomes cheap to buy expensive online data and services.

It’s also worth mentioning that many of these providers rely on the same sources of data that are freely available. In my tests, there were only marginal differences in quality across various solutions (both commercial API, private sources of data and home-brew systems). The largest factor in geocoding appears to be address quality first, and your data source second (and how well your parser can understand bad addresses). Soundex helps with address quality, but some addresses data is just bad and I suspect Google and others use some form of dynamic analysis based upon past searches to obtain higher quality and suggestions for incorrect cities, zip codes, etc. Here’s an interesting report comparing various geocoding options and the effectiveness of each based upon testing with 100 addresses (pay close attention to shapefile based data sources).

With these facts of life in mind, I decided what any crazy person would — setup my own! Google be damned. I can do it, and in one week. Well, maybe two. I know what you’re thinking, pretty foolish and maybe even a little masochistic, and you’d be right on both counts.

Fortunately, PostGIS has in it’s latest version, buried under extras a quite comprehensive geocoder project based upon freely available US Census Shapefile data. I’ll walk you through setting up your own geocoder, complete with soundex support and ability to bulk geocode from a db table.

Postgres Setup

If you already have Postgres setup, you can skip or skim over these steps. Just make sure you have a working installation and now the paths to all the relevant bits where things are installed

Download & Install Postgres

For maximum flexibility in the initial prototypying, I decided to compile Postgres from source, but most distributions also have a package available and Postgres maintains binary and graphical installer packages for some. To compile:

You’ll need build-essential, gcc, etc. packages to compile

export POSTGRES_VERSION=9.0.4
wget http://wwwmaster.postgresql.org/redir/180/f/source/v$POSTGRES_VERSION/postgresql-$POSTGRES_VERSION.tar.gz
tar -xvzf postgresql-$POSTGRES_VERSION.tar.gz
cd postgresql-$POSTGRES_VERSION
./configure --prefix=/usr/local
make
sudo make install

You’ll now need to compile and install PostGIS. Before you do this, it’s a good idea to spend some time setting up accounts on Postgres. The authentication model for Postgres relies on system accounts, specifically allowing root access to the postgres user by default, where you can administer other users as necessary. For my setup, I made sure to have a system account for every postgres user.

Create the postgres user:

sudo adduser postgres

You now need to create a directory to store the main Postgres DB files. I put mine under /usr/local, but you can put it generally anywhere as needed (the path is specified in configuration and/or during server startup).

sudo mkdir /usr/local/pgsql/data
sudo chown postgres /usr/local/pgsql/data
sudo chgrp postgres /usr/local/pgsql/data

Now you’ll need to create a postgres cluster and start the postmaster process. First, change to the postgres user:

su postgres
/usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data
/usr/local/pgsql/bin/pg_ctl -D /usr/local/pgsql/data -l /var/log/postgres start

You can now login using the psql client (as the postgres system user):

/usr/local/pgsql/bin/psql

If all is well, you should be at a postgres command prompt like the following:

psql (9.0.4)
Type "help" for help.


postgres=#

To quit, type “\q”. Now we’re ready to install and setup PostGIS!

Setup PostGIS

You’ll need libxml2-dev, libgeos-dev and libproj-dev (xslt and convert are also needed for documentation). I had to manually compile and install geos (which you can get from osgeo).

Since we’re interested in the Geocoder from 2.0, we’ll need to grab the latest development snapshot:

wget http://postgis.refractions.net/download/postgis-2.0.0SVN.tar.gz
tar -xvzf postgis-2.0.0SVN.tar.gz
./configure --with-pgconfig=/usr/local/pgsql/bin/pg_config
make
make install
ldconfig

Create a spatial database template

So we can easily create spatially enabled databases, we’ll create one as a template and apply the spatial data types to it:

su postgis
/usr/local/pgsql/bin/createdb spatial_db_template
/usr/local/pgsql/bin/createlang plpgsql spatial_db_template
/usr/local/pgsql/bin/psql -d spatial_db -f /usr/local/pgsql/share/contrib/postgis-2.0/postgis.sql
/usr/local/pgsql/bin/psql -d spatial_db -f /usr/local/pgsql/share/contrib/postgis-2.0/spatial_ref_sys.sql

Now to create a new postgis enabled spatial database, we just do the following:

/usr/local/pgsql/bin/createdb -T spatial_db_template tiger

Add soundex ability

To add soundex ability to the db (used in matching parsed address parts which includes soundex, metaphone and levenshtein abilities), first build the module under the contrib directory of your postgres source:

cd postgresql-$POSTGRES_VERSION
cd contrib/fuzzystrmatch
sudo make && make install

This will add the fuzzystrmatch under the contrib folder of your installation. To add it to your database:

/usr/local/pgsql/bin/psql -d tiger -f /usr/local/pgsql/share/contrib/fuzzystrmatch.sql

Create a geocoder database

Now, we’ll need to create a new geocoder database:

/usr/local/pgsql/bin/createdb -T spatial_db_template geocoder
/usr/local/pgsql/bin/psql -d tiger -f /usr/local/pgsql/share/contrib/fuzzystrmatch.sql
/usr/local/pgsql/bin/psql -d geocoder
CREATE SCHEMA tiger;
CREATE SCHEMA tiger_data;
ALTER DATABASE geocoder SET search_path=public, tiger;

Now, we can begin to setup our geocoder. We first need to populate some street abbreviation and state lookup tables, and the tiger_loader.sql, which is the actual code to load tiger data into the geocoder table structure:

(I had my PostGIS sources under /tmp in this case):

/usr/local/pgsql/bin/psql -d geocoder -f /tmp/postgis-2.0.0SVN/extras/tiger_geocoder/tiger_2010/tables/lookup_tables_2010.sql
/usr/local/pgsql/bin/psql -d geocoder -f /tmp/postgis-2.0.0SVN/extras/tiger_geocoder/tiger_2010/tiger_loader.sql

Next, create the actual geocode function in the db (note you need to be in the tiger_2010 directory):

/usr/local/pgsql/bin/psql -d geocoder -f ./create_geocode.sql

You’re now ready to load US Census shapefile data into the geocoder! There’s a slightly contrived way of generated scripts from the db, but I found this cumbersome for loading many states, so I wrote my own with perl. Someone from the PostGIS mailing list also wrote their own:

For the original loader script and instructions, check the original documentation.

I found it useful to get all the files into a single directory, and then just process them from there (instead of having a separate script to do each state):

wget http://www2.census.gov/geo/pvs/tiger2010st/ --no-parent --relative --recursive --level=2 --accept=zip,txt --mirror --reject=html

The loading process takes anywhere from 30 minutes to over an hour, depending on how many points of interest are in the shape file and the complexity of the generated sql. Once it’s finished, it’s quite simple to geocode addresses with pretty good quality:

/usr/local/pgsql/bin/psql -d geocoder
geocoder=# SELECT g.rating, ST_X(geomout) AS lon, ST_X(geomout) AS lat, (addy).* FROM geocode('2500 Farmers Rd Columbus, Ohio') as g;
rating | lon | lat | address | predirabbrev | streetname | streettypeabbrev | postdirabbrev | internal | location | stateabbrev | zip | parsed
--------+------------+------------+---------+--------------+------------+------------------+---------------+----------+--------------+-------------+-------+--------
8 | -83.784252 | -83.784252 | 2500 | | Farmers | Rd | | | Sabina | OH | 45177 | t
9 | -83.784252 | -83.784252 | 2500 | | Farmers | Rd | | | Wilmington | OH | 45177 | t
11 | -83.784252 | -83.784252 | 2500 | | Farmers | Rd | | | Port William | OH | 45177 | t
13 | -83.087185 | -83.087185 | 2500 | | Farmers | Dr | | | Columbus | OH | 43235 | t
(4 rows)

What’s even more interesting is bulk geocoding by loading an entire table of addresses and selecting them back out (or to another table) geocoded. Here’s some code that does that (modified from (http://postgis.refractions.net/documentation/manual-svn/Geocode.html)):

CREATE TABLE addresses_to_geocode(addid serial PRIMARY KEY, address text,
lon numeric, lat numeric, new_address text, rating integer);


INSERT INTO addresses_to_geocode(address)
VALUES ('529 Main Street, Boston MA, 02129'),
('77 Massachusetts Avenue, Cambridge, MA 02139'),
('28 Capen Street, Medford, MA'),
('124 Mount Auburn St, Cambridge, Massachusetts 02138'),
('950 Main Street, Worcester, MA 01610');


-- only update the first two addresses (850 ms) --
-- for large numbers of addresses you don't want to update all at once
-- since the whole geocode must commit at once
UPDATE addresses_to_geocode
SET (rating, new_address, lon, lat)
= (g.rating, pprint_addy(g.addy),
ST_X(g.geomout), ST_Y(g.geomout) )
FROM (SELECT DISTINCT ON (addid) addid, (g1.geo).*
FROM (SELECT addid, (geocode(address)) As geo
FROM addresses_to_geocode As ag
WHERE ag.rating IS NULL ) As g1
ORDER BY addid, rating LIMIT 2) As g
WHERE g.addid = addresses_to_geocode.addid;

result
-----
2 rows affected, 850 ms execution time.

SELECT * FROM addresses_to_geocode WHERE rating is not null;


addid | address | lon | lat | new_address | rating
------+----------------------------------------------+-----------+----------+-------------------------------------------+--------
1 | 529 Main Street, Boston MA, 02129 | -71.07187 | 42.38351 | 529 Main St, Boston, MA 02129 | 0
2 | 77 Massachusetts Avenue, Cambridge, MA 02139 | -71.09436 | 42.35981 | 77 Massachusetts Ave, Cambridge, MA 02139 | 0

Performance Considerations

What’s it like to use it for real life stuff? Well, your mileage may vary. I loaded all US States and performance ranged anywhere from 40 ms to over 3000 ms for some difficult addresses on a dual quad core machine with 32 GB of ram and fast disks. Part of the complexity is in the address parsing and attempting to match address parts to many tables, resulting in table scans and a lot of time and disk complexity. If you have many millions of addresses to geocode, this could be a problem.

I was able to increase performance by “pre-normalizing” my addresses, and then sorting them by zip code. This seemed to help the most, along with increases shared memory cache in Postgres configuration. I’m also looking at optimizing away some unnecessary table scans, and trying to make more use of partitioned tables and different indexes. So far, it’s worked for my purposes, and I haven’t tried to cluster it yet, which may be an option for some (e.g. partition multiple states or sets of data over many machines).

Happy geocoding!

If you’re using Socket.io and want to reverse proxy your web socket connections, you’ll quickly find it’s somewhat difficult. Since web sockets are done over HTTP 1.1 connections (where the handshake and upgrade are completed), your backend needs to support HTTP 1.1, and from what I have researched, they break the HTTP 1.0 spec (see this discussion at stackoverflow).

Some people have been able to get HAProxy to work for effective proxying of websocket connections, but I couldn’t get this to work reliably when I tried the latest version and TCP mode.

If you’re using nginx, you won’t be able to proxy web socket connections using the standard nginx proxy_pass directives. Fortunately, Weibin Yao has developed a tcp proxy module for nginx that allows you to reverse proxy general tcp connections, especially well suited for websockets.

Let’s get a simple web socket proxy up and running. Download nginx sources and tcp_proxy module:

Compile Nginx with tcp_proxy module

export NGINX_VERSION=1.0.4
curl -O http://nginx.org/download/nginx-$NGINX_VERSION.tar.gz
git clone https://github.com/yaoweibin/nginx_tcp_proxy_module.git
tar -xvzf nginx-$NGINX_VERSION.tar.gz
cd nginx-$NGINX_VERSION
patch -p1 < ../nginx_tcp_proxy_module/tcp.patch
./configure --add-module=../nginx_tcp_proxy_module/
sudo make && make install

Proxy Configuration

Create a simple vhost like the following (note tcp must be defined at the server directive level):

tream websockets {
  ## node processes
  server 127.0.0.1:8001;
  server 127.0.0.1:8002;
  server 127.0.0.1:8003;
  server 127.0.0.1:8004;

  check interval=3000 rise=2 fall=5 timeout=1000;
}

  server {
    listen 127.0.0.1:80;
    server_name _;

    tcp_nodelay on;
    proxy_pass websockets;
  }
}

http {
  ## status check page for websockets
  server {
    listen 9000;

    location /websocket_status {
      check_status;
    }
  }
}

You’ll notice there is an additional http section, with a check_status directive. The tcp_proxy module provides a simple and convenient status check page which you can use to see if your backend node processes are up:

Create a Simple Websocket Server

var http = require('http'), io = require('socket.io');

var server = http.createServer(function(req, res){
  res.writeHead(200, {'Content-Type': 'text/html'});
  res.end('Hello world');
});
server.listen(process.argv[2]);

// socket.io
var socket = io.listen(server);
socket.on('connection', function(client){
  // new client is here!
  console.log('client has connected');
  client.on('message', function(){  })
});

Start four node processes, each listening on different ports:

node ./websocket-server.js 8001 &
node ./websocket-server.js 8002 &
node ./websocket-server.js 8003 &
node ./websocket-server.js 8004 &

You should now check your status page to verify all backends are up and running:

You can also go to our proxy via standard http (web browser) and correctly see “Hello World” to verify we’re hitting one of our node backends.

Test Websocket Proxying

Now lets create a simple web socket client to test if we can actually create a websocket connection over the proxy:

<html>
  <head>
  <title>Websockets Proxy Test</title>
  <script src="sio.js"></script>
  <script>
  var socket = new io.Socket('ws://localhost');
    socket.connect();
    socket.on('connect', function() {
    console.log('connected!');
  });
  </script>
</head>
  <body>
    <h1>Websockets Proxy Test</h1>
  </body>
</html>

For simplicity, I used a node static page server to serve this test page from the same node process and attached my Socket.IO instance to it:

var io = require('./Socket.IO-node');

var http = require("http"),
    url = require("url"),
    path = require("path"),
    fs = require("fs"),
    port = process.argv[2] || 8888;

var server = http.createServer(function(request, response) {
  var uri = url.parse(request.url).pathname, filename = path.join(process.cwd(), uri);

  path.exists(filename, function(exists) {
    if(!exists) {
      response.writeHead(404, {"Content-Type": "text/plain"});
      response.write("404 Not Found\n");
      response.end();
      return;
    }

  if (fs.statSync(filename).isDirectory()) filename += '/index.html';
  fs.readFile(filename, "binary", function(err, file) {
    if(err) {
      response.writeHead(500, {"Content-Type": "text/plain"});
      response.write(err + "\n");
      response.end();
      return;
    }

    response.writeHead(200);
    response.write(file, "binary");
    response.end();
    });
 });

 server.listen(parseInt(port, 10));

 // socket.io
 var socket = io.listen(server);

Now, when we run our node instances:

node ./websocket-server.js 8001
Static file server running at
  => http://localhost:8001/
  CTRL + C to shutdown
     info  - socket.io started

And when we go to http://localhost, we can see successful websocket handshakes (which means we’re going over the nginx proxy):

 debug - client authorized
  info  - handshake authorized
  info  - handshaken 5ec456d53d8dc0b43f61b5f3acdf8b8e

Using this method you can successfully balance a cluster of web socket nodes, with some simple failure provided by the tcp_proxy module. Depending on the need and placement of nginx within your specific application stack, the ability to use this instead of something like HAProxy or some other balancer strategy could allow scaling many more connections per server without introducing significant complexity.

One thing that should be noted is you can no longer guarantee that a client will always connect to the same node process, like in the event of disconnection, so your application will need to understand this and implement session management across the cluster (such as using a redis or memcache session store, for example).

Happy hacking!