Recurrent Neural Network Tutorial for Artists

Ha, David

Recurrent Neural Network Tutorial for Artists

January 1, 2017

This post is not meant to be a comprehensive overview of recurrent neural networks. It is intended for readers without any machine learning background. The goal is to show artists and designers how to use a pre-trained neural network to produce interactive digital works using simple Javascript and p5.js library.

Introduction

Handwriting Generation with Javascript

Machine learning has become a popular tool for the creative community in recent years. Techniques such as style transfer, t-sne, autoencoders, generative adversarial networks, and countless other methods have made their way into the digital artist’s toolbox. Many techniques take advantage of convolutional neural networks for feature extraction and feature processing.

On the other end of the spectrum, recurrent neural networks, and other autoregressive models enable powerful tools that can generate realistic sequential data. Artists have employed such techniques to generate text, and music and sounds. One of the areas I feel lacking focus at the moment is on the generation of vector artwork, perhaps due to the lack of available data.

Handwriting is a form of sketch artwork. Recently, I have collaborated with Shan Carter, Ian Johnson, and Chris Olah to publish a post on distill.pub on handwriting generation. In particular, the experiments in the post help visualise the internals of a recurrent neural network trained to generate handwriting. The truth is, that project also served as a kind of meta-experiment for myself. Rather than directly working on the visualisation experiments and writeup, I set out to create a pre-trained handwriting model with an easy-to-use Javascript interface, and have my collaborators, who are highly talented data visualisation artists, experiment with the model to create something out of it. They ended up creating the beautiful interactive visualization experiments in the distill.pub post.

I decided to write this post and make available the same handwriting model used in the distill.pub project along with explanations, with the hope that other artists and designers can also take advantage of these technologies and even go deeper into the field.

Modelling a Handwriting Brain

There are many things going on in our brain when we are writing a letter. Based on what we set out to accomplish by writing, we make a plan about what we are going to write, select a suitable choice of vocabulary, how neat our handwriting needs to be, and then pick up then pen and start writing something on a pad of paper, making decisions about where to place the pen, where to move it, and when to pick it up.

It would be difficult to create a Javascript model to simulate the entire human brain for writing a letter, but we can instead try to model the handwriting brain approximately by focusing on the last part of the handwriting process, namely where to place the pen, where to move it, and when to pick it up. So our model of the handwriting process will only care about the location of the pen, and whether the pen is touching the paper pad.

We also make two assumptions about the model. The first assumption is that the decision of what the model will write next will only depend on whatever it wrote in the past. However, when we write things, while we remember precisely the details of the last pen stroke, we don’t actually remember exactly what we wrote many strokes ago, and only have a vague idea about what was written. This vague idea about what was written before can in fact be modelled within the context of a recurrent neural network.

With an RNN, we can store this type of vague knowledge directly into the neurons of the RNN, and we refer to this object as the hidden state of the RNN. This hidden state is just a vector of floating point numbers that keep track of how active each neuron is. What our model will write next, will therefore depend on its hidden state. This hidden state object will keep on getting updated after something is written, so it will be constantly changing. We will demonstrate how this works in the next section.

The second assumption about the model, is that that the model will not be absolutely certain about what it should write next. In fact, the decision of what the model will write next is random. For example, when the model is writing the character $y$ , it may decide to either continue writing the character to make the bottom hook of the $y$ character larger, or it can decide to suddenly finish off the character and move the pen to another location. Therefore, the output of our model will not be precisely what to write next, but actually a probability distribution of what to write next. We will need to sample from this probability distribution to decide what to actually write next.

These two assumptions can be summarised in the following diagram, which describes the process of using a Recurrent Neural Network model with a hidden state to generate a random sequence.

Generative Sequence Model Framework

Don’t worry if you don’t fully understand this diagram. In the next section, we will demonstrate what is going on line-by-line with Javascript.

Recurrent Neural Network for Handwriting

We have pre-trained a recurrent neural network model to preform the handwriting task described in the previous section. In this section, we will describe how to use this model in Javascript with p5.js. Below is the entire p5.js sketch for handwriting generation.

var x, y;
var dx, dy;
var pen;
var prev_pen;
var rnn_state;
var pdf;
var temperature = 0.65;
var screen_width = window.innerWidth;
var screen_height = window.innerHeight;
var line_color;

function restart() {
  x = 50;
  y = screen_height/2;
  dx = 0;
  dy = 0;
  prev_pen = 0;
  rnn_state = Model.random_state();
  line_color = color(random(255), random(255), random(255))
}

function setup() {
  restart();
  createCanvas(screen_width, screen_height);
  frameRate(60);
  background(255);
  fill(255);
}

function draw() {
  rnn_state = Model.update([dx, dy, prev_pen], rnn_state);
  pdf = Model.get_pdf(rnn_state);
  [dx, dy, pen] = Model.sample(pdf, temperature);
  if (prev_pen == 0) {
    stroke(line_color);
    strokeWeight(2.0);
    line(x, y, x+dx, y+dy);
  }
  x += dx;
  y += dy;
  prev_pen = pen;
  if (x > screen_width - 50) {
    restart();
    background(255);
    fill(255);
  }
}

We will explain how each line works. First, we will need to define a few variables to keep track of where the pen actually is (x, y). Our model will be working with smaller coordinate offsets (dx, dy) and determine where the pen should go next, and (x, y) will be the accumulation of (dx, dy).

var x, y; // absolute coordinates of where the pen is
var dx, dy; // offsets of the pen strokes, in pixels

In addition, our pen will not always be touching the paper. We would need a variable, called pen, to model this. If pen is zero, then our pen is touching the paper at the current time step. We also need to keep track of the pen variable at the previous time step, and store this into prev_pen.

// keep track of whether pen is touching paper. 0 or 1.
var pen;
var prev_pen; // pen at the previous timestep

If we have a list of (dx, dy, pen) variables generated by our model at every time step, it will be enough for us to use this data to draw out what the model has generated on the screen. At the beginning, all of these variables (dx, dy, x, y, pen, prev_pen) will be initialised to zero.

We will also define some variable objects that will be used by our RNN model:

var rnn_state; // store the hidden states the rnn

// store all the parameters of a mixture-density distribution
var pdf;

// controls the amount of uncertainty of the model
// the higher the temperature, the more uncertainty.
var temperature = 0.65; // a non-negative number.

As described in the previous section, the rnn_state variable will represent the hidden state of the RNN. This variable will hold all the vague ideas about what the RNN thought it has written in the past. To update rnn_state, we will use the update function in the model later on in the code.

rnn_state = Model.update([dx, dy, prev_pen], rnn_state);

The object rnn_state will be used to generate the probability distribution of what the model will write next. That probability distribution will be represented as the object called pdf. To obtain the pdf object from rnn_state, we will use the get_pdf function later, like this:

pdf = Model.get_pdf(rnn_state);

An additional variable called temperature allows us to control how confident or how uncertain we want to make the model. Combined with pdf object, we can use the sample function in the model to sample the next set of (dx, dy, pen) values from our probability distribution. We will use the following function later on:

[dx, dy, pen] = Model.sample(pdf, temperature);

The only other variables we need now are to control the colour of the handwriting, and also keep track of the screen’s dimensions of the browser:

// stores the browser's dimensions
var screen_width = window.innerWidth;
var screen_height = window.innerHeight;

// colour for the handwriting
var line_color;

Now we are ready to initialise all these variables we just declared for the actual handwriting generation. We will create a function called restart to initialise these variables since we will be reinitialising them many times later.

function restart() {
  // set x to be 50 pixels from the left of the canvas
  x = 50;
  // set y somewhere in middle of the canvas
  y = screen_height/2;

  // initialize pen's states to zero.
  dx = 0;
  dy = 0;
  prev_pen = 0;
  // note: we draw lines based off previous pen's state

  // randomise the rnn's initial hidden states
  rnn_state = Model.random_state();

  // randomise colour of line by choosing RGB values
  line_color = color(random(255), random(255), random(255))
}

After creating the restart function, we can define the usual p5.js setup function to initialise the sketch.

function setup() {
  restart(); // initialize variables for this demo
  createCanvas(screen_width, screen_height);
  frameRate(60); // 60 frames per second
  // clear the background to be blank white colour
  background(255);
  fill(255);
}

Our handwriting generation will take place in the draw function of the p5.js framework. This function is called 60 times per second. Each time this function is called, the RNN will draw something on the screen.

function draw() {

  // using the previous pen states, and hidden state
  // to get next hidden state 
  rnn_state = Model.update([dx, dy, prev_pen], rnn_state);

  // get the parameters of the probability distribution
  // from the hidden state
  pdf = Model.get_pdf(rnn_state);

  // sample the next pen's states
  // using our probability distribution and temperature
  [dx, dy, pen] = Model.sample(pdf, temperature);

  // only draw on the paper if pen is touching the paper
  if (prev_pen == 0) {
    // set colour of the line
    stroke(line_color);
    // set width of the line to 2 pixels
    strokeWeight(2.0);
    // draw line connecting prev point to current point.
    line(x, y, x+dx, y+dy);
  }

  // update the absolute coordinates from the offsets
  x += dx;
  y += dy;

  // update the previous pen's state
  // to the current one we just sampled
  prev_pen = pen;

  // if the rnn starts drawing close to the right side
  // of the screen, restart our demo
  if (x > screen_width - 50) {
    restart();
    // reset screen
    background(255);
    fill(255);
  }

}

At each frame, the draw function will update the hidden state of the model based on what it has previously drawn on the screen. From this hidden state, the model will generate a probability distribution of what will be generated next. Based on this probability distribution, along with the temperature parameter, we will randomly sample what action it will take in the form of a new set of (dx, dy, pen) variables. Based on this new set of variables, it will draw a line on the screen if the pen was previously touching the paper pad, and update the global location of the pen. Once the global location of the pen gets close to the right side of the screen, it will reset the sketch and start again.

Putting all of this together, we get the following handwriting generation sketch.

So there you have it, handwriting generation in your web browser in with a few lines of Javascript using p5.js.

Sampling from a Probability Distribution with Varying Temperature

The variable pdf is supposed to store the probability distribution of the next pen stroke at each time step. Under the hood, the object pdf actually just contains the parameters of a complicated probability distribution (i.e. the means and the standard deviations of a bunch of Normal Distributions). We have chosen to model the probability distribution of dx and dy as a Mixture Density Distribution.

But what exactly is a mixture density distribution? Well, statisticians (data scientists) like to model probability distributions with well known, mathematically tractable distributions such as the Normal Distribution, and they try to determine the parameters of the distribution (such as the mean and standard deviation for a Normal Distribution) to best fit the data. However, when dealing with something complicated, like the strokes of handwriting data, we find that a simple Normal Distribution is not good enough to model the data. Intuitively, handwriting strokes either stay close to the previous location, or jump to another location when a word or character is finished.

A straight forward way to deal with this problem is to model a probability distribution as the sum of many Normal distributions added together. In our case, we model the handwriting strokes as the sum of 20 Normal distributions. With a Mixture of 20 Normal distributions, our model can do an okay job of modelling the actual handwriting data. More technical details can be obtained in this other post.

When we take this probability distribution, and sample from this distribution to get the set of (dx, dy, pen) values to determine what to draw next, we use the temperature parameter to control the level of uncertainty of the model. If the temperature parameter is very high, then we are more likely to obtain samples in less probable regions of the probability distribution. If the temperature parameter is very low, or close to zero, then we will only obtain samples from the most probable parts of the distribution.

In the sketch below, you can visualise how the probability distribution becomes augmented by varying the temperature parameter. You can control the temperature parameter by dragging around the top orange bar.

Visualise a Mixture Density Distribution by adjusting the Temperature.

For simplicity, the above demo simulates a mixture of twenty, one-dimensional normal distributions with a temperature parameter. In the handwriting model, the probability distribution is a mixture of twenty, two-dimensional normal distributions. In the next sketch, you can modify the temperature of the handwriting model while it is writing something, to see how the handwriting changes with varying temperatures.

When the temperature is kept low, the handwriting model becomes very deterministic, so the handwriting is generally more neat and more realistic. Increasing the temperature will increase the likelihood of choosing less likely probable of the probability distribution, so the handwriting samples will tend to be more funky and uncertain.

Extending the Handwriting Demo

One of the more interesting aspects of combining machine learning with design is to explore the interaction between human and machine. The typical machine learning framework + python stack makes it difficult to deploy truly interactive web applications, as they often require dedicated web services to be written on the server side to process user interaction on the client side. The nice thing about Javascript frameworks such as p5.js is interactive programming can be done with ease, and deployed without much effort in a web browser.

A possible interactive extension we can build from the basic handwriting demo is to have the user interactively write some handwriting onto the screen, and when the user is idle, have the model continuously predict the rest of the handwriting sample. Another extension we can build, similar to the one is the distill.pub post, is to have the model sample multiple possible paths that follow the handwriting path created by the user.

There are countless other possibilities one can experiment with this model. It will also be interesting to combine this model with more advanced frameworks such as paper.js or d3.js to generate better looking strokes.

Use this code!

If you are an artist or designer interested in machine learning, you can fork the github repository containing the code used for this post, and use it to your liking.

This post only scratches the surface of recurrent neural networks. If you want to be more involved into the whole machine learning development process and train your own models, there are excellent resources to learn how to build models with TensorFlow, or keras. If you use keras to build and train your models, there is even a tool called keras.js that allow you to export pre-trained models for web browser usage, so you can build model interfaces like the Javascript handwriting model used in this post. I haven’t personally used keras.js, and I found it fun to just write the handwriting model from scratch in Javascript.

Update:

This model has already been ported to bl.ocks, and extended by a few people to do some very, interesting, things.