Machine learning has become a popular tool for the creative community in recent years. Techniques such as style transfer, t-sne, autoencoders, generative adversarial networks, and countless other methods have made their way into the digital artist’s toolbox. Many techniques take advantage of convolutional neural networks for feature extraction and feature processing.
On the other end of the spectrum, recurrent neural networks, and other autoregressive models enable powerful tools that can generate realistic sequential data. Artists have employed such techniques to generate text, and music and sounds. One of the areas I feel lacking focus at the moment is on the generation of vector artwork, perhaps due to the lack of available data.
I decided to write this post and make available the same handwriting model used in the distill.pub project along with explanations, with the hope that other artists and designers can also take advantage of these technologies and even go deeper into the field.
Modelling a Handwriting Brain
There are many things going on in our brain when we are writing a letter. Based on what we set out to accomplish by writing, we make a plan about what we are going to write, select a suitable choice of vocabulary, how neat our handwriting needs to be, and then pick up then pen and start writing something on a pad of paper, making decisions about where to place the pen, where to move it, and when to pick it up.
We also make two assumptions about the model. The first assumption is that the decision of what the model will write next will only depend on whatever it wrote in the past. However, when we write things, while we remember precisely the details of the last pen stroke, we don’t actually remember exactly what we wrote many strokes ago, and only have a vague idea about what was written. This vague idea about what was written before can in fact be modelled within the context of a recurrent neural network.
With an RNN, we can store this type of vague knowledge directly into the neurons of the RNN, and we refer to this object as the hidden state of the RNN. This hidden state is just a vector of floating point numbers that keep track of how active each neuron is. What our model will write next, will therefore depend on its hidden state. This hidden state object will keep on getting updated after something is written, so it will be constantly changing. We will demonstrate how this works in the next section.
The second assumption about the model, is that that the model will not be absolutely certain about what it should write next. In fact, the decision of what the model will write next is random. For example, when the model is writing the character , it may decide to either continue writing the character to make the bottom hook of the character larger, or it can decide to suddenly finish off the character and move the pen to another location. Therefore, the output of our model will not be precisely what to write next, but actually a probability distribution of what to write next. We will need to sample from this probability distribution to decide what to actually write next.
These two assumptions can be summarised in the following diagram, which describes the process of using a Recurrent Neural Network model with a hidden state to generate a random sequence.
Recurrent Neural Network for Handwriting
We will explain how each line works. First, we will need to define a few variables to keep track of where the pen actually is (
x, y). Our model will be working with smaller coordinate offsets (
dx, dy) and determine where the pen should go next, and
(x, y) will be the accumulation of
In addition, our pen will not always be touching the paper. We would need a variable, called
pen, to model this. If
pen is zero, then our pen is touching the paper at the current time step. We also need to keep track of the
pen variable at the previous time step, and store this into
If we have a list of
(dx, dy, pen) variables generated by our model at every time step, it will be enough for us to use this data to draw out what the model has generated on the screen. At the beginning, all of these variables (
dx, dy, x, y, pen, prev_pen) will be initialised to zero.
We will also define some variable objects that will be used by our RNN model:
As described in the previous section, the
rnn_state variable will represent the hidden state of the RNN. This variable will hold all the vague ideas about what the RNN thought it has written in the past. To update
rnn_state, we will use the
update function in the model later on in the code.
rnn_state will be used to generate the probability distribution of what the model will write next. That probability distribution will be represented as the object called
rnn_state, we will use the
get_pdf function later, like this:
An additional variable called
temperature allows us to control how confident or how uncertain we want to make the model. Combined with
sample function in the model to sample the next set of
(dx, dy, pen) values from our probability distribution. We will use the following function later on:
The only other variables we need now are to control the colour of the handwriting, and also keep track of the screen’s dimensions of the browser:
Now we are ready to initialise all these variables we just declared for the actual handwriting generation. We will create a function called
restart to initialise these variables since we will be reinitialising them many times later.
After creating the
restart function, we can define the usual p5.js
setup function to initialise the sketch.
Our handwriting generation will take place in the
draw function of the p5.js framework. This function is called 60 times per second. Each time this function is called, the RNN will draw something on the screen.
At each frame, the
draw function will update the hidden state of the model based on what it has previously drawn on the screen. From this hidden state, the model will generate a probability distribution of what will be generated next. Based on this probability distribution, along with the
temperature parameter, we will randomly sample what action it will take in the form of a new set of
(dx, dy, pen) variables. Based on this new set of variables, it will draw a line on the screen if the pen was previously touching the paper pad, and update the global location of the pen. Once the global location of the pen gets close to the right side of the screen, it will reset the sketch and start again.
Putting all of this together, we get the following handwriting generation sketch.
Sampling from a Probability Distribution with Varying Temperature
dy as a Mixture Density Distribution.
But what exactly is a mixture density distribution? Well, statisticians (data scientists) like to model probability distributions with well known, mathematically tractable distributions such as the Normal Distribution, and they try to determine the parameters of the distribution (such as the mean and standard deviation for a Normal Distribution) to best fit the data. However, when dealing with something complicated, like the strokes of handwriting data, we find that a simple Normal Distribution is not good enough to model the data. Intuitively, handwriting strokes either stay close to the previous location, or jump to another location when a word or character is finished.
A straight forward way to deal with this problem is to model a probability distribution as the sum of many Normal distributions added together. In our case, we model the handwriting strokes as the sum of 20 Normal distributions. With a Mixture of 20 Normal distributions, our model can do an okay job of modelling the actual handwriting data. More technical details can be obtained in this other post.
When we take this probability distribution, and sample from this distribution to get the set of
(dx, dy, pen) values to determine what to draw next, we use the
temperature parameter to control the level of uncertainty of the model. If the temperature parameter is very high, then we are more likely to obtain samples in less probable regions of the probability distribution. If the temperature parameter is very low, or close to zero, then we will only obtain samples from the most probable parts of the distribution.
In the sketch below, you can visualise how the probability distribution becomes augmented by varying the temperature parameter. You can control the temperature parameter by dragging around the top orange bar.
For simplicity, the above demo simulates a mixture of twenty, one-dimensional normal distributions with a temperature parameter. In the handwriting model, the probability distribution is a mixture of twenty, two-dimensional normal distributions. In the next sketch, you can modify the temperature of the handwriting model while it is writing something, to see how the handwriting changes with varying temperatures.
When the temperature is kept low, the handwriting model becomes very deterministic, so the handwriting is generally more neat and more realistic. Increasing the temperature will increase the likelihood of choosing less likely probable of the probability distribution, so the handwriting samples will tend to be more funky and uncertain.
Extending the Handwriting Demo
A possible interactive extension we can build from the basic handwriting demo is to have the user interactively write some handwriting onto the screen, and when the user is idle, have the model continuously predict the rest of the handwriting sample. Another extension we can build, similar to the one is the distill.pub post, is to have the model sample multiple possible paths that follow the handwriting path created by the user.
There are countless other possibilities one can experiment with this model. It will also be interesting to combine this model with more advanced frameworks such as paper.js or d3.js to generate better looking strokes.
Use this code!
If you are an artist or designer interested in machine learning, you can fork the github repository containing the code used for this post, and use it to your liking.
This model has already been ported to bl.ocks, and extended by a few people to do some very, interesting, things.