100 Days Of ML Code — Day 062

100 Days Of ML Code — Day 062

Recap From Day 061

Day 061, we looked at at using Wekinator to control a drum machine with a webcam based on an example video from Wekinator.

Today, we will start looking at something new, that builds on what we’ve seen before.

Working With Time

Dynamic time warping, a summary

In the past couple of days, we saw dynamic time warping. A method that can be used to compute the similarity between two sequences of data over time. These two sequences can typically be two gestures captured through a mouse, a Wiimote, a game track, a video camera and so on.

Since DTW(dynamic time warping) is used to compute similarity between temporal sequences, it can be used for gesture classification. Remember that to perform gesture classification with DTW(dynamic time warping), we first record several gesture templates so each template is a gesture that can be recognize by the system.

For instance, our first template is a circle drawn with the mouse so the template is represented as mouse x,y values while drawing the gesture. The first x,y value recorded is the bottom position of the circle and the last x,y value recorded is also the bottom position after we’ve drawn the entire circle. After that, another template can be recorded, let’s say a square, and then a triangle to serve as our three gestures. At the end of the training, our vocabulary contains three gesture templates. Circle, square and triangle. In performance, the idea is to draw a certain shape that will be recognize by DTW.

Let’s say that we are drawing a triangle for instance, DTW matches the sequence of features given by the input gesture, the triangle, to the sequences of features given by each template by computing a similarity measure between the input sequence and the template. The classification outcome is the gesture for which the similarity measure is maximum, In other word, the distance between the two sequences minimum. In the example that we saw previously, the triangle template is the third one. So, the DTW will return the index 3.

DTW is a powerful technique for gesture recognition and temporal sequence matching in general because it takes into account not only the current value, or position of the captured gesture, but also the past values. As a result, a hand gesture pose for instance depends on all the path the hand took to reach that particular position.

If we imagine that using DTW in a digital musical instrument, we could imagine to assign one song to each gesture in the vocabulary. For instance, a guitar riff associated to the circle, a baseline to the square, and a drum sequence to the triangle. Then while performing the continuous gesture, each time the gesture is recognized by the DTW, the associated song is played. For instance if we start to perform a circle and then a square, the guitar riff, followed by the drum sequence will be heard.

The relationship between the gesture performed and the song played is then based on triggering. We may find such method limited for our way of performing music and we may want to have a system that would allow for more control of the sound that not only triggers, but maybe we would like to modulate characteristics of the synthesized songs, for instance, it’s pitch, it’s frequency spectrum, amplitude and so on. And surely we’would like to modulate the songs while we are performing the gesture.

In other words, we may want to be able to use expressive variations, we are executing when we are doing our gesture, such as slowing down at some point an then going faster or exaggerating the amplitude of our gesture and so on. So, we may want to be able to use the expressive variation of our gestures in other to control continuously other parameters of the song synthesis. In that case, what we would need is not a method that will give us which gesture we are doing, such as DTW, but also nearest neighbor or Naive Bayes, but also how we are doing our gestures.

In the coming days, we will see methods that allow for capturing how we are performing a gesture, while we are performing it. And in turn we will see that such methods can provide additional expressive control on the songs, or other digital media, for real time performance.

That’s all for day 062. I hope you found this informative. Thank you for taking time out of your schedule and allowing me to be your guide on this journey. And until next time, be legendary.