Recap From Day 029
In day 029, we learned about what makes a good Feature. We saw that regardless of the learning algorithm we’re using there are few properties we want to look for when choosing our features.
Today, we’ll start looking at working with audio input: Common audio features.
Working With Audio Input: Common Audio Features
One of the most exciting things about machine learning is that it makes it possible for us to build systems that respond to more complicated types of inputs, like real-time audio and video.
Let’s start with audio. Here is a quick list of things you could potentially do using machine learning on audio input. Each of these will tend to become easier the more training data you have, but each of these should be possible to start exploring even with a pretty small training data set that you take a few minutes to create on your own.
You might detect whether someone is speaking; detect which word is being said, especially if you have a relatively small vocabulary of words. You could detect what instrument is playing? Detect what artist or genre of music is playing? Again, this is easiest with a relatively limited number of categories. You could detect what key a piece of music is in, or you could detect what the mood of the piece of music is. For example, is it low-energy or high-energy? is it in a major or minor key?
Without machine learning, the examples above are very difficult problem to solve. If you know anything about human auditory perception, you know that out ears and brains are amazingly complex systems. Even though we, as humans, hear things like speech, instrumentation and harmony as being relatively easy to perceive, there isn’t quite so simple a relationship between these perceptual qualities and what we can directly measure from an audio signal.
As we continue with working with audio input, we’ll see some of the basic audio features that can be useful for solving the kinds of problems above. On their own, these features won’t necessary give you something that is easy to relate to phenomena like pitch or harmony or speech recognition, but used in construction with a machine learning, they can become very useful.
To understand these features, you should have a basic grasp on what an audio signal really represents.
Amazing to know that you’re still here. We’ve come to the end of day 030. I hope you found this informative. Thank you for taking time out of your schedule and allowing me to be your guide on this journey.