AI, Machine Learning and Deep Learning – Samee Ur Rehman

Why the fuss about deep learning?

samee99 — Wed, 07 Jul 2021 19:04:45 +0000

Unless you were living under a rock, you would have come across a lot of media coverage centered around Artificial Intelligence, Machine Learning and Deep Learning over the past decade. Companies like Google, Facebook and Tesla have leveraged AI to their advantage:

Tesla autopilot uses Deep Learning to successfully navigate roads

You might ask, computers have been around for a very long time, what’s up with the all the fuss about artificial intelligence and in particular machine learning and deep learning in the recent past? So much so, that the media has started to paint doomsday scenarios that AI may one day take over from humans as the dominant form of intelligence on Earth!

Some of the headlines in the past decade related to AI takeover

In reality we are a fair way away from such a scenario. A statement by Andrew Ng with respect to the above, provides a more realistic picture:

Fearing a rise of killer robots is like worrying about overpopulation on Mars.
Andrew Ng – CEO Landing AI, Adjust Professor at Stanford University

Unfortunately, you won’t be able to get to order any goodies online on the Red Planet anytime soon!

In other words, we haven’t even solved some of the fundamental problems in AI and that’s why focusing on a potential AI takeover is the wrong problem to focus on.

It does still beg the question, why the fuss about deep learning. After all, a quick search on Google Trends, for the term “deep learning” shows an exponential increase in interest in the past decade:

To answer why, we have to first understand some of the historical limitations of machine learning. The TLDR for why deep learning is so hot right now is that Machine Learning requires Feature Engineering while Deep Learning does not.

Machine Learning requires Feature Engineering while Deep Learning does not.
TLDR

Let’s dig deeper to understand what we mean by that statement. Let’s assume you have heard all the hype about artificial intelligence and you decide that it’s time you start your own company that sells self-driving cars. As the founder, you get started with figuring out how to build an effective algorithm for safely navigating the car on the road. Such a learning algorithm should at least be able to identify when there is another car on the road. How would we do that?

Let’s say you installed a camera on the front bumper and use it to capture a 1 minute video, similar to the one from your competitor Tesla. Because you are a thoughtful engineer, you chop the video into 10ms segments each of which you can now treat as an image. You now have 600 images from the 1 minute video and your task is to make the computer output whether there is a car in the image or not. The task seems simple enough:

Imagine, you are the founder of a startup building a learning algorithm for a self-driving car. An important task would be to identify other cars in the environment. Given an image identify if it contains a car or not.

Assume you know nothing about learning algorithms but you can do a bit of coding, after all you are a founder of a tech startup. You visually inspect your 600 images and manually label each as “1” if it contains a car and “0” if it does not contain a car. Now you decide to write a program to output car/no car depending on the input. So you write a simple program:

If Data == 1
     print('Car')
else
     print('No car')
end

You show the above piece of code to your co-founder, Bob, who recently read a blog about what machine learning is. Bob shakes his head.

Bob is not convinced

Bob points out that what you wrote looks like a traditional program, i.e. it’s a hard-coded set of instructions. The data you gave to the computer consisted of 1s and 0s and you instructed the computer to print the correct answer and that, he argues, looks a lot like the traditional programming described in the blog:

If the input data took the form of a binary switch, 1 when there is a car and 0 when there isn’t, then we wouldn’t even need a machine learning algorithm. A hard-coded set of instructions would do just fine.

Secondly, Bob argues that since you did the job of labeling images as car or no car and simply provided the labels (0 or 1) to the computer without the raw image data, you didn’t give the computer the chance to learn anything. You agree with Bob and ask him for help. Based on his knowledge, he draws the following:

You note that Bob has inverted the relationship between output and program and is now supplying the raw image data as input to the computer. Bob says that if the machine can learn such a program then this program can be deployed in a self-driving car to predict cars in new images that the camera collects! That’s true, you say…

Bob’s right!

But you point out to Bob that that the raw data takes the form of an RGB image with many different pixels. Some of the pixels have something to do with a car, while others do not. Any single pixel, on it’s own, doesn’t tell me anything about a car anyway, you argue. How would I teach the computer to find a car in the image by looking at a bunch of pixels?

Bob is as clueless as you are.

All cars should have tires, windows and number plates, Bob says. What if we, the engineers, extract these main features of a car and ask the machine (computer) to perform classification, i.e. predict whether a car is present in the image or not using these features instead of the raw data:

You nod and realize that by providing the computer information about the high level representation of a car (i.e. tires, windows) instead of raw pixels, you make the learning algorithm’s job easier.

Mapping the raw data to the output has historically been a difficult task for machine learning algorithms. ML algorithms have therefore often depended on mapping manually extracted features to the output instead.

In the simplest case, you can see how this might work. You, as the engineer, would identify a part of an image containing a tire. You would provide this tire feature to the computer and ask the computer to check if a tire similar to this is present in another image. Presumably if a tire is present, the likelihood of a car being present is high. Now the computer’s job is basically that of a detective, it needs to look at each patch of a new image and check if it contains the tire patch:

No tire on this one!

You point out to Bob that the computer could take the pixels of the tire patch, slide it over the new image in steps and multiply it with the pixels of the image, and report back the product. If the product of the pixels leads to a large value that means the tire feature is activated in the image. The computer could then sum over the pixels and report back that the car is in the image if the product of the pixels is above a certain threshold.

Extracting tire features manually from the input image data is hard work. Bob realizes that bicycles also have tires, so you might confuse a bicycle for a car using this approach as a bike would also activate the same features. But Bob thinks you are on to something. He asks if what you are doing looks something like this:

You again nod in agreement. Bob’s drawing reminds you a lot of neurons in the brain, where dendrites provide input data, the cell body performs some function (in our case product of tire features with input image followed by applying a threshold) and an axon provides the output:

You wonder if the artificial neural network you are building can learn weights (parameters) similar to synapses in the brain:

Artificial Neural Network with single artificial neuron with learnable weights

So instead of supplying tire features, you ask if artificial neural networks can learn weights (i.e. parameters) by reducing the error between the desired output and predicted output? You argue that you already have the desired output for the 600 images, since you labeled the image as 1 if it contains a car and 0 if it doesn’t. Why not just use that desired output to adjust the weights of our Artificial Neural Network. You draw another figure to illustrate the point:

Adjust artificial neural network weights by minimizing difference between desired and predicted output

Bob likes the idea but he sees a couple of issues. He doubts if the computer can learn high level representation like tires and windows from just the raw pixels directly. He wonders if you could use many neurons, stacked one after other, instead of just one neuron to make the task easier for the machine.

You get the main idea but are not sure how stacking the neurons one after another in layers such that one neuron’s output is received as the next neuron’s input can lead to the machine’s learning task becoming easier. Bob thinks that the weights of the earlier neurons will learn lower level features (like edges) while weights of later neurons will learn object parts (e.g. tires) automatically using information provided by earlier neurons. You stare back cluelessly at Bob and he tries to help out with the following figure:

You finally get what Bob is doing. The question Bob poses is this: could the computer automatically learn higher level features (such as presence of tires) from simpler representations (e.g. edges corresponding to a sudden change of intensity in the image) which in turn are generated from the raw data? You aren’t convinced with the details just yet, but you see where Bob is going with this. Bob draws another figure:

Bob says that many neurons stacked in layers one after another can automatically learn the representation as well as the mapping from the representation to the output. He calls this Deep Learning.

Bob says that many neurons stacked in layers one after another can automatically learn the representation as well as the mapping from the representation to the output. He calls this Deep Learning and points out that as opposed to machine learning, which requires a human to perform the feature extraction, this learning approach is completely automatic since the machine learns the representation (i.e. features) on it’s own.

You note that the input is again the raw image. If you understand it correctly, Bob is suggesting we ask the computer to perform end-to-end learning. The Deep Learning algorithm would perform representation learning (i.e. learn salient feature like tires from simpler features like edges) as well as map the representations to the output. The human would not be required at all.

Deep Learning enables end-to-end learning.

But you also realize that to do what Bob is suggesting, you would need a lot of data and sufficient computational power, since the learning algorithm has to learn the representation itself as well:

Bob agrees and gives the analogy where the computing power (CPU, GPU, TPU) is the engine of a rocket and the input data is it’s fuel. You need a combination of both to thrust the rocket into space, i.e. to enable Deep Learning

Bob uses the analogy where the computational power is the engine in a rocket and the input data is the fuel. You need a combination of both to thrust the rocket into space, i.e. to enable Deep Learning

So Bob says that training artificial neural networks would only make sense if you have a lot of data, otherwise machine learning algorithms that require features extraction may just do better. He makes another plot:

Bob thinks that large Neural Networks having many layers of neurons can continue to learn with increasing data while other learning algorithms’ performance tend to stall as the representation itself is a bottleneck.

You understand what Bob means and try to put his Deep Learning idea within context of Artificial Intelligence, Machine Learning and Representation Learning:

You realize that the first program you wrote for identifying a car could be considered as a simple AI algorithm:

A rule based system such as the above may be considered as a very simple form of AI

The idea of using a tire as a feature and asking the computer to map the tire feature to the car would come under classical machine learning:

Finally, the idea of using multiple neurons stacked in many layers, where the multiple layers through automatic representation learning/feature extraction enable End to End Learning without human intervention, would be considered Deep Learning:

You realize that using Deep Learning, given enough data and computing power, you could tackle all kinds of problems, without requiring any human expertise for feature extraction. e.g. without needing medical doctors to identify portions of a brain MRI image to which the computer should pay attention when performing a diagnosis.

And that’s why Bob, you and the rest of the world think that Deep Learning isn’t just hype but is in fact here to stay. And that’s why you are so confident about launching your own startup into orbit using Deep Learning!

Your self-driving car startup launches into orbit courtesy Deep Learning.

What are the different types of machine learning algorithms?

samee99 — Mon, 21 Jun 2021 22:10:46 +0000

So we know what Machine Learning is. But what are the different ways in which machines learn?

What makes a machine stay engaged when learning about our world?

There are four distinct ways in which machines learn.

If the above is a bunch of gibberish, let’s go into detail.

In the case of supervised learning, the learning engine has access to data (e.g. image of a cat) and desired output (e.g. label, binary digit 1, indicating presence of a cat).
In self-supervised learning, the learning engine has access to data (e.g. image of a cat) but NOT to the desired output (e.g. label, binary digit 1, indicating presence of a cat). Self-supervised learning is also sometimes referred to as unsupervised learning.
Semi-supervised learning is a combination of supervised learning and self-supervised learning. So in this case, only some parts of the data is labeled. As an example, maybe 10% of cats/non-cat images are labeled while the rest are not.
Reinforcement learning is a completely different beast. It assumes that an agent (e.g. a robot) is in an environment (e.g. a maze). When the agent performs a certain action, e.g. move one step forward, it is provided a reward depending on whether that action helps it to get closer or not to completing it’s task (e.g. get out of the maze).

Initial training for Reinforcement Learning is usually performed in simulated environments, to avoid situations like the above.

Below we see concrete examples for each of the four types.

On the far left, we see an example of supervised learning, where the learning algorithm is tasked with learning the decision boundary between the green circles and red triangles.

A clustering example is shown for self-supervised learning. Here the squares need to be divided into 3 distinct portions depending on their relative distance in the two-dimensional domain spanned by x1 and x2.

For semi-supervised learning, the task is the same as the one shown for supervised learning, but in this case, some of the data is not labeled (i.e. the learning engine does not have access to whether the labels is a green circle or a red triangle).

Finally, only the far right, we see an example of reinforcement learning, where an agent (e.g. a robot) needs to navigate through an environment (e.g. a maze) by performing actions and updating it’s state based on the relative reward it receives.

A well trained reinforcement learning algorithm!

In industry, supervised learning is responsible for most of the economic value generated so far. But creating a dataset for supervised learning is also particularly expensive in terms of man-power, as a human being has to go through the process of labeling the data. Also, remember that while it’s easy for just about any person to label images as cat/non-cat, often datasets require expert input, e.g. labelling a dataset of radiology images requires a radiologist or sometimes even a team of radiologists and a radiologist’s time is expensive!

Labeling datasets for supervised learning is an expensive process, especially when it requires expert knowledge, e.g. radiologists for labeling radiology datasets.

That’s why the machine learning research community, as a whole, is trying to leverage the power of the other three learning strategies, particularly self-supervised learning, in order to training learning algorithms.

What is machine learning?

samee99 — Sat, 19 Jun 2021 06:34:29 +0000

What exactly is machine learning and what would the ultimate learning machine look like?

Ridiculously powerful human-like machines have captured our collective imagination for long.

So let’s get the formal definition out of the way.

“Machine Learning is the art of getting computers to learn without being explicitly programmed”
Arthur Samuel (1958)

The art of what, again?

If you read the definition above, the part about not explicitly programming the computer is what really distinguishes machine learning from traditional programming. In traditional programming, a computer take data and a program, i.e. an explicit set of instructions, as input and generates an output.

On the other hand, in machine learning, a combination of data and the output is used to make the machine learn the program:

Note that one thing didn’t change. The data is still an input in both cases.

But how is machine learning actually performed? For example, how can we make a computer learn to identify that an image contains a cat.

This is harder than it sounds. Human beings are very good at doing this sort of pattern recognition. But to a computer an RGB image of a cat is just a bunch of pixels. An extremely unique combination of pixels results in an image of a cat. And an effective machine learning algorithm needs to be able to automatically identify what are the combinations and patterns of pixels, that when put together, results in a cat appearing in an image.

It’s not trivial for a computer to learn that a bunch of pixels correspond to an image of a cat.

How would you would go about teaching such a thing to a computer? Well, there are three components of learning:

Any machine/deep learning algorithm you come across will contain the above three components. This point is definitely worth repeating, any learning engine will contain the above three parts. So let’s take it step by step. What do we mean by representation:

Representation is the heart of any machine learning algorithm. A learner must be represented in some formal language the computer can handle. In practice, this formal language is basically some mathematical model that takes some input data, performs a computation on it, and returns an output. Neural Networks, Gaussian Processes, Support Vector Machines and Decision Trees are all examples of representation.

Having a representation is not enough though. All mathematical models have parameters that can change in value. Parameters are just variable values in the model that need to be adjusted for the task to be performed correctly. For example, the tuner knob on your car radio represents a parameter that can be tuned:

The value of the parameters of a model will change depending on the problem being solved. For the radio analogy above, you can imagine that you need to tune the knob different amounts depending on which channel you want to hear. Therefore, you need a mechanism or a method to identify which set of values for the parameters of the model will result in a learning algorithm that effectively performs the task at hand, e.g. learning whether an image contains a cat or not. In other words, we need an objective function that can distinguish good solutions from bad ones. This is what the evaluation component of our learning engine does:

Mean Squared Error, which is basically an average of the square of the error between the actual output and the machine learning engine prediction, is an example of such an objective function.

Finally, once you have a representation, i.e. some mathematical model, and and an evaluation method to identify which parameters might be good choices, you need to be able to search for the set of parameters that lead to the optimal result, i.e. the smallest error between actual output and the machine learning prediction. This is where the Optimization component comes in handy:

Optimization is basically a method to search for the best solution. There are many different optimization methods, some depend on computing the gradient or slope of a function, while other approaches such as Genetic Algorithms, take inspiration from how nature identifies the best solutions.

If we put all of the components together, we can write out the complete learning problem as the below optimization problem:

In other words, learning involves finding the optimal parameters, theta_star, that maximize a certain goodness function (e.g. Mean Squared Error) given a certain representation (e.g. Neural Networks) with varying parameters, theta, and a set of data (e.g. Dataset containing images with and without cats together with labels of whether the respective image contains a cat) to train the algorithm over.

Hope your learning algorithm manages to tune the parameters just right, so you can enjoy the end result!

Or you can go ahead and learn about the different types of machine learning approaches.

References

Pedro Domingos. 2012. A few useful things to know about machine learning. Commun. ACM 55, 10 (October 2012), 78–87. DOI:https://doi.org/10.1145/2347736.2347755