2019.12.01(pm): Gradient descent method

This article describes the gradient descent method, one of the basic function optimization methods.

The gradient descent method is one of the representative ways to apply the concept of differential to optimization problems and is to find the local minimum of a function. The gradient descent method is also called steepest descent method.

Gradient descent method

When you’re in a dense jungle where you can’t see well, the way to the summit is simple. Although we don’t know where the mountain is actually at, we will come to the summit someday as we climb up the steepest slope.

Alternatively, if you want to find a deep valley, you can go down the mountain in the steepest downhill direction.

This method of moving in the direction of the gradient from the current position to find the maximum of a function is called the gradient ascent method and the method of moving in the opposite direction of the gradient to find the minimum is called the gradient descent method.

The gradient descent method, widely known as one of the optimization algorithms, uses the characteristics of these gradients to progressively find parameter values ​​to minimize the value of any cost function.

In other words, starting from a certain initial value x0 = (x10, …, xn0), if you move x slightly in the opposite direction of the gradient according to the above equation, you can find x where f (x) becomes the minimum. is.

If the goal is to find the local maximum rather than the local minimum, update x using equation below (gradient ascent method).

The problem with the gradient descent method is that it falls into the local minimum, as you can easily imagine. In other words, I thought that this was the summit of the mountain, so I went up hard and it was a small hill and there was a much higher mountain right next to it.

One of the problems with gradient descent is that the computational speed is very slow. The solution to this problem is the Stochastic Gradient Decent (SGD) method. Next time, we will look at Stochastic Gradient Decent.