2023/12/01 13:39

# 一、简介

## 为什么梯度下降重要？

1. 广泛应用：从简单的线性回归到复杂的深度神经网络，梯度下降都发挥着至关重要的作用。

2. 解决不可解析问题：对于很多复杂的问题，我们往往无法找到解析解（analytical solution），而梯度下降提供了一种有效的数值方法。

3. 扩展性：梯度下降算法可以很好地适应大规模数据集和高维参数空间。

# 二、梯度下降的数学原理

## 更新规则

### 代码示例：基础的梯度下降更新规则

import numpy as np

"""
Perform a single gradient descent update.

Parameters:
theta (ndarray): Current parameter values.
alpha (float): Learning rate.

Returns:
ndarray: Updated parameter values.
"""
return theta - alpha * grad

# Initialize parameters
theta = np.array([1.0, 2.0])
# Learning rate
alpha = 0.01

# Perform a single update
print("Updated theta:", theta_new)


Updated theta: [0.995 1.99 ]


## 代码示例

import torch

# Hypothetical data (features and labels)
X = torch.tensor([[1.0, 2.0], [2.0, 3.0], [3.0, 4.0]], requires_grad=True)
y = torch.tensor([[1.0], [2.0], [3.0]])

# Initialize parameters

# Learning rate
alpha = 0.01

# Number of iterations
n_iter = 1000

# Cost function: Mean Squared Error
def cost_function(X, y, theta):
m = len(y)
predictions = X @ theta
return (1 / (2 * m)) * torch.sum((predictions - y) ** 2)

for i in range(n_iter):
J = cost_function(X, y, theta)
J.backward()

print("Optimized theta:", theta)


Optimized theta: tensor([[0.5780],


## 代码示例

import torch
import random

# Hypothetical data (features and labels)
X = torch.tensor([[1.0, 2.0], [2.0, 3.0], [3.0, 4.0]], requires_grad=True)
y = torch.tensor([[1.0], [2.0], [3.0]])

# Initialize parameters

# Learning rate
alpha = 0.01

# Number of iterations
n_iter = 1000

for i in range(n_iter):
# Randomly sample a data point
idx = random.randint(0, len(y) - 1)
x_i = X[idx]
y_i = y[idx]

# Compute cost for the sampled point
J = (1 / 2) * torch.sum((x_i @ theta - y_i) ** 2)

J.backward()

# Update parameters

print("Optimized theta:", theta)


Optimized theta: tensor([[0.5931],


## 优缺点

SGD虽然解决了批量梯度下降在大数据集上的计算问题，但因为每次只使用一个样本来更新模型，所以其路径通常比较“嘈杂”或“不稳定”。这既是优点也是缺点：不稳定性可能帮助算法跳出局部最优解，但也可能使得收敛速度减慢。

## 代码示例

import torch

# Hypothetical data (features and labels)
X = torch.tensor([[1.0, 2.0], [2.0, 3.0], [3.0, 4.0], [4.0, 5.0]], requires_grad=True)
y = torch.tensor([[1.0], [2.0], [3.0], [4.0]])

# Initialize parameters

# Learning rate and batch size
alpha = 0.01
batch_size = 2

dataset = TensorDataset(X, y)

for epoch in range(100):
J = (1 / (2 * batch_size)) * torch.sum((X_batch @ theta - y_batch) ** 2)
J.backward()

print("Optimized theta:", theta)


Optimized theta: tensor([[0.6101],


0 评论
0 收藏
0