# 图解OneFlow的学习率调整策略

06/23 08:03

1

• https://huggingface.co/spaces/basicv8vc/learning-rate-scheduler-online

• https://share.streamlit.io/basicv8vc/scheduler-online

## 学习率调整策略

### 基类LRScheduler

LRScheduler(optimizer: Optimizer, last_step: int = -1, verbose: bool = False)是所有学习率调度器的基类，初始化参数中last_step和verbose一般不需要设置，前者主要和checkpoint相关，后者则是在每次step() 调用时打印学习率，可以用于 debug。LRScheduler中最重要的方法是step()，这个方法的作用就是修改用户设置的初始学习率，然后应用到下一次的Optimizer.step()。

### ConstantLR

oneflow.optim.lr_scheduler.ConstantLR(    optimizer: Optimizer,    factor: float = 1.0 / 3,    total_iters: int = 5,    last_step: int = -1,    verbose: bool = False,)

ConstantLR和固定学习率差不多，唯一的区别是在前total_iters，学习率为初始学习率 * factor。

### LinearLR

oneflow.optim.lr_scheduler.LinearLR(    optimizer: Optimizer,    start_factor: float = 1.0 / 3,    end_factor: float = 1.0,    total_iters: int = 5,    last_step: int = -1,    verbose: bool = False,)

LinearLR和固定学习率也差不多，唯一的区别是在前total_iters，学习率先线性增加或递减，然后再固定为初始学习率 * end_factor。

LinearLR

### ExponentialLR

oneflow.optim.lr_scheduler.ExponentialLR(    optimizer: Optimizer,    gamma: float,    last_step: int = -1,    verbose: bool = False,)

ExponentialLR

### StepLR

oneflow.optim.lr_scheduler.StepLR(    optimizer: Optimizer,    step_size: int,    gamma: float = 0.1,    last_step: int = -1,    verbose: bool = False,)

StepLR和ExponentialLR差不多，区别是不是每一次调用step()都进行学习率调整，而是每隔step_size才调整一次。

StepLR

### MultiStepLR

oneflow.optim.lr_scheduler.MultiStepLR(    optimizer: Optimizer,    milestones: list,    gamma: float = 0.1,    last_step: int = -1,    verbose: bool = False,)

StepLR每隔step_size就调整一次学习率，而MultiStepLR则根据用户指定的milestones进行调整，假设milestones是[2, 5, 9]，在[0, 2)是lr，在[2, 5)是lr * gamma，在[5, 9)是lr * (gamma **2)，在[9, )是lr * (gamma **3)。

MultiStepLR

### PolynomialLR

oneflow.optim.lr_scheduler.PolynomialLR(    optimizer,    steps: int,    end_learning_rate: float = 0.0001,    power: float = 1.0,    cycle: bool = False,    last_step: int = -1,    verbose: bool = False,)

PolynomialLR

### CosineDecayLR

oneflow.optim.lr_scheduler.CosineDecayLR(    optimizer: Optimizer,    decay_steps: int,    alpha: float = 0.0,    last_step: int = -1,    verbose: bool = False,)

### CosineAnnealingLR

oneflow.optim.lr_scheduler.CosineAnnealingLR(    optimizer: Optimizer,    T_max: int,    eta_min: float = 0.0,    last_step: int = -1,    verbose: bool = False,)

CosineAnnealingLR和CosineDecayLR很像，区别在于前者不仅包含余弦衰减的过程，也可以包含余弦增加，在前T_max步，学习率由lr余弦衰减到eta_min, 如果cur_step > T_max，然后再余弦增加到lr，不断重复这个过程。

CosineAnnealingLR

### CosineAnnealingWarmRestarts

oneflow.optim.lr_scheduler.CosineAnnealingWarmRestarts(    optimizer: Optimizer,    T_0: int,    T_mult: int = 1,    eta_min: float = 0.0,    decay_rate: float = 1.0,    restart_limit: int = 0,    last_step: int = -1,    verbose: bool = False,)

T_mult=1, decay_rate=1

T_mult=1, decay_rate=0.5

### LambdaLR

oneflow.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda, last_step=-1, verbose=False)

LambdaLR可以说是最灵活的策略了，因为具体的方法是根据函数lr_lambda来指定的。比如实现Transformer中的Noam Scheduler：

def rate(step, model_size, factor, warmup):    """    we have to default the step to 1 for LambdaLR function    to avoid zero raising to negative power.    """    if step == 0:        step = 1    return factor * (        model_size ** (-0.5) * min(step ** (-0.5), step * warmup ** (-1.5))    )

model = CustomTransformer(...)optimizer = flow.optim.Adam(    model.parameters(), lr=1.0, betas=(0.9, 0.98), eps=1e-9)lr_scheduler = LambdaLR(    optimizer=optimizer,    lr_lambda=lambda step: rate(step, d_model, factor=1, warmup=3000),)

### SequentialLR

oneflow.optim.lr_scheduler.SequentialLR(    optimizer: Optimizer,    schedulers: Sequence[LRScheduler],    milestones: Sequence[int],    interval_rescaling: Union[Sequence[bool], bool] = False,    last_step: int = -1,    verbose: bool = False,)

### WarmupLR

oneflow.optim.lr_scheduler.WarmupLR(    scheduler_or_optimizer: Union[LRScheduler, Optimizer],    warmup_factor: float = 1.0 / 3,    warmup_iters: int = 5,    warmup_method: str = "linear",    warmup_prefix: bool = False,    last_step=-1,    verbose=False,)

WarmupLR是SequentialLR的子类，包含两个LRScheduler，并且第一个要么是ConstantLR，要么是LinearLR。

### ChainedScheduler

oneflow.optim.lr_scheduler.ChainedScheduler(schedulers)

lr ==> LRScheduler_1 ==> LRScheduler_2 ==> ... ==> LRScheduler_N

### ReduceLROnPlateau

oneflow.optim.lr_scheduler.ReduceLROnPlateau(    optimizer,    mode="min",    factor=0.1,    patience=10,    threshold=1e-4,    threshold_mode="rel",    cooldown=0,    min_lr=0,    eps=1e-8,    verbose=False,)

optimizer = flow.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)scheduler = flow.optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min')for epoch in range(10):    train(...)    val_loss = validate(...)    # 注意，该步骤应在validate()之后调用。    scheduler.step(val_loss)

## 实践

• https://github.com/basicv8vc/oneflow-cifar100-lr-scheduler

（本文经授权后发布，原文：

https://zhuanlan.zhihu.com/p/520719314 ）

0
1 收藏

0 评论
1 收藏
0