Manual Warmup
- class pytorch_warmup.base.BaseWarmup(optimizer, warmup_params, last_step=-1)[source]
Base class for all warmup schedules.
The learning rate \(\alpha_{t}\) is dampened by multiplying it by the warmup factor \(\omega_{t} \in [0, 1]\) at each iteration \(t\). Thus, the modified learning rate
\[\hat \alpha_{t} = \alpha_{t} \cdot \omega_{t} \]is used by the optimizer.
- Parameters:
optimizer (Optimizer) – Wrapped optimizer.
warmup_params (list) – Warmup parameters.
last_step (int) – The index of last step. Default: -1.
- dampen(step=None)[source]
Dampens the learning rate.
It is not recommended to explicitly call this method for PyTorch 1.4.0 or later. Please use the
dampening()
context manager that calls this method correctly.- Parameters:
step (int) – The index of current step. Default:
None
.
- dampening()[source]
Dampens the learning rate after calling the
step()
method of the learning rate scheduler.The
step()
method calls must be placed in a suite of thewith
statement having thedampening()
context manager.Examples
>>> # For no LR scheduler >>> with warmup_scheduler.dampening(): >>> pass
>>> # For a single LR scheduler >>> with warmup_scheduler.dampening(): >>> lr_scheduler.step()
>>> # To chain two LR schedulers >>> with warmup_scheduler.dampening(): >>> lr_scheduler1.step() >>> lr_scheduler2.step()
>>> # To delay an LR scheduler >>> iteration = warmup_scheduler.last_step + 1 >>> with warmup_scheduler.dampening(): >>> if iteration >= warmup_period: >>> lr_scheduler.step()
- load_state_dict(state_dict)[source]
Loads the warmup scheduler’s state.
- Parameters:
state_dict (dict) – Warmup scheduler state. Should be an object returned from a call to
state_dict()
.
- state_dict()[source]
Returns the state of the warmup scheduler as a
dict
.It contains an entry for every variable in
self.__dict__
which is not the optimizer.
- warmup_factor(step, **params)[source]
Returns the warmup factor \(\omega_{t}\) at an iteration \(t\).
dampen()
uses this method to get the warmup factor for each parameter group. It is unnecessary to explicitly call this method.- Parameters:
step (int) – The index of current step.
params (dict) – The warmup parameters. For details, refer to the arguments of each subclass method.
- class pytorch_warmup.base.ExponentialWarmup(optimizer, warmup_period, last_step=-1)[source]
Exponential warmup schedule.
The exponential warmup schedule uses the warmup factor
\[\omega_{t}^{\rm expo, \tau} = 1 - \exp \left( - \frac{1}{\tau} \cdot t \right) \]at each iteration \(t\), where the constant \(\tau\) is analogous to a linear warmup period.
- Parameters:
optimizer (Optimizer) – Wrapped optimizer.
RAdam
is not suitable because of the warmup redundancy.warmup_period (int or list[int]) – The constant \(\tau\) analogous to a linear warmup period.
last_step (int) – The index of last step. Default: -1.
Example
>>> lr_scheduler = CosineAnnealingLR(optimizer, ...) >>> warmup_scheduler = ExponentialWarmup(optimizer, warmup_period=1000) >>> for batch in dataloader: >>> optimizer.zero_grad() >>> loss = ... >>> loss.backward() >>> optimizer.step() >>> with warmup_scheduler.dampening(): >>> lr_scheduler.step()
Warning
The warmup schedule must not be initialized before the initialization of the learning rate schedule.
- class pytorch_warmup.base.LinearWarmup(optimizer, warmup_period, last_step=-1)[source]
Linear warmup schedule.
The linear warmup schedule uses the warmup factor
\[\omega_{t}^{\rm linear, \tau} = \min \left\{ 1, \frac{1}{\tau} \cdot t \right\} \]at each iteration \(t\), where \(\tau\) is the warmup period.
- Parameters:
optimizer (Optimizer) – Wrapped optimizer.
RAdam
is not suitable because of the warmup redundancy.warmup_period (int or list[int]) – The warmup period \(\tau\).
last_step (int) – The index of last step. Default: -1.
Example
>>> lr_scheduler = CosineAnnealingLR(optimizer, ...) >>> warmup_scheduler = LinearWarmup(optimizer, warmup_period=2000) >>> for batch in dataloader: >>> optimizer.zero_grad() >>> loss = ... >>> loss.backward() >>> optimizer.step() >>> with warmup_scheduler.dampening(): >>> lr_scheduler.step()
Warning
The warmup schedule must not be initialized before the initialization of the learning rate schedule.