New optimizers other than Adam

In this note, I plan to explore various new optmizers for training neural networks:

List

[Old Optimizer, New Norm: An Anthology]
Jeremy Bernstein, Laker Newhouse
[signSGD: Compressed Optimisation for Non-Convex Problems]
Jeremy Bernstein, Yu-Xiang Wang, Kamyar Azizzadenesheli, Anima Anandkumar
[Shampoo: Preconditioned Stochastic Tensor Optimization]
Vineet Gupta, Tomer Koren, Yoram Singer
[Muon: An optimizer for hidden layers in neural networks]
Keller Jordan
[Mango]
Qinzi Zhang, Ashok Cutkosky
[Training Deep Learning Models with Norm-Constrained LMOs]
Thomas Pethick, Wanyun Xie, Kimon Antonakopoulos, Zhenyu Zhu, Antonio Silveti-Falls, Volkan Cevher

Push me :)

I must confess—I’m a bit lazy at the moment. If you’re really interested in any of these topics, feel free to give me a nudge (or a push!) via email to expand on them further (even asking for a chinese version).