GitXplorerGitXplorer
t

Mode-Connectivity

public
0 stars
0 forks
0 issues

Commits

List of commits on branch main.
Verified
591c4e8fa3cd7cc56fc881c6c0cc2efb187e5da0

Update README.md

EErdunGAO committed 2 years ago
Verified
49a2c73eb55445ebe6f1a8e53ac16195161fc895

Update README.md

EErdunGAO committed 2 years ago
Verified
313b9e14845ea2c3855ca1c96781df0a546b1598

Update README.md

EErdunGAO committed 2 years ago
Verified
159f37a4c463b806c2c9b9cc9d1f69e1b8ea2c3e

Update README.md

EErdunGAO committed 2 years ago
Verified
ad99a4b643053b45bb0151e888eab6da5dcd9704

Update README.md

EErdunGAO committed 2 years ago
Verified
d81dba01510aabf881742748e18523531ac7491e

Update README.md

EErdunGAO committed 2 years ago

README

The README file for this repository.

Mode-Connectivity

This repo is a collection of AWESOME things about recent researches about Mode Connectivity, mainly about the findings of Geometric Properties of learned neural networks. For the detailed methods side, we include papers which try to rebasin the different learned models and how to directly induce a better model (generalization) from one-shot training. For the application side, downstream taks such as federated learning, continual learning and sparse neural network which may potentially benefit from the advances of the model connectivity research. Waited to be researched...

Please feel free to fork.

Geometric Properties

The LMC

  • Topology and Geometry of Half-Rectified Network Optimization.

    C. Daniel Freeman, Joan Bruna [ICLR17]

The Nonlienar MC

  • Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs.

    Timur Garipov, Pavel Izmailov, Dmitrii Podoprikhin, Dmitry Vetrov, Andrew Gordon Wilson [Neurips18][codes]

  • Essentially No Barriers in Neural Network Energy Landscape.

    Felix Draxler, Kambis Veschgini, Manfred Salmhofer, Fred A. Hamprecht [ICML18][codes]

  • 👀 Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling.

    Gregory W. Benton, Wesley J. Maddox, Sanae Lotfi, Andrew Gordon Wilson [ICML21][codes]

Findings

  • Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances.

    Berfin Şimşek, François Ged, Arthur Jacot, Francesco Spadaro, Clément Hongler, Wulfram Gerstner, Johanni Brea [ICML21][codes]

  • (Initialisations) Random initialisations performing above chance and how to find them.

    Frederik Benzing, Simon Schug, Robert Meier, Johannes von Oswald, Yassir Akram, Nicolas Zucchet, Laurence Aitchison, Angelika Steger [NeurIPS22 OPT][codes]

  • Linear mode connectivity and the lottery ticket hypothesis.

    Jonathan Frankle, Gintare Karolina Dziugaite, Daniel Roy, Michael Carbin [ICML20][codes]

  • (Functional behaviors of end points) On Convexity and Linear Mode Connectivity in Neural Networks.

    David Yunis, Kumar Kshitij Patel, Pedro Henrique Pamplona Savarese, Gal Vardi, Jonathan Frankle, Matthew Walter, Karen Livescu, Michael Maire [NeurIPS22 OPT]

  • Large Scale Structure of Neural Network Loss Landscapes.

    Stanislav Fort, Stanislaw Jastrzebski [NeurIPS19]

  • Plateau in Monotonic Linear Interpolation -- A "Biased" View of Loss Landscape for Deep Networks.

    Xiang Wang, Annie N. Wang, Mo Zhou, Rong Ge [ICLR23]

  • Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes.

    James Lucas, Juhan Bae, Michael R. Zhang, Stanislav Fort, Richard Zemel, Roger Grosse [ICML21][codes]

Theory

  • Explaining landscape connectivity of low-cost solutions for multilayer nets.

    Rohith Kuditipudi, Xiang Wang, Holden Lee, Yi Zhang, Zhiyuan Li, Wei Hu, Sanjeev Arora, Rong Ge [NeurIPS19]

Methods for rebasin

  • (Width, Depth)(Simulated Annealing) The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks.

    Rahim Entezari, Hanie Sedghi, Olga Saukh, Behnam Neyshabur [ICLR22][codes]

  • Optimizing mode connectivity via neuron alignment.

    N. Joseph Tatro, Pin-Yu Chen, Payel Das, Igor Melnyk, Prasanna Sattigeri, Rongjie Lai [NeurIPS19][codes]

  • 🔥 👍 (Three methods) Git Re-Basin: Merging Models modulo Permutation Symmetries.

    Samuel K. Ainsworth, Jonathan Hayase, Siddhartha Srinivasa [ICLR23][codes][pytorch]

  • Re-basin via implicit Sinkhorn differentiation.

    Fidel A. Guerrero Peña, Heitor Rapela Medeiros, Thomas Dubail, Masih Aminbeidokhti, Eric Granger, Marco Pedersoli [paper22]

  • Linear Mode Connectivity of Deep Neural Networks via Permutation Invariance and Renormalization.

    Rahim Entezari, Hanie Sedghi, Olga Saukh, Behnam Neyshabur [ICLR23][codes]

Model merging

  • 👍 [Stochastic Weight Averaging] Averaging Weights Leads to Wider Optima and Better Generalization.

    Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, Andrew Gordon Wilson [UAI18][codes]

  • Subspace Inference for Bayesian Deep Learning.

    Pavel Izmailov, Wesley J. Maddox, Polina Kirichenko, Timur Garipov, Dmitry Vetrov, Andrew Gordon Wilson. [UAI19][codes]

  • Bayesian Nonparametric Federated Learning of Neural Networks.

    Mikhail Yurochkin, Mayank Agarwal, Soumya Ghosh, Kristjan Greenewald, Trong Nghia Hoang, and Yasaman Khazaeni. [ICML19][codes]

  • Model fusion via optimal transport.

    Singh, Sidak Pal and Jaggi, Martin [NeurIPS20][codes]

  • [Averaging merge] Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time.

    Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, Ludwig Schmidt [ICML22][codes]

  • lo-fi: distributed fine-tuning without communication.

    Mitchell Wortsman, Suchin Gururangan, Shen Li, Ali Farhadi, Ludwig Schmidt, Micheal Rabbat, Ari S. Morcos [TMLR23]

  • 👍 Learning Neural Network Subspaces.

    Mitchell Wortsman, Maxwell Horton, Carlos Guestrin, Ali Farhadi, Mohammad Rastegari [ICML21][codes]

  • Robust fine-tuning of zero-shot models.

    Mitchell Wortsman, Gabriel Ilharco, Jong Wook Kim, Mike Li, Simon Kornblith, Rebecca Roelofs, Raphael Gontijo-Lopes, Hannaneh Hajishirzi, Ali Farhadi, Hongseok Namkoong, Ludwig Schmidt [CVPR22][codes]

  • [Fisher merge] Merging Models with Fisher-Weighted Averaging.

    Michael Matena, Colin Raffel [NeurIPS22][codes]

  • [Regression Mean merge] Dataless Knowledge Fusion by Merging Weights of Language Models.

    Xisen Jin, Xiang Ren, Daniel Preotiuc-Pietro, Pengxiang Cheng [ICLR23]

  • Wasserstein Barycenter-based Model Fusion and Linear Mode Connectivity of Neural Networks.

    Aditya Kumar Akash, Sixu Li, Nicolás García Trillos [paper23][codes]

  • PopulAtion Parameter Averaging (PAPA).

    Alexia Jolicoeur-Martineau, Emy Gervais, Kilian Fatras, Yan Zhang, Simon Lacoste-Julien [paper23]

  • ZipIt! Merging Models from Different Tasks without Training.

    George Stoica, Daniel Bolya, Jakob Bjorner, Taylor Hearn, and Judy Hoffman [paper23]

Pretrained model connectivity

  • What is being transferred in transfer learning?

    Behnam Neyshabur, Hanie Sedghi, Chiyuan Zhang [NeurIPS20][codes]

  • Exploring Mode Connectivity for Pre-trained Language Models.

    Yujia Qin, Cheng Qian, Jing Yi, Weize Chen, Yankai Lin, Xu Han, Zhiyuan Liu, Maosong Sun, Jie Zhou [EMNLP22][codes]

  • Knowledge is a Region in Weight Space for Fine-tuned Language Models.

    Almog Gueta, Elad Venezian, Colin Raffel, Noam Slonim, Yoav Katz, Leshem Choshen [paper23]

Equivariant Network Design

  • 👀 Equivariant Architectures for Learning in Deep Weight Spaces.

    Aviv Navon, Aviv Shamsian, Idan Achituve, Ethan Fetaya, Gal Chechik, Haggai Maron [paper23][codes]

  • 👀 Permutation Equivariant Neural Functionals.

    Allan Zhou, Kaien Yang, Kaylee Burns, Yiding Jiang, Samuel Sokota, J. Zico Kolter, Chelsea Finn [paper23]

Related paper

  • Entropy-SGD optimizes the prior of a PAC-Bayes bound: Generalization properties of Entropy-SGD and data-dependent priors.

    Gintare Karolina Dziugaite, Daniel Roy [ICML18]

  • Sharpness-Aware Minimization for Efficiently Improving Generalization.

    Pierre Foret, Ariel Kleiner, Hossein Mobahi, Behnam Neyshabur [ICLR21][codes]

  • Model Ratatouille: Recycling Diverse Models for Out-of-Distribution Generalization.

    Alexandre Ramé, Kartik Ahuja, Jianyu Zhang, Matthieu Cord, Léon Bottou, David Lopez-Paz [paper23][codes]

Applications

  • (FL) Connecting Low-Loss Subspace for Personalized Federated Learning.

    Seok-Ju Hahn, Minwoo Jeong, Junghye Lee [KDD22][codes]

  • Linear Mode Connectivity in Multitask and Continual Learning.

    Seyed Iman Mirzadeh, Mehrdad Farajtabar, Dilan Gorur, Razvan Pascanu, Hassan Ghasemzadeh [ICLR21][codes]

  • All Roads Lead to Rome? On Invariance of BERT Representations.

    Yuxin Ren, Qipeng Guo, Zhijing Jin, Shauli Ravfogel, Mrinmaya Sachan, Ryan Cotterell, Bernhard Schölkopf [TACL23]

  • (Meta Learning) Subspace Learning for Effective Meta-Learning.

    Weisen Jiang, James Kwok, Yu Zhang [ICML22][codes]

  • (Incremental Learning) Towards better plasticity-stability trade-off in incremental learning: a simple linear connector.

    Guoliang Lin, Hanlu Chu, Hanjiang Lai [CVPR22][codes]

  • 👍 (Sparsity) Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning Ticket's Mask?

    Mansheej Paul, Feng Chen, Brett W. Larsen, Jonathan Frankle, Surya Ganguli, Gintare Karolina Dziugaite [ICLR23]

  • 👀 Improving Ensemble Distillation With Weight Averaging and Diversifying Perturbation.

    Giung Nam, Hyungi Lee, Byeongho Heo, Juho Lee [ICML22][codes]

  • LCS: Learning Compressible Subspaces for Efficient, Adaptive, Real-Time Network Compression at Inference Time

    Elvis Nunez, Maxwell Horton, Anish Prabhu, Anurag Ranjan, Ali Farhadi, Mohammad Rastegari[WACV23][codes]

  • (OOD) Diverse Weight Averaging for Out-of-Distribution Generalization

    Alexandre Ramé, Matthieu Kirchmeyer, Thibaud Rahier, Alain Rakotomamonjy, Patrick Gallinari, Matthieu Cord[NeurIPS22][codes]

  • Linear Connectivity Reveals Generalization Strategies.

    Jeevesh Juneja, Rachit Bansal, Kyunghyun Cho, João Sedoc, Naomi Saphra [ICLR23][codes]

Interesting paper

  • Editing models with task arithmetic.

    Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Ludwig Schmidt, Hannaneh Hajishirzi, Ali Farhadi [ICLR23][codes]

Tools