论文自梳理

对于模型融合这个新兴的领域,我梳理了当前主要论文的调研逻辑,并对其中的重点论文做了详细的笔记。

💡我对这个领域的内容做了下面的分类与梳理:

(1)综述论文与总结博客:

  • [综述论文] [arXiv 2408.07666] Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities
  • [综述论文概览] [知乎] Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities
  • [一篇不错的理解] [知乎] 模型融合(Model Merging):合理性、常见技术及其特性
  • [一篇不错的技术博客] [个人博客] Model Merging: A Survey
  • [其它综述论文] [arXiv 2410.12927] SoK: On Finding Common Ground in Loss Landscapes Using Deep Model Merging Techniques
  • [其它综述论文] [arXiv 2408.07057] A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning
  • [其它综述论文] [arXiv 2407.06089] Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models
  • [其它综述论文] [arXiv 2310.08184] Learn From Model Beyond Fine-Tuning: A Survey

(2)Toolkit / Benchmark / Evaluation:

  • [Toolkit] [arXiv 2403.13257] Arcee's MergeKit: A Toolkit for Merging Large Language Models
  • [Benchmark] [arXiv 2406.03280] FusionBench: A Comprehensive Benchmark of Deep Model Fusion
  • [Evaluation] [arXiv 2409.18314] Realistic Evaluation of Model Merging for Compositional Generalization
  • [Evaluation] [arXiv 2410.03617] What Matters for Model Merging at Scale?
  • [Evaluation] [Comprehensive Comparison and Synergistic Application for Model Merging, MoE and Stacking] [arXiv 2410.05357] Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild
  • [Evaluation] [Unified Framework of Delta Parameter Editing] [arXiv 2410.13841] A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models

(3)Subspace-based Merging Method(Sparse or Low-rank Subspace):

  • [论文] [arXiv 2412.12153] Revisiting Weight Averaging for Model Merging
  • [论文] [arXiv 2412.00081] Task Singular Vectors: Reducing Task Interference in Model Merging
  • [论文] [arXiv 2412.00054] Less is More: Efficient Model Merging with Binary Task Switch
  • [论文] [arXiv 2411.16815] FREE-Merging: Fourier Transform for Model Merging with Lightweight Experts
  • [论文] [arXiv 2411.16139] Beyond Task Vectors: Selective Task Arithmetic Based on Importance Metrics
  • [论文] [arXiv 2410.19735] Model merging with SVD to tie the Knots
  • [论文] [arXiv 2410.05583] NegMerge: Consensual Weight Negation for Strong Machine Unlearning
  • [论文] [arXiv 2410.02396] Parameter Competition Balancing for Model Merging
  • [论文] [arXiv 2408.13656] Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic
  • [论文] [arXiv 2408.09485] Activated Parameter Locating via Causal Intervention for Model Merging
  • [论文] [arXiv 2406.11617] DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling
  • [论文] [arXiv 2405.17461] EMR-Merging: Tuning-Free High-Performance Model Merging
  • [论文] [arXiv 2405.07813] Localizing Task Information for Improved Model Merging and Compression
  • [论文] [arXiv 2403.02799] DPPA: Pruning Method for Large Language Model to Model Merging
  • [论文] [arXiv 2311.03099] Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
  • [论文] [arXiv 2306.01708] TIES-Merging: Resolving Interference When Merging Models

(4)Routing-based Merging Methods:

  • [论文] [arXiv 2410.21804] Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging
  • [论文] [arXiv 2408.10174] SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models
  • [论文] [arXiv 2406.15479] Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging
  • [论文] [arXiv 2406.12034] Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
  • [论文] [arXiv 2406.09770] Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion
  • [论文] [arXiv 2402.05859] Learning to Route Among Specialized Experts for Zero-Shot Generalization
  • [论文] [arXiv 2402.00433] Merging Multi-Task Models via Weight-Ensembling Mixture of Experts
  • [论文] [arXiv 2310.01334] Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy
  • [论文] [arXiv 2306.03745] Soft Merging of Experts with Adaptive Routing
  • [论文] [arXiv 2212.05055] Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints