论文自梳理

对于模型融合这个新兴的领域，我梳理了当前主要论文的调研逻辑，并对其中的重点论文做了详细的笔记。

💡我对这个领域的内容做了下面的分类与梳理：

（1）综述论文与总结博客：

[综述论文] [arXiv 2408.07666] Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities
[综述论文概览] [知乎] Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities
[一篇不错的理解] [知乎] 模型融合(Model Merging)：合理性、常见技术及其特性
[一篇不错的技术博客] [个人博客] Model Merging: A Survey
[其它综述论文] [arXiv 2410.12927] SoK: On Finding Common Ground in Loss Landscapes Using Deep Model Merging Techniques
[其它综述论文] [arXiv 2408.07057] A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning
[其它综述论文] [arXiv 2407.06089] Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models
[其它综述论文] [arXiv 2310.08184] Learn From Model Beyond Fine-Tuning: A Survey

（2）Toolkit / Benchmark / Evaluation：

[Toolkit] [arXiv 2403.13257] Arcee's MergeKit: A Toolkit for Merging Large Language Models
[Benchmark] [arXiv 2406.03280] FusionBench: A Comprehensive Benchmark of Deep Model Fusion
[Evaluation] [arXiv 2409.18314] Realistic Evaluation of Model Merging for Compositional Generalization
[Evaluation] [arXiv 2410.03617] What Matters for Model Merging at Scale?
[Evaluation] [Comprehensive Comparison and Synergistic Application for Model Merging, MoE and Stacking] [arXiv 2410.05357] Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild
[Evaluation] [Unified Framework of Delta Parameter Editing] [arXiv 2410.13841] A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models

（3）Subspace-based Merging Method（Sparse or Low-rank Subspace）：

[论文] [arXiv 2412.12153] Revisiting Weight Averaging for Model Merging
[论文] [arXiv 2412.00081] Task Singular Vectors: Reducing Task Interference in Model Merging
[论文] [arXiv 2412.00054] Less is More: Efficient Model Merging with Binary Task Switch
[论文] [arXiv 2411.16815] FREE-Merging: Fourier Transform for Model Merging with Lightweight Experts
[论文] [arXiv 2411.16139] Beyond Task Vectors: Selective Task Arithmetic Based on Importance Metrics
[论文] [arXiv 2410.19735] Model merging with SVD to tie the Knots
[论文] [arXiv 2410.05583] NegMerge: Consensual Weight Negation for Strong Machine Unlearning
[论文] [arXiv 2410.02396] Parameter Competition Balancing for Model Merging
[论文] [arXiv 2408.13656] Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic
[论文] [arXiv 2408.09485] Activated Parameter Locating via Causal Intervention for Model Merging
[论文] [arXiv 2406.11617] DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling
[论文] [arXiv 2405.17461] EMR-Merging: Tuning-Free High-Performance Model Merging
[论文] [arXiv 2405.07813] Localizing Task Information for Improved Model Merging and Compression
[论文] [arXiv 2403.02799] DPPA: Pruning Method for Large Language Model to Model Merging
[论文] [arXiv 2311.03099] Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
[论文] [arXiv 2306.01708] TIES-Merging: Resolving Interference When Merging Models

（4）Routing-based Merging Methods：

[论文] [arXiv 2410.21804] Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging
[论文] [arXiv 2408.10174] SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models
[论文] [arXiv 2406.15479] Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging
[论文] [arXiv 2406.12034] Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
[论文] [arXiv 2406.09770] Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion
[论文] [arXiv 2402.05859] Learning to Route Among Specialized Experts for Zero-Shot Generalization
[论文] [arXiv 2402.00433] Merging Multi-Task Models via Weight-Ensembling Mixture of Experts
[论文] [arXiv 2310.01334] Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy
[论文] [arXiv 2306.03745] Soft Merging of Experts with Adaptive Routing
[论文] [arXiv 2212.05055] Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

本页面最近更新：，更新历史
发现错误？想一起完善？在 GitHub 上编辑此页！
本页面贡献者：OI-wiki
本页面的全部内容在协议之条款下提供，附加条款亦可能应用