论文自梳理
对于模型融合这个新兴的领域,我梳理了当前主要论文的调研逻辑,并对其中的重点论文做了详细的笔记。
💡我对这个领域的内容做了下面的分类与梳理:
(1)综述论文与总结博客:
- [综述论文] [arXiv 2408.07666] Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities
- [综述论文概览] [知乎] Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities
- [一篇不错的理解] [知乎] 模型融合(Model Merging):合理性、常见技术及其特性
- [一篇不错的技术博客] [个人博客] Model Merging: A Survey
- [其它综述论文] [arXiv 2410.12927] SoK: On Finding Common Ground in Loss Landscapes Using Deep Model Merging Techniques
- [其它综述论文] [arXiv 2408.07057] A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning
- [其它综述论文] [arXiv 2407.06089] Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models
- [其它综述论文] [arXiv 2310.08184] Learn From Model Beyond Fine-Tuning: A Survey
(2)Toolkit / Benchmark / Evaluation:
- [Toolkit] [arXiv 2403.13257] Arcee's MergeKit: A Toolkit for Merging Large Language Models
- [Benchmark] [arXiv 2406.03280] FusionBench: A Comprehensive Benchmark of Deep Model Fusion
- [Evaluation] [arXiv 2409.18314] Realistic Evaluation of Model Merging for Compositional Generalization
- [Evaluation] [arXiv 2410.03617] What Matters for Model Merging at Scale?
- [Evaluation] [Comprehensive Comparison and Synergistic Application for Model Merging, MoE and Stacking] [arXiv 2410.05357] Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild
- [Evaluation] [Unified Framework of Delta Parameter Editing] [arXiv 2410.13841] A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models
(3)Subspace-based Merging Method(Sparse or Low-rank Subspace):
- [论文] [arXiv 2412.12153] Revisiting Weight Averaging for Model Merging
- [论文] [arXiv 2412.00081] Task Singular Vectors: Reducing Task Interference in Model Merging
- [论文] [arXiv 2412.00054] Less is More: Efficient Model Merging with Binary Task Switch
- [论文] [arXiv 2411.16815] FREE-Merging: Fourier Transform for Model Merging with Lightweight Experts
- [论文] [arXiv 2411.16139] Beyond Task Vectors: Selective Task Arithmetic Based on Importance Metrics
- [论文] [arXiv 2410.19735] Model merging with SVD to tie the Knots
- [论文] [arXiv 2410.05583] NegMerge: Consensual Weight Negation for Strong Machine Unlearning
- [论文] [arXiv 2410.02396] Parameter Competition Balancing for Model Merging
- [论文] [arXiv 2408.13656] Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic
- [论文] [arXiv 2408.09485] Activated Parameter Locating via Causal Intervention for Model Merging
- [论文] [arXiv 2406.11617] DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling
- [论文] [arXiv 2405.17461] EMR-Merging: Tuning-Free High-Performance Model Merging
- [论文] [arXiv 2405.07813] Localizing Task Information for Improved Model Merging and Compression
- [论文] [arXiv 2403.02799] DPPA: Pruning Method for Large Language Model to Model Merging
- [论文] [arXiv 2311.03099] Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
- [论文] [arXiv 2306.01708] TIES-Merging: Resolving Interference When Merging Models
(4)Routing-based Merging Methods:
- [论文] [arXiv 2410.21804] Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging
- [论文] [arXiv 2408.10174] SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models
- [论文] [arXiv 2406.15479] Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging
- [论文] [arXiv 2406.12034] Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
- [论文] [arXiv 2406.09770] Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion
- [论文] [arXiv 2402.05859] Learning to Route Among Specialized Experts for Zero-Shot Generalization
- [论文] [arXiv 2402.00433] Merging Multi-Task Models via Weight-Ensembling Mixture of Experts
- [论文] [arXiv 2310.01334] Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy
- [论文] [arXiv 2306.03745] Soft Merging of Experts with Adaptive Routing
- [论文] [arXiv 2212.05055] Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
本页面最近更新:,更新历史
发现错误?想一起完善? 在 GitHub 上编辑此页!
本页面贡献者:OI-wiki
本页面的全部内容在 协议之条款下提供,附加条款亦可能应用