SiM: Training-Free Task Classification for Multi-Task Model Merging

Researchers have introduced SiM, a novel method for merging multiple task-specific models into a single multi-task model without requiring additional training or task ID information at inference time. SiM formulates routing as a training-free task classification problem, utilizing singular value decomposition (SVD) to approximate task manifolds and score tasks based on projection residuals.

RDR76Confidence 95%model mergingmulti-task learningroutingsingular value decompositionSVDtraining-freecomputer visionnatural language processing

Why it matters

This approach addresses a key challenge in multi-task model merging, where combined models often underperform individual experts due to parameter interference. By eliminating the need for extra training or task IDs, SiM offers a more efficient and accessible way for developers to create versatile models that retain expert-level performance across various tasks.

What changed

The advent of foundation models and the pre-training-finetuning paradigm has spurred efforts to consolidate multiple task-specific models into a single, unified multi-task model. However, a persistent issue has been the underperformance of these merged models compared to their individual expert counterparts, often attributed to parameter interference. Existing solutions, such as dynamic model merging with routing mechanisms, typically necessitate either extensive retraining with large labeled datasets or require knowledge of task IDs during inference.

This new research proposes SiM (Singular Value Decomposition-based Manifold approximation), a method that aims to bridge the performance gap between merged and individual expert models without demanding additional training or access to task IDs at inference time. SiM reframes the routing problem as a training-free task classification task for each input. It achieves this by employing SVD-based low-rank manifold approximations for each task. The SiM method scores tasks by calculating the projection residual of a test input's features onto each task's manifold, and then routes the input accordingly.

Crucially, the task manifolds can be pre-computed offline from a pre-trained backbone model. This pre-computation requires only a small per-task support set (e.g., 32 examples per task) and is performed prior to the merging process. This eliminates the need for any router training or the availability of data during the merging phase itself. Furthermore, SiM integrates seamlessly with existing subspace- or mask-based merging techniques that represent task-experts using lightweight compressed task vectors, thereby avoiding the storage overhead of full expert parameters.

Why it matters for builders

For AI builders, SiM presents a significant advancement in the efficiency and effectiveness of creating multi-task models. The ability to merge expert models without requiring further training or explicit task identification at runtime simplifies the development pipeline considerably. This means developers can potentially achieve higher performance from their merged models, closer to that of specialized models, without incurring the costs associated with additional data collection and training cycles. The integration with compressed task vectors also suggests more memory-efficient deployments.

Practical impact

Experiments conducted across benchmarks in computer vision and natural language processing, under task-unknown inference conditions, demonstrate that SiM substantially enhances the performance of merged models. The research indicates that SiM consistently narrows the performance gap between the merged model and individual task experts. This suggests that developers can leverage SiM to build more capable and versatile AI systems that can handle a wider range of tasks with greater accuracy and efficiency than previously possible with merged models.

Caveats and source limits

The findings presented are based on research published on arXiv and have not yet undergone peer review or been validated through independent third-party benchmarks. The specific performance gains and the exact size of the performance gap reduction are not quantified with precise numbers in the provided excerpt. The excerpt mentions experiments across computer vision and natural language processing, but details on the specific datasets, architectures, or the exact nature of the "task-unknown inference" scenario are not elaborated upon. The proposed method's scalability and performance on extremely large numbers of tasks or highly complex tasks are also not detailed.

Article ID - cmqq8ol4p0Featured on AI Radar: SiM: Training-Free Task Classification for Multi-Task Model Merging