Task Vector Quantization for Memory-Efficient Model Merging

Abstract

Model merging enables efficient multi-task models by combining task-specific fine-tuned checkpoints. However, storing multiple task-specific checkpoints requires significant memory, limiting scalability and restricting model merging to larger models and diverse tasks. In this paper, we propose quantizing task vectors (i.e., the difference between pre-trained and fine-tuned checkpoints) instead of quantizing fine-tuned checkpoints. We observe that task vectors exhibit a narrow weight range, enabling low precision quantization (≤ 4 bit) within existing task vector merging frameworks. To further mitigate quantization errors within ultra-low bit precision (e.g., 2 bit), we introduce Residual Task Vector Quantization, which decomposes the task vector into a base vector and offset component. We allocate bits based on quantization sensitivity, ensuring precision while minimizing error within a memory budget. Experiments on image classification and dense prediction show our method maintains or improves model merging performance while using only 8% of the memory required for full-precision checkpoints.

Motivation

# Tasks	Baseline	TVQ (ours)			RTVQ (ours)
# Tasks	FP32	INT8	INT4	INT2	B3O2
8	9.1 GB	2.3 GB	1.1 GB	0.6 GB	0.7 GB
14	16.0 GB	4.0 GB	2.0 GB	1.0 GB	1.2 GB
20	22.8 GB	5.7 GB	2.9 GB	1.4 GB	1.7 GB

Model merging aims to combine multiple well-trained models into a single set of parameters. However, storing the checkpoints required to combine these models entails significant memory overhead. For instance, a ViT-L/14 model needs 1.14 GB per fine-tuned checkpoint, totaling 22.8 GB for 20 tasks. In resource-constrained environments like an edge device, such high memory demands for checkpoints hinder scaling to larger models and more tasks. Our approach overcomes this limitation, achieving up to a 13x reduction in storage costs while maintaining the original model performance.

TL;DR takeaways

Weight distribution for Q error – full model

Weight distribution for Q error – segmentation layers

We observe that task vectors (i.e., the difference between fine-tuned and pretrained checkpoints) exhibit a weight range that is an order of magnitude narrower than that of fine-tuned weights. Since the upper bound of quantization is determined by the range of weight value, quantizing task vectors results in a smaller quantization error compared to quantizing full fine-tuned checkpoints.

Based on this observation, we propose to quantize task vectors rather than the entire set of fine-tuned weights, which enables effective low-bit precision quantization (e.g., 4-bit).

To further reduce quantization error in extremely low bit precision, we introduce Residual Task Vector Quantization (RTVQ). The key idea is to decompose the multiple task vectors into a shared base vector and a several offset vectors. By assigning higher precision to the base vector and lower precision to the offset vectors, it reduces quantization error while keeping memory overhead minimal, as only one base vector is shared across all tasks.

A key advantage of our approach is that it operates soley on the checkpoints. As a result, our quantization method can be seamlessly integrated into existing model merging frameworks without any modifications. Extensive experiments on image classification and dense vision tasks validate its effectiveness.

Experimental Results

To assess the impact of quantization on model merging, we first quantize fine-tuned checkpoints (FQ), task vectors (TVQ), and residual task vectors (RTVQ), then apply these weights to various merging methods. We compare all methods with their full-precision (FP32) counterparts across multiple tasks. Our goal is not to maximize absolute performance but to show that even with highly compact quantization, the model remains effective across multiple tasks.

Merging 8 Classification Tasks

ViT-L/14

Method	Baseline	FQ		TVQ (ours)				RTVQ (ours)
Method	FP32	INT8	INT4	INT8	INT4	INT3	INT2	RTVQ (ours)
Individual	94.2	94.2 (0.0)	4.3 (-89.9)	94.1 (-0.1)	94.2 (0.0)	94.2 (0.0)	90.9 (-3.3)	–
Task arithmetic	84.3	84.1 (-0.2)	4.5 (-79.8)	84.3 (0.0)	84.4 (0.1)	84.8 (0.5)	77.9 (-6.4)	84.8 (0.5)
Ties merging	84.5	78.3 (-6.2)	4.2 (-80.3)	84.5 (0.0)	84.6 (0.1)	85.3 (0.8)	78.0 (-6.5)	81.6 (-2.9)
LiNeS	86.9	86.4 (-0.5)	5.4 (-81.5)	86.9 (0.0)	86.9 (0.0)	87.7 (0.8)	81.8 (-5.1)	87.7 (0.8)
Consensus TA	86.6	84.6 (-2.0)	4.2 (-82.4)	86.6 (0.0)	86.6 (0.0)	87.1 (0.5)	79.0 (-7.6)	87.3 (0.7)
AdaMerging	90.8	90.8 (0.0)	4.8 (-86.0)	90.9 (0.1)	90.9 (0.1)	91.0 (0.2)	89.4 (-1.4)	90.9 (0.1)
EMR-Merging	93.5	92.8 (-0.7)	4.4 (-89.1)	93.5 (0.0)	93.9 (0.4)	93.9 (0.4)	87.6 (-5.9)	90.3 (-3.2)

ViT-B/32

Method	Baseline	FQ		TVQ (ours)				RTVQ (ours)
Method	FP32	INT8	INT4	INT8	INT4	INT3	INT2	RTVQ (ours)
Individual	90.5	90.4 (-0.1)	4.2 (-86.3)	90.5 (0.0)	90.5 (0.0)	90.7 (0.2)	83.5 (-7.0)	–
Task arithmetic	69.2	68.1 (-1.1)	4.2 (-65.0)	69.0 (-0.2)	69.1 (-0.1)	71.2 (2.0)	62.1 (-7.1)	70.2 (1.0)
Ties merging	72.9	64.8 (-8.1)	4.2 (-68.7)	72.7 (-0.2)	72.0 (-0.9)	73.6 (0.7)	62.6 (-10.3)	72.7 (-0.2)
LiNeS	74.1	73.9 (-0.2)	4.3 (-69.8)	74.2 (0.1)	74.2 (0.1)	74.7 (0.6)	60.7 (-13.4)	74.2 (0.1)
Consensus TA	74.9	70.6 (-4.3)	3.7 (-71.2)	74.9 (0.0)	74.9 (0.0)	74.8 (-0.1)	58.5 (-16.4)	72.7 (-2.2)
AdaMerging	81.8	81.6 (-0.2)	4.5 (-77.3)	81.6 (-0.2)	81.5 (-0.3)	82.0 (0.2)	78.1 (-3.7)	82.8 (1.0)
EMR-Merging	88.3	88.7 (0.4)	3.9 (-84.4)	88.4 (0.1)	89.8 (1.5)	90.0 (1.7)	77.2 (-11.1)	83.2 (-5.1)

Directly quantizing fine-tuned checkpoints maintains acceptable performance at 8-bit precision but suffers a significant drop in accuracy at 4-bits. In contrast, TVQ show much better stability. Even at 4-bit and 3-bit precision, model accuracy stays close to FP32. However, at 2-bit, performance drops sharply, indicating that excessive compression introduces substantial quantization noise. RTVQ overcomes these limitations with a robust approach. It decomposes each task vector into a shared base vector (3 bits) and a residual vector (2 bits), requiring about 2.375 bits per task. This vector decomposition reduces the performance drop seen in 2-bit TVQ while preserving most of the accuracy benefits of higher-bit quantization.

Merging 14 and 20 Classification Tasks

Scaling to 14 and 20 classification tasks

With 14 and 20 tasks, storing full-precision fine-tuned checkpoints becomes more impractical and highlights the need for effective quantization. Note that, as the base vector is globally shared among tasks, RTVQ scales favorably by effectively reducing the per-task bit requirement as task numbers increase.

Merging Dense Prediction Tasks

Method	Segmentation ↑				Depth Estimation ↓				Normal Estimation ↓
Method	FP32	TVQ-INT4	TVQ-INT2	RTVQ	FP32	TVQ-INT4	TVQ-INT2	RTVQ	FP32	TVQ-INT4	TVQ-INT2	RTVQ
Individual	52.0	52.0 (0.0)	37.7 (-14.3)	–	41.5	41.4 (-0.1)	62.5 (+21.0)	–	24.2	24.2 (0.0)	34.2 (+10.0)	–
Task arithmetic	31.6	31.5 (-0.1)	36.4 (+4.8)	36.1 (+4.5)	24.0	24.0 (0.0)	26.2 (+2.2)	24.6 (+0.6)	30.6	30.6 (0.0)	36.1 (+5.5)	32.6 (+2.0)
Ties merging	39.9	40.0 (+0.1)	36.1 (-3.8)	37.0 (-2.9)	27.3	27.2 (-0.1)	26.5 (-0.8)	24.6 (-2.7)	36.2	36.2 (0.0)	37.0 (+0.8)	32.6 (-3.6)
MagMax	24.7	25.4 (+0.7)	29.9 (+5.2)	29.4 (+4.7)	23.9	24.2 (+0.3)	25.6 (+1.7)	24.7 (+0.8)	30.3	30.0 (-0.3)	32.2 (+1.9)	31.1 (+0.8)
Breadcrumbs	34.1	34.3 (+0.2)	32.2 (-1.9)	34.0 (-0.1)	27.2	27.2 (0.0)	28.4 (+1.2)	27.7 (+0.5)	36.9	37.0 (+0.1)	40.6 (+3.7)	38.3 (+1.4)
EMR-Merging	41.5	44.8 (+3.3)	21.3 (-20.2)	34.1 (-7.4)	19.4	18.8 (-0.6)	25.5 (+6.1)	22.1 (+2.7)	26.5	26.6 (+0.1)	45.2 (+18.7)	35.0 (+8.5)

Key Observations

Quantized task vectors deviate from their original directions in the loss landscape, sometimes shifting toward directions more beneficial for other tasks.

Our quantization process naturally prunes the task vector’s less impactful parameters by mapping small-magnitude weights to zero, leading to a sparsity of 56.7%.

BibTeX

@article{kim2025task,
  title={Task vector quantization for memory-efficient model merging},
  author={Kim, Youngeun and Lee, Seunghwan and Jung, Aecheon and Ryu, Bogon and Hong, Sungeun},
  journal={arXiv preprint arXiv:2503.06921},
  year={2025}
}