Rethinking Cross-task Performance
We hypothesize that a model's cross-task performance closely relates to its merging performance. In a preliminary study on 20 vision tasks with ViT-B/32, we observe a strong positive correlation (r = 0.863, p < 0.001) between average cross-task performance and average merge performance.
Concretely, given a model pair (A, B), we evaluate: (1) cross-task performance by attaching A's encoder to B's classifier and evaluating on B's task, and (2) merging performance by weight-averaging the two encoders and evaluating the merged encoder on B's task.