本文来源:FUTURE | 远见 闵青云 选编
近日,浙江大学宋明黎团队以「Federated selective aggregation for on-device knowledge amalgamation」¹为题在Chip上发表最新研究成果,共同第一作者为谢东霖、余若男,通讯作者为宋明黎。Chip是全球唯一聚焦芯片类研究的综合性国际期刊,是入选了国家高起点新刊计划的「三类高质量论文」期刊之一。

得益于开源社区的发展,网络上公开了大量可用的预训练模型,他们包含了来自不同领域的专业知识。但在实际应用场景中,我们的需求往往更加复杂。因此,为了实现模型知识的灵活组合,提高下游模型的可复用性,知识重组采用一种「多个教师-学生」的训练范式,重组来自多个教师的知识并将其传授给学生,使得该学生模型能够胜任定制化的复杂任务。传统的知识重组方法²⁻⁴均基于一个重要的假设,即预训练模型和训练数据都可以直接访问。但由于隐私保护,商业利益等问题,模型和数据并不总是被公开的⁵。

图1 | 联邦知识重组框架。
为此,研究团队探索了一种新的知识重组问题,名为联邦知识重组,其中若干预训练模型和训练数据分布在不同的服务器节点上,它们不被公开,只能在本地进行训练。为了解决这个问题,本文提出一种基于选择性知识聚合的联邦知识重组方法(如图1),该方法能够自动化地选择合适的预训练模型,然后利用代理模型间接地学习选择的知识,并将其聚合到目标任务的模型中,以此达到定制目标模型的效果。具体而言,本文首先提出了一种基于显著图的模型选择策略,并通过代理模型在服务器节点之间传输知识,最后上传到中心节点实现进一步的知识聚合, 如图2。

图2 | 基于选择性知识聚合的联邦知识重组算法。
理论和实验表明,在不同的任务和数据集上,本文提出的方法均能比竞争对手实现更好的效果。此外,该方法还能显著降低知识重组过程中的通信和计算开销,使其更适合部署在边缘侧设备上。
Federated selective aggregation for on-device knowledge amalgamation¹
Due to the development of the open-source community, numerous pre-trained models are now readily accessible online, which contain professional knowledge from various fields. However, our practical application scenarios often demand greater complexity. Therefore, in order to achieve flexible combinations of model knowledge and improve the reusability of downstream models, knowledge amalgamation adopts a "multiple teacher-student" training paradigm, which can amalgamate knowledge from multiple teachers and transfer it to a single student model, enabling the student to tackle customized complex tasks effectively. Traditional knowledge amalgamation methods²⁻⁴ have relied heavily on the assumption that both the pre-trained models and training data are directly accessible. However, due to concerns related to privacy protection, commercial interests, and other issues, models and data are not always made publicly available⁵.
To this end, the research team investigated a new model reusing problem called federated knowledge amalgamation, in which several pre-trained models and training data are distributed on various server nodes. Importantly, these valuable resources are not publicly available and can only be accessed for local training. Hence, this paper proposed a novel method, termed Federated Selective Aggregation (FedSA), which selects the appropriate pre-trained models automatically and then utilizes local proxy models to transfer the selected knowledge to the target model. Specifically, FedSA introduces a saliency-based knowledge selection strategy and employs proxy models to transfer parameters across different nodes. Subsequently, the knowledge will be uploaded and aggregated at the center server.
Theory and experiments demonstrate that the proposed method achieves superior performance compared to its competitors across different tasks and datasets. Moreover, FedSA can also significantly reduce the communication and computation cost in the knowledge aggregation process, making it more suitable for deployment on edge devices.
参考文献:
1. Xie, D. L. et al. Federated selective aggregation for on-device knowledge amalgamation. Chip 2, 100053 (2023).
2. Shen, C., Wang, X., Song, J., Sun, L. & Song, M. Amalgamating knowledge towards comprehensive classification. In Proceedings of the AAAI Conference on Artificial Intelligence, 3068–3075 (2019).
3. Ye, J. et al. Student becoming the master: knowledge amalgamation for joint scene parsing, depth estimation, and more. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2829–2838 (IEEE, 2019).
4. Zhang, H. et al. Knowledge amalgamation for object detection with transformers. IEEE Trans. Image Process. 32, 2093–2106 (2023).
5. Ye, J., Ji, Y., Wang, X., Gao, X. & Song, M. Data-free knowledge amalgamation via group-stack dual-gan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12516-12525 (IEEE, 2020).
论文链接:
https://www.sciencedirect.com/science/article/pii/S2709472323000163

