
https://github.com/microsoft/DeepSpeedExamples/tree/master/applications/DeepSpeed-Chat
一个快速、负担得起、可扩展和开放的系统框架,用于实现端到端强化学习人类反馈 (RLHF) 培训体验,以生成各种规模的高质量 ChatGPT 样式模型。

目录
https://github.com/databrickslabs/dolly
https://github.com/OptimalScale/LMFlow
https://github.com/CarperAI/trlx
https://github.com/huggingface/peft
DeepSpeed Chat 特性
快速上手
cd DeepSpeed
pip install .
git clone https://github.com/microsoft/DeepSpeedExamples.git
cd DeepSpeedExamples/applications/DeepSpeed-Chat/
pip install -r requirements.txt
cd training/step1_supervised_finetuning/
# Run the training script
bash training_scripts/single_gpu/run_1.3b.sh
# Evaluate the model
bash evaluation_scripts/run_prompt.sh
cd training/step2_reward_model_finetuning
# Run the training script
bash training_scripts/run_350m.sh
# Evaluate the model
bash evaluation_scripts/run_eval.sh
cd training/step3_rlhf_finetuning/
# Run the training script
bash training_scripts/single_gpu/run_1.3b.sh
actor_model_name_or_path=args.actor_model_name_or_path,
critic_model_name_or_path=args.critic_model_name_or_path,
tokenizer=tokenizer,
num_total_iters=num_total_iters,
args=args)
trainer = DeepSpeedPPOTrainer(engine=engine, args=args)
for prompt_batch in prompt_train_dataloader:
out = trainer.generate_experience(prompt_batch)
actor_loss, critic_loss = trainer.train_rlhf(out)
python chat.py --path ${PATH-to-your-actor-model}
科技改变生活

图 2:第 3 步吞吐量与其他两个系统框架(Colossal AI 的 Coati 和 Huggingface-DDP)的比较,用于在单个 NVIDIA A100-40G 商用 GPU 上加速 RLHF 训练。没有图标代表 OOM 场景。

图 3. 在配备 8 个 NVIDIA A100-40G GPU 的单个 DGX 节点上,不同模型大小的训练管道第 3 步(最耗时的部分)的端到端训练吞吐量比较。没有图标代表 OOM 场景。

图 4. DeepSpeed Chat 混合引擎的卓越生成阶段加速:在具有 8 个 A100-40G GPU 的单个 DGX 节点上训练 OPT-1.3B 参与者模型 + OPT-350M 奖励模型的时间/序列分解。
支持的模型
引用
-
[1] Schulman, John, et al. "Introducing ChatGPT",https://openai.com/blog/chatgpt (2022). -
[2] Ouyang, Long, et al. "Training language models to follow
instructions with human feedback." arXiv preprint arXiv:2203.02155
(2022). This is also referred as InstructGPT -
[3] Stiennon, Nisan, et al. "Learning to summarise with human
feedback." Advances in Neural Information Processing Systems 33
(2020): 3008-3021. -
[4] Transformers Hugging Face (github.com) -
[5] CarperAI, https://github.com/CarperAI/trlx -
[6] lvwerra/trl: Train transformer language models with reinforcementlearning. (github.com) -
[7] pg-is-all-you-need/02.PPO.ipynb at master ·
MrSyee/pg-is-all-you-need (github.com)
作者:致Great 来源:blog.csdn.net/yanqianglifei/article/details/130141730 版权申明:内容来源网络,仅供学习研究,版权归原创者所有。如有侵权烦请告知,我们会立即删除并表示歉意。谢谢!
如果您在领域内有自己的独特见解、落地实践等内容
即刻扫码提交议题

点这里↓↓↓记得关注标星哦~

