Generative AI Advanced Fine-Tuning for LLMs

Issued by IBM

The badge earner understands the concepts of large language models (LLMs) as policies and instruction-tuning. They will know how to reward a model using Hugging Face. They will also understand reinforcement learning from human feedback (RLHF) and proximal policy optimization (PPO), and how to create an optimal solution to direct preference optimization (DPO).