[Work Group] Add RLHF like ColosallChat on bigger dataset to achieve ChatGPT quality #943

alyxdow · 2023-04-03T18:02:06Z

alyxdow
Apr 3, 2023

Add RLHF like ColosallChat on bigger dataset to achieve ChatGPT quality

Although models in the GPT series, such as ChatGPT and GPT-4, are highly powerful, they are unlikely to be fully open-sourced. Fortunately, the open-source community has been working hard to address this.

For example, Meta has open-sourced the LLaMA model, which offers parameter sizes ranging from 7 billion to 65 billion. A 13 billion parameter model can outperform the 175 billion GPT-3 model on most benchmark tests. However, since it doesn’t have an instruct tuning stage, its actual generated results are not satisfactory.

Stanford’s Alpaca generates training data in a self-instructed manner by calling OpenAI’s API. With only 7 billion parameters, this lightweight model can be fine-tuned at a fraction of the cost to achieve conversational performance similar to a very large language model like GPT-3.5 with 175 billion parameters.

However, existing open-source solutions can only be considered as supervised fine-tuned models in the first stage of RLHF (Reinforcement Learning from Human Feedback), with subsequent alignment and fine-tuning stages not performed. Additionally, Alpaca’s training dataset is limited to English, which to some extent restricts the model’s performance.

Yet, the impressive effects of ChatGPT and GPT-4 are due to the introduction of RLHF into the training process, which increases the consistency of the generated content with human values.

Training Dataset Open Source

ColossalChat releases a bilingual dataset comprising approximately 100,000 Q&A pairs in both English and Chinese. The dataset was collected and cleaned from real-life question scenarios on social media platforms, serving as the seed dataset, and was expanded using self-instruct technology, and annotation costs were approximately $900. Compared to datasets generated by other self-instruct methods, this dataset contains more realistic and diverse seed data and encompasses a wider range of topics. The dataset is suitable for both fine-tuning and RLHF training. With the provision of high-quality data, ColossalChat can achieve better dialogue interactions and also support Chinese.

RLHF Algorithm Replication

The RLHF algorithm replication involves three stages:

In RLHF-Stage1, supervised instruct fine-tuning is performed using the datasets mentioned earlier to fine-tune the model.

In RLHF-Stage2, a reward model is trained to assign corresponding scores by manually ranking different outputs for the same prompt, which then supervises the training of the reward model.

In RLHF-Stage3, the reinforcement learning algorithm is being used, which is the most complex part of the training process:

In the PPO part, ColossalChat follows a two-stage process: first, the make experience stage, which uses SFT (Supervised Fine-Tuning), Actor, RM (Reward Model), and Critic models to calculate generated experience and store it in the buffer. Then comes the parameter update stage, which calculates the policy loss and value loss using the experience.

In the PTX part, ColossalChat calculates the cross-entropy loss between the Actor’s output response and the response part of the input corpus. This loss is used to add pre-training gradients to the PPO gradient to maintain the language model’s original performance and prevent forgetting. Finally, the policy loss, value loss, and PTX loss are summed up for backpropagation and parameter update.

alyxdow · 2023-04-03T18:05:16Z

alyxdow
Apr 3, 2023
Author

OIG Dataset

I came across the OIG dataset, which is a large open-source instruction dataset consisting of approximately 43M instructions. I believe this dataset could be a valuable addition to GitHub for those working on chatbot technology and related projects.

The dataset has been released by LAION.ai, along with its volunteers, Ontocord, Together and other members of the open source community. The purpose of this release is to create equal access to chatbot technology and encourage improvements from contributors.

0 replies

alyxdow · 2023-04-03T18:06:50Z

alyxdow
Apr 3, 2023
Author

As we work towards training the RHLF LLaMa model, I propose that we stack all the datasets we have found for this task. By combining these datasets, we can increase the variety and size of our training data, which will ultimately improve the accuracy and performance of our model.

I suggest that we carefully review each dataset to ensure that they are relevant to our task and that they meet our quality standards. We should also consider the format and structure of each dataset and determine the best way to combine them into a single training set.

0 replies

prusnak · 2023-04-04T00:38:18Z

prusnak
Apr 4, 2023
Collaborator Sponsor

How is any of this related to llama.cpp?

0 replies

Royalphax · 2023-04-05T16:28:14Z

Royalphax
Apr 5, 2023

Not an expert but it seems to be very well explained. I think that the question of how we can improve the quality of results we get during inference by using RLHF algorithm on datasets as you described is very interesting.

Now, is it the role of the llama.cpp project?
In my opinion : not sure.

I think llama.cpp is here to allow us to convert, quantize and mostly run (inference phase) models in the most optimized way.

Maybe the question of how to optimize current datasets with RLHF should be the goal of another open-source project. And/or this conversation should be moved to the "discussion" tab.

0 replies

jon-chuang · 2023-04-13T03:58:46Z

jon-chuang
Apr 13, 2023

Similar approach to RLHF (but lower quality) is used to train alpaca and vicuna, among others. Vote to close this issue.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Work Group] Add RLHF like ColosallChat on bigger dataset to achieve ChatGPT quality #943

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

[Work Group] Add RLHF like ColosallChat on bigger dataset to achieve ChatGPT quality #943

alyxdow Apr 3, 2023

Add RLHF like ColosallChat on bigger dataset to achieve ChatGPT quality

Training Dataset Open Source

RLHF Algorithm Replication

Replies: 5 comments

alyxdow Apr 3, 2023 Author

alyxdow Apr 3, 2023 Author

prusnak Apr 4, 2023 Collaborator Sponsor

Royalphax Apr 5, 2023

jon-chuang Apr 13, 2023

alyxdow
Apr 3, 2023

alyxdow
Apr 3, 2023
Author

alyxdow
Apr 3, 2023
Author

prusnak
Apr 4, 2023
Collaborator Sponsor

Royalphax
Apr 5, 2023

jon-chuang
Apr 13, 2023