Flatting Packing / maybe fix #5443 and #5426 #5458

AlongWY · 2024-09-17T18:10:04Z

What does this PR do?

support flatting_packing
fix knapsack, may cause Running tokenizer on dataset 速度逐渐变慢 #5443
avoid supervised examples wrongly truncation 使用neat_packing进行sft训练，模型性能指标下降明显 #5426

Before submitting

Did you read the contributor guideline?
Did you write any new necessary tests?

AlongWY · 2024-09-17T20:11:29Z

src/llamafactory/data/processors/supervised.py

-        if total_length >= cutoff_len:
-            break
-
-        source_len, target_len = infer_seqlen(len(source_ids), len(target_ids), cutoff_len - total_length)


这里导致 Inst 数据被异常截断 #5426, 也许考虑引入一个新的参数来保证是否可以被截断？我的样本是2轮次的 tool 调用，但是如果截断就只会学习到输出 tool_calls 没有最后的答案了。而且这里现在截断的实现方式将会导致 user 和 assistant 的内容被截断。如在 mistral 模板中，会产生 [INST] xxxxxxx 的结果，而xxxxx[/INST] 就不见了，这显然是不正确的。

我觉得不是这里的问题？non-packing 也会有同样的行为

不过我确实觉得需要加一个参数控制一下，因为有些情况下不允许一个样本被中间截断

不截 prompt 的话 assistant 放在哪里呢

直接跳过，drop掉这个样本

加了参数控制是否可以截断，默认不能截断

AlongWY · 2024-09-17T22:25:42Z

src/llamafactory/data/processors/supervised.py

+                packed_input_ids.append(batch_input_ids[index])
+                packed_labels.append(batch_labels[index])
+                packed_images.append(batch_images[index])
+                packed_videos.append(batch_videos[index])


延迟处理，此时先不返回 position ids，在 collator 中整合并返回 position ids

AlongWY · 2024-09-17T22:29:42Z

src/llamafactory/train/sft/workflow.py

+        data_args.flatting_packing and
+        (getattr(model.config, "_attn_implementation", None) != "flash_attention_2")
+    ):
+        logger.warning("The `flatting_packing` only support `flash_attention_2`! Maybe cause Out of memory!")


也许应该强制开启 fa2，但是这个时候已经晚了

flat packing 应该不是和 fa2 强制绑定的，本质上就是 4d attention mask

应该是绑定的，packing-with-FA2，他是通过 flash-attention 直接计算的，不需要 4d attention mask 了，虽然本质上是这样的，但是 fa2 不能输入 4d attention mask，细节可以看这个 transformers pull request

我知道，他的实现是绑定的，原理上 sdpa 和 eager 照样能用

那可能也行

cx9208 · 2024-09-18T01:53:17Z

想问下这个flatting_packing和neat_packing的区别是什么呢，单看选项说明（Enable sequence packing with flattening）仍然不太理解

AlongWY · 2024-09-18T02:02:36Z

实现了这个 packing-with-FA2，经测试，该方案练吞吐量比 neat_packing 更高

AlongWY · 2024-09-18T10:14:02Z

mistral 的 function call 我还在修改，晚会提交

hiyouga · 2024-09-18T11:37:42Z

could you open another pr for function call updates?

AlongWY · 2024-09-18T12:46:16Z

好的，那我重新整理一下代码？

2. fix knapsack, may cause hiyouga#5443 3. avoid supervised examples wrongly truncation

AlongWY · 2024-09-18T13:34:53Z

现在应该是一个干净的提交，工具调用的 PR 在 #5473

muziyongshixin · 2024-09-26T03:24:00Z

实现了这个 packing-with-FA2，经测试，该方案练吞吐量比 neat_packing 更高
请问这个flatting packing有验证过收敛性么？

我在相同数据集上相同训练配置尝试了一下neat_packing 和 flatting_packing 发现flatting_packing 初始loss显著高于neat_packing(2.1 vs 0.9)
而且flatting_packing 训练step数高于neat_packing(10454 vs 9850)
训练完的结果也不如neat_packing

模型参数YI-9B lr=1e-5

muziyongshixin · 2024-09-26T04:58:33Z

实现了这个 packing-with-FA2，经测试，该方案练吞吐量比 neat_packing 更高
请问这个flatting packing有验证过收敛性么？

我在相同数据集上相同训练配置尝试了一下neat_packing 和 flatting_packing 发现flatting_packing 初始loss显著高于neat_packing(2.1 vs 0.9) 而且flatting_packing 训练step数高于neat_packing(10454 vs 9850) 训练完的结果也不如neat_packing

模型参数YI-9B lr=1e-5

找到flatten_packing初始loss高的原因了，transformers版本需要升级到最新4.45.0，accelerate==0.34.2
初始loss跟neat_packing差不多都是0.9左右的水平，同时step数有略微减小10454->10198，从预估的训练时间看略微提速（150h->131h) 不确定这些改变来自于哪里。
具体训练完的效果还有待验证。

juncaofish · 2024-10-08T01:55:44Z

Any updates for this PR?

Arcmoon-Hu · 2024-10-16T11:30:43Z

实现了这个 packing-with-FA2，经测试，该方案练吞吐量比 neat_packing 更高
请问这个flatting packing有验证过收敛性么？

我在相同数据集上相同训练配置尝试了一下neat_packing 和 flatting_packing 发现flatting_packing 初始loss显著高于neat_packing(2.1 vs 0.9) 而且flatting_packing 训练step数高于neat_packing(10454 vs 9850) 训练完的结果也不如neat_packing
模型参数YI-9B lr=1e-5

找到flatten_packing初始loss高的原因了，transformers版本需要升级到最新4.45.0，accelerate==0.34.2 初始loss跟neat_packing差不多都是0.9左右的水平，同时step数有略微减小10454->10198，从预估的训练时间看略微提速（150h->131h) 不确定这些改变来自于哪里。具体训练完的效果还有待验证。

好心人做完实验了吗，效果对比怎么样哇

AlongWY · 2024-10-18T14:34:23Z

@hiyouga 目前的实现有什么问题吗？

merge

AlongWY marked this pull request as draft September 17, 2024 19:00

AlongWY commented Sep 17, 2024

View reviewed changes

AlongWY mentioned this pull request Sep 17, 2024

使用neat_packing进行sft训练，模型性能指标下降明显 #5426

Open

1 task

AlongWY force-pushed the main branch from dfd9ab3 to 558b983 Compare September 17, 2024 22:11

AlongWY changed the title ~~支持 Mistral 格式的 function call 和 Flatting Packing~~ Flatting Packing / mistral style function call / maybe fix #5443 and #5426 Sep 17, 2024

AlongWY marked this pull request as ready for review September 17, 2024 22:13

AlongWY commented Sep 17, 2024

View reviewed changes

hiyouga added the pending This problem is yet to be addressed label Sep 18, 2024

hiyouga self-requested a review September 18, 2024 02:43

1. support flat_packing

7cab73b

2. fix knapsack, may cause hiyouga#5443 3. avoid supervised examples wrongly truncation

AlongWY force-pushed the main branch from 746a8f6 to 7cab73b Compare September 18, 2024 13:32

AlongWY changed the title ~~Flatting Packing / mistral style function call / maybe fix #5443 and #5426~~ Flatting Packing / maybe fix #5443 and #5426 Sep 18, 2024

AlongWY mentioned this pull request Sep 18, 2024

Running tokenizer on dataset 速度逐渐变慢 #5443

Open

1 task

Merge pull request #1 from hiyouga/main

342556d

merge

hiyouga force-pushed the main branch from 5569125 to b4c7dd3 Compare October 29, 2024 07:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flatting Packing / maybe fix #5443 and #5426 #5458

Flatting Packing / maybe fix #5443 and #5426 #5458

AlongWY commented Sep 17, 2024 •

edited

Loading

AlongWY Sep 17, 2024 •

edited

Loading

hiyouga Sep 18, 2024

AlongWY Sep 18, 2024 •

edited

Loading

hiyouga Sep 18, 2024

AlongWY Sep 18, 2024

AlongWY Sep 18, 2024

AlongWY Sep 17, 2024

AlongWY Sep 17, 2024 •

edited

Loading

hiyouga Sep 18, 2024

AlongWY Sep 18, 2024

hiyouga Sep 18, 2024

AlongWY Sep 18, 2024

cx9208 commented Sep 18, 2024

AlongWY commented Sep 18, 2024 •

edited

Loading

AlongWY commented Sep 18, 2024

hiyouga commented Sep 18, 2024

AlongWY commented Sep 18, 2024

AlongWY commented Sep 18, 2024 •

edited

Loading

muziyongshixin commented Sep 26, 2024

muziyongshixin commented Sep 26, 2024 •

edited

Loading

juncaofish commented Oct 8, 2024

Arcmoon-Hu commented Oct 16, 2024

AlongWY commented Oct 18, 2024

Flatting Packing / maybe fix #5443 and #5426 #5458

Are you sure you want to change the base?

Flatting Packing / maybe fix #5443 and #5426 #5458

Conversation

AlongWY commented Sep 17, 2024 • edited Loading

What does this PR do?

Before submitting

AlongWY Sep 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlongWY Sep 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlongWY Sep 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cx9208 commented Sep 18, 2024

AlongWY commented Sep 18, 2024 • edited Loading

AlongWY commented Sep 18, 2024

hiyouga commented Sep 18, 2024

AlongWY commented Sep 18, 2024

AlongWY commented Sep 18, 2024 • edited Loading

muziyongshixin commented Sep 26, 2024

muziyongshixin commented Sep 26, 2024 • edited Loading

juncaofish commented Oct 8, 2024

Arcmoon-Hu commented Oct 16, 2024

AlongWY commented Oct 18, 2024

AlongWY commented Sep 17, 2024 •

edited

Loading

AlongWY Sep 17, 2024 •

edited

Loading

AlongWY Sep 18, 2024 •

edited

Loading

AlongWY Sep 17, 2024 •

edited

Loading

AlongWY commented Sep 18, 2024 •

edited

Loading

AlongWY commented Sep 18, 2024 •

edited

Loading

muziyongshixin commented Sep 26, 2024 •

edited

Loading