Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] Support RoPE Op and fuse #13978

Closed
wants to merge 8 commits into from

Conversation

ys44kim
Copy link
Contributor

@ys44kim ys44kim commented Sep 11, 2024

this PR is a draft version of the compiler related to RoPE op fuse.
Please refer to the issue below.
#13972

when i reflect it in the future, i will proceed according to the rules below.

  • single module per PR
  • single context of change per PR

const int32_t i0_n = input_shape.Dims(0);
const int32_t multihead_n = input_shape.Dims(1);
const int32_t i2_n = input_shape.Dims(2);
const int32_t head_n = input_shape.Dims(3);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the meaning of the variable name is ambiguous.
the meaning of i0 and i2 is input rank0 and input rank2.
I would like to change this to a more clear name, but since I didn't exactly understand the meaning of this rank,
so I used a simple name(ex. i0, i2.).
If recommend variable name with a clear meaning, I will change it.
and,
as a kernel added to verify fused ops, optimization is not considered and is implemented as is in the graph.
and, although it may already be optimized,
there is no environment (input, golden data?) for measuring latency yet (maybe i don't know),
so i don't know what environment the data should be measured to be meaningful. andi plan to further

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rank0 and input rank2.

usually we used term dim as dimension for this.
meaning for each dim, maybe the source paper can explain or the python script for the model.

for measuring latency yet

we don't measure this in frontend :)

i don't know what environment the data should be measured to be meaningful.

as of now, we don't. just random input data, compared to source model, onnx-runtime for onnx and tflite interpreter for tflite model, looked as some computational black box.

@@ -619,6 +620,7 @@ union BuiltinOptions {
BitcastOptions,
BitwiseXorOptions,
RightShiftOptions,
RoPEOptions = 250,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will change op number after rmsnorm is added because the number overlaps with rmsnorm

@@ -0,0 +1,211 @@
operand {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rope part was extracted from the tg model(layers.tg.dynamic.1.q.opt.circle) and I created tflite receipe by referring to the graph.
and, the sin/cos table was divided into 0~80 in the range of -1 to 1, made into constants, and added as input.
and, the tg model is a quantized model and the data type is INT16. I tried to create a receipt with this data type, but
an error occurred because the luci interpreter's handling of INT16 was abnormal, so I created a receipe with Float32.
i think modifying the interpreter to support INT16 should be classified as a different task, and is it necessary? I think we need to check first.
and, through luci-pass-value-test, It was confirmed that the tflite model and circle model fused with rope op have the same results.

@ys44kim ys44kim added the DRAFT A draft issue or PR for sharing one's current working status and discussion. label Sep 11, 2024
@ys44kim ys44kim marked this pull request as draft September 11, 2024 01:11
@ys44kim ys44kim mentioned this pull request Sep 11, 2024
15 tasks
add param to support two modes(neox, gpt) of rope

ONE-DCO-1.0-Signed-off-by: youngsik kim <[email protected]>
table RoPEOptions {
mode: RoPEMode;
}

// An OperatorCode can be an enum value (BuiltinOperator) if the operator is a
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add rope mode (neox, gpt)
#13972 (comment)

ONE-DCO-1.0-Signed-off-by: youngsik kim <[email protected]>
@@ -1521,7 +1521,13 @@ table InstanceNormOptions {
fused_activation_function:ActivationFunctionType;
}

enum RoPEMode : int {
NEOX,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanshpark
when expressing in enum, should I exclude '0'?
the mode of test.recipe is set to 0. the option can be expressed abnormaly.

flatc --json ./res/CircleSchema/0.9/circle_schema.fbs -- ./build/compiler/common-artifacts/RoPE_000.circle

          builtin_options_type: "RoPEOptions",
          builtin_options: {
          }

if from '1', it is normal.
and, Is there a missing code? Or does it not support 0 structurally?
I thought you said that yesterday. I don't remember.--;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I exclude '0'?

I think we can follow what other enum does.
Others have explicitly have values like

  NEOX = 0,

And some has NONE = 0 or DEFAULT = 0. But as I understand, we don't need none or default.

and, Is there a missing code? Or does it not support 0 structurally?

I can't catch the problem right now. I'll look up to the code...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanshpark
thanks
i checked with circledump, and it is displayed normally

./build/compiler/circledump/circledump ./build/compiler/common-artifacts/RoPE_000.circle
O(0:0) ROPE
    mode(GPT_NEOX)
    I T(0:0) ifm
    I T(0:1) sin_table
    I T(0:2) cos_table
    O T(0:3) ofm

ONE-DCO-1.0-Signed-off-by: youngsik kim <[email protected]>
ys44kim added a commit to ys44kim/ONE that referenced this pull request Sep 13, 2024
this commit supports for RoPE operation in luc IR

ONE-DCO-1.0-Signed-off-by: youngsik kim <[email protected]>

draft : Samsung#13978
issue : Samsung#13972
ys44kim added a commit to ys44kim/ONE that referenced this pull request Sep 13, 2024
this commits support RoPE for circle chef

ONE-DCO-1.0-Signed-off-by: youngsik kim <[email protected]>

draft : Samsung#13978
issue : Samsung#13972
ys44kim added a commit to ys44kim/ONE that referenced this pull request Sep 24, 2024
ONE-DCO-1.0-Signed-off-by: youngsik kim <[email protected]>

draft: Samsung#13978
issue: Samsung#13972
ys44kim added a commit to ys44kim/ONE that referenced this pull request Sep 24, 2024
ys44kim added a commit to ys44kim/ONE that referenced this pull request Sep 24, 2024
ONE-DCO-1.0-Signed-off-by: youngsik kim <[email protected]>

draft: Samsung#13978
issue: Samsung#13972
ys44kim added a commit to ys44kim/ONE that referenced this pull request Sep 24, 2024
this commit supports for RoPE operation in luc IR

ONE-DCO-1.0-Signed-off-by: youngsik kim <[email protected]>

draft : Samsung#13978
issue : Samsung#13972
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DRAFT A draft issue or PR for sharing one's current working status and discussion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants