ValueError: Unable to create node 'LabelEncoder' with name='LabelEncoder3' and attributes={'keys_floats': array([False, True]), 'values_int64s': array([0, 1])}. #1047

TopCoder2K · 2023-12-03T08:29:14Z

Hi there!

I'm trying to convert the sklearn's Pipeline to the ONNX format, but I get a strange error. The Pipeline is the following:

        preprocessor = ColumnTransformer(
            transformers=[
                ("cat", OrdinalEncoder(dtype=np.int64), categorical_features),
                ("num", "passthrough", numerical_features),
            ],
            sparse_threshold=1,
            verbose_feature_names_out=False,
        ).set_output(transform="pandas")
        model = make_pipeline(
            self.preprocessor, RandomForestRegressor(**cfg.model.hyperparams)
        )

The dataset is obtained via fetch_openml("Bike_Sharing_Demand", version=2, as_frame=True, parser="pandas") and little preprocessing (e.g. the "heavy_rain" category is merged with the "rain" category). The result of print(X_train.iloc[:1]) is

   season  month  hour  holiday  weekday  workingday weather  temp  feel_temp  humidity  windspeed
0  spring      1     0    False        6       False   clear  9.84     14.395      0.81        0.0

When I run model_onnx = to_onnx(model, X=X_train.iloc[:1], verbose=1), I get an error in the title. The type guessing seems to work fine:

[to_onnx] initial_types=[('season', StringTensorType(shape=[None, 1])), ('month', Int64TensorType(shape=[None, 1])), ('hour', Int64TensorType(shape=[None, 1])), ('holiday', BooleanTensorType(shape=[None, 1])), ('weekday', Int64TensorType(shape=[None, 1])), ('workingday', BooleanTensorType(shape=[None, 1])), ('weather', StringTensorType(shape=[None, 1])), ('temp', DoubleTensorType(shape=[None, 1])), ('feel_temp', DoubleTensorType(shape=[None, 1])), ('humidity', DoubleTensorType(shape=[None, 1])), ('windspeed', DoubleTensorType(shape=[None, 1]))]

The remaining part of the output is:

[convert_sklearn] parse_sklearn_model
[convert_sklearn] convert_topology
[convert_operators] begin
[convert_operators] iteration 1 - n_vars=0 n_ops=5
[call_converter] call converter for 'SklearnConcat'.
[call_converter] call converter for 'SklearnOrdinalEncoder'.
Traceback (most recent call last):
  File "/home/topcoder2k/.cache/pypoetry/virtualenvs/mlopscourse-4vHWhyzM-py3.9/lib/python3.9/site-packages/skl2onnx/common/_container.py", line 707, in add_node
    node = make_node(op_type, inputs, outputs, name=name, **new_attrs)
  File "/home/topcoder2k/.cache/pypoetry/virtualenvs/mlopscourse-4vHWhyzM-py3.9/lib/python3.9/site-packages/onnx/helper.py", line 164, in make_node
    node.attribute.extend(
  File "/home/topcoder2k/.cache/pypoetry/virtualenvs/mlopscourse-4vHWhyzM-py3.9/lib/python3.9/site-packages/onnx/helper.py", line 165, in <genexpr>
    make_attribute(key, value)
  File "/home/topcoder2k/.cache/pypoetry/virtualenvs/mlopscourse-4vHWhyzM-py3.9/lib/python3.9/site-packages/onnx/helper.py", line 894, in make_attribute
    raise ValueError(
ValueError: Could not infer the attribute type from the elements of the passed Iterable value.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/topcoder2k/CodeProjects/MLOps/mlops-course/commands.py", line 59, in <module>
    fire.Fire()
  File "/home/topcoder2k/.cache/pypoetry/virtualenvs/mlopscourse-4vHWhyzM-py3.9/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/topcoder2k/.cache/pypoetry/virtualenvs/mlopscourse-4vHWhyzM-py3.9/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/topcoder2k/.cache/pypoetry/virtualenvs/mlopscourse-4vHWhyzM-py3.9/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/topcoder2k/CodeProjects/MLOps/mlops-course/commands.py", line 30, in train
    Trainer(config_name, **kwargs).train()
  File "/home/topcoder2k/CodeProjects/MLOps/mlops-course/mlopscourse/train.py", line 69, in train
    model_onnx = to_onnx(model.model, X=X_train.iloc[:1], verbose=1)
  File "/home/topcoder2k/.cache/pypoetry/virtualenvs/mlopscourse-4vHWhyzM-py3.9/lib/python3.9/site-packages/skl2onnx/convert.py", line 306, in to_onnx
    return convert_sklearn(
  File "/home/topcoder2k/.cache/pypoetry/virtualenvs/mlopscourse-4vHWhyzM-py3.9/lib/python3.9/site-packages/skl2onnx/convert.py", line 208, in convert_sklearn
    onnx_model = convert_topology(
  File "/home/topcoder2k/.cache/pypoetry/virtualenvs/mlopscourse-4vHWhyzM-py3.9/lib/python3.9/site-packages/skl2onnx/common/_topology.py", line 1532, in convert_topology
    topology.convert_operators(container=container, verbose=verbose)
  File "/home/topcoder2k/.cache/pypoetry/virtualenvs/mlopscourse-4vHWhyzM-py3.9/lib/python3.9/site-packages/skl2onnx/common/_topology.py", line 1349, in convert_operators
    self.call_converter(operator, container, verbose=verbose)
  File "/home/topcoder2k/.cache/pypoetry/virtualenvs/mlopscourse-4vHWhyzM-py3.9/lib/python3.9/site-packages/skl2onnx/common/_topology.py", line 1132, in call_converter
    conv(self.scopes[0], operator, container)
  File "/home/topcoder2k/.cache/pypoetry/virtualenvs/mlopscourse-4vHWhyzM-py3.9/lib/python3.9/site-packages/skl2onnx/common/_registration.py", line 27, in __call__
    return self._fct(*args)
  File "/home/topcoder2k/.cache/pypoetry/virtualenvs/mlopscourse-4vHWhyzM-py3.9/lib/python3.9/site-packages/skl2onnx/operator_converters/ordinal_encoder.py", line 89, in convert_sklearn_ordinal_encoder
    container.add_node(
  File "/home/topcoder2k/.cache/pypoetry/virtualenvs/mlopscourse-4vHWhyzM-py3.9/lib/python3.9/site-packages/skl2onnx/common/_container.py", line 709, in add_node
    raise ValueError(
ValueError: Unable to create node 'LabelEncoder' with name='LabelEncoder3' and attributes={'keys_floats': array([False,  True]), 'values_int64s': array([0, 1])}.

The skl2onnx version from the poetry.lock is:

[[package]]
name = "skl2onnx"
version = "1.15.0"
description = "Convert scikit-learn models to ONNX"
optional = false
python-versions = "*"
files = [
    {file = "skl2onnx-1.15.0-py2.py3-none-any.whl", hash = "sha256:13a9ea5d50619ce42381c67001db8c87ce574a459a8f0738b45d2f4b93f465f6"},
    {file = "skl2onnx-1.15.0.tar.gz", hash = "sha256:05b2c2643ad0357ec1ea684d138438a2df657df828e57d07cb78c2e76be20e37"},
]

UPD:

I've also decided to leave here the python and scikit-learn versions:

python --- 3.9.13
scikit-learn --- 1.3.1

The text was updated successfully, but these errors were encountered:

xadupre · 2023-12-07T11:10:27Z

Did you try X=X_train[:1] instead? onnx needs a column name for each of them.

TopCoder2K · 2023-12-08T19:28:38Z

@xadupre, yes, I tried. I've just tried one more time and got exactly the same error (I compared it to the output above using www.diffchecker.com and the only difference is on the screenshot).

xadupre · 2023-12-10T09:52:30Z

I created PR #1049 to look into your issue but I can't replicate it. I assume the test I used is different from yours. Could you let me know what is different?

TopCoder2K · 2023-12-10T20:58:39Z

@xadupre, thank you for the tests!

I looked at the tests and they gave me an idea of what is wrong. I haven't noticed that month, hour, holiday, weekday, workingday are also treated as categorical values in my code. So, I set the initial types by myself:

                initial_types = [
                    ('season', StringTensorType(shape=[None, 1])),
                    ('month', StringTensorType(shape=[None, 1])),
                    ('hour', StringTensorType(shape=[None, 1])),
                    ('holiday', StringTensorType(shape=[None, 1])),
                    ('weekday', StringTensorType(shape=[None, 1])),
                    ('workingday', StringTensorType(shape=[None, 1])),
                    ('weather', StringTensorType(shape=[None, 1])),
                    ('temp', DoubleTensorType(shape=[None, 1])),
                    ('feel_temp', DoubleTensorType(shape=[None, 1])),
                    ('humidity', DoubleTensorType(shape=[None, 1])),
                    ('windspeed', DoubleTensorType(shape=[None, 1]))
                ]
                model_onnx = to_onnx(model.model, initial_types=initial_types, verbose=1)

but the error persists:

ValueError: Unable to create node 'LabelEncoder' with name='LabelEncoder3' and attributes={'keys_floats': array([False,  True]), 'values_int64s': array([0, 1])}.

What am I missing? It seems there is a problem when converting the "boolean" column holiday that is treated as categorical by the OrdinalEncoder.

xadupre · 2023-12-11T08:25:23Z

booleans are not supported by onnx LabelEncoder (see https://onnx.ai/onnx/operators/onnx_aionnxml_LabelEncoder.html#l-onnx-docai-onnx-ml-labelencoder). You should convert them into int64 before calling the converter.

TopCoder2K · 2023-12-11T15:42:08Z

Oh, I didn't know this...

You should convert them into int64 before calling the converter.

Sorry, I don't understand. Should I convert the entire column before training? Or can I just convert the provided X_train[:1] and then, at inference time, safely send booleans?

And why does the holiday column is treated as boolean if I set its type to string?

xadupre · 2023-12-11T15:43:50Z

You can use boolean for training but the input schema for the converter must be integers and onnxruntime will expect integers as well when running inference.

TopCoder2K · 2023-12-12T08:22:00Z

Ah, I see. It's more coherent to convert the entire dataset before the training then. I did this and

model_onnx = to_onnx(model.model, X=X_train[:1], verbose=1)

has worked correctly:

[convert_sklearn] parse_sklearn_model
[convert_sklearn] convert_topology
[convert_operators] begin
[convert_operators] iteration 1 - n_vars=0 n_ops=5
[call_converter] call converter for 'SklearnConcat'.
[call_converter] call converter for 'SklearnOrdinalEncoder'.
[call_converter] call converter for 'SklearnConcat'.
[call_converter] call converter for 'SklearnConcat'.
[call_converter] call converter for 'SklearnRandomForestRegressor'.
[convert_operators] end iter: 1 - n_vars=58
[convert_operators] iteration 2 - n_vars=58 n_ops=5
[convert_operators] end iter: 2 - n_vars=58
[convert_operators] end.
[_update_domain_version] +opset 0: name='ai.onnx.ml', version=2
[_update_domain_version] +opset 1: name='', version=18
[convert_sklearn] end

Thank you for your help! Closing the issue.

TopCoder2K · 2023-12-12T09:06:55Z

Oh, no, there is another error when I try to load the onnx model.

                model_onnx = to_onnx(model.model, X=X_train[:1], verbose=1)
                sess = InferenceSession(
                    model_onnx.SerializeToString(), providers=["CPUExecutionProvider"]
                )

gives

File "/home/topcoder2k/CodeProjects/MLOps/mlops-course/mlopscourse/train.py", line 70, in train
    sess = InferenceSession(
  File "/home/topcoder2k/.cache/pypoetry/virtualenvs/mlopscourse-4vHWhyzM-py3.9/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/home/topcoder2k/.cache/pypoetry/virtualenvs/mlopscourse-4vHWhyzM-py3.9/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 454, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (LabelEncoder5) Op (LabelEncoder) [ShapeInferenceError] Input type is not int64 tensor but keys_int64s is set

What can be wrong with converting the LabelEncoder?.. The initial types are

[to_onnx] initial_types=[('season', StringTensorType(shape=[None, 1])), ('month', Int64TensorType(shape=[None, 1])), ('hour', Int64TensorType(shape=[None, 1])), ('holiday', Int64TensorType(shape=[None, 1])), ('weekday', Int64TensorType(shape=[None, 1])), ('workingday', Int64TensorType(shape=[None, 1])), ('weather', StringTensorType(shape=[None, 1])), ('temp', DoubleTensorType(shape=[None, 1])), ('feel_temp', DoubleTensorType(shape=[None, 1])), ('humidity', DoubleTensorType(shape=[None, 1])), ('windspeed', DoubleTensorType(shape=[None, 1]))]

xadupre · 2023-12-12T09:12:59Z

Based on the error message, I assume one input is expected to be an integer by the LabelEncoder but it is not.

TopCoder2K · 2023-12-12T09:29:41Z

Sorry, I don't know onnx well, but at what point is something send to the LabelEncoder? I do not provide any data. If X_train[:1] is used, I'm sure there are integers where necessary:

season         object
month           int64
hour            int64
holiday         int64
weekday         int64
workingday      int64
weather        object
temp          float64
feel_temp     float64
humidity      float64
windspeed     float64

xadupre · 2023-12-12T09:34:21Z

Could you use a tool like netron to look at your model and search for node LabelEncoder5? It should show you the data it processes and lead you to the input it is connected to.

TopCoder2K · 2023-12-12T09:46:22Z

I don't see any problems here, do you? (The LabelEncoder5 is under Y = 5)

Here is also Concat:

TopCoder2K · 2023-12-12T09:49:43Z

Cast for the workingday is:

xadupre · 2023-12-12T09:53:22Z

The concat node is casting every column to a single type. I assume it is string. So every label encoder is expecting string input. It seems skl2onnx is unable to handle this scenario. It usually follows sikit-learn implementation. Sometimes, we do not test it against all cases, sometimes scikit-learn is changing its implementation. You may try the latest version released today to see if it fixes it. Otherwise, I suggest splitting the ordinal encoder into 2, one for strings columns, another one for integers. It should not impact your pipeline but the converted models should have two concat nodes, one for strings, another one for integers.

TopCoder2K · 2023-12-12T11:16:50Z

The concat node is casting every column to a single type

Oh, it's strange that this is done without any warning... Thank you for pointing it out!

You may try the latest version released today to see if it fixes it

I've tried 1.16.0 and the RE has gone! But I've tried to mimic your tests and got too much difference between the predictions of the original and the onnx versions of the model. Why can this be? The code is:

                model_onnx = to_onnx(model.model, X=X_train[:1], verbose=1)
                sess = InferenceSession(
                    model_onnx.SerializeToString(), providers=["CPUExecutionProvider"]
                )
                dict_data = {
                    column: X_train[column].values.reshape((-1, 1))
                    for column in X_train.columns
                }
                got = sess.run(
                    None,
                    dict_data,
                )
                preds = model.model.predict(X_train)
                for col in X_train.columns:
                    assert X_train[col][6354] == dict_data[col][6354][0]
                print(
                    (preds - got[0].ravel()).sum(),
                    (preds - got[0].ravel()).argmax(),
                    preds[6354],
                    got[0].ravel()[6354]
                )

output:

128.99087821669974 6354 422.1 390.79

xadupre · 2023-12-12T11:19:47Z

I'll need to get the full example to understand what is going on (without the data). What is RE?

TopCoder2K · 2023-12-12T11:38:02Z

What is RE?

Oops, sorry, I used the wrong abbreviation. I meant the error: onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (LabelEncoder5) Op (LabelEncoder) [ShapeInferenceError] Input type is not int64 tensor but keys_int64s is set It is gone!

I'll need to get the full example to understand what is going on (without the data)

The repository is public and contains little code, since it is designed for educational purposes, but I'm not sure you have time to figure out even this code. The above code is just a modification of the 68th line. I'm running the commands.py with

poetry run python3 commands.py train --config_name rf_config

xadupre · 2024-01-24T14:39:20Z

If you could create a failed unittest like in this file https://github.com/onnx/sklearn-onnx/blob/main/tests/test_issues_2024.py, it would save me some time.

TopCoder2K · 2024-01-24T17:24:44Z

I can try!
I'll endeavor to create a minimal reproducible example over the weekend and adjust it to look like the ones in the link.

xadupre · 2024-01-24T18:14:38Z

Thanks :)

TopCoder2K · 2024-01-27T14:50:16Z

I decided to create an example on Google's Colab as a first attempt, since you may have some improvement remarks. Moreover, if I don't save and then load the dataset, I obtain a strange error:

sess = InferenceSession(
    model_onnx.SerializeToString(), providers=["CPUExecutionProvider"]
)
dict_data = {
    column: X_train[column].values.reshape((-1, 1))
    for column in X_train.columns
}
# ONNX inference
got = sess.run(
    None,
    dict_data,
)

produces

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-42-ecf344d75e61>](https://localhost:8080/#) in <cell line: 12>()
     10 }
     11 # ONNX inference
---> 12 got = sess.run(
     13     None,
     14     dict_data,

[/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py](https://localhost:8080/#) in run(self, output_names, input_feed, run_options)
    218             output_names = [output.name for output in self._outputs_meta]
    219         try:
--> 220             return self._sess.run(output_names, input_feed, run_options)
    221         except C.EPFail as err:
    222             if self._enable_fallback:

RuntimeError: Input must be a list of dictionaries or a single numpy array for input 'holiday'.

whereas with

X_train.to_csv("train_split.csv")
X_train_copy = pd.read_csv(f"train_split.csv", index_col=0)
dict_data = {
    column: X_train_copy[column].values.reshape((-1, 1))
    for column in X_train.columns
}

everything works fine...

If you know how to fix this, I can create a PR with adjusted code added as a unit test.

TopCoder2K · 2024-02-27T09:43:57Z

@xadupre, have you looked at the notebook? Have you observed the same behaviour?

TopCoder2K · 2024-03-29T13:29:46Z

@xadupre, have you managed to take a look at the notebook?

TopCoder2K closed this as completed Dec 12, 2023

TopCoder2K reopened this Dec 12, 2023

github-project-automation bot added this to Can Fix Aug 29, 2024

github-project-automation bot moved this to To do in Can Fix Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: Unable to create node 'LabelEncoder' with name='LabelEncoder3' and attributes={'keys_floats': array([False, True]), 'values_int64s': array([0, 1])}. #1047

ValueError: Unable to create node 'LabelEncoder' with name='LabelEncoder3' and attributes={'keys_floats': array([False, True]), 'values_int64s': array([0, 1])}. #1047

TopCoder2K commented Dec 3, 2023 •

edited

Loading

xadupre commented Dec 7, 2023

TopCoder2K commented Dec 8, 2023 •

edited

Loading

xadupre commented Dec 10, 2023

TopCoder2K commented Dec 10, 2023 •

edited

Loading

xadupre commented Dec 11, 2023

TopCoder2K commented Dec 11, 2023

xadupre commented Dec 11, 2023

TopCoder2K commented Dec 12, 2023

TopCoder2K commented Dec 12, 2023

xadupre commented Dec 12, 2023

TopCoder2K commented Dec 12, 2023

xadupre commented Dec 12, 2023

TopCoder2K commented Dec 12, 2023

TopCoder2K commented Dec 12, 2023

xadupre commented Dec 12, 2023

TopCoder2K commented Dec 12, 2023

xadupre commented Dec 12, 2023

TopCoder2K commented Dec 12, 2023

xadupre commented Jan 24, 2024

TopCoder2K commented Jan 24, 2024 •

edited

Loading

xadupre commented Jan 24, 2024

TopCoder2K commented Jan 27, 2024 •

edited

Loading

TopCoder2K commented Feb 27, 2024

TopCoder2K commented Mar 29, 2024

ValueError: Unable to create node 'LabelEncoder' with name='LabelEncoder3' and attributes={'keys_floats': array([False, True]), 'values_int64s': array([0, 1])}. #1047

ValueError: Unable to create node 'LabelEncoder' with name='LabelEncoder3' and attributes={'keys_floats': array([False, True]), 'values_int64s': array([0, 1])}. #1047

Comments

TopCoder2K commented Dec 3, 2023 • edited Loading

UPD:

xadupre commented Dec 7, 2023

TopCoder2K commented Dec 8, 2023 • edited Loading

xadupre commented Dec 10, 2023

TopCoder2K commented Dec 10, 2023 • edited Loading

xadupre commented Dec 11, 2023

TopCoder2K commented Dec 11, 2023

xadupre commented Dec 11, 2023

TopCoder2K commented Dec 12, 2023

TopCoder2K commented Dec 12, 2023

xadupre commented Dec 12, 2023

TopCoder2K commented Dec 12, 2023

xadupre commented Dec 12, 2023

TopCoder2K commented Dec 12, 2023

TopCoder2K commented Dec 12, 2023

xadupre commented Dec 12, 2023

TopCoder2K commented Dec 12, 2023

xadupre commented Dec 12, 2023

TopCoder2K commented Dec 12, 2023

xadupre commented Jan 24, 2024

TopCoder2K commented Jan 24, 2024 • edited Loading

xadupre commented Jan 24, 2024

TopCoder2K commented Jan 27, 2024 • edited Loading

TopCoder2K commented Feb 27, 2024

TopCoder2K commented Mar 29, 2024

TopCoder2K commented Dec 3, 2023 •

edited

Loading

TopCoder2K commented Dec 8, 2023 •

edited

Loading

TopCoder2K commented Dec 10, 2023 •

edited

Loading

TopCoder2K commented Jan 24, 2024 •

edited

Loading

TopCoder2K commented Jan 27, 2024 •

edited

Loading