Booster.predict returns same score for any distinct instance when executed via JEP #2500

ManuelMourato25 · 2019-10-09T15:44:20Z

I am trying to call LightGBM via JEP (Java Embedded Python), in order to predict the scores of a couple of records.
However, if I execute the Booster.predict command via JEP, it returns a constant score of 0.04742587, for every distinct record passed.
The same does not happen if I invoke LGBM from a Python script.
Any ideas on what the issue might be?
Note: when using JEP to invoke different models, like xgboost, this issue does not happen.

Environment info

Operating System: Ubuntu 18.04

CPU/GPU model: Running on local machine with 8 cores / 32 GB RAM

Python version: 3.6.9
GCC version: 7.4.0
Java version: 1.8
Jep version: 3.7.1

LightGBM version or commit hash: 2.3.1

Steps to reproduce

COMMON STEPS:

Extract the following model inside the zip into a file named m0_test.model:

m0_test.zip

Save the following script as classifier.py, in any folder you wish (replace <PATH_TO_MODEL> accordingly)

import lightgbm as lgb
import numpy as np
import pandas as pd
import random as rd


class Classifier:
    def __init__(self):
        # Necessary file paths
        model_file_name = '<PATH_TO_MODEL>/m0_test.model'

        # Load model.
        self.model = Classifier.get_model(model_file_name)

    def getClassDistribution(self, instances):
        score = self.get_transaction_score(instances)

        print('Score:'+str(score))

        return [np.array([score, 1 - score])]

    def get_transaction_score(self, instances):
        gbm=self.model
        # Get prediction.
        fraud_prediction = gbm.predict(instances, num_iteration=gbm.best_iteration)
        return fraud_prediction.flatten()

    @staticmethod
    def get_model(model_name):
        with open(model_name, 'r') as file:
            data = file.read()
            lgb_model = lgb.Booster(model_str=data)
            return lgb_model

JEP EXECUTION:

Install JEP

git clone https://github.com/ninia/jep.git 
cd jep 
echo "Installing JEP..." 
git checkout v3.7.1 
python setup.py build install

Set the following environment variables (replace variables inside <> accordingly):

LD_LIBRARY_PATH=<PATH_TO_JEP>/lib/python3.6/site-packages/jep
LD_PRELOAD=<PATH_TO_PYTHON_SO>/lib/libpython3.6m.so
JEP_LOCATION=<PATH_TO_JEP>/lib/python3.6/site-packages/jep
JEP_JAR =${JEP_LOCATION}/jep-3.7.0.jar
TO_PRELOAD=${LD_PRELOAD}

Compile and run the following Java Class (replace <PATH_TO_FOLDER_CONTAINING_CLASSIFIER.PY> accordingly):

package com.feedzai.tests;
import jep.Jep;

public class TestLightGBM {

    public static void main(String[] args) {

        try  {

            Jep jep= new Jep();
            jep.eval("import sys");
           jep.eval("from java.lang import System");
            jep.eval("sys.path.append(\"<PATH_TO_FOLDER_CONTAINING_CLASSIFIER.PY>\")");
            jep.eval("from classifier import Classifier");

            jep.eval("import numpy as np");

            jep.eval("record1 = np.array([[ 4.61935575 ,-5.18927169,  2.74834851,  1.0087401 ,  1.95090556, -3.33563201,"
                   + "1.  ,        2.     ,     2.     ,     2.        ]]) ");
            jep.eval("record2 = np.array([[2.30000000e+01, 2.60000000e+02, 1.00000000e+06, 2.29400000e+02,"
                    +"2.30209137e+07 ,1.09000000e+04 ,2.00000000e+00 ,2.00000000e+00,"
                    +"1.00000000e+00, 2.00000000e+00]])");
            jep.eval( "record3 = np.array([[2.20000000e+01, 2.37000000e+02 ,1.00000100e+06 ,3.50400000e+02,"
                   + "1.90109777e+07, 1.00320000e+04, 1.00000000e+00, 3.00000000e+00,"
                 +   "2.00000000e+00, 5.00000000e+00]])");

            jep.eval( "record4 = np.array([[9.80000000e+01, 2.57000000e+02, 1.00000200e+06, 3.33400000e+02,"
                    +"1.41323727e+07, 1.19990000e+04, 0.00000000e+00, 0.00000000e+00,"
                   +" 2.00000000e+00, 3.00000000e+00]])");
            jep.eval( "record5 = np.array([[1.30000000e+01, 3.17000000e+02, 1.00000300e+06, 6.99400000e+02,"
                   + "3.30892917e+07, 1.03560000e+04, 2.00000000e+00, 2.00000000e+00,"
                   +" 1.00000000e+00, 1.00000000e+00]])");


            jep.eval("test = Classifier()");
            jep.eval("test.getClassDistribution(record1)[0].item(0)");
            jep.eval("test.getClassDistribution(record2)[0].item(0)");
            jep.eval("test.getClassDistribution(record3)[0].item(0)");
            jep.eval("test.getClassDistribution(record4)[0].item(0)");
            jep.eval("test.getClassDistribution(record5)[0].item(0)");

        }
        catch(Exception e){
            System.out.println("Error");
            System.out.println(e);

        }
    }



}

See that the return score is always constant independent of the record classified:

Finished loading model, total used 200 iterations
Score:[0.04742587]
0.04742587317756678
Score:[0.04742587]
0.04742587317756678
Score:[0.04742587]
0.04742587317756678
Score:[0.04742587]
0.04742587317756678
Score:[0.04742587]
0.04742587317756678

PYTHON EXECUTION:

Open the Python console
Type the folowing code, equivalent to the example above. (replace <PATH_TO_FOLDER_CONTAINING_CLASSIFIER.PY> accordingly ):

import sys
sys.path.append("<PATH_TO_FOLDER_CONTAINING_CLASSIFIER.PY>") # ex: /opt/folder/
from classifier import Classifier
import numpy as np
record1 = np.array([[ 4.61935575 ,-5.18927169,  2.74834851,  1.0087401 ,  1.95090556, -3.33563201,
   1.  ,        2.     ,     2.     ,     2.        ]])
record2 = np.array([[2.30000000e+01, 2.60000000e+02, 1.00000000e+06, 2.29400000e+02,
  2.30209137e+07 ,1.09000000e+04 ,2.00000000e+00 ,2.00000000e+00,
  1.00000000e+00, 2.00000000e+00]])
record3 = np.array([[2.20000000e+01, 2.37000000e+02 ,1.00000100e+06 ,3.50400000e+02,
  1.90109777e+07, 1.00320000e+04, 1.00000000e+00, 3.00000000e+00,
  2.00000000e+00, 5.00000000e+00]])
record4 = np.array([[9.80000000e+01, 2.57000000e+02, 1.00000200e+06, 3.33400000e+02,
  1.41323727e+07, 1.19990000e+04, 0.00000000e+00, 0.00000000e+00,
  2.00000000e+00, 3.00000000e+00]])
record5 = np.array([[1.30000000e+01, 3.17000000e+02, 1.00000300e+06, 6.99400000e+02,
  3.30892917e+07, 1.03560000e+04, 2.00000000e+00, 2.00000000e+00,
  1.00000000e+00, 1.00000000e+00]])

test = Classifier()
print(test.getClassDistribution(record1)[0].item(0))
print(test.getClassDistribution(record2)[0].item(0))
print(test.getClassDistribution(record3)[0].item(0))
print(test.getClassDistribution(record4)[0].item(0))
print(test.getClassDistribution(record5)[0].item(0))

The instances are now correctly scored:

>>> print(test.getClassDistribution(record1)[0].item(0))
Score:[1.07931287e-09]
1.0793128698291579e-09
>>> print(test.getClassDistribution(record2)[0].item(0))
Score:[1.25450336e-09]
1.2545033620809255e-09
>>> print(test.getClassDistribution(record3)[0].item(0))
Score:[1.1813487e-09]
1.1813487042059903e-09
>>> print(test.getClassDistribution(record4)[0].item(0))
Score:[1.1813487e-09]
1.1813487042058477e-09
>>> print(test.getClassDistribution(record5)[0].item(0))
Score:[9.23272512e-10]
9.232725116775672e-10

The text was updated successfully, but these errors were encountered:

ManuelMourato25 · 2019-10-09T16:26:02Z

LInked with the following issue : ninia/jep#205

StrikerRUS · 2019-10-09T18:57:10Z

Since Python script returns a correct values, I guess there is an issue at the JEP side (we do not maintain it). Maybe something even with compatibility between numpy and JEP. Also, please note that you can use our SWIG bindings for JAVA application, or https://github.com/Azure/mmlspark, or https://github.com/jpmml/jpmml-lightgbm for prediction.

Since, you've already filed this issue to JEP, I'm going to close it here. However, feel free to update this issue if something will be needed from us for better compatibility with JEP.

ndjensen · 2019-10-21T15:52:51Z

@StrikerRUS, can you take a look at ninia/jep#205 (comment) to see if LightGBM should be improved to handle locales better?

StrikerRUS · 2019-10-21T19:33:10Z

@ndjensen Thanks a lot for keeping us with updates! Can you please elaborate a little bit more, in what aspects locale handling can be improved?

import locale

import numpy as np
import lightgbm as lgb

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

locale.getdefaultlocale()
>>> ('ru_RU', 'cp1251')

X, y = load_boston(True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

est = lgb.LGBMRegressor().fit(X_train, y_train)
pred_default = est.predict(X_test)

locale.setlocale(locale.LC_ALL, 'pt_PT')
>>> 'pt_PT'

est = lgb.LGBMRegressor().fit(X_train, y_train)
pred_with_custom_locale = est.predict(X_test)

np.testing.assert_allclose(pred_default, pred_with_custom_locale)

IDK, maybe this issue is somehow related to JEP: #1481.

ndjensen · 2019-10-21T23:12:50Z

I don't entirely understand it, you can read the other comments on the Jep ticket, but I think if you save a model file in one locale and load it in a different locale, the model file is not read correctly. @ManuelMourato25 already attached a model file and it doesn't work if read when the locale is Portuguese. @ManuelMourato25, please feel free to correct me if I'm not explaining it correctly.

ManuelMourato25 added the bug label Oct 9, 2019

ManuelMourato25 mentioned this issue Oct 9, 2019

Invoking LightGBM via JEP has distinct behaviour than invoking outside of JEP (simple python script) ninia/jep#205

Closed

StrikerRUS closed this as completed Oct 9, 2019

AlbertoEAF mentioned this issue Mar 8, 2020

[bug] SWIG Booster read/write dependent on machine locale #2890

Closed

lock bot locked as resolved and limited conversation to collaborators Mar 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Booster.predict returns same score for any distinct instance when executed via JEP #2500

Booster.predict returns same score for any distinct instance when executed via JEP #2500

ManuelMourato25 commented Oct 9, 2019

ManuelMourato25 commented Oct 9, 2019

StrikerRUS commented Oct 9, 2019

ndjensen commented Oct 21, 2019

StrikerRUS commented Oct 21, 2019 •

edited

Loading

ndjensen commented Oct 21, 2019

Booster.predict returns same score for any distinct instance when executed via JEP #2500

Booster.predict returns same score for any distinct instance when executed via JEP #2500

Comments

ManuelMourato25 commented Oct 9, 2019

Environment info

Steps to reproduce

COMMON STEPS:

JEP EXECUTION:

PYTHON EXECUTION:

ManuelMourato25 commented Oct 9, 2019

StrikerRUS commented Oct 9, 2019

ndjensen commented Oct 21, 2019

StrikerRUS commented Oct 21, 2019 • edited Loading

ndjensen commented Oct 21, 2019

StrikerRUS commented Oct 21, 2019 •

edited

Loading