Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of exp() and exp2() #1712

Merged
merged 23 commits into from
May 3, 2023
Merged

Conversation

gptsarthak
Copy link
Contributor

No description provided.

@gptsarthak
Copy link
Contributor Author

159/202 Test #92: elemental_04 .....................***Exception: SegFault 0.20 sec

Cannot figure out whats causing the segfault. It does not give any error locally on my linux system.

Copy link
Collaborator

@czgdp1807 czgdp1807 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this I would recommend to implement exp using IntrinsicFunction API as we have done for abs, sin, cos already. Then you can implement generate_Exp (similarly as generate_ListIndex) in visit_IntrinsicFunction method in asr_to_llvm.cpp. Same for C backend. Then we won't have to use lfortran_intrinsics.c runtime library. Each backend will be able to use its preferred way of implementing exponentiation.

References,

For IntrinsicFunction - https://github.com/lcompilers/lpython/blob/main/src/libasr/pass/intrinsic_function_registry.h (see ListIndex specifically as we will be implementing exp in the backend as we have done for list.index).

void generate_ListIndex(ASR::expr_t* m_arg, ASR::expr_t* m_ele) {
ASR::ttype_t* asr_el_type = ASRUtils::get_contained_type(ASRUtils::expr_type(m_arg));
int64_t ptr_loads_copy = ptr_loads;
ptr_loads = 0;
this->visit_expr(*m_arg);
llvm::Value* plist = tmp;
ptr_loads = !LLVM::is_llvm_struct(asr_el_type);
this->visit_expr_wrapper(m_ele, true);
ptr_loads = ptr_loads_copy;
llvm::Value *item = tmp;
tmp = list_api->index(plist, item, asr_el_type, *module);
}
void visit_IntrinsicFunction(const ASR::IntrinsicFunction_t& x) {
switch (static_cast<ASRUtils::IntrinsicFunctions>(x.m_intrinsic_id)) {
case ASRUtils::IntrinsicFunctions::ListIndex: {
switch (x.m_overload_id) {
case 0: {
ASR::expr_t* m_arg = x.m_args[0];
ASR::expr_t* m_ele = x.m_args[1];
generate_ListIndex(m_arg, m_ele);
break ;
}
default: {
throw CodeGenError("list.index only accepts one argument",
x.base.base.loc);
}
}
break ;
}
default: {
throw CodeGenError( ASRUtils::IntrinsicFunctionRegistry::
get_intrinsic_function_name(x.m_intrinsic_id) +
" is not implemented by LLVM backend.", x.base.base.loc);
}
}
}

https://llvm.org/docs/LangRef.html#llvm-exp-intrinsic, https://llvm.org/docs/LangRef.html#llvm-exp2-intrinsic, https://llvm.org/doxygen/IRBuilder_8cpp_source.html#l00956

Something like,

builder->CreateUnaryIntrinsic(llvm::Intrinsic::exp, tmp)

@czgdp1807 czgdp1807 marked this pull request as draft April 17, 2023 04:56
@certik
Copy link
Contributor

certik commented Apr 20, 2023

exp and exp2 should be implemented using the IntrinsicFunction mechanism.

@gptsarthak
Copy link
Contributor Author

gptsarthak commented Apr 20, 2023

@certik I just started working on it right now, that was just a merge commit. I'll implement them now.

@gptsarthak
Copy link
Contributor Author

Implemented exp() and exp2 using IntrinsicFunction

# no import

def test():
    x : f64 = 0.5
    print(exp(x))
    print(exp2(x))

test()
(lp) sarthak@pop-os:~/lpython$ src/bin/lpython --fast examples/test.py
1.64872127070012819e+00
1.41421356237309515e+00

@gptsarthak
Copy link
Contributor Author

gptsarthak commented Apr 20, 2023

One thing I observed after implementing them is that Python does not actually have exp2() in the math module. Only Numpy has it. On the other hand, we have it in the math module, and since we have implemented them using IntrinsicFunction it will work without any import for now.

Just a thought - since there is repetition of code in eval and create, can we use macros instead? Something like #define eval_real(function). We can use them for any function that accepts only float as input, like log() or sqrt() shall we ever decide to implement them using Intrinsics.

Copy link
Contributor

@certik certik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good. Yes, we should simplify using macros, but in subsequent PRs.

@gptsarthak gptsarthak marked this pull request as ready for review April 21, 2023 06:30
@gptsarthak
Copy link
Contributor Author

If we do

from math import exp
print(exp(0.5))

or do

import math 
print(math.exp(0.5))

that still uses the old implementation.
Do we need to fix this? Most users would import the modules.
One solution I can think of is:
In intrinsic_function_registry.h

{"math_exp", {&Exp::create_Exp, &Exp::eval_Exp}},

In math.py as well as lpython_intrinsic_numpy.py

def exp(x: f64) -> f64:
    """
    Return `e` raised to the power `x`.
    """
    return math_exp(x)

Copy link
Collaborator

@czgdp1807 czgdp1807 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While doing AST to ASR transition, we also need to create IntrinsicFunction node in ASR as well. See,

if (!s) {
std::set<std::string> not_cpython_builtin = {
"sin", "cos", "gamma", "tan", "asin", "acos", "atan"
};
if (ASRUtils::IntrinsicFunctionRegistry::is_intrinsic_function(call_name)
&& not_cpython_builtin.find(call_name) == not_cpython_builtin.end()) {
ASRUtils::create_intrinsic_function create_func =
ASRUtils::IntrinsicFunctionRegistry::get_create_function(call_name);
Vec<ASR::expr_t*> args_; args_.reserve(al, x.n_args);
visit_expr_list(x.m_args, x.n_args, args_);
tmp = create_func(al, x.base.base.loc, args_,
[&](const std::string &msg, const Location &loc) {
throw SemanticError(msg, loc); });
return ;
} else if (intrinsic_procedures.is_intrinsic(call_name)) {

Probably adding "exp" should work. Also, we should raise an error if from math import exp or from numpy import exp isn't there. For "exp2" we should raise an error if from numpy import exp2 is absent. Also math.exp should only accept scalars (error otherwise). numpy.exp and numpy.exp2 can accept arrays.

@czgdp1807 czgdp1807 marked this pull request as draft April 21, 2023 11:10
@gptsarthak
Copy link
Contributor Author

gptsarthak commented Apr 21, 2023

@czgdp1807 So I did this

std::set<std::string> not_cpython_builtin = {
                "sin", "cos", "gamma", "tan", "asin", "acos", "atan", "exp", "exp2"
            };

and it raised the error as expected.

semantic error: Function 'exp' is not declared and not intrinsic
  --> examples/hi.py:12:7
   |
12 | print(exp(0.90))
   |       ^^^^^^^^^ 

But the problem is that when we do import, it is using the old implementation inside math.py.
How to deal with this? We need to use the intrinsic functions for the performance increase.

@czgdp1807
Copy link
Collaborator

czgdp1807 commented Apr 21, 2023

Well try to fix this. Remove Python implementations of exp and exp2 from LPython and then try. This needs debugging (and is doable).

@gptsarthak
Copy link
Contributor Author

gptsarthak commented Apr 22, 2023

Well try to fix this. Remove Python implementations of exp and exp2 from LPython and then try. This needs debugging (and is doable).

@czgdp1807 Please check if this is the correct way to handle this.

Also math.exp should only accept scalars (error otherwise). numpy.exp and numpy.exp2 can accept arrays.

I'm working on this. Done, please review. Conditions are complex because of functions that are in both modules.
Whats left is vectorization.
How to vectorize these functions for arrays in case of numpy? I could not figure it out. I know its something related to elemental.

@czgdp1807
Copy link
Collaborator

czgdp1807 commented Apr 24, 2023

Whats left is vectorization.
How to vectorize these functions for arrays in case of numpy? I could not figure it out. I know its something related to elemental.

We already have the support for that. See,

void replace_IntrinsicFunction(ASR::IntrinsicFunction_t* x) {
if( !ASRUtils::IntrinsicFunctionRegistry::is_elemental(x->m_intrinsic_id) ) {
return ;
}
LCOMPILERS_ASSERT(current_scope != nullptr);
const Location& loc = x->base.base.loc;
std::vector<bool> array_mask(x->n_args, false);
bool at_least_one_array = false;
for( size_t iarg = 0; iarg < x->n_args; iarg++ ) {
array_mask[iarg] = ASRUtils::is_array(
ASRUtils::expr_type(x->m_args[iarg]));
at_least_one_array = at_least_one_array || array_mask[iarg];
}
if (!at_least_one_array) {
return ;
}
std::string res_prefix = "_elemental_func_call_res";
ASR::expr_t* result_var_copy = result_var;
bool is_all_rank_0 = true;
std::vector<ASR::expr_t*> operands;
ASR::expr_t* operand = nullptr;
int common_rank = 0;
bool are_all_rank_same = true;
for( size_t iarg = 0; iarg < x->n_args; iarg++ ) {
result_var = nullptr;
ASR::expr_t** current_expr_copy_9 = current_expr;
current_expr = &(x->m_args[iarg]);
self().replace_expr(x->m_args[iarg]);
operand = *current_expr;
current_expr = current_expr_copy_9;
operands.push_back(operand);
int rank_operand = PassUtils::get_rank(operand);
if( common_rank == 0 ) {
common_rank = rank_operand;
}
if( common_rank != rank_operand &&
rank_operand > 0 ) {
are_all_rank_same = false;
}
array_mask[iarg] = (rank_operand > 0);
is_all_rank_0 = is_all_rank_0 && (rank_operand <= 0);
}
if( is_all_rank_0 ) {
return ;
}
if( !are_all_rank_same ) {
throw LCompilersException("Broadcasting support not yet available "
"for different shape arrays.");
}
result_var = result_var_copy;
if( result_var == nullptr ) {
result_var = PassUtils::create_var(result_counter, res_prefix,
loc, operand, al, current_scope);
result_counter += 1;
}
*current_expr = result_var;
Vec<ASR::expr_t*> idx_vars, loop_vars;
std::vector<int> loop_var_indices;
Vec<ASR::stmt_t*> doloop_body;
create_do_loop(loc, common_rank,
idx_vars, loop_vars, loop_var_indices, doloop_body,
[=, &operands, &idx_vars, &doloop_body] () {
Vec<ASR::expr_t*> ref_args;
ref_args.reserve(al, x->n_args);
for( size_t iarg = 0; iarg < x->n_args; iarg++ ) {
ASR::expr_t* ref = operands[iarg];
if( array_mask[iarg] ) {
ref = PassUtils::create_array_ref(operands[iarg], idx_vars, al);
}
ref_args.push_back(al, ref);
}
Vec<ASR::dimension_t> empty_dim;
empty_dim.reserve(al, 1);
ASR::ttype_t* dim_less_type = ASRUtils::duplicate_type(al, x->m_type, &empty_dim);
ASR::expr_t* op_el_wise = ASRUtils::EXPR(ASR::make_IntrinsicFunction_t(al, loc,
x->m_intrinsic_id, ref_args.p, ref_args.size(), x->m_overload_id,
dim_less_type, nullptr));
ASR::expr_t* res = PassUtils::create_array_ref(result_var, idx_vars, al);
ASR::stmt_t* assign = ASRUtils::STMT(ASR::make_Assignment_t(al, loc, res, op_el_wise, nullptr));
doloop_body.push_back(al, assign);
});
use_custom_loop_params = false;
result_var = nullptr;
}

And I think we should have an elemental flag in ASR::IntrinsicFunction. @certik

For now add your function below,

static inline bool is_elemental(int64_t id) {
ASRUtils::IntrinsicFunctions id_ = static_cast<ASRUtils::IntrinsicFunctions>(id);
return ( id_ == ASRUtils::IntrinsicFunctions::Abs ||
id_ == ASRUtils::IntrinsicFunctions::Cos ||
id_ == ASRUtils::IntrinsicFunctions::Gamma ||
id_ == ASRUtils::IntrinsicFunctions::LogGamma ||
id_ == ASRUtils::IntrinsicFunctions::Sin );
}

src/runtime/math.py Outdated Show resolved Hide resolved
@gptsarthak
Copy link
Contributor Author

For now add your function below,

@czgdp1807 I already tried that, did not work. LLVM throws a codegen error. I think its because other functions like sin and log_gamma use the C implementations by calling _lfortran_dsin in instantiate_functions() , while exp() uses LLVM implmentation in the backend.
If we use exp(array[0]):

code generation error: asr_to_llvm: module failed verification. Error:
Intrinsic has incorrect return type!
double* (double*)* @llvm.exp.p0f64
FPExt only operates on FP
  %58 = fpext double* %57 to double

If we do exp(array)

code generation error: asr_to_llvm: module failed verification. Error:
Intrinsic has incorrect return type!
double* (double*)* @llvm.exp.p0f64
Stored value type does not match pointer operand type!
  store double* %108, double* %92, align 8
 double

@czgdp1807
Copy link
Collaborator

I don't see either Exp or Exp2 in the following function,

static inline bool is_elemental(int64_t id) {
ASRUtils::IntrinsicFunctions id_ = static_cast<ASRUtils::IntrinsicFunctions>(id);
return ( id_ == ASRUtils::IntrinsicFunctions::Abs ||
id_ == ASRUtils::IntrinsicFunctions::Cos ||
id_ == ASRUtils::IntrinsicFunctions::Gamma ||
id_ == ASRUtils::IntrinsicFunctions::LogGamma ||
id_ == ASRUtils::IntrinsicFunctions::Sin );
}

If we use exp(array[0]):

So probably its related to incorrect processing of ArrayItem in generate_Exp. C code needs pointers but LLVM code needs value.

@gptsarthak
Copy link
Contributor Author

@Thirumalai-Shaktivel please review my approach to solving the problem mentioned in #1712 (comment) .

How do I fix the error in macos CI test?

@Thirumalai-Shaktivel
Copy link
Collaborator

LGTM!

@certik
Copy link
Contributor

certik commented Apr 26, 2023

Just do what the error is saying:

/Users/runner/work/lpython/lpython/integration_tests/_lpython-tmp-test-c/elemental_08.c:100:36: error: implicitly declaring library function 'exp' with type 'double (double)' [-Werror,-Wimplicit-function-declaration]
        ASSERT(_lcompilers_abs_f32(exp(array->data[(i - array->dims[0].lower_bound)]) - result->data[(i - result->dims[0].lower_bound)]) <= eps);
                                   ^
/Users/runner/work/lpython/lpython/integration_tests/_lpython-tmp-test-c/elemental_08.c:100:36: note: include the header <math.h> or explicitly provide a declaration for 'exp'

I think we need to inlude math.h in the generated file.

@gptsarthak
Copy link
Contributor Author

gptsarthak commented Apr 26, 2023

I believe this PR is ready. Let me know if there's anything else that needs to be done.

May be, expm1 can also become an IntrinsicFunction. It should just re-use Exp intrinsic.

Edit: We can do that in a separate PR i guess. I'll complete this today.
We also need to address an issue caused by using trigonometric functions with arrays using ItrinsicFunctions, I'll open an issue for that soon.
We can then map the arcsin like functions to asin in intrinsic_function_registry, and decide if we should remove their implementations in runtime modules.

@gptsarthak gptsarthak marked this pull request as ready for review April 26, 2023 04:13
@gptsarthak gptsarthak marked this pull request as draft April 26, 2023 04:32
ASRUtils::create_intrinsic_function create_func =
ASRUtils::IntrinsicFunctionRegistry::get_create_function(call_name);
Vec<ASR::expr_t*> args_; args_.reserve(al, x.n_args);
visit_expr_list(x.m_args, x.n_args, args_);
if (ASRUtils::is_array(ASRUtils::expr_type(args_[0])) &&
current_scope->resolve_symbol("numpy") == nullptr ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to be the wrong check, I think it doesn't matter if import numpy is present, but rather if exp is imported from numpy.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the following case,

  1. The user hasn't installed numpy from pip, conda or anywhere on the internet.
  2. They have written a numpy package locally with their own API.
  3. Now they do from numpy import exp (exp is defined by them in their numpy package).
  4. So will the ASR::IntrinsicFunction will be used or their exp will be used.
  5. I think we should test this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@czgdp1807 we already have this issue with sin and other functions I think, so we can probably fix this in a separate PR.

@certik
Copy link
Contributor

certik commented Apr 26, 2023

Thanks, I think it looks pretty good overall, there are just some smaller things to resolve.

@gptsarthak
Copy link
Contributor Author

gptsarthak commented Apr 27, 2023

Changes:

  • Implemented expm1() as a IntrinsicFunction
  • Added a macro to create namespace for exponent related functions in intrinsic_function_registry. This avoids repetition of code.
  • Modified C backend to import math header only for IntrinsicFunction.
  • Added a map structure to keep track of imported functions and the name of the modules they are imported from.
  • Error is thrown if function imported from math module is having a vector parameter.

Regarding #1712 (comment), right now I could not think of a way to solve this, but as Ondrej said, we can tackle this in another issue/PR.

@gptsarthak gptsarthak marked this pull request as ready for review April 27, 2023 15:49
@Thirumalai-Shaktivel
Copy link
Collaborator

LGTM!

@gptsarthak gptsarthak requested a review from czgdp1807 May 3, 2023 14:49
Copy link
Collaborator

@czgdp1807 czgdp1807 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's wait for #1673 to be merged.

@czgdp1807 czgdp1807 merged commit 51ad8b4 into lcompilers:main May 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants