Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

built-in function: Support locate(substr, str[, pos]) function #820

Closed
4 tasks
jingchen2222 opened this issue Dec 1, 2021 · 9 comments · Fixed by #3943
Closed
4 tasks

built-in function: Support locate(substr, str[, pos]) function #820

jingchen2222 opened this issue Dec 1, 2021 · 9 comments · Fixed by #3943
Milestone

Comments

@jingchen2222
Copy link
Collaborator

jingchen2222 commented Dec 1, 2021

Is your feature request related to a problem? Please describe.

locate(substr, str[, pos])

Returns the position of the first occurrence of substr in str after position pos. The given pos and return value are 1-based.

Examples:

> SELECT locate('bar', 'foobarbar');
 4
> SELECT locate('bar', 'foobarbar', 5);
 7

Reference:

https://spark.apache.org/docs/2.3.0/api/sql/#locate

Describe the solution you'd like

We do provide a built-in function development guide for you.
Please check OpenMLDB Build-In Function Develop Guide or the Chinese version for help.

Please make sure you:

  • Add built-in C++ function in src/udf/udf.h and src/udf/udf.cc
  • Register function to default library from function void DefaultUdfLibrary::IniStringUdf() in src/udf/default_udf_library.cc.
  • Add related unit test in src/codegen/udf_ir_builder_test.cc
  • Documenting function

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@jingchen2222 jingchen2222 added enhancement New feature or request good first issue Good for newcomers labels Dec 1, 2021
@Mouray-Hutchinson
Copy link

Hello, I would like some clarification on the parameter and return types of this function. I've written the function as 1
but I am aware that it is likely something like 2 or 3. Please let me know so I can commit my changes.

  1. unsigned int locate(std::string substr, std::string str, unsigned int pos =0 ) {..}
  2. int32_t locate( hybridse::codec::StringRef substr, hybridse::codec::StringRef str,int32_t pos =0 ) {...}
  3. int32_t locate(string substr,string str,int32_t pos =0 ) {...}

@jingchen2222
Copy link
Collaborator Author

jingchen2222 commented Jan 24, 2022

Hi Mouray-Hutchinson, very glad that you would like to help.
Firstly, I would like to remove the SELECT POSITION('bar' IN 'foobarbar') since it might be conflict with our IN expression. Maybe we can simply support locate('bar', 'foobarbar') and locate('bar', 'foobarbar', 5) for this issue.
Secondly, we had better implement a function looks like int32_t locate( hybridse::codec::StringRef substr, hybridse::codec::StringRef str,int32_t pos =0 ) in udf.cc

Thank you again. Please feel free to reach out to me regarding any questions or updates.

@Mouray-Hutchinson
Copy link

@jingchen2222 two questions

  1. What should the function return if the sub string is not found? I have it returning -1
  2. If I used

t32_t locate( hybridse::codec::StringRef substr, hybridse::codec::StringRef str,int32_t pos =0 )

then is the line following correct syntax for within the test?
Test File Location: udf_ir_builder_test.cc

CheckUdf<Nullable, Nullable,Nullable<int32_t>>(udf_name, StringRef("all"), StringRef("helloAll"),int32_t(-1) );

CheckUdf<Nullable, Nullable,Nullable<int32_t>>(udf_name, StringRef("abcd"), StringRef("abcdefg"),int32_t(1) );

hutchinsonmouray added a commit to hutchinsonmouray/OpenMLDB that referenced this issue Jan 29, 2022
@jingchen2222
Copy link
Collaborator Author

jingchen2222 commented Feb 9, 2022

locate

@Mouray-Hutchinson Sorry for the late reply.

Locate(substr, str)

Returns the position of the first occurrence of substring substr in string str. This function can be regarded as a variant of
Locate(substr, str, position=1)

Locate(substr, str, position)

Locate(substr, str, position)
Returns the position of the first occurrence of substring substr in string str, starting at position pos.

The rules are:

  • Returns 0 if substr is not in str.
  • Returns NULL if any argument is NULL.
  • The string is case-sensitive

Locate will be implemented in a way very similar to Mysql except that our string comparison is case-sensitive.
Check https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_locate for more details.

> SELECT LOCATE('bar', 'foobarbar');
-- 4

> SELECT LOCATE('bar', 'foobarbar', 5);
-- 7



-- Returns 0 if substr is not in str.
> SELECT LOCATE('xbar', 'foobar');
-- 0  

-- String is case-sensitive
> SELECT LOCATE('all', 'helloAll', 5);
-- 0

-- Returns NULL if any argument is NULL.
> SELECT LOCATE('bar', NULL);
-- NULL
> SELECT LOCATE(NULL,'foobar');
-- NULL

@jingchen2222
Copy link
Collaborator Author

jingchen2222 commented Feb 9, 2022

Hi Mouray-Hutchinson, very glad that you would like to help. Firstly, I would like to remove the SELECT POSITION('bar' IN 'foobarbar') since it might be conflict with our IN expression. Maybe we can simply support locate('bar', 'foobarbar') and locate('bar', 'foobarbar', 5) for this issue. Secondly, we had better implement a function looks like int32_t locate( hybridse::codec::StringRef substr, hybridse::codec::StringRef str,int32_t pos =0 ) in udf.cc

Thank you again. Please feel free to reach out to me regarding any questions or updates.

@Mouray-Hutchinson
The position should be 1 by default.
Secondly, we had better implement a function looks like int32_t locate( hybridse::codec::StringRef substr, hybridse::codec::StringRef str,int32_t pos =1 ) in udf.cc

@lumianph
Copy link
Collaborator

hi @Mouray-Hutchinson do you have any progress for this? we are looking forward to your contribution.
thanks.

@lumianph lumianph added call-for-contributions execute-engine hybridse sql engine and removed enhancement New feature or request good first issue Good for newcomers labels Apr 21, 2022
@aceforeverd aceforeverd modified the milestones: v0.5, v0.6 May 7, 2022
@aceforeverd aceforeverd added the udf label May 7, 2022
@lumianph lumianph added the good first issue Good for newcomers label May 27, 2022
@lumianph lumianph removed the good first issue Good for newcomers label Jun 16, 2022
@mangoGoForward
Copy link
Contributor

I'd like to tackle it, please assign to me~

@lumianph
Copy link
Collaborator

lumianph commented Aug 9, 2022

I'd like to tackle it, please assign to me~

thank you !!

@mangoGoForward mangoGoForward removed their assignment Oct 11, 2022
@mangoGoForward
Copy link
Contributor

I tried but failed, if anyone others want to contribute, please pick it up

howdb pushed a commit to howdb/OpenMLDB that referenced this issue May 30, 2024
howdb added a commit to howdb/OpenMLDB that referenced this issue May 30, 2024
tobegit3hub added a commit that referenced this issue Jul 18, 2024
* feat: sbin use the generated zk conf (#3901)

Co-authored-by: lijiangnan <[email protected]>

* refactor!: relocate go sdk (#3889)

* refactor!: relocate go sdk

moving to https://github.com/4paradigm/openmldb-go-sdk

* go readme

* ci: fix sdk workflow

* docs: fix example (#3907)

raw SQL request mode example was wrong because execute_mode should be request

* fix: make clients use always send auth info (#3906)

* fix: make clients use auth by default

* fix: let skip auth flag only affect verify

* feat: tablets get user table remotely (#3918)

* fix: make clients use auth by default

* fix: let skip auth flag only affect verify

* feat: tablets get user table remotely

* fix: use FLAGS_system_table_replica_num for user table

* fix: recoverdata support load disk table (#3888)

* docs: add map desc in create table (#3912)

* ci(#3904): python mac jobs fix (#3905)

* fix(#3909): checkout execute_mode in config clause in sql client (#3910)

* feat: merge dag sql (#3911)

* feat: merge AIOS DAG SQL

* feat: mergeDAGSQL

* add AIOSUtil

* feat: add AIOS merge SQL test case

* feat: split margeDAGSQL and validateSQLInRequest

* fix: gcformat space and continuous sign (#3921)

* fix: gcformat space

* fix: gcformat continuous sign use hash

* fix: delete incorrect comments

* feat: merge 090 features to main (#3929)

* Set s3 and aws dependencies ad provided (#3897)

* feat: execlude zookeeper for curator (#3899)

* Execlude zookeeper when using curator

* Fix local build java

* Run script to update post release version (#3931)

* feat: crud users synchronously (#3928)

* fix: make clients use auth by default

* fix: let skip auth flag only affect verify

* feat: tablets get user table remotely

* fix: use FLAGS_system_table_replica_num for user table

* feat: consistent user cruds

* fix: pass instance of tablet and nameserver into auth lambda to allow locking

* feat: best effort try to flush user data to all tablets

* fix: lock scope

* fix: stop user sync thread safely

* fix: default values for user table columns

* feat(parser): simple ANSI SQL rewriter (#3934)

* feat(parser): simple ANSI SQL rewriter

* feat(draft): translate request mode query

* feat: request query rewriter

* test: tpc rewrite cases

* feat(rewrite): enable ansi sql rewriter in `ExecuteSQL`

You may explicitly set this feature on via `set session ansi_sql_rewriter
= 'true'`

TODO: this rewriter feature should be off by default

* build(deps-dev): bump urllib3 from 1.26.18 to 1.26.19 in /docs (#3948)

Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.18 to 1.26.19.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/1.26.19/CHANGES.rst)
- [Commits](urllib3/urllib3@1.26.18...1.26.19)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* feat(udf): isin (#3939)

* feat(#3916): support @@execute_mode = 'request' (#3924)

* feat(udf): array_combine & array_join (#3945)

* feat(udf): array_combine

* feat(udf): new functions

- array_combine
- array_join

* feat: casting arrays to array<string> for array_combine

WIP, string allocation need fix

* fix: array_combine with non-string types

* feat(array_combine): handle null inputs

* fix(array_combine): behavior tweaks

- use empty string if delimiter is null
- restrict to array_combine(string, array<T> ...)

* feat: support batchrequest in ProcessQuery (#3938)

* feat: user authz (#3941)

* feat: change user table to match mysql

* feat: support user authz

* fix: cean up created users

* build(deps-dev): bump requests from 2.31.0 to 2.32.2 in /docs (#3951)

Bumps [requests](https://github.com/psf/requests) from 2.31.0 to 2.32.2.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](psf/requests@v2.31.0...v2.32.2)

---
updated-dependencies:
- dependency-name: requests
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps-dev): bump org.apache.derby:derby (#3949)

Bumps org.apache.derby:derby from 10.14.2.0 to 10.17.1.0.

---
updated-dependencies:
- dependency-name: org.apache.derby:derby
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump org.postgresql:postgresql (#3950)

Bumps [org.postgresql:postgresql](https://github.com/pgjdbc/pgjdbc) from 42.3.3 to 42.3.9.
- [Release notes](https://github.com/pgjdbc/pgjdbc/releases)
- [Changelog](https://github.com/pgjdbc/pgjdbc/blob/master/CHANGELOG.md)
- [Commits](pgjdbc/pgjdbc@REL42.3.3...REL42.3.9)

---
updated-dependencies:
- dependency-name: org.postgresql:postgresql
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* feat: iot table (#3944)

* feat: iot table

* fix

* fix

* fix delete key entry

* fix comment

* ut

* ut test

* fix ut

* sleep more for truncate

* sleep 16

* tool pytest fix and swig fix

* fix

* clean

* move to base

* fix

* fix coverage ut

* fix

---------

Co-authored-by: Huang Wei <[email protected]>

* feat(open-mysql-db): pandas support (#3868)

* feat(open-mysql-db): refactor

1. remove unnecessary instance var port
2. fix cause null bug
3. remove unnecessary throws
4. fix ctx.close() sequence bug
5. config sessionTimeout and requestTimeout
6. add docs of SqlEngine

* feat(open-mysql-db): refactor

* feat(open-mysql-db): revert passsword

* feat(open-mysql-db): mock commit and schema table count

* feat(open-mysql-db): replace data type text with string

* feat(open-mysql-db): remove null

---------

Co-authored-by: yangwucheng <[email protected]>

* fix: drop aggr tables in drop table (#3908)

* fix: drop aggr tables in drop table

* fix

* fix test

* fix

* fix

---------

Co-authored-by: Huang Wei <[email protected]>

* ci(#3954): fix checkout action on old glibc OS (#3955)

* ci(#3954): fix checkout action on old glibc OS

* ci: include checkout fix in all workflows

* ci: fix python-sdk

* test: node-2 to node-3 (#3957)

node-3 is not available, moving to node-2

* feat: support locate(substr, str[, pos]) function(#820) (#3943)

* fix(scripts): deploy spark correctly (#3958)

$SPARK_HOME may be a symbolic link referring to a invalid directory, so
we'd try 'rm -f' first

* Add changelog for 0.9.1 (#3959)

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: venessa <[email protected]>
Co-authored-by: lijiangnan <[email protected]>
Co-authored-by: aceforeverd <[email protected]>
Co-authored-by: oh2024 <[email protected]>
Co-authored-by: HuangWei <[email protected]>
Co-authored-by: wyl4pd <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Huang Wei <[email protected]>
Co-authored-by: yangwucheng <[email protected]>
Co-authored-by: yangwucheng <[email protected]>
Co-authored-by: howd <[email protected]>
tobegit3hub added a commit that referenced this issue Jul 26, 2024
* feat: sbin use the generated zk conf (#3901)

Co-authored-by: lijiangnan <[email protected]>

* refactor!: relocate go sdk (#3889)

* refactor!: relocate go sdk

moving to https://github.com/4paradigm/openmldb-go-sdk

* go readme

* ci: fix sdk workflow

* docs: fix example (#3907)

raw SQL request mode example was wrong because execute_mode should be request

* fix: make clients use always send auth info (#3906)

* fix: make clients use auth by default

* fix: let skip auth flag only affect verify

* feat: tablets get user table remotely (#3918)

* fix: make clients use auth by default

* fix: let skip auth flag only affect verify

* feat: tablets get user table remotely

* fix: use FLAGS_system_table_replica_num for user table

* fix: recoverdata support load disk table (#3888)

* docs: add map desc in create table (#3912)

* ci(#3904): python mac jobs fix (#3905)

* fix(#3909): checkout execute_mode in config clause in sql client (#3910)

* feat: merge dag sql (#3911)

* feat: merge AIOS DAG SQL

* feat: mergeDAGSQL

* add AIOSUtil

* feat: add AIOS merge SQL test case

* feat: split margeDAGSQL and validateSQLInRequest

* fix: gcformat space and continuous sign (#3921)

* fix: gcformat space

* fix: gcformat continuous sign use hash

* fix: delete incorrect comments

* feat: merge 090 features to main (#3929)

* Set s3 and aws dependencies ad provided (#3897)

* feat: execlude zookeeper for curator (#3899)

* Execlude zookeeper when using curator

* Fix local build java

* Run script to update post release version (#3931)

* feat: crud users synchronously (#3928)

* fix: make clients use auth by default

* fix: let skip auth flag only affect verify

* feat: tablets get user table remotely

* fix: use FLAGS_system_table_replica_num for user table

* feat: consistent user cruds

* fix: pass instance of tablet and nameserver into auth lambda to allow locking

* feat: best effort try to flush user data to all tablets

* fix: lock scope

* fix: stop user sync thread safely

* fix: default values for user table columns

* feat(parser): simple ANSI SQL rewriter (#3934)

* feat(parser): simple ANSI SQL rewriter

* feat(draft): translate request mode query

* feat: request query rewriter

* test: tpc rewrite cases

* feat(rewrite): enable ansi sql rewriter in `ExecuteSQL`

You may explicitly set this feature on via `set session ansi_sql_rewriter
= 'true'`

TODO: this rewriter feature should be off by default

* build(deps-dev): bump urllib3 from 1.26.18 to 1.26.19 in /docs (#3948)

Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.18 to 1.26.19.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/1.26.19/CHANGES.rst)
- [Commits](urllib3/urllib3@1.26.18...1.26.19)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* feat(udf): isin (#3939)

* feat(#3916): support @@execute_mode = 'request' (#3924)

* feat(udf): array_combine & array_join (#3945)

* feat(udf): array_combine

* feat(udf): new functions

- array_combine
- array_join

* feat: casting arrays to array<string> for array_combine

WIP, string allocation need fix

* fix: array_combine with non-string types

* feat(array_combine): handle null inputs

* fix(array_combine): behavior tweaks

- use empty string if delimiter is null
- restrict to array_combine(string, array<T> ...)

* feat: support batchrequest in ProcessQuery (#3938)

* feat: user authz (#3941)

* feat: change user table to match mysql

* feat: support user authz

* fix: cean up created users

* build(deps-dev): bump requests from 2.31.0 to 2.32.2 in /docs (#3951)

Bumps [requests](https://github.com/psf/requests) from 2.31.0 to 2.32.2.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](psf/requests@v2.31.0...v2.32.2)

---
updated-dependencies:
- dependency-name: requests
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps-dev): bump org.apache.derby:derby (#3949)

Bumps org.apache.derby:derby from 10.14.2.0 to 10.17.1.0.

---
updated-dependencies:
- dependency-name: org.apache.derby:derby
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump org.postgresql:postgresql (#3950)

Bumps [org.postgresql:postgresql](https://github.com/pgjdbc/pgjdbc) from 42.3.3 to 42.3.9.
- [Release notes](https://github.com/pgjdbc/pgjdbc/releases)
- [Changelog](https://github.com/pgjdbc/pgjdbc/blob/master/CHANGELOG.md)
- [Commits](pgjdbc/pgjdbc@REL42.3.3...REL42.3.9)

---
updated-dependencies:
- dependency-name: org.postgresql:postgresql
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* feat: iot table (#3944)

* feat: iot table

* fix

* fix

* fix delete key entry

* fix comment

* ut

* ut test

* fix ut

* sleep more for truncate

* sleep 16

* tool pytest fix and swig fix

* fix

* clean

* move to base

* fix

* fix coverage ut

* fix

---------

Co-authored-by: Huang Wei <[email protected]>

* feat(open-mysql-db): pandas support (#3868)

* feat(open-mysql-db): refactor

1. remove unnecessary instance var port
2. fix cause null bug
3. remove unnecessary throws
4. fix ctx.close() sequence bug
5. config sessionTimeout and requestTimeout
6. add docs of SqlEngine

* feat(open-mysql-db): refactor

* feat(open-mysql-db): revert passsword

* feat(open-mysql-db): mock commit and schema table count

* feat(open-mysql-db): replace data type text with string

* feat(open-mysql-db): remove null

---------

Co-authored-by: yangwucheng <[email protected]>

* fix: drop aggr tables in drop table (#3908)

* fix: drop aggr tables in drop table

* fix

* fix test

* fix

* fix

---------

Co-authored-by: Huang Wei <[email protected]>

* ci(#3954): fix checkout action on old glibc OS (#3955)

* ci(#3954): fix checkout action on old glibc OS

* ci: include checkout fix in all workflows

* ci: fix python-sdk

* test: node-2 to node-3 (#3957)

node-3 is not available, moving to node-2

* feat: support locate(substr, str[, pos]) function(#820) (#3943)

* fix(scripts): deploy spark correctly (#3958)

$SPARK_HOME may be a symbolic link referring to a invalid directory, so
we'd try 'rm -f' first

* Add changelog for 0.9.1 (#3959)

* fix: select from JOB_INFO should always in online mode (#3963)

* fix: select from JOB_INFO should always in online mode

Fix error when user set default `execute_mode` to offline:

```sql
set global execute_mode = 'offline';
select 1;
```

* fix: query mode on user & pre_agg tables

* build(docker): centos7 EOL (#3965)

* build(docker): centos7 EOL

* fix vault address for aarch64

* ci(docker): disable arm64 image

Dont have arm machine to test

* fix(docker): numpy version lock (#3966)

* Update docs version to 0.9.1 (#3960)

* add blog post (#3936)

* refactor: fix compile for mcjit and improve to tests (#3952)

* refactor: rm SQL_CASE_BASE_DIR

* fix: compile on mcjit

* feat: setup SqlCaseBaseDir for hybridse

TODO: also setup for tests in src/

* docs: add blog post (#3913)

* Include new posts

* update links

* minor change

* ci: update create-pull-request action to v6 in udf-doc-gen workflow & rm deprecated file sync (#3964)

* Updated create-pull-request action to v6 in udf-doc-gen workflow

* Removed references to docs/en/reference/sql/udfs_8h.md as the file no longer exists

* build: upgrade openmldb sdk version in self host (#3962)

* docs: add changelog for 0.9.2 (#3968)

* docs: update version 0.9.2 in docs (#3970)

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: venessa <[email protected]>
Co-authored-by: lijiangnan <[email protected]>
Co-authored-by: aceforeverd <[email protected]>
Co-authored-by: oh2024 <[email protected]>
Co-authored-by: HuangWei <[email protected]>
Co-authored-by: wyl4pd <[email protected]>
Co-authored-by: tobe <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Huang Wei <[email protected]>
Co-authored-by: yangwucheng <[email protected]>
Co-authored-by: yangwucheng <[email protected]>
Co-authored-by: howd <[email protected]>
Co-authored-by: Siqi Wang <[email protected]>
Co-authored-by: Jayaprakash0511 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants