Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added tokenization and transformation #4

Merged
merged 1 commit into from
Jun 18, 2021
Merged

Conversation

woutslakhorst
Copy link
Member

tokenization happens first.
A whitespace tokenizer has been provided.

This allows for "care organization A" to be indexed with terms: care, organization and a.

It's pluggable, so other transformers (like pronunciation) can be used.

@reinkrul
Copy link
Member

Maybe you can talk me through this

Copy link
Member

@reinkrul reinkrul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't really judge the low-level correctness of some stuff (especially findR), because the control flow is quite deeply nested and I find it hard to understand what some of the variables mean

index.go Show resolved Hide resolved
} else {
err = fn(cKey, entry)
// value/search terms transformation (eg lowercase)
tokens := indexParts[0].Tokenize(seek)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quite complex, could use some refactoring maybe?

Copy link
Member Author

@woutslakhorst woutslakhorst Jun 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does require a refactor, but this PR is not the time to do it. See #5

transform.go Show resolved Hide resolved
transform.go Show resolved Hide resolved

if a, ok := val.([]interface{}); ok {
var ra []interface{}
for _, ai := range a {
interm := j.matchRecursive(parts, ai)
interm, err := j.matchRecursive(parts, ai)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these 1 or 2 character variables makes it hard to understand the code (because I don't work on it), since I've got to really analyze it to guess/derive what is actually means

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

part of the refactor.

@woutslakhorst woutslakhorst merged commit 00711d8 into master Jun 18, 2021
@woutslakhorst woutslakhorst deleted the preprocessing branch June 18, 2021 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants