Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement offline search API #11

Open
rapee opened this issue Oct 16, 2019 · 10 comments
Open

Implement offline search API #11

rapee opened this issue Oct 16, 2019 · 10 comments

Comments

@rapee
Copy link
Contributor

rapee commented Oct 16, 2019

We want to search across dataset (people / zipcode / party / vote logs). Can we do offline e.g. not loading external API?

const results = searchByKeyword("ประชา")

Response body:

#11 (comment)

@p16i
Copy link
Contributor

p16i commented Oct 19, 2019

One way might be to let use which attribute to search and we load an index file accordingly.

@rapee
Copy link
Contributor Author

rapee commented Oct 19, 2019

You mean to choose attribute first, then search the correct index file? @heytitle

@p16i
Copy link
Contributor

p16i commented Oct 19, 2019

Yes. We can also create a combined index file, but i'm not sure how hard it is to construct such a file.

@rapee rapee added this to the Version 2 Release milestone Nov 9, 2019
@rapee
Copy link
Contributor Author

rapee commented Nov 23, 2019

One idea is to create simple lookup index like this for all content types that are of our concerns.

[
  { q: "สว", type: "page", url: "/cabinet" },
  { q: "พลเอก ประยุทธ์ จันทร์โอชา", type: "people", url: "/people/ประยุทธ์-จันทร์โอชา" },
  { q: "เพื่อไทย", type: "party", url: "/party/เพื่อไทย" },
  { q: "ขยายสัญญาสัมปทานทางด่วนและรถไฟฟ้าบีทีเอส", type: "votelog", url: "/votelog/3" },
  // ... go on
]

Then, partial matching of q with search keyword should give a simple yet just-work search function.

@rapee
Copy link
Contributor Author

rapee commented Nov 23, 2019

For word variations like "สว", we could do RegExp:

[
  { q: "ส\.?ว.\?", match: "regexp", type: "page", url: "/cabinet" },
  // ... go on
]

@Th1nkK1D
Copy link
Contributor

@rapee let see if I understand correctly

  1. Generate "look up index" from the data we have. Might be array of object like your previous comment.
  2. Create function searchByKeyword to filter data in index file using regex.

@rapee
Copy link
Contributor Author

rapee commented Nov 25, 2019

That’s right @Th1nkK1D

@p16i
Copy link
Contributor

p16i commented Nov 26, 2019

imho, we also have to check the size of the index file. If it shouldn't be too large. @rapee any idea about the reasonable file size?

Given that this index file doesn't change regularly, we might cache it using local storage.

In term of indexing, we might need to construct a search tree somehow for efficient look-ups. I'm not sure whether huffman encoding is suitable here.

@Th1nkK1D
Copy link
Contributor

Idea proposed with prototype v1 implemented in PR #156
Feel free to discuss :)

@rapee
Copy link
Contributor Author

rapee commented Dec 3, 2019

I think it's best to not exceed 500 KB. Storing on local storage is good idea.

I haven't done benchmark search performance yet, but given the current number of indexed records are within 1,000, it might not be top priority. I think we implement the first version of search component first, then we can try exploring huffman and other ideas.

@rapee rapee removed this from the Version 2 Release milestone Jul 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants