-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/boolean search syntax #310
base: main
Are you sure you want to change the base?
Feature/boolean search syntax #310
Conversation
- Additionally add tests for SearchQuery in test_search.py
The search feature is really cool but i tested it on a bigger library (about 2.2 million entries) and it makes searching basically unusable, because it took 21 min to search for an entry with a tag. This previously took 1.71s so if you could somehow improve the Performance it would be really nice. |
Okay, I definitely wasn't expecting that. At it's very worst the syntax parsing should only increase the runtime by a small percentage. I wouldn't have been surprised if it decreased the runtime even. But I was able to reproduce a 6x slowdown even with only about 1.4k entries, so something has definitely gone wrong. I will try to do some profiling to figure out what is causing this. Thanks. |
* improve noticeable performance loss from calling .resolve() in Library.search_library
Remove unimplemented feature search for missing file
Okay, I believe I figured out what was causing the slowdown. Do you mind if I ask what you were searching for in your library? I suggest going back to the original TagStudioDev:main branch and trying a search for "missing" or "no file". If my suspicions are correct, then I bet those searches would take about 20 minutes to complete, even in the official project. The reason for this is that Library.search_library() contains a costly .resolve() operation in order to check an entry's filepath against Library.missing_files. The worst part is that Library tracking missing or displaced files seems to have been broken and unimplemented in for TagStudio a while. The Library.missing_files attribute stays empty, even when I delete files and refresh the library and directories. My fix is to remove the unimplemented feature from searches for now. I left the rest of the Library.missing_files related code untouched. Please let me know if this resolves your slowdown @Thesacraft. Thanks. |
Where does the syntax come from? I sense inspiration from some boorus, but I can't say it's a good thing. Provided readme section seems insufficient for me to grasp it even if project owners consider the syntax good enough to adopt. |
Move str(Path) and os.path.join() operations inside search.py to reduce the chance that they are run
Thank you for trying my project and sharing your searches with me @Thesacraft. I wanted to understand your situation better, so since my last post I have created my own library with 1 million items and performed lots of different searches. Now I have some search results I want to share with you! Have you tried searching for "missing" in the main project like I suggested earlier? I tried that search with the main project, and it took a very similar amount of time to regular searches in my PR before I pushed your performance update. I also wanted to replicate and experiment with your latest result, and I found that I could also get the main project to produce similar times to after my performance update. I suggest also searching "filename: ?" in the main project if you want to replicate this second search for yourself. I have since pushed another smaller performance update for you to try. I think this will give you another boost to performance, but so far the performance improvements of my updates are all for search dependent reasons, just like in the main project. Right now, the only improvements left that I see involve making the code harder to understand and modify, or starting to sacrifice boolean search syntax entirely. I'd have looked into some of this earlier if I didn't expect non-syntax search performance to change entirely if the project switches to using an SQL database. Non-syntax search performance isn't something I've considered much, so I'm happy to listen to any other performance concerns with my PR.
Thank you for reading the documentation I wrote for my search syntax and thank you for posting your reply @KillyMXI. I don't want you to feel pressure to trust anything or to be convinced of anything. When I started this feature, I designed it for myself and based on my own needs and preferences. There wasn't a CONTRIBUTING.md and I hadn't even decided I wanted to contribute my code to the main project. You are discerning to correctly realize that I took inspiration from online image boorus, but my personal search preferences are just a starting point. Now that the feature is working, I have no issue at all with changing the syntax based on feedback. I just see this as an opportunity to help a project that I have already benefited from. I would also love to collaborate to improve my README cheat-sheet section. Thanks again for your consideration into my syntax and explanation. Does any of this sound like something you would want to help improve? |
I just tested the searches you suggested and it seems like you are completle right. But with the new peformance improvment a simple Tag search is down to about 5.4s in my library which is pretty good. So i think you shouldn't really sacrifice the boolean syntax to get it faster. I think a better solution is to maybe add a loading bar or some visual indicator (in bigger librarys) and maybe detach the search from the main thread to stop it from blocking the rendering of the application. But this is probably out of scope for this pr and could be added in the future. I really appreciate the work you put in to make it more peformant. |
Full disclosure, I can't pretend that I understand the needs of developers and users of TagStudio. This issue just happens to appear just at the moment when I noticed it, and it is also tangential to some of my interests. I was unaware of previous issue and PR and associated discussions - didn't notice them before. At the same time, I'm not engaged enough to properly catch up with all of that and confidently deciding what help I can offer. (To be honest, I should focus on my own projects that desperately need some attention...) So, I was here mostly out of curiosity. When I found that there is something I have issue with - I decided to comment. What I can add:
Most importantly, this exists along with previous discussions (that might hint at other needs for the query language syntax and implementation) and someone else will be making decisions what to accept. Mine is just a bystander opinion, also not closely familiar with previous discussions and needs for the project. |
Replacing instances of the term "operand" with "in" and "input" for the sake of brevity and clarity. Changing _TagNode.match for the sake of brevity and consistency.
7d29d10
to
b9d9887
Compare
…StudioDev#272 and TagStudioDev#325) Adds ability to check the existence of fields of any type using the following syntax: ```has_<field>``` ```has_<field>:<True|False>``` Adds the ability to search the content of text_line and text_box fields using the following syntax: ```<field>:<text>``` (Replace whitespace with underscore _ in `text`) Updated test_search.py for new behavior.
Adds Boolean search syntax to the library search bar. For an explanation of how the search syntax works, check out the new "Search Cheat-Sheet" section in README.md.
Some existing features were changed. Tags with spaces now require underscores when being used in searches. Special consideration was made for the upcoming SQLite migration to hopefully reduce necessary refactoring.
I am happy to discuss these details or any other ones if there are any questions or concerns.