-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some Indexers need an overwrite_db or last_indexed_time parameter. #279
Comments
Hi, sorry for late response! It's actually surprising it takes 17 minutes, for 8K notes/24K URLs -- do you know how many lines are these? Unless your laptop is really weak, I would expect it to index much faster. Maybe you can log indexing times for individual notes, figure out the one that takes longest and then we can profile it? Otherwise, so you suggest you could do something like
It kinda makes sense, but one downside is that it's possible that some URLs were removed from the note, and they would still be present in promnesia database, because the 'interface' of indexers in Promnesia is currently only supporting adding new visits. So it would trigger some phantom visits. We might think of changing the interface somehow, but I'd much rather speed up the indexer for simplicity. |
My laptop is Dell Inspiron 7501(i7-10750H CPU @ 2.60GHz 16GB RAM). I don't think this laptop is a slow environment. But some machine such as RPis and AWS light-sail(1 core) could be slow.
Many notes were from Evernote. I used Joplin as an archiving tool and wrote a journal at work. Some notes are web-clipped notes, and It seems to have many useless links. Recently, I am switching the Joplin to org-roam and learning the Zettelkasten method and I use Joplin as way-back machine now.
The Joplin indexer was a proof-of-concept, and It is just an initial version. So I think I can profile the indexing.
Right. Incremental and partial update needs two metadata at least.
Yeah, you are right. I can optimize the indexer better. But I think Promnesia needs incremental update code for slow machine and indexing efficiently. |
Yep, looks decent, surprising it takes so much time!
Yep, definitely agree it makes sense to make it as fast as we can :) Just mean there is a tradeoff between that and simplicity of the architecture.
Yeah -- the problem is the latter: basically currently there is no way to tell for a visit from database which file it's coming from. To be more precise, no reliable way, there is a Maybe a good compromise would be adding cachew support for file-based indexers, so basically each file would have a cache of its Visits (depending on the file timestamp), and it would automatically recompute if necessary. |
I had already seen the cachew and I thought it was not the right solution for caching. I guess I didn't look closely. |
Related: #243 |
I have made a Joplin indexer. But there is a problem that the indexer needs a incremental updating parameter when the database is large. I have 8000+ notes in my Joplin database. Joplin indexer finds 24000+ URLs which can be
Visits
. It takes 17 minutes long on my laptop.Joplin has a
update_time
field innotes
table. So I think I can implement incremental indexing(updating) in the indexer.However, there is no
overwrite_db
parameter in the Indexer when a user pass--overwrite
parameter and wants to restart the indexing. Or iflast_indexed_time
in thepromnesia
framework would be passed byiter_all_visits
, It would be much more helpful.The text was updated successfully, but these errors were encountered: