Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more flexible searching by post URL, partial URL searching #4748

Open
5 tasks done
Die4Ever opened this issue May 28, 2024 · 5 comments
Open
5 tasks done

more flexible searching by post URL, partial URL searching #4748

Die4Ever opened this issue May 28, 2024 · 5 comments
Labels
area: search enhancement New feature or request

Comments

@Die4Ever
Copy link

Requirements

  • Is this a feature request? For questions or discussions use https://lemmy.ml/c/lemmy_support
  • Did you check to see if this issue already exists?
  • Is this only a feature request? Do not put multiple feature requests in one issue.
  • Is this a backend issue? Use the lemmy-ui repo for UI / frontend issues.
  • Do you agree to follow the rules in our Code of Conduct?

Is your proposal related to a problem?

Currently if I want to search if something has been posted already, I have to get the URL an exact match, but URLs can vary in small ways.

Youtube example:

https://www.youtube.com/watch?v=pd5iofvLrIU

https://youtu.be/pd5iofvLrIU

https://youtu.be/pd5iofvLrIU?si=QO2iK1Zw0Z7NDRHo

Describe the solution you'd like.

If I could search by a part of the URL, like just pd5iofvLrIU, then it would find them all. I guess it would be a text index with wordstops like :, /, ?, =, and &. This could probably even happen automatically for known domains as suggested/lower search results. This could also be used to search by domain name.

Also when making a post, currently it does searches by title which helps reduce reposts or get people to crosspost instead of repost, it would be cool if it also did that automatic search by the URL.

Describe alternatives you've considered.

This could maybe be a plugin?

Additional context

No response

@Die4Ever Die4Ever added the enhancement New feature or request label May 28, 2024
@Die4Ever
Copy link
Author

some related issues

LemmyNet/lemmy-ui#1922

LemmyNet/lemmy-ui#154

#2542

@dessalines
Copy link
Member

I don't see how we'd be able to do this, since there are infinite numbers of variations of these urls, and different definitions as to what constitues a match, and what doesn't.

@Die4Ever
Copy link
Author

It might also be good to normalize YouTube URLs since there's a lot of different formats they can end up in and that breaks crosspost detection. Probably the only query params that need to be respected are the video ID, playlist, index, and timestamp? The si query parameter can be thrown away, I think it's just a tracking ID for sharing.

@Nutomic
Copy link
Member

Nutomic commented Sep 9, 2024

We could handle this by resolving the post url and following redirects, eg curl -Ls -o /dev/null -w %{url_effective} "https://youtu.be/pd5iofvLrIU?si=QO2iK1Zw0Z7NDRHo". This gives https://www.youtube.com/watch?si=QO2iK1Zw0Z7NDRHo&v=pd5iofvLrIU&feature=youtu.be which isnt fully normalized, but the advantage is that it works for all websites.

Otherwise we would need a Rust library to normalize Youtube urls, but I couldnt find that on crates.io.

@dessalines
Copy link
Member

dessalines commented Sep 10, 2024

That one would be handled when we eventually add the clearurls crate, as discussed in #4905

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: search enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants