-
-
Notifications
You must be signed in to change notification settings - Fork 884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add controversial ranking #3205
Add controversial ranking #3205
Conversation
I can't read this code but this should be disabled on clients (web and api) if the instance doesn't allow downvotes as this ranking method becomes useless. Is this taken into account like the issue mentions? |
My thinking is that this should be done on the front end if possible, I've drafted a PR but I haven't written the hiding logic yet. The front end is going to do that job any way, so it doesn't really matter if the controversy rank exists or not in the API responses, does it? |
if downvotes <= 0 or upvotes <= 0 then | ||
return 0; | ||
else | ||
return floor((upvotes + downvotes) * case when upvotes > downvotes then downvotes::float / upvotes::float else upvotes::float / downvotes::float end); | ||
end if; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that came up in #2515 was what the sorting algorithm looks like. The algorithm used by Reddit (in 2015) looks to me like it would work better. Comparison here: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=400c37593a339dcdb53851e685ecaf75
Note that I changed this from the one in the issue to use a float for the Reddit one, which I think works better for controversy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the rest of them as well. I think as long as there is a maximum diagonal where values grow according to how close they are to the diagonal, then it's good enough. The Reddit one is quite larger in the diagonal since it raises the activity to the power of the ratio instead of multiplying, so it might avoid some incorrect sorts in close situations, but both are fine I think. My thinking was that since we're going to be running this function pretty heavily, we might as well make it as performant as possible, and powers are pretty bad in that sense.
@@ -347,6 +347,9 @@ impl<'a> CommentQuery<'a> { | |||
|
|||
query = match self.sort.unwrap_or(CommentSortType::Hot) { | |||
CommentSortType::Hot => query.then_order_by(comment_aggregates::hot_rank.desc()), | |||
CommentSortType::Controversial => { | |||
query.then_order_by(comment_aggregates::controversy_rank.desc()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't do the Reddit method of controversy calculation, should this add a second ORDER BY
for activity as @dcormier suggested? Same applies for the other places that do this ordering.
@@ -100,6 +100,7 @@ impl<'a> PersonQuery<'a> { | |||
SortType::Hot | SortType::Active | SortType::TopAll => { | |||
query.order_by(person_aggregates::comment_score.desc()) | |||
} | |||
SortType::Controversial => query.order_by(person_aggregates::comment_score.asc()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look right to me, should we maintain a person_aggregates::controversy_score
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what sense is a person controversial though? I find that idea pretty controversial itself :P
I just added that as a dumb fallback since there probably won't be such an option ever implemented in the UI, I could also put it along with SortType::Hot even if it's quite a different concept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, so this was added because the same SortType
is used here both for posts and for persons, that seems less than ideal. You could add it to the match pattern above together with hot, active, and top all, which also don't really apply to a person.
Might be better though to split that out, so that there's a PostSortType
and PersonSortType
, alongside the existing CommentSortType
. That's a bigger change though 🙂 WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it would make more sense to add a separate sort type for person, and then also get rid of the unused options (newcomments, hot, active)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that would be quite better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made PersonSortType like this:
pub enum PersonSortType {
New,
Old,
MostComments,
TopDay,
TopWeek,
TopMonth,
TopYear,
TopAll,
}
Do we need the same with CommunitySortType? If so, what values should be removed from it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main issue with this PR, is that
src/scheduled_tasks.rs
Outdated
scheduler.every(CTimeUnits::minutes(5)).run(move || { | ||
update_hot_ranks(&mut conn_2, true); | ||
update_controversy_ranks(&mut conn_2, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the main issue with this PR, that should get you to rethink it. The only reason hot ranks need to be recalculated, is because its a decaying time function. I forget the term in SQL, but its essentially based on an unstable function.
A controversial sort, which works only on upvotes, downvotes, and score, is not that. It can be done entirely with vote triggers.
Just like the post_aggregates.score
column, you need to update the vote triggers to run whatever logic necessary to update that controversial_rank column.
If votes are disabled, that comes back with |
Marking as draft for now, will open it again once I divide the Sort types and do some more testing |
SortType::TopYear => PersonSortType::TopYear, | ||
SortType::TopMonth => PersonSortType::TopMonth, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bit awkward that this is needed, but I guess there is no other way to make search all work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think the same should be done with CommunitySortType or is it too much?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably should, but you don't need to do that in this PR unless you want to. A lot of the Post sorts don't make sense for communities either.
crates/db_schema/src/newtypes.rs
Outdated
@@ -35,7 +35,7 @@ impl fmt::Display for PostId { | |||
/// The person id. | |||
pub struct PersonId(pub i32); | |||
|
|||
#[derive(Debug, Copy, Clone, Hash, Eq, PartialEq, Serialize, Deserialize, Default)] | |||
#[derive(Debug, Copy, Clone, Hash, Eq, PartialEq, PartialOrd, Serialize, Deserialize, Default)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PartialOrd seems to be unused
…s don't seem to get updated with floats, divide SortTypes
migrations/2023-06-16-220750_create_controversial_indexes/up.sql
Outdated
Show resolved
Hide resolved
crates/db_schema/src/lib.rs
Outdated
New, | ||
Old, | ||
MostComments, | ||
TopDay, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above, but yeah I think you can get rid of all these Top
sorts, and just add CommentScore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the post_to_person_sort_type util, does it make sense to use CommentScore as a default so there aren't so many mappings because of the SortType's Top sorts?
pub fn post_to_person_sort_type(sort: SortType) -> PersonSortType {
match sort {
SortType::Active | SortType::Hot | SortType::Controversial => PersonSortType::CommentScore,
SortType::New | SortType::NewComments => PersonSortType::New,
SortType::MostComments => PersonSortType::MostComments,
SortType::Old => PersonSortType::Old,
_ => PersonSortType::CommentScore,
}
}
SortType::TopYear => PersonSortType::TopYear, | ||
SortType::TopMonth => PersonSortType::TopMonth, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably should, but you don't need to do that in this PR unless you want to. A lot of the Post sorts don't make sense for communities either.
c3a7270
to
11579d1
Compare
…SortType and added CommentScore instead
…com:iByteABit256/lemmy into lemmy-2515-controversial-posts-and-comments
@dessalines I've made the changes you requested in case you hadn't noticed |
migrations/2023-06-16-220750_create_controversial_indexes/up.sql
Outdated
Show resolved
Hide resolved
migrations/2023-06-16-220750_create_controversial_indexes/up.sql
Outdated
Show resolved
Hide resolved
migrations/2023-06-16-220750_create_controversial_indexes/up.sql
Outdated
Show resolved
Hide resolved
migrations/2023-06-16-220750_create_controversial_indexes/up.sql
Outdated
Show resolved
Hide resolved
migrations/2023-06-16-220750_create_controversial_indexes/up.sql
Outdated
Show resolved
Hide resolved
migrations/2023-06-16-220750_create_controversial_indexes/up.sql
Outdated
Show resolved
Hide resolved
My bad for being so slow to get to this, I apologize. |
No worries 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, thanks!
* Added controversy rank property to posts and comments, and ability to sort by it * Triggers instead of schedules tasks, integer -> double, TODO: comments don't seem to get updated with floats, divide SortTypes * Created PersonSortType * PersonSortType::MostComments case * Removed unused PartialOrd trait * Added new person sort type mappings * SortType -> PersonSortType * fixes * cargo fmt * fixes after merge with main * Fixed bug in controversy rank trigger, removed TopX sorts from PersonSortType and added CommentScore instead * Uncovered enum case * clippy * reset translation changes * translations * translations * Added additional hot ordering on controversial posts and comments * featured local and featured community added to controversy rank index, additional order_by removed (?), added post_score and post_count to PersonSortType * Woodpecker rerun * cargo fmt * woodpecker rerun * fixed controversy_rank order * fix * Readded migration as latest, removed second update statement for setting controversy rank
This reverts commit c5450e7.
I guess I'm kinda late with this.. but since controversy_rank seems purely computed from other columns in the same table, it looks like this could be done with a GENERATED ALWAYS AS controversial_rank(upvotes, downvotes) column which would have avoided the trigger changes and the expensive migration |
Didn't know about GENERATED ALWAYS AS, it would have been quite better indeed :/ It would store less data that way too The good thing is that if the votes don't change that often for a large amount of posts, then those calculations will be avoided using triggers |
Closes #2515
Added controversy rank property to posts and comments, and ability to sort by it.
Out of the ones previously discussed, I used the controversy_rank_ratio method, but this is still open for debate.
Any input is welcome.