Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controversial posts and comments #2515

Closed
ghost opened this issue Oct 26, 2022 · 18 comments · Fixed by #3205
Closed

Controversial posts and comments #2515

ghost opened this issue Oct 26, 2022 · 18 comments · Fixed by #3205
Labels
area: sorting enhancement New feature or request extra: good first issue Good for newcomers

Comments

@ghost
Copy link

ghost commented Oct 26, 2022

Posts and comments ordered by most total votes but that are close to zero score. I guess this should only be available on instances with the downvotes active.

@ghost ghost added the enhancement New feature or request label Oct 26, 2022
@dessalines dessalines transferred this issue from LemmyNet/lemmy-ui Oct 27, 2022
@dessalines
Copy link
Member

I don't have time to do this but someone else could.

@Nutomic Nutomic added the extra: good first issue Good for newcomers label Jan 15, 2023
@iByteABit256
Copy link
Contributor

Can I give this a try?

@dcormier
Copy link

@iByteABit256 I was just digging into this. 🫠 I came back to propose a calculation for "controversialness". You can have this one. I'll share where I was at, anyway.

I was thinking something like this (but implemented in SQL, similar to the existing hot_rank SQL function).

fn controversy_rank(upvotes: u32, downvotes: u32, score: i32) -> u32 {
    (upvotes + downvotes) / if score == 0 { 1 } else { score.unsigned_abs() }
}

Some examples of how this would work with various inputs can be seen here.

@iByteABit256
Copy link
Contributor

iByteABit256 commented Jun 16, 2023

Not bad, but it has a flaw that small changes in like/dislike ratio can lead to huge changes in "controversialness".

For example 98/100 ratio isn't that different than 99/100, but it would have half the score.

My thinking was something like this:

fn controversy_rank(upvotes: u32, downvotes: u32, score: i32) -> u32 {
  if downvotes != 0 { upvotes / downvotes * score.unsigned_abs() } else { 0 }
}

Which seems more intuitive to me and gives more balanced scores, what do you think?

@dcormier
Copy link

dcormier commented Jun 16, 2023

it has a flaw that small changes in like/dislike ratio can lead to huge changes in "controversialness".

Does it matter? Will that value be shown, or used for anything other than sorting the comments?

My thinking was something like this:

fn controversy_rank(upvotes: u32, downvotes: u32, score: i32) -> u32 {
  if downvotes != 0 { upvotes / downvotes * score.unsigned_abs() } else { 0 }
}

The results for that are surprising. 100 upvotes and 100 downvotes results in 0 controversialness. The same as if something has 0 upvotes and 100 downvotes. Similarly, I would expect these to have the same level of controverialness, but they don't:

    assert_eq!(5, controversy_rank(50, 45, 5));
    assert_eq!(0, controversy_rank(45, 50, -5));

@iByteABit256
Copy link
Contributor

iByteABit256 commented Jun 16, 2023

You're right, it needs some work. Also, what I was thinking for the multiplier was actually (upvotes + downvotes) to represent activity, since a 50-50 post with 2 total votes is much less controversial than a 50-50 post with 1000 votes.

Your way definitely gives good enough results though, I just want to explore it a bit before implementing it

@dcormier
Copy link

dcormier commented Jun 16, 2023

Yeah, that's what I was thinking, too. The total number of votes should be significant, here.

It's definitely worth exploring.

Here's something to show the output better and let you fiddle with the math more. I originally was just using a spreadsheet to try different approaches.

@iByteABit256
Copy link
Contributor

iByteABit256 commented Jun 16, 2023

Printing it out as a table made it quite clearer, I think it's good enough to keep

All of the high scores are highly controversial, and the amount of activity clearly scales with it

@dcormier
Copy link

dcormier commented Jun 16, 2023

I agree. It seems good. I'd like to see some more people chime in with opinions, but maybe that'll come with a PR. At the very least, it's something that can be moved forward with.


Edit: Playing with the output visualization more because I was bored and it was pleasing. https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=130155f2c33aa262c403427b8235dd82

@iByteABit256
Copy link
Contributor

That's proof enough haha, really cool!

@dcormier
Copy link

dcormier commented Jun 16, 2023

Looking that that output makes me think it might be worthwhile to subsort by something like activity (descending), or some other existing sort type. Things that aren't controversial get to a pretty flat curve fairly quickly, and otherwise may result in inconsistent ordering (if that's important in any way).

@jamesmcm
Copy link

It might also be helpful if the "controversy score" were visible in the UI when sorting this way too.

@qznc
Copy link

qznc commented Jun 17, 2023

To throw in another idea:

min(upvotes, downvotes)

However, its primary advantage is that it is simpler, so easier to understand for user.

@qznc
Copy link

qznc commented Jun 17, 2023

@ghost
Copy link
Author

ghost commented Jun 17, 2023

That seems like it has worked well in the past.

cpdef double controversy(long ups, long downs):
    """The controversy sort."""
    if downs <= 0 or ups <= 0:
        return 0

    magnitude = ups + downs
    balance = float(downs) / ups if ups > downs else float(ups) / downs

    return magnitude ** balance

@iByteABit256
Copy link
Contributor

iByteABit256 commented Jun 17, 2023

Here is a comparison between @dcormier's, @qznc's and Reddit's method.

Reddit's looks like the most correct overall, but @dcormier's looks almost as good but much more performant since it doesn't involve float arithmetic and powers. @qznc's is the most performant, but the results are quite worse judging from this


Edit: Added an alteration of my own method after realising the main problem with it and how Reddit solved it

Edit: Changed debug build to release build and did absolute function manually instead of using Rust's abs() which seemed to be much faster. The results of the first 3 all seem good enough, time seems to be slightly better on the ratio method but take that with a grain of salt. After all, this is going to be implemented in SQL not Rust.

@dcormier
Copy link

dcormier commented Jun 19, 2023

That's not a very effective way of benchmarking in this case, unfortunately. The results are wildly different from run to run, and even within the same run. I.e., not only do the number change quite a bit from run to run, but within the same run two algorithms that had similar times in one run might have disparate times in another. Using cargo bench (requires nightly) or Criterion.rs would show differences more clearly.

Regardless, it's probably not worth benchmarking that in Rust. The existing hot_rank function used to produce the value to sort by when sorting on hot lives in SQL, not Rust. I would expect this function to end up being similar.

The Reddit method produces more gentle curve, which is nice.

@iByteABit256
Copy link
Contributor

I had a pretty lucky streak when I first wrote it but yeah, unfortunately it seems completely indeterminate now that I tried it again some times

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: sorting enhancement New feature or request extra: good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants