-
-
Notifications
You must be signed in to change notification settings - Fork 884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Controversial posts and comments #2515
Comments
I don't have time to do this but someone else could. |
Can I give this a try? |
@iByteABit256 I was just digging into this. 🫠 I came back to propose a calculation for "controversialness". You can have this one. I'll share where I was at, anyway. I was thinking something like this (but implemented in SQL, similar to the existing fn controversy_rank(upvotes: u32, downvotes: u32, score: i32) -> u32 {
(upvotes + downvotes) / if score == 0 { 1 } else { score.unsigned_abs() }
} Some examples of how this would work with various inputs can be seen here. |
Not bad, but it has a flaw that small changes in like/dislike ratio can lead to huge changes in "controversialness". For example 98/100 ratio isn't that different than 99/100, but it would have half the score. My thinking was something like this: fn controversy_rank(upvotes: u32, downvotes: u32, score: i32) -> u32 {
if downvotes != 0 { upvotes / downvotes * score.unsigned_abs() } else { 0 }
} Which seems more intuitive to me and gives more balanced scores, what do you think? |
Does it matter? Will that value be shown, or used for anything other than sorting the comments?
The results for that are surprising. 100 upvotes and 100 downvotes results in 0 controversialness. The same as if something has 0 upvotes and 100 downvotes. Similarly, I would expect these to have the same level of controverialness, but they don't: assert_eq!(5, controversy_rank(50, 45, 5));
assert_eq!(0, controversy_rank(45, 50, -5)); |
You're right, it needs some work. Also, what I was thinking for the multiplier was actually Your way definitely gives good enough results though, I just want to explore it a bit before implementing it |
Yeah, that's what I was thinking, too. The total number of votes should be significant, here. It's definitely worth exploring. Here's something to show the output better and let you fiddle with the math more. I originally was just using a spreadsheet to try different approaches. |
Printing it out as a table made it quite clearer, I think it's good enough to keep All of the high scores are highly controversial, and the amount of activity clearly scales with it |
I agree. It seems good. I'd like to see some more people chime in with opinions, but maybe that'll come with a PR. At the very least, it's something that can be moved forward with. Edit: Playing with the output visualization more because I was bored and it was pleasing. https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=130155f2c33aa262c403427b8235dd82 |
That's proof enough haha, really cool! |
Looking that that output makes me think it might be worthwhile to subsort by something like activity (descending), or some other existing sort type. Things that aren't controversial get to a pretty flat curve fairly quickly, and otherwise may result in inconsistent ordering (if that's important in any way). |
It might also be helpful if the "controversy score" were visible in the UI when sorting this way too. |
To throw in another idea:
However, its primary advantage is that it is simpler, so easier to understand for user. |
That seems like it has worked well in the past.
|
Here is a comparison between @dcormier's, @qznc's and Reddit's method. Reddit's looks like the most correct overall, but @dcormier's looks almost as good but much more performant since it doesn't involve float arithmetic and powers. @qznc's is the most performant, but the results are quite worse judging from this Edit: Added an alteration of my own method after realising the main problem with it and how Reddit solved it Edit: Changed debug build to release build and did absolute function manually instead of using Rust's abs() which seemed to be much faster. The results of the first 3 all seem good enough, time seems to be slightly better on the ratio method but take that with a grain of salt. After all, this is going to be implemented in SQL not Rust. |
That's not a very effective way of benchmarking in this case, unfortunately. The results are wildly different from run to run, and even within the same run. I.e., not only do the number change quite a bit from run to run, but within the same run two algorithms that had similar times in one run might have disparate times in another. Using Regardless, it's probably not worth benchmarking that in Rust. The existing The Reddit method produces more gentle curve, which is nice. |
I had a pretty lucky streak when I first wrote it but yeah, unfortunately it seems completely indeterminate now that I tried it again some times |
Posts and comments ordered by most total votes but that are close to zero score. I guess this should only be available on instances with the downvotes active.
The text was updated successfully, but these errors were encountered: