-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Web3 recommendations: balancing trust and relevance #6942
Comments
finding anything in a sea of misinformation |
Today we did some initial brainstorming, which we continue next week. The thesis addresses one of somewhat overlooked problem in Web3: Attack-resilient Search. Structured overlays, search and indexing is one of the fundamental topics in peer-to-peer literature. Some of the classical work:
Now the topic is revived again in the context: |
Continued brainstorm... Note: Doing theory just for the theory and Cum Laude is not the most fun. For scientists, most engineering time takes time away from true science. I fundamentally disagree, but that's sadly the ivory-tower culture. Outcome: simulations, somewhat nice, Internet-deployment is waste of time ⚡ ...
OR
OR
OR
??? |
I'm done reading through all the suggested readings/topics.
Interesting article on state of metadata in the current world If an experimental thesis means that I will lose my cum laude, I'd rather do that than not work with code for 10 months :) |
|
Trust for Web3 in a world of misinformation, spam, and fraudThe scientific challenge within this master thesis is enabling anybody to freely publish without any censorship, creating a commons where everybody is free to read, react, and interact, while at the same time preventing the destruction of this commons by utilising a trust network for filtering misinformation, spam, and deceptive content. We believe this problem is unsolvable within the context of profit-driven big tech companies and can only be solved within the collaborative Web3 context. Our infrastructure for the common good is owned by both everybody and nobody. This collective sense of ownership is believed to be a critical element to its feasibility. |
btw we have access to industrial GPU infrastructure
btw2 there is already a complaint that our older GPU stuff is making too much noise in the noisy datacenter ⛑️ |
Question to think about for this week: How does one quantify trust in information propogation ? While trust in systems that rely on work can be broken down to a "fundamental unit" where Alice can verify Bob having done work for her (regardless of if that claim is fake or not), how can a user every verify if the information provided by another peer is misinformation or not? (especially if it's fake news which can't be discerned from real news) These two weeks I focused on two types of papers, one focused on decentralised search (i.e. finding information in a sea of information) and trust (disinformation/freeriding peers/peers spreading false information) Most relevant papers on decentralised search: I found the idea of constructing content overlay networks mentioned in the first paper interesting. The concept involves creating graphs with close and far neighbours, the close neighbours consist of peers that own similar content to your node, together close neighbours form a content overlay. When querying for content outside your overlay, you'd rely on a caching mechanism/gossip to find out about the relevant overlays. Could be a starting point for implementation, the paper suggests using Neo4j to support graphs for large network. Most interesting papers I read on trust: |
|
Master thesis idea of the week: "creating trust with weak signals and fixating collaborators". For "with honors" your style needs to be like the Harvard professor Nowak : Five rules for the evolution of cooperation. And focus more on this line of work: An optimal strategy to solve the Prisoner’s Dilemma. PD favor: iterative, group, etc. Quantifying social group evolution
Thinking: evolution of cooperation feels more fundamental. However, they ignore the problem of fake identities. Sybils are the core cost driver for Big Tech and other Internet police. Generalised: You scratch my back and I'll scratch yours later. Quite elaborate scope "unwritten rules for the evolution of cooperation in Web3":
|
Here's the first draft of my literature survey, there's a few things pending at the end but the structure/skeleton is mostly fleshed out |
Great read !!!
Finally: 80% coding for good in master thesis is OK. Would have a price in grade! |
writing down your insightful coffee remark: no algorithm exists yet which balances trust and relevance. Practical P2P semantic search (storyline idea)Numerous projects have designed peer-to-peer semantic search engine. Some of this pioneering work also implemented and even deployed their ideas. Yet none of this prior work in the past 20 years proved to be practical, viable, or resilient against spammer. With Tribler we deployed a keyword search engine for Bittorrent video streams in 2006. Our work has been used by over 2 million people. After 17 years of incremental upgrades we moved beyond keyword matching and added search for concepts. Our algorithm called SemanticTrust uniquely integrates knowledge graphs, relevance ranking, crowdsourcing, and reputation calculations. We fully implemented this algorithm in Kotlin and Internet-deployed it on a P2P network of various $100 smartphones. We demonstrate the efficiency of our SemanticTrust algorithm by running it on 100 nodes which form a P2P network. Our result show low resource utilisation, quick ClickLog dissemination speeds, SemanticTrust efficiency, responsiveness to new item discovery, and etc. etc. Grow to Tiktok sophistication. We conclude that our practical P2P semantic search engine is a key step to De-Google The Internet [see rant]. The example query used in related work 20 years ago is as follows: (repeating: phd-level remarks) Above is the science, now the engineering excellence with a focus on another 17 years of relentless improvements: maintainability. The best code is deleted code 😸
|
Was a bit busy with exams the last two weeks so slightly slower progress on the literature survey: Except for this, spent time looking into the mentioned thesis idea, I find the storyline very interesting. Spent some time reading up the Tribler codebase too. Had an interesting discussion with @grimadas about how MeritRank could be used in a search setting, have some gaps to fill that I'll work on in next two weeks along with the rest of the literature survey. Will also start working on the mentioned ToDos. Was asked to add this declaration here: The most interesting part of my thesis for me is building systems and reading existing code. |
Master thesis: keyword search MusicDAO {practical P2P semantic search} OR learn-through-consumption {musicDAO endless TikTok-scroll}
|
Practical P2P semantic search - related work: "Situating Search"
Solve within your Master thesis: technology as a solution to a political problem ... |
Progress on Literature Survey: Literature_Survey third draft.pdf
Still Pending:
Work on Thesis these last two weeks:
|
Master thesis ideas:
|
Brain Dump of discussion today: Tentative thesis goal: Top N recommendation system similar to The order of recommendation will take into account a combination of two values: trust and relevance. Trust value is created using a reputation mechanism which uses MeritRank, for this:
The relevance metric is less fleshed out but also of secondary importance for the thesis since we can argue that it is a black box and can be substituted out for any other ML/non-ML model. For the first iteration, maybe just use item similarity with current items of the user to generate this. What will be important is how we choose to combine the score obtained for relevance with the trust value to decide the ordering of recommendations. Let me know if I missed or misunderstood something @synctext @grimadas |
Hopefully final version of my literature survey In this cycle, I spent some time slowly extending the current MusicDao implementation to have a recommendations tab. |
Great literature survey! Ready for arxiv uploading. For With Honors thesis: think about convincing your thesis committee that your did great science: "Algorithms 1" awesomeness. ToDo: convert comments on this issue into your first thesis chapter: "Problem Description", IEEE format, thesis skeleton. addition, thesis title brainstorm:
addition2: how about always showing the recommender as the MusicDAO frontpage. First sorted by popularity. Using learning-by-consumption you sort by most recommended song as user plays stuff. |
Survey is on Arxiv |
Great work! (Source: “Universal Trust Machine”, https://arxiv.org/abs/2301.06938): Somewhat Related work: TasteBuddy-based Version Selection Strategy for BitTorrent Users against Content Pollution
Conclusions: you can spend 90% of your AI time on data engineering! Solution: shortcut with 106,574 tracks ToDo: state-of-the-art in content recommendation (Collaborative Filtering)? |
Spent the last two weeks researching Collaborative Filtering and coming up with a high level strategy of how recommendations would be implemented in the Superapp. After reading existing methods, I think the approach we should take is: Distributed Probabilistic Memory, User Based Collaborative Filtering inspired partly by this paper
The cited paper uses the below for its new user profile constructions: This approach needs to be adapted to work in a distributed context, i.e. instead of a database we will have gossip about various user profiles, but except for that we can use this formula to construct a first minimal working implementation. |
Main thesis focus:
Excellent recent work from EPFL:
Sprint Target: get baseline implementation working and post screenshot from emulator |
In this sprint I designed and implemented Flooding based Distributed Memory, User Based Distributed Collaborative Filtering v1.0. The following is an overview of how it works:
The Recommendation with the maximum value is returned by the Node.
Note that in the case that the Node has no items listened to or any common items with any other users, this is not better than a random selection, so the cold start problem still needs to be tackled, also the Beta paramater needs to be tuned. I was able to find the dataset for the Million Songs Challenge (in fact the motivation for using a memory based algo was from the winning submission) I will try to get some simulations running with this data to see if I can compare it to benchmarks, obviously, this algorithm is not guaranteed to calculated globally optimum recommendations, however, since each node only has to perform a single computation against its user ratings, it should work quite fast in a distributed context. While I have implemented the algo, I haven't plugged it into MusicDAO yet because I've been having trouble getting the creative common content working or being able to seed music to other users. Getting the integration and some simulations working will be the target of my next sprint. In the above, the "ratings" are implicit ratings which are calculated using a mixture of: a) number of plays b) time each song c) total song "listen" time, thus giving an indication of the user's "value" of the song based on the amount of time they have invested into it |
|
Mid sprint progress (I've iterated through many ideas this week so I figured I'd post a
Updated idea for how the distributed recommendation will work:
Following the above principles, the flow for a new user who joins the network:
Problems to be figured out:
|
Update to the above: On further consideration I realised that calculating UserTrust using highest similarity peers as "pre-trusted peers" is problematic. Instead, I am now leaning towards performing a "Distributed Page Rank" combined with EigenTrust which is used to generate a concensus on "top x online trusted peers" at any time in the network (recalculated as trusted peers go offline) which can be used to introduce new nodes to the network. Further, I have created a first version for a formula which reflects a user's trust + similarity to a certain item. The function will take as input two parameters: 1) explicit like or dislike (a lack of either can be viewed as an implicit like) 2) gamma: playing time of the song compared to total playing time of all songs The formula is a piecewise function that can be viewed here. The function maps the above parameters to a value between -inf and +inf. The inuition behind this formula is as follows: there are 4 possibilities:
Of course the above can be exploited for "review bombing" by sybils, which is what we hope MeritRank will help mitigate |
|
Mid-sprint update: I have devised a hybrid Personalised PageRank - Personalised SALSA algorithm for the ItemTrust calculation which uniquely combines the taste of a user's semantic neighbours' taste and the user's trust in other users to quantify how much the user trusts an unseen item. The algorithm is inspired by GraphJet, the algorithm used by Twitter to generate real time recommendations for Tweets. The original algorithm is a random walk based monte carlo estimation of user-item similarity which uses SALSA, the below illustration demonstrates how it work in Twitter: The flow of the original random walk is below:
For the distributed implementation, the above approach will be modified in a few ways:
Further, in the original algorithm we want to guarantee "recent recommendations" hence only "x most recent" interactions are stored (to guarantee freshness). Similarly, our algorithm will guarantee freshness of recommendations by only storing "x most recent edges", this will also ensure our storage isn't burdened. Each node will periodically gossip a random node and its taste profile (read: song recommendations). This "edge gossip" mechanism will be weighted by the recency of the recommendations i.e. an edge with a newer timestamp is more likely to be gossiped, but older edges can also be shared with a low probability. Once a node receives this gossip, if it has at least one item in common with the gossiped user and room for more profiles, it simply stores the user. If it does not, it calculates a score weighted by: a) the similarity of the user b) the freshness of the recommendation (timestamp of the recommendation) and if it's higher than the lowest node in the user to item network, the new node is marked to be added to the network. After the "to be added" nodes reaches a certain threshold (x) we can batch remove x nodes and add x nodes, recalculating all dirty random walks. This ensures that we aren't constantly recalculating RandomWalks and since there is bound to be overlap between dirty nodes in dirty random walks, should be more efficient too. The gossip is also made "less random" with the following mechanism: with probability "beta", instead of randomly gosipping, look for two nodes in your network which have at least one item in common and gossip their profiles to each other. Further since each random walk is caclulated independently of each other, it can be easily parallelised (also potentially distributed across similar nodes - but that's probably beyond the scope of this thesis) Hence, each user will store two graphs: 1) A Node to Node directed graph which indicates its trust in other users, used for calculation of UserTrust through Personalized PageRank 2) A bipartite undirected Node to Item graph which indicates its taste in items, used for calculation for ItemTrust through the hybrid algorithm mentioned above. As a further optimisation, the node to node graph can be stored on disk and only fetched to memory when we need to recompute Personalised PageRank, while the node to item graph which only contains a subset of the nodes in the first graph can be always kept in memory since it's much lighter and queried much more frequently. Cold Start Problem: A global PageRank can essentially mathematically shown to be the result of averaging each user's Personalised PageRank. Thus a consensus mechanism across all nodes can be used to indicate "top trustworthy online users". A fresh user will simply share its taste profile with these top trustworthy users who will return semantically similar users to be added to its Node to Items network. As we slowly determine the user's taste as they listen to more songs, the gossip mechanism will ensure that it will slowly replace semantically similar in its network with those that are dissimilar. Update to above: I'm done implementing the graphs mentioned aboved, currently working on implementing Personalized PageRank. Once that is done, I will proceed with implementing the hybrid algorithm above. |
|
Progress for this sprint:
The last two weeks I narrowed done the focus of my thesis through some reading and realised the most significant aspect of recommendation systems in Instagram/TikTok is the ability to generate "fresh" recommendations. i.e. while similarity is important, what is more important is to act on "recent signals" (where signals are interactions of users with songs), this is also the most significant aspect of the algorithm that my system is inspired by: GraphJet. In the original algorithm, this involves deciding a "time window" and only considering interactions of users with tweets inside it. Practically implementing this for our system requires two things: 1) an edge gossiping algorithm that prioritises gossiping newer edges and 2) a mechanism to remove edges older than the time window. All edges already have a timestamp associated with them so I came up with and implemented a novel "time-biased gossiping algorithm" as below:
This allows us to assign weights to each edge biased by their recency to ensure that our gossiper is more likely to gossip newer edges, yet in order to keep the network in sync, older edges are also gossiped. After implementing this, we face a similar issue to the original algorithm that often old users will find that they have no interactions (since they haven't consumed any content recently inside the time window). GraphJet solves this by conducting random walks from a pool of trusted users. I introduced a dynamic "exploratory probability", this probability determines if the SALSA random walk should start from a non-root node instead, the choice of non-root node is based on the personalized page rank calculated earlier, so in effect we are also choosing trusted nodes to start the walk from, but also allowing a certain amount of "serendepity". This probability will change depending on how many interactions the users has inside the network, so in the extreme case where the user has no interactions it will be 100%, this in effect also helps to partially solve the cold start problem. Beyond this, I have also been thinking of a set of experiments to further verify certain properties of the system:
Other pending items:
|
|
After discussions with @grimadas, I ended up designing multiple set of experiments to run on our recommendation system. The two most important being: 1) Influence on sybils in top x ranking after increasing the decays in the system 2) influence on the relevance of the rankings as we increase the decays. If the system works as it is supposed to as we increase the decays is 1, we should see less sybils in the system, but also see a lower relevance since our aggresive tactics would mark a lot of non-sybil recommendations as sybil too, thus degrading their ranking. In order to run the experiments, I constructed a sample TrustNetwork out of a subset of data from the Million Song Dataset which consists of 400000 songs, 50000 users and their interactions (read: play counts). The user to user trust in the network was bootstrapped by adding 5 edges to most similar nodes together using the improved pearson coefficient we will use for bootstrapping new users into the network, the node to song edges were made using affinity. For experiments in 1, I made 30% of the nodes in the network sybils and randomly carried out either linear, cycle or parallel sybil attacks through them, randomly adding 1 to 5 sybil nodes to them (therefore, carrying out the Giga Sybil Attack we spoke about). After this, I carried out the personalised recommendation algorithm on a 100 random non-sybil nodes and measured with increasing decay parameters: 1) How much ranking score sybil songs gained overall 2) The score of sybil songs in: Top 100 Ranking, Top 500, Top 1000 and Top 5000 (Because are main objective is the user's "Ranking Table" and removing sybils from it, so this metric effectively conveys what we need to test. The results of these experiment are shown in the graph below: As expected, as we increase decays, the number of sybils goes down dramatically. So this shows that our random walks are capable of detecting sybil nodes. Here is a more zoomed in view with lower z axis values which shows the micro difference between Beta Decay (from MeritRank) and the Exploration probability I introduced: I also carried out statistical tests to show the impact of both parameters numerically. Next, for tests in 2), I found a similarity metric for comparing two sets of rankings in this paper. It's called the "Ranked Biased Overlap" and is in effect an improved Kendal Tau for lists which do not necessarily share all items and ends up biasing ranking differences near the top. I ran experiments on the base network with increase amounts of decay and compared how much the ranking changes as we increase decays, here are the results: As expected, with increasing decays the similarity drops, but notably the similarity does not change too much (expect for high values of exploration probability), this is a good indication that non-sybil rankings are not too heavily influenced by our decay parameters. I also carried out ANOVA and regression tests to show the significance (and non-significance) of the various decay parameters in the experiments. I have also started working on a thesis draft though I did not have much time between running and rerunning experiments so here's a very early draft with a basic skeleton of what I expect: Thesis_first_draft.pdf |
|
Integrated Web3Recommend with MusicDAO in a "Discover" section Disclaimer: Since the only device running this is my phone, you will only see one recommendation so far (which is not playable since torrenting across super apps doesn't quite work 🥲 ) https://github.com/rmadhwal/trustchain-superapp/releases/tag/2 Based on feedback from last week and discussion with @grimadas, new set of experiments and new set of (non-3d) graphs:
Interpretation of results: With alpha decay set to 0.1, in more than half of the root nodes, the missing song is within the top 100 recommendations (the dataset consists of 300k songs so this is top 0.0003% of the results) and about 80% of the times it is in the top 100k (top 33% recommendations) Then I compared the new algorithm with Vanilla GraphJet for varying alpha decays: As can be seen from the graphs, the algorithm performs just as well or much better than GraphJet (except for a few Alpha Decay values where the top 100 % differs by a small amount). Notably, for 0.1 Alpha Decay there is almost a 20% difference! As can be seen from the results, the false positive impact seems to increase a lot more with increasing Alpha Decay, while increasing Beta Decay seems to have an almost negligible impact. Thus, a low Alpha Decay such as 0.1 and a high Beta Decay such as 0.8 seems like a good recommendation for the system if they can also provide sybil resistance.
So if our implementation is correct we should observe this. First, without any alpha decay to confirm that a linear attack can indeed cause a lot of nefarious ranking manipulation without MeritRank:
First, with increasing Alpha Decays: Next with increasing Beta Decays: Note the formula for calculating the scores in ranking is based on adding up the ranks of each sybil items. So in top 100 recommendations if item #1 is a sybil, that would contribute "100" to the sybil score and so on. Beyond this, I also have ANOVA analysis which demonstrates that Alpha and Beta decay can confidently limit the impact of sybils on recommendations. Pending: Experiments for scalability, however since the implementation is already done, these should be relatively trivial and involve very simple measurements. I will measure memory impact with increasing users to demonstrate that the algorithm is reasonably scalable. Progress on thesis draft: second_draft.pdf Rewrote the abstract based on suggestions and considerably rewrote the intro and added a problem description, however, still not as refined as I want it to be. Will focus on getting this done for the next sprint. |
|
Still have to complete 2-3 mini sections, most notably experiment results need to be interpreted, boostrapping mechanisms need to be added and a good conclusion is needed. Also needs more polish (and much better formatting). But this should give a pretty decent idea of what my final thesis should look like. |
|
Updated the draft based on feedback, submitted the first version of the thesis for feedback from the thesis committee. |
Final thesis is on the TU Delft Repo Defence PPT v1 - still have to add all citations and a few more illustrations |
|
USA House of Representatives - INVESTIGATION OF COMPETITION IN DIGITAL MARKETS :
(literature survey, start Q1'22, background data science track, practical is OK, government agency embedding?). ideas:
Background: operational permissionless plug-in system for Android. Foundational block for zero-server app store.
The text was updated successfully, but these errors were encountered: