-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check dev experience on starting an issue #26
Comments
Toss out CVs I never believe them. GitHub though, there is proof of work. I am a big fan of using GitHub for this purpose.
How can you determine this?
In theory we can make configs for all of these, but it would be optimal to choose a single strategy and focus on that for this plugin. |
Actually feeding a CV into the chatgpt is way more easier compared to parsing contributor's github where, as you've already mentioned, we should parse the number of solidity commits, etc... |
People lie on CVs all the time. It's useless data compared to a portfolio of work. |
I think within a CV you can write anything. GitHub is reliable, maybe we could try using the statistics? https://docs.github.com/en/rest/metrics/statistics?apiVersion=2022-11-28 |
We could leverage Gitroll, there is a commercial version too, which creates a CV based on the their github history basically. That was a public scan which does not include private contributions, you can log in and do a personal one for more info. It's not very fast in completing scans although once a scan has been completed you can search for it. So first see if a scan exists, if not fire one (scanning my profile takes between 3-5 minutes) Double checking the search-for function, it may have been removed, when I used this before there was no commercial version so things have changed a bit. also idk how valid it is lmao but it's a good point of reference if we implement something in-house:
https://github-readme-stats.vercel.app/api/top-langs/?username=keyrxng We could fetch top language stats from this endpoint and parse it for solidity percentage. Seems brittle but you'd expect anyone with a relative amount of pushed solidity code would have more than 15/20% I think |
Didn't know about Gitroll, it's pretty fun. The risk is that it takes into account forks and I know some people just keep forking repos for some reason, I don't know if it takes into account the lines that you actually wrote. But that could be a starting point which might save tons of development hours. |
Surprised you missed it, it had 15 minutes of fame on twitter and everyone was doing it but it was some time ago.
https://gitroll.io/profile/uOp67oGeYgBNu5MjHSCmHHoqY0qV2/repos
As far as I can tell it detects forks but does not count them as code you've authored. The repos you see in that link are only my repos so I think it only accounts for code you've authored in your own repos.
But yeah that's why I mentioned it I doubt it suits our needs out of the box but something to work towards maybe
Ah I see what you mean now, download the source code and push as their own, suddenly they are a 10x, I get you |
Well either way,
we only have available the contributor's public contributions and repos (unless, does the bot have private repo access of a user via the authToken?) which is susceptible to the same sort of manipulation. If we only have public we are doing a disservice to eligible devs who push private web3 code which is very pretty commonplace for a lot of web3 projects (the blue chips are always public)
|
/start |
! Too many assigned issues, you have reached your max limit |
@gentlementlegen can you price this with a few hours I've spent a couple on it so far and I have a couple more expanding the test suite and optimizing/refining things |
Seems like this is roughly a day task |
@rndquu do you have any remarks on your vision and how granular the guarding will be? Ref: ubiquity-os-marketplace/command-start-stop#17 (comment) I think labeling is nice to have because even our devpool.directory supports this right now. You can see a "UI/UX" and "Research" task. I suppose that in the config, the partner should be able to associate an arbitrary label name to a type of code for the check. This is obviously restricted to coding tasks, but seems straightforward to implement if GitHub recognizes code types for the statistics. |
Granular enough to reduce the number of false positives. Not sure how @Keyrxng is going to implement this plugin but at first glance we could:
I don't think the 1st version of this plugin must be super accurate. But it must be accurate enough to allow assigning this issue only to somebody with solidity experience. |
@rndquu the open pr implements things similarly but in reverse and meets the requirements of the spec, it just needs refined a little.
This approach gives us configurability at the repo level but not the org level although a default baseline could be set, The reason I chose not to use the repo code stats for V1 is because of situations like in the screenshot below. It's not a problem for the solidity repos but will be a problem in others and will need to be addressed specifically as off the top of my head idk how we'd handle this elegantly Overall, I think the current PR meets the requirements for V1. V2 will likely revolve around the real XP system which would follow the same mapped config setup as labels/tags probably so it should be thought out and tasked I think. |
@Keyrxng What unit is it exactly when you say |
1-100, maybe it can be made clearer what sort of threshold it is. rndquu's language stats looks like: I had only looked at mine and @rndquu's stats in QA before but looking over more it's making me doubtful that this is going to be as effective as I first thought. V1 will probs need to do some manual user repo parsing as well Since in the case of newcomer's they won't have any tasks to compare, I think if it's possible to list open/merged user PRs that are solidity based (via Any thoughts? |
This seems to be the percentage of a certain language you are mostly using through your repos isn't it? Which means a beginner that only did one project in TypeScript would get a |
No that sounds right afaik and is why I also included the other markers at first as they'd help out balance that scenario and others that are similar. It looks like this will need to do at least a little bit of manual validation on the user's PRs/repos anyway but without a concrete XP based system we are up against it as there are lots of ways to spoof your github stats and data. |
I think we should get all of the user's commits and determine which languages they are committing in. We can set hard limits for the "ranks". Example: You need 1000 commits containing TypeScript code to be "pro rank" and 250 to be "intermediate" or "mid" rank. We can do a ton of requests because we have a six hour runtime. |
not really ideal for users waiting any length of time for a response plus we are bound by the fact that The endpoint I'm using to gather user stats is open source and self-hostable if we want to go down that route and keep the plugin fast It could be killing two birds with one stone actually re: bringing in more devs, idk how exactly but it could be leverage for that purpose somehow. It would be similar to having our own https://gitroll.io/ |
Cache might be useful, then we only need to run it once per user. We can rerun if they previously were not high enough level? But yes I agree any lag is not attractive.
I don't think this is a good idea to maintain all this infra for this plugin. |
I thought of this and we might be able to get away with it with one user with the worker, but if it's a team then that's potentially tens of thousands of commits. Making assumptions here obviously and I'll only know after testing but I expect it to be problematic. The little work I've done on the faucet, I read that worker limits while appear to be time based are more memory-based than anything else. I have less exp here than any of you folks obviously but if that is your opinion also then that may be a separate issue
agreed it's not ideal for just this alone so I will proceed with other suggestions |
As far as I understand those stats are taken from commits, not from forked repositories so it requires quite an effort to spoof those stats. Anyway I think it's enough for v1. Setting a label like |
@Keyrxng this can run async in the background via the GitHub action runner. Here is a user flow:
Then the action runner can check all their commits at its leisure. |
poor guy probs gets tagged thousands of times per week 😂
So should Or should it be a separate plugin which runs after and has it's own config etc? I feel like it sort of defeats the purpose having a rapid assignment comment and then potentially them being ejected a min or two later. By that point they may have went ahead and forked repos/checked out branches etc So maybe the assignment comment needs to be updated so we inform them ahead of time that they are being xp checked and are temporarily assigned until it's verified? |
I just had a thought based on this issue that we should also build in a @0x4007 It would be easy enough to build into the open PR if you could define label schema
|
Separate plugin. It's literally a couple of minutes max of wasted effort. I think this is acceptable. Can post a warning while it works. Certainly not perfect but it seems like an acceptable trade off.
Private repo is sufficient. |
I just realized with this plugin enabled we should reply with... # Please wait...
# Analyzing your profile to see if you qualify for this task. ...comment before assigning. If they pass, then assign. If not, then edit the message saying that they require more experience. Perhaps something like ! You need more TypeScript projects on your GitHub profile in order to be eligible for this task. It could also be really interesting to include a gif of a loader spinner for some of these transient comments. |
Review has taken me in the direction of this running async after However, if the self-assign checks fail then that event won't fire and so would we delete the comment from within the Maybe we add a config item to
So this should run before
That would be cool, why not task it out and make an on-brand logo loader? |
@Keyrxng, this task has been idle for a while. Please provide an update. |
4 similar comments
@Keyrxng, this task has been idle for a while. Please provide an update. |
@Keyrxng, this task has been idle for a while. Please provide an update. |
@Keyrxng, this task has been idle for a while. Please provide an update. |
@Keyrxng, this task has been idle for a while. Please provide an update. |
Looks like the repo for this was deleted maybe as I cannot find it lmao https://github.com/ubiquibot/task-xp-guard/pull/1 https://github.com/ubq-testing/task_xp_guard - with no repo to PR against I'm unsure what to do here as I'm aware I shouldn't be creating as many new repos |
You make your own repo. On your own org. We copy it when it's finished. |
The most recent PR was deleted so I can't reference the conversation so I want to clarify and summarize here as I am currently refactoring my working approach to use your commit counting strategy. In short, I should collect a user' commits and then reduce the commits down to a tally of file extensions that appeared once per commit regardless of what that commit contains all that matters is appearing in the commit. "deletions" should be ignored from the tally.
query userRepositories($login: String!) {
user(login: $login) {
repositories(first: 100, ownerAffiliations: [OWNER, COLLABORATOR], isFork: false) {
nodes {
name
url
}
}
}
}
I'm at the accumulation stage right now and using @0x4007 as the account to test with.
I've removed hundreds of bad entries manually (left a couple in) and will need to add handling for it but assuming all three aspects are acceptable. 1. Repo collection 2. Tally output 3. Time taken. then how are we transforming these numbers into stats? Just determining weight across the number of different languages? In a team scenario the call count and time taken could easily 2-3x and so it's eating a huge chunk of our rate limit every time a task is started and the delay is ~5 mins per contributor. If we try to make things go faster we'll hit the secondary rate limit which will knock out all other plugins. I think a cache was mentioned but I'm not sure how that would work, when would we update a cache entry - every n tasks? GraphQL does not expose what we need or it does but not the way we need it, i.e it exposes the tree with all files and we can pinpoint files with a path via the tree at that point in time but it doesn't contain the files in the commit. I've been using the explorer for introspection manually and GPT both telling me it's not possible, maybe we are both wrong but I don't think this is a feasible approach without removing the calls to I tried to use Blame and Contribution Graph data from GraphQL too but all roads lead nowhere. |
Maybe we can optimize the rate limits in a different way. For now we can let the job run slowly and as for cache we just need to store the totals somewhere. The first run will be the heaviest, maybe a separate app can handle that. Later runs could "sync" from the last time it ran. For example since you just checked today, and if I start a task tomorrow, it will only check my last one day of commits which should be a relatively tiny amount. Seems like the stats you aggregated look pretty great. It shows I'm pretty experienced with typescript which is what I expected. There's a lot of not extensions in there which should be fixed up later. Also I'm not seeing CSS which is unexpected. We also eventually should read private repos as well by authenticating through ubiquity-os |
I'm sorry I do not understand how that approach is better/more effective than what I had implemented, if the primary goal is detecting the languages which appears most across a user' commits across repos they own and have collaborated on. My approach indexed the same repos and obtained the same final result (that you know TS) in under 10s and used <10 api calls.
Where would that be then, Supabase DB for this plugin or other storage solution?
So we'd check commits from the beginning of time on the first run. We'd save the final output and the date we last ran a check. Then any subsequent usage of |
Ideally git based storage but for now I guess Supabase or whatever works. |
Not ideal as partner' will need to create an additional app just to handle rate limits from a single plugin wouldn't they? or would it be our token that's used across all partners?
So if we are looking at a 5 minute delay on the first run, then really we shouldn't assign the task right away or we are assigning right away?
This doesn't work with using an additional app unless it too had private access. Wouldn't this plugin need the
As I understand this comment ^
|
These two plugins were decoupled and run async as two separate plugins in the chain meaning they should both receive the payload at the same time. Well, as two separate plugins that receive the same payload it's possible that this plugin forwards the same payload onto the Otherwise the only way I can see it happening is if This is what the log above looks like converted via gpt-4o:
Your TS score here is less than the 80% in the picture because there are more "languages" than what was included in my approach. I guess we can address handling language/extension specifics in another task minus the obvious ones that I exclude before merge. |
I linked https://gitroll.io/ before and we could do something similar
pros:
cons:
|
removing my assignment until the spec is made clearer |
@rndquu rfc |
There are certain kinds of tasks which must be completed only by experienced developers.
Check the following issues for example:
It would be great to know that collaborator solving the above issues has prior experience with solidity and ethereum.
Possible solution could be to use chatgpt to parse collaborator's github or CV to make sure a contributor is experienced enough to
/start
an issue.The text was updated successfully, but these errors were encountered: