-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How should we analyze feedback? #5
Comments
...Or even how do we collect data of any kind and have some kind of agreement on what that generally might mean. For instance, a lot of conversation is about what developers will 'get' or what they will find 'confusing'. Today, we can provide implementations through babel which really helps the economics of being part of the 'discussion'. People don't have to read theoretical code and imagine theoretical examples, they can touch them and try them in real use cases solving real problems and then, in theory, we should be able to get 'data' there and 'do science' to inform that argument away from just speculation. There might be some people who think 'developers are going to be very confused by this' but if we have a way to collect data/opinions better that doesn't have to persist. I don't know what this looks like really, but it feels very worth thinking about. Do we have some kind of data? |
Well, once we have things in Babel, then what do we do? |
No that's what i mean..How do we know and analyze what's happening once it's in Babel |
Agree. But we should also note there are some footguns/confusions only occur in edge cases. It's not easy to find them by "data". And it's very possible to neglect them because "data" told you it's edge case. I think some edge cases can be ignored (if they can be avoid by some practical ways), some are not (if no tools can protect us or will have very big cost which even exceed the benefit of using the feature which may cause the footgun). |
We spend tons and tons of time in TC39 theorizing about edge cases, how practical they are to avoid, how bad they are, etc. It's not clear to me how to combine this with real data. Certainly, I don't want to fall into the pattern of basically rejecting all data because "it didn't take footgun theorizing into account". |
@littledan I never mean reject data. Actually I want both. They are two sources of information to compensate for bias and possible shortcomings. |
One example of this question will happen for the pipeline proposals championed by @littledan. There are currently two front-running, competing proposals, each making different trade-offs. It’s difficult for anyone to weigh multiple interacting pros and cons and make decisions based on them, when using only their abstract reasoning. So @mAAdhaTTah and I are working on a Babel plugin that implements both proposals. Once it’s done, stakeholders could actually try out both proposals and see how they feel in concrete code. Hopefully, TC39 then would have more information to make an eventual decision regarding the proposals. But, once the Babel plugin is implemented, what actually happens then? What kind of stakeholder data should be collected, and how should they be analyzed? Twitter threads, informal surveys, and opinion polls are easy to do but are coarse (measuring only superficial gut impressions) and come with many pitfalls (“opinion polls: potential for inaccuracy” and “total survey error” list several examples). But more-rigorous types of data are more difficult to obtain. For instance, some months ago, there was talk with @littledan and @codehag of running small A/B testing or some other sort of randomized controlled trial on programmer test subjects, in which they might write or read functions with different versions of the pipeline operator and perhaps the proposed This question, of course, is broader than one mere operator. But that operator may end up being an interesting pilot case for new approaches. It’s all yet very preliminary, and the Babel plugin isn’t done yet, anyway. Just one example, in any case. |
Yes, exactly. I think that that is the next question for standards in general. it's an entirely new 'aspect' of standardization made possible by the idea of 'speculatively polyfilling' and transpiling proposals and allowing developers to 'participate' mostly by just 'using' and therefore helping the economics and depth. AB studies would be interesting, or even just workshops or meetups that review, but I think that the depth afforded by 'real use' is really important. Looking back on 'failed' standards there is often a lot of excitement for them. There are seemingly good arguments about why they were going to be so great and even early adopter mavens who seemed really happy about them. But then, once they start to get really used, in real life, by everyone... not so much. At the end of the day, we'll only truly know whether something 'excels' or 'struggles' or even just outright fails 'in the large' when a fairly significant amount of people have actually spent time trying to use it to solve real world problems. Things like babel give us the ability to do that in a way we never could, in theory, if only we had data we could analyze and people to start figuring out how to actually do it. Like, on pipeline, two things that came up were "developers won't get/like X" and "lodash is one of the popular ways in which you achieve a similar thing". So, I'm just kind of spitballing here but it seems like there are several metrics that could be interesting: First, just what is the uptake - do we even have a way to measure it? Can we? If so, I feel like there are plenty of other questions you could ask: Are projects that currently use that bit of lodash switching to one of the these over time? Are projects that started with one of these switching to another - and so on.. |
Indeed, these approaches are not mutually exclusive. All of these approaches could be tried simultaneously, although they all would take months or even years to fully develop. Of course, one big disadvantage of deploying competing proposals to real code is that only one proposal can be chosen while writing the code. This makes it difficult to actually compare proposals. It also can bias people’s initial impressions of the proposals based on what their organization chooses. And from the early user’s perspective, there is also the big risk that the particular proposal that they choose is not the one that will be eventually chosen by TC39. Some of these disadvantages could be mitigated by code transformers that could convert the same code between the different proposals. Not only would this allow real code bases to quickly change proposals when TC39 makes its final decision, it would allow people to simultaneously read and compare versions of the same code using the different proposals. But it still may be difficult to analyze even real-world experience from many heterogeneous sources and many heterogeneous code bases... |
There's been some experimentation, especially in the class fields proposal driven by @hax and @Igmat, with writing articles or giving presentations, followed by a multiple choice poll about the preferred outcome. I've done a bunch of Twitter polls myself. What kinds of things do you all think you can learn from this sort of data? Are there any best practices that we should recommend to people who are trying to collect this information? Or is its value inherently limited? How does this data compare with findings from in-depth, interactive discussions with people? |
FWIW the CSSWG occasionally uses this sort of thing for things that are mostly bikesheddy or debate gets into what developers will or won't "get" or "prefer". Members of the wg typically RT/share the poll giving it a bunch of fairly diverse eyeballs of people who follow CSS for diff reasons. We all realize it is not binding or scientific and sometimes there is nearly as much debate in crafting the question in the first place and interpreting the results. It's not perfect, but it is occasionally it is helpful in stemming off unproductive conversation paths and getting focus. I don't know about real 'best practices' but I think things that are helpful there are agreeing how to ask the question (and that it is a valuable question in the first place) and getting different kinds of people to share it. I think it's just different data points really, valuable in different ways at different times. in-depth conversation is better in collecting actually 'constructive' feedback, and you can do it earlier too - but the sample is often small and kind of biased toward the people well prepared to discuss it which is probably not reflective of the larger community either. Finding ways to plug into early use measurement or something would also be different - measuring a broader scope of people and something more concrete, but available much later. |
@littledan Actually I believe @Igmat's and my presentations about class fields are much more "in-depth" than any other forms before. Especially, a big problem of class fields proposal is the "whole cost" of too many issues (even each issue may small), which simple articles/polls can never demonstrate. |
I think it could be great to work together with people outside of the committee to refine the materials before distributing them more broadly. I have my issues with these particular presentations, which I think could be resolved if we work together on materials for something in the future. |
As I have stated before, I never want to give speech publicly about the "dark side" of any proposal, and I never did such thing before. The only reason I am doing it now is I believe TC39 process has big flaws on controversial issues and never possible to fix them. So I was forced to warn the community a broken proposal will be landed and you will face many footguns. |
In this repository, I'm trying to improve exactly this issue about collecting community feedback and making sure the committee doesn't ignore it. In my opinion, we'll be more effective working collaboratively with champions, rather than presenting it as a challenge/"dark side". |
I want to expand idea that I've shared originally in tc39/proposal-class-fields#203 (comment). The main part of it is these 3 questions:
I would propose the following:
* - to make such polls valuable, article on top of which poll is handled should be accepted by proposal champions, but it doesn't mean that it should contain only positive proposal aspects, but rather fairly describe all its advantages/disadvantages. @littledan, what do you think about this? |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Right, so, I think champions will not be interested in polls whose goal is to move a feature back stages. Instead, I think we should work on outreach to help us determine whether proposals should move forward, and how proposals could solve developers' problems. |
I haven't proposed demoting at all, just delaying in some circumstances. If feedback is more negative (which is VERY rare situation) then something wen wrong and we have to carefully weight every decision we've taken so far.
No, I'm talking about article + poll pair, which goal isn't neither moving back or forwards. Such activity goal MUST NOT be "to prove somebody's opinion" (whether he/she is committee member or community), but rather to get OBJECTIVE numeric assessment of how would it be accepted in the wild. Does it make sense? |
Oh, I didn't mentioned it in my proposal, but I was talking about article+polls in |
I don't think we'll ever reach objectivity, either in the design itself or in our understanding of JavaScript developers' feelings and preferences. Articles and polling populations will always be influenced by the perspectives of those involved, as will any other mechanism of outreach. The best we can do is acknowledge that and do the best we can from there. |
Hm, I think that if both sides involved in research preparation it could be objective enough to reference it in further decision making process. But lets keep it aside for a while. Do you have any other ideas how to involve larger community and guarantee them, that they'll be heard (it doesn't mean that something will be changed, but rather that opinion is really taken into account)? To be more clear - my intention is to AVOID situations when committee and community wastes their time on useless debates, we all have similar goal - improve language design. |
Yeah, it sounds like you're getting at a bunch of interesting questions that I don't know the answer to:
|
|
One of possible arguments to convince the community is appearance of somebody who represents community during committee's meet-ups. This person could probably present community's feedback to committee too. |
Many delegates already serve that purpose, including the JS Foundation which is an entire member dedicated to that goal. Additionally, proposal champions typically all present this feedback when discussing their proposals. Can you elaborate on how your suggestion would be different, so we can get a better idea of what needs fixing with the current setup? |
Right, I don't think there should be any one person who's the channel of community feedback; I'd say all TC39 delegates are trying to do right by JavaScript programmers. But I don't think @Igmat was implying otherwise. If we want TC39 delegates present at meetups, maybe we should make a list of TC39 delegates who are interested in being reached out to for this purpose. I'm not sure whether the committee might be interested in maintaining such a list; I'll ask people at this TC39 meeting. If they're not, maybe we could maintain the list here. |
@littledan Is the list available now? It will also be a good list for invitation to our tech conferences :-) |
cc @gesa |
Discussed in the educators meeting in December, cc @bkardell
The text was updated successfully, but these errors were encountered: