Capturing sentiment alongside awareness/usage #183

LeaVerou · 2023-08-11T19:00:47Z

LeaVerou
Aug 11, 2023
Maintainer

Starting a new thread here since Idea: Quick Context (#163) was getting too long and had evolved quite a bit from its original proposal (which I don’t support anymore now that we have a clearer picture of the requirements).

Problem description

The current feature questions capture awareness (never heard of it, heard of it but haven't used it, used it). We also need to capture sentiment (interested / not interested, want to try / don't want to try, want to use again / don’t want to use again). Without sentiment, these questions are practically useless to browsers, which is a shame since they are the bulk of the survey.
For this to be useful, it's important that response rate for sentiment is very high (as a % of the response rate for awareness). It doesn't need to be 100%, but 5% will not work.
Given the sheer number of these questions (55 in State of CSS, 30 in State of JS), any amount of friction introduced by the UI (excess scrolling, multi-click response process etc) can add up to quite a lot over time.
The design selected needs to allow for a button to provide optional feedback (so no matrices or auto-scrolling).

Option 1: 5-answer multiple choice questions

This is a template already used in some questions in other surveys, mainly around tooling.

Pros:

Already implemented, requires zero effort
Familiar UI
One click interaction on both desktop and mobile
Response rate is 100% that of awareness

Cons:

Increases scrolling by about 66%, on what is already a lot of scrolling.
This flattens what is conceptually a 2 step process, resulting in repeated options (two "Heard of it" options, two "Used it" options), causing redundant user effort
No way to refrain from providing sentiment Often, sentiment can truly be neutral or close to it, and there has already been feedback about questions using this format that neutral sentiment cannot be expressed. Forcing respondents to go one way or the other will give us noisy data — or even put people off selecting an answer altogether. Yes, it is common in surveys to avoid having a neutral option to prevent respondents that slightly lean one way or the other from picking neutral, but these are typically 4+ point likert scales, not just positive/negative.
This will not work in other surveys, since we cannot change existing questions into a 5-point format, to preserve comparability the results to previous years.
No way to capture sentiment for features the respondent hasn't seen before. While not part of the core requirements, capturing sentiment for APIs the respondent is seeing for the first time can give us very useful data (does the API make immediate sense to people? can they see use cases without any digging?) Note that most feature questions come with a short description and code snippet, so it’s entirely possible to understand what an API does even if the respondent has never seen it before (and no, they would not select "heard of it" for APIs they are seeing for the first time, because that clearly implies heard of it before the survey).

Option 2: Sentiment Chips

Update: 👉🏼 Latest prototype 👈🏼

feature-desktop.mp4

feature-mobile.mp4

Older prototype

in-answer-lite.mp4

This interaction preserves the 3-answer format for awareness, with followups for sentiment expressed as green/red¹ buttons next to each awareness answer. The followups are visible on hover/focus without any UI shifts. Followups of selected answers are always visible. Clicking on a followup also selects the answer associated with it, facilitating responses to both with a single click. To save space on mobile, followups could be coded as 👍🏼 👎🏼 buttons, with only the description of the selected sentiment visible.
We could also reword the 3 answers to match the shorter ones in the 5-answer template ("Know what it is, but haven't used it" → "Heard of it", "I've used it" → Used it).

Pros:

Preserves the 3-answer format, to reduce overwhelm and scrolling distance
Awareness and sentiment as separate chunks of information, lowering cognitive load
Question can still be anwered in one click (on mobile)
Can be applied to all surveys — in fact, maybe we could transition all of these questions (features + tooling) to this format, so we could only have one, rather than two
Neutral/no sentiment can still be expressed.
Color coding facilitates faster responses (for non-colorblind populations)¹
Possible to also capture sentiment for features the respondent is seeing for the first time
Can be used in multiple choice questions as well (e.g. tooling questions) which opens up the possibility of doing mini-feature questions: checkbox questions that group many features and ask about usage or awareness (not both). For example, for stable features, awareness doesn't tell us that much, but sentiment around usage can still be very useful. IMO this in itself is reason enough to go this way.

Cons:

Requires some implementation effort (though the CSS is largely already written)
Less common type of interaction compared to a flat multiple choice question
Concerns about the complexity of a multilevel solution versus a flat list
Some people dislike things that show up on hover
Sentiment response requires an extra tap on mobile (but finger taps are "cheaper" than mouse clicks, thus it’s common for mobile interfaces to trade off space with number of clicks)
The way to to respond to both in one click might not be discoverable, resulting in a two click process on desktop as well

Color scheme would need to be swapped in many Asian locales. ↩ ↩²

SachaG · 2023-08-11T22:47:26Z

SachaG
Aug 11, 2023
Maintainer

Thanks for creating a separate thread, it was getting hard navigating the other one! I won't address everything point by point because I mostly agree with you about the tradeoffs of each solution.

But I guess my overall position would be that based purely on my own subjective intuition and experience, I do not feel like Option 2 is clearly superior to Option, 1 given our requirements.

Now I'm happy to be proven wrong, and it's true we have a chance to do just that with user testing.

BUT – I do not think changing the UI should be our priority right now when the survey outline isn't even done yet. To be honest, when I heard about the original end-of-August deadline, it already seemed very tight WITHOUT any user testing or UI changes.

So if the decision is left up to me, I would recommend against investing our resources into: implementing this new UI, testing it, discussing the test results, and implementing new improvements – at least at this time (we can of course revisit this topic for future surveys, when we have a little bit more time).

That being said I'm well aware I'm not the only person involved in the project (and I'm very grateful that this is the case!), so I don't intend to force a decision that others are not happy with. What does everybody think?

0 replies

michaelquiapos · 2023-08-14T06:05:11Z

michaelquiapos
Aug 14, 2023

Hi,

I'm coming from the position of user-first principles, and to share with you an industry proven design approach.

Two UIs for capturing sentiment alongside awareness of web platform features and tools.
To find out which one performs better or figure out how users do across variants of a design. We randomly assign each visitor a different experience or a different piece of content and track their activity to see how a different experience affected the outcome.

The right thing to do when variant testing is to test as early as pre-launch, and have a continuous testing post-launch when there's enough audience to build on sentiment and awareness, to make informed decisions.

Yet if you must turn into which UI we should consider using right now? Let's base assumptions over a top-down model of user archetype(s). The context is in answering multiple choice questions. We have Option 1: 5-answer multiple choice questions and Option 2: Quick sentiment.

Within these two UIs (Opt 2) has elements that merge the standard Pros of an already Familiar set of UI used in other surveys.

To guide user behaviour and improve their experiences is a controversial topic so here are some key considerations why and how familiarity and comfort matters in UX. Familiarity plays a crucial role in the usability and overall UX of a product or service. Familiar designs are more likely to be easily understood and navigated by users. It is closely tied to cognitive processes such as attention, perception, and memory. When users encounter a design that's familiar to them, their attention is more likely captured, and they are more likely to perceive and remember the design's elements and interactions. There are 2 other factors that influence familiarity. Design elements and patterns, Navigation and information architecture.

Familiar design elements and patterns include common layout structures, navigation systems, and common design elements. A clear and well-organised navigation system can make it easier to find information and understand the overall structure of an interface.

Familiarity affects ease of learning, speed of completion, and error recovery / prevention. Where am I going with this? Keep the design simple and straightforward, avoid unnecessary complexity, and give clear and concise information.

As Lea mentioned that if we implement Option 2 in time, we can even present user study participants with both (within-subjects design), and see how they do across both. And with a limited capacity, you could fall forward with to guide user behaviour with familiarity and comfort because that's what matters in UX.

2 replies

SachaG Aug 14, 2023
Maintainer

So which option are you in favor of?

michaelquiapos Aug 14, 2023

If it must turn into which UI should be considered between Opt 1 or Opt 2 right now. I'd approach it on the already Familiar set of UI which is Opt 1 first.

Yet my position on this is to find out which one performs better or figure out how users do across variants of a design through variant testing.

atopal · 2023-08-14T23:45:26Z

atopal
Aug 14, 2023

I agree with you on all points, Lea. I'd just add that the first one increases scrolling by 66%, but the second one increases clicks by at least that much. I also used to feel strongly about providing a neutral option, but I'm now convinced by the evidence that it's often not a good idea, as you pointed out, this is not a Likert scale, so maybe not applicable here.

In the end, I'd probably still prefer the second option, but only just so, and I'm not sure the impact will be materially different. I think focusing on the questions for the user testing is probably going to be more impactful.

1 reply

LeaVerou Aug 15, 2023
Maintainer Author

but the second one increases clicks by at least that much.

I’m not so sure about that; it depends on how fast users discover that they can select both with a single click. I cannot predict when that will be — some users may go through all these questions never discovering this, while others may figure it out in the first few questions. That’s exactly the type of thing we find out in user testing.

We could also detect with JS if users treat it as a two click process, and show them a tip about it (maybe after they've done it a few times).

LeaVerou · 2023-08-15T16:21:38Z

LeaVerou
Aug 15, 2023
Maintainer Author

I’m pasting here the heuristic eval by @michaelquiapos, as it think it's a shame if it's lost in an email thread. I’ve styled the items common across both options in ~~strikethrough~~ to make it easier to find the differences.

Option 1

Heuristic usability eval

~~Use of words and concepts familiar to the user.~~

~~Does not have a clear way to exit the current interaction, like a clear or cancel button.~~

Consistent with other surveys which follow established conventions.

~~No error prevention.~~

~~Offer help in context.~~

No provision on accelerators.

Focused on the essentials.

~~It’s best if the system doesn’t need any additional explanation. However, it may be necessary to provide documentation on how to complete tasks.~~

Accessibility eval

Won't be too much of a problem as long as the website is optimized for responsiveness.

Option 2

Heuristic usability eval

~~Use of words and concepts familiar to the user.~~

~~Does not have a clear way to exit the current interaction, like a clear or cancel button.~~

Maintaining consistency is unclear nor does this follow established industry conventions.

~~No error prevention.~~

~~Offer help in context.~~

Some type of provision on accelerators, tailor content and functionality for individual users, and allow options and shortcuts to clear data entry.

Keep the content and visual design of UI focused on the essentials.

How can the icons be used with other UI element(s) as a visual anchor?

The placement of the check mark over the Pill button element does not support industry conventions.

Aesthetic position and grouping of the buttons may need some more clarity.

Navigational persistence, what if the selected answers have more or lesser word count?

~~It’s best if the system doesn’t need any additional explanation. However, it may be necessary to provide documentation on how to complete tasks.~~

Accessibility eval

What do these components look like on a web-friendly, responsive, adaptive, mobile-optimized screen?

How will these components behave on a different screen size?

Manual evaluation

Is this perceivable

Do not use color as the only way to show the button. Consider when people are unable to see colors in a normal way.

Provide alt text. Test it between the background colors.

Is this operable

Make sure all functionality can be accessed like 'Tab' on keyboard-only users.

Is this understandable

Use plain English but also check for localization, does it make sense after localizing? (not the direct translation of the word)

Is this robust

Using cross-browser compatible HTML scripts. The use of valid HTML so user agents can accurately interpret.

Let assistive technologies know what every element is for, what state it is currently in and if it changes.

I’ll post some responses in a reply below.

3 replies

LeaVerou Aug 15, 2023
Maintainer Author

@michaelquiapos:
Thank you for taking the time to do a comparative heuristic eval!

As an aside, in heuristic evaluations, it's generally good practice to mention which taxonomy each heuristic comes from (Nielsen, Norman, Tognazzini, Shneiderman etc) to make sure we're all on the same page, as they can have subtle differences. Also, when doing a heuristic evaluation with the explicit purpose of picking between two interfaces, since those that apply to both in exactly the same way are less useful, it may be useful to deprioritize them in some way (listing them can be useful for signaling that you've considered everything so the next person is reassured and doesn't have to think about it for themselves).

In this case, I think the following heuristics are also highly relevant and would be useful to consider:

Flexibility & efficiency (Nielsen) / Efficiency of the User (Tognazzini)
Aesthetic & minimalist design (Nielsen)
Reduce short-term memory load (Shneiderman)
Autonomy (Tognazzini)

Especially efficiency should be a big focus here, since any friction adds up over the 50+ questions that will use this.

Specific responses / questions:

Does not have a clear way to exit the current interaction, like a clear or cancel button.
No error prevention.

These are inherent to anything using radio buttons. We should likely implement a generic interaction where clicking on a selected radio button unselects it, but this is outside the scope of this thread. I’ve just opened #186 about this.

How can the icons be used with other UI element(s) as a visual anchor?

Not sure I understand this. Are you asking if we can use icons in the buttons as a visual anchor? If so, I suppose we could adopt 👍🏼👎🏼 icons, which would also simplify implementing the mobile version (which uses them).

The placement of the check mark over the Pill button element does not support industry conventions.

The check mark is an additional cue to establish which button is selected, since using colors and borders exclusively to communicate selected status always confuses some users (and is harder to detect for people with atypical color vision). While not super common, the ticks are definitely becoming a convention (I'll try to remember some examples and list them here). That said, if you have a better idea about how to communicate selected status in addition to borders and colors, I'd love to hear it!

Aesthetic position and grouping of the buttons may need some more clarity.

Please elaborate?

Navigational persistence, what if the selected answers have more or lesser word count?

Please elaborate about the issue you see?

What do these components look like on a web-friendly, responsive, adaptive, mobile-optimized screen?

How will these components behave on a different screen size?

I’ve described this in the OP.

Do not use color as the only way to show the button. Consider when people are unable to see colors in a normal way.

This does not apply here; the button uses primarily text (and/or 👍🏼👎🏼 emojis) to communicate its purpose. Color is only used as an additional cue for visual anchoring.

Provide alt text. Test it between the background colors.

These are not images, so this does not apply here. The text is actual UI text. In terms of markup, they'd either use styled buttons, or hidden radio buttons with styled labels.

Make sure all functionality can be accessed like 'Tab' on keyboard-only users.

Also doesn't apply, since these are regular form elements (see above), so this comes for free by the browser.

Use plain English but also check for localization, does it make sense after localizing?

The language used is practically identical to Option 1, so not sure why this is only listed under Option 2. The main i18n concern is color, which I’ve addressed in the OP (swapping green with red in certain locales).

Using cross-browser compatible HTML scripts. The use of valid HTML so user agents can accurately interpret.

I don't understand what you mean here. If you’re saying that any JS written needs to work cross-browser, that's already a given. In fact, it's unclear if implementing this will even require JS at all, it's possible to do entirely with HTML & CSS.

michaelquiapos Aug 18, 2023

generally good practice to mention which taxonomy each heuristic comes from (Nielsen, Norman, Tognazzini, Shneiderman etc) to make sure we're all on the same page, as they can have subtle differences.

I'm pretty sure Bruce and Ben would be great when I get the chance to study their work. I'm rooted on J.Nielsen, D.Norman, and HFI practices.

Are you asking if we can use icons in the buttons as a visual anchor?

I was pertaining to the icons after the radio buttons. Whether they are consistent anywhere else in the survey as feature. Just like when you see the Like/Unlike icons which are practically everywhere.

The check mark is an additional cue to establish which button is selected, ... the ticks are definitely becoming a convention ...

Conventional standards in UI button/link may be enough (there's a bit more into designing it). I know this can be done quite easily for a desktop view, yet can be a challenge to pull-off for smaller/adaptive/responsive screens.

Also, that check mark could be perceived as a notification element of an added action possibility in the relation between a user and an object. What if you apply button state affordances? e.g., the shadows and the shape make the blue rectangles stand out as buttons. Reversing/inverting those is a subtle way of indicating the state appearance: default, highlighted, focused, selected, and disabled.

Aesthetic position and grouping of the buttons

Users are task-focused, so making these groupings visually obvious increases usability by placing related elements in close proximity. I would test for placing the buttons closer to the right-side call out.

what if the selected answers have more or lesser word count?

I think we have to test this over smaller screen-sizes and (if there will be) a longer set of words.

LeaVerou Aug 23, 2023
Maintainer Author

Sorry, I missed this response.

Also, that check mark could be perceived as a notification element of an added action possibility in the relation between a user and an object.

in general, I see what you mean, but in practice here I don't think there's much risk of that, since the checkmark appears right after you select the sentiment chip.

What if you apply button state affordances? e.g., the shadows and the shape make the blue rectangles stand out as buttons. Reversing/inverting those is a subtle way of indicating the state appearance: default, highlighted, focused, selected, and disabled.

The issue is that today these affordances are too dulled to be enough of a signifier.
Anyhow, let's see if any user study subjects stumble on this and we can tweak from there.

Users are task-focused, so making these groupings visually obvious increases usability by placing related elements in close proximity. I would test for placing the buttons closer to the right-side call out.

I considered that, but the proximity to the answer text is intentional: it complements the selected answer, follows up from it. I’m worried that if I group them with the comment icon on the right it would be much easier to miss.

LeaVerou · 2023-08-25T02:00:02Z

LeaVerou
Aug 25, 2023
Maintainer Author

I fleshed out the sentiment chips idea a bit more today, and made a higher fidelity prototype: https://lea.verou.me/files/stateof/mocks/sentiment/

Interaction Videos

3-point Feature questions

Desktop

feature-desktop.mp4

Mobile

feature-mobile.mp4

Checkbox questions (mini-feature questions)

Desktop

minifeature-desktop.mp4

Mobile

minifeature-mobile.mp4

Followup comments

option-comments.mp4

Changes from earlier prototype

There are several changes wrt the UI & UX:

Up/down arrows as an additional visual anchor. Subtle animation of the arrows on hover & selection.
Mobile friendly: When the question width is under 500px, only the arrows are displayed until a sentiment is selected, in which case its label slides in.
Integration with checkbox questions. These do not have comments, only sentiment chips (having comments as well would be nice, but I imagine hard to do, as it would involve storing multiple comments per question)
Comment now uses a slide-down accordion-type container instead of a popover, i.e. what @SachaG had originally.
Keyboard navigation (some intentional and some that comes for free since we now use radio buttons)
Several other small refinements
I’ve also trimmed the feature choices to be closer to the 5-point questions, and I’m experimenting with the sentiment-free emojis discussed in Using sentiment-free emojis for answers? #188

Non-user facing changes:

Code & logic which has been entirely rewritten to make it easier to adopt
The HTML for the sentiment chips now uses labels and radio buttons and hopefully should be largely accessible (and submittable!) out of the box.
Question metadata is now read from a JS file, so we could experiment with different content to see how it works.

Caveats: Protoype does not currently work in Firefox or older browser versions. This is fine for the user study, though we'd need to fix it pre-launch.

Architecture

Changes to Data model

Each question that includes sentiment chips needs to store a separate sentiment field, which is a single number (-1 = negative, 1 = positive, 0 = no selection) for single choice questions and an array of such numbers for multiple choice questions (length of array would be equal to number of selected options).

Changes to question spec (Specifying sentiment labels)

Sentiment label pairs (e.g. Interested/Not interested or Want to use again/Don’t want to use again) can be defined at the question level, or the option level. Both are needed: in features each option has a different pair, in checkbox questions the whole question has the same pair. If an option doesn't have a defined pair, it is inherited from the question. This also allows individual options to opt-out of sentiment chips (e.g. you don't want "None of the above" to have sentiment chips). You can see how this works in the sample questions.

There are three predefined sentiment pairs:

Interested / Not interested
Want to try / Not interested
Want to use again / Don't want to use again

These can be used by simply referencing their name.

My prototype uses an array ([positive_label, negative_label]) to define sentiment pairs, but {positive: positive_label, negative: negative_label} might be better (?). Or even {"-1": negative_label, "1": positive_label }.

Potential compromises to reduce implementation effort:

Drop arbitrary sentiment pairs and only use presets defined globally (we can just define more presets as needed)
If individual options cannot have their own settings, option-level sentiment pairs can be stored with the question, as an array of arrays (or array of objects if using an object to store each pair). This means we cannot combine option-level sentiment pairs with question-level sentiment pairs, i.e. to opt out or use a different pair on a single option, we have to define the pairs for every option. But we can live with that. Maybe we can make "None of the above" a question-level setting and hardcode that it has no sentiment chips, which deals with the only use case so far where both are needed (and is useful even outside of this feature).

Changes to Markup

While I tried to work with the existing markup & CSS as much as possible, it will require one small markup change: moving the option <label> deeper down, right before .comment-trigger-wrapper, to avoid nested labels. The CSS should be adoptable as-is for the user study, though will need some work for the final version to support a wider range of browsers (it currently uses things like :has(), @container, and color-mix()).

Logic

The logic is spread across:

Models (Option, Question, SingleChoiceQuestion, MultiChoiceQuestion)
The HTML template
The CSS.

The prototype is implemented with Vue, though Vue is only used for reactivity in the HTML: there are no Vue components, no plugins, even the app spec is empty, so it should be easy to follow even for someone who doesn't speak Vue.

Basically all you'd need to understand is:

Mini Vue Template Syntax Cheatsheet

v-model does two way data binding
v-for renders collections (arrays, objects etc)
v-if="foo" removes elements from the DOM whenever foo is falsy. There's also v-else-if and v-else.
v-html sets innerHTML
:attrName sets attributes to a JS expression and updates them reactively (e.g. :value="i === 0? 1 : -1" changes whenever i changes). :class and :style also support an object-based syntax for convenience.
{{ foo }} reactively prints out the value of the foo expression
@-prefixed attributes are event listeners, e.g. @click listens to the click event. There are also modifiers like @click.stop to stop propagation, and modifiers for specific events, e.g. @keyup.esc to only do stuff when the Esc key is pressed.

A few notes:

Since sentiment chips now use radios, it's essential to have code to uncheck them, both on click and with the keyboard. It appears that browsers only fire a click event when the radio is unchecked. The prototype implements Esc for unselecting, but we may want to also use Space. This ad hoc code would need to be removed if Implement error recovery for radio buttons: clicking on a selected radio button should unselect it #186 is implemented.
If the relevant option is not selected, the radios are removed from the DOM (and their labels should probably get role="presentational" or something) so that they don't mess with the tab order or submission
When the option selection changes:
- the comment widget is closed
- sentiment is cleared
When clicking on a sentiment chip of an unselected option, it also selects the option (this is a crucial part of this)

1 reply

LeaVerou Aug 29, 2023
Maintainer Author

@SachaG Just thought: If you would prefer, instead of chips with text, we could use 😞😃 emojis, with the sentiment label as a tooltip (and/or have it appear when it's selected, like the current prototype does on mobile). I think that satisfies the same goals (sentiment prominent but optional, single click interaction possible) so if you'd prefer that, no issues from me. Let me know if you want me to prototype that too.

eric-burel · 2023-08-25T14:25:31Z

eric-burel
Aug 25, 2023
Maintainer

Hey everybody, the work done here is impressive! I can't add anything relevant on the UX side, but just wanted to give a bit more context on the data analysis side.

If I sum up the point of sentiment analysis as "do people like/want this feature or not", we don't need that much data to answer this question reliably, just enough to achieve significance in a binomial hypothesis test. For instance if we get 500 respondent, with 270 having a positive opinion on a feature, the test is significant (p-value is roughly 4%). With 100 respondents, 60 positive answers is enough.

My maths might be wrong so I'll double check that properly later on, to compute how many answers we need to get to a 5% significance level depending on the ratio of positive answers.

But you get the idea: we need more data for controversial features, less for non-controversial features, sometimes even a very small number of respondants can be enough.

This is something we could take into account to find a compromise between mental workload and precision of the responses. Perhaps some features are deemed "risky" by vendors and that's where we could accept an higher workload for respondents. We could even indicate that explicitly that the question is controversial and therefore their answer all the more appreciated. For other less controversial features, a more discrete feature could be sufficient to get significant data.

Now there is the selection bias, more involved respondent are more likely to answer which affects the result in an unknown fashion. But I think we might want to keep that a separate issue and not care too much about that for now. We are currently working on detecting sub-populations among respondents via clustering, which could be a direction to better understand those biases.

0 replies

SachaG · 2023-08-31T01:22:27Z

SachaG
Aug 31, 2023
Maintainer

I've deployed the different UIs under consideration here:

https://survey-staging.devographics.com/en-US/survey/state-of-html/2023/

Page 1 & 2 (Forms & Interactivity): 3-question UI with "Tell us more…" button
Page 3 (Web Components): 5-question UI with no follow-ups
Page 4 (External Content): 3-question UI with sentiment chips

(We're not really considering the "Tell us more…" UI for this survey but I will probably use it for other surveys)

10 replies

LeaVerou Sep 1, 2023
Maintainer Author

Screenshot?

SachaG Sep 1, 2023
Maintainer

https://share.getcloudapp.com/QwuBvGXQ

SachaG Sep 2, 2023
Maintainer

They are shortened to be identical to the 5-point answers (minus the sentiment, since we are providing separate UI for that).
Can you please shorten them in the deployed survey as well so we can be consistent? Otherwise it's not a fair comparison with the 5-point answers.

Actually that doesn't really address the issue I raised about the short labels being less clear. If the label is "Heard of it", but I have used library X, then I'm going to be tempted to click "heard of it" (because of course I've heard of something if I'm using it), even though what I should've clicked instead is "Used it".

That's why the current label explicitly states Know what it is, but haven't used it, because otherwise people are going to be like "Well I've used Foo, so of course I do know about it… but I can't click both because they're radio buttons… so which one do I pick?"

I'm sure they'll figure it out eventually but there's a reason why the wording is the way it is, it's trying to eliminate any ambiguity that could trip people up.

LeaVerou Sep 2, 2023
Maintainer Author

To reiterate, the 5-answer template uses the same shortened labels, and I presume you are ok with them there, so not sure why you are arguing against them now.

I see two options here:

Either we make the 5-answer template also have longer labels ("Know what it is, but haven't used it > Would like to learn") — hopefully you agree that this is pretty bad
Or we change the sentiment chips template to use the same shorter labels as the 5-answer template

The only difference in my labels is that I also shortened the first one ("Never heard of it/Not sure what it is" to "Never heard of it"). "Not sure what it is" makes what to select ambiguous: What do I pick if I've heard of it but I’m not sure what it is?

PS: I just noticed that the 5-answer template is inconsistent in the answer order: the 3-answer template goes from less to more knowledge, whereas the 5-answer template from more to less knowledge. Given that the same survey can have both, we should stick to one order for both. I suggest using the more -> less knowledge order, since then people pick the first one that matches, rather the last one that matches.

SachaG Sep 2, 2023
Maintainer

I think in the 5-answer format, since the full label is visible at all times it mitigates the issue a bit. But anyway it's not a deal breaker, I was just pointing it out as something to watch out for. I've actually already changed it to the short labels in the outline.

Re: answer order, in the HTML survey they both have the same order. It's true in the screenshot above the order is reversed, but I'm not sure why (maybe a temporary issue?).

eric-burel · 2023-08-31T13:12:39Z

eric-burel
Aug 31, 2023
Maintainer

Hi again,

Here are more details about the data analysis concerns regarding capturing sentiment. This is strictly from the data analyst point of view, many other dimensions should be taken into account, but I hope this can fuel the decisions being made.

0) Awareness vs sentiment

To capture awareness, it makes total sense to collect as much data as possible from the ecosystem, as the sheer number of respondents is a valuable information.

To capture people sentiment, proportions are more interesting than headcount. And statistics can tell us exactly how many responses to collect to precisely compute a proportion. Sometimes it's much less than we expect.

1) Yes/no answers

Say that we want to answer the following question:

"Do people have a positive opinion of the img tag?"

We want to collect enough responses to answer this question confidently.

The number of response we need actually depends on how controversial the question is.

For instance, if you ask 10 people randomly and 9 have a positive opinion, you are done. If you ask 10 people randomly and have 6 positive opinion, it's not enough, you can't conclude anything.

Here are the exact numbers, for a 95% confidence (industry standard for AB test):

9 responses are enough if 90% of them are positive
15 responses if 80% are positive
30 responses if 70% are positive
110 responses if 60% are positive
450 responses if 55% are positive
9800 if 51% are positive
40 000 if 50.5% are positive

If you have less data than that for each level of positiveness in the table, you can still conclude but with less "confidence".

The math behind: this is a binomial hypothesis test, I've computed a rough value using a normal approximation, and computed an exact value using R from there.

2) Estimates

Now, say that you have 63 positive responses, on a total of 107 responses (so 69.8% of positive responses). You'll want to answer the following question:

"Is 69.8% of positive sentiment a good estimation, given that I got a total of 107 responses?"

The table above confirms that the sentiment is positive (= over 50% of the population is happy with the img tag), but it doesn't tell if "69.8%" is a good estimate of the sentiment of the global population.

I didn't crunch the exact numbers but basically the value given above are a minimum. More data are still welcome so the actual value (69.8% here) can be considered reliable.

What this imply for the UX and takeaways

We don't need to collect that much data to confirm that people like a feature.
The more controversial the question is, the more data we need to conclude over a positive response (or negative) versus no conclusion. We need few data to answer a yes/no question if it leans a lot towards yes or no, we need a a huge amount of data to conclude on controversial issues.
More data is still helpful as it makes the estimated value more reliable.
More data could also help in the future to reduce sampling bias (say a population is more represented, we can remove some redundant data while still having enough responses)
The duration of the survey helps removing some biases (that's why a few responses are still ok, the sampling is truly random)
Here we don't compute the "strength of the positiveness", only the proportion of positive response. For that we would need something like a Likert scale, which is more effort than a simple "yes/no" question

2 replies

LeaVerou Sep 1, 2023
Maintainer Author

This is fantastic @eric-burel, thank you so much for this in-depth analysis!

How does optional vs mandatory (with no neutral option) sentiment factor in to these calculations, if it does?
I would imagine that making it mandatory adds noise to the data as people with no actual sentiment are forced to pick something almost at random, do we have a model for how that affects confidence intervals etc.?

eric-burel Sep 1, 2023
Maintainer

Good question, first to moderate my first post, as @atopal mentioned we need to be cautious with our confidence intervals, I can't actually guarantee a "95% confidence" due to the fact that people self-sample at 2 levels:

they choose to answer or not the survey, so the population we test is always "the Devographics respondents" and not "developers"
they choose to answer or not to give sentiment on a question, so there is a risk of adding an additional bias (but it's not systematic)

@SachaG gave a great examples: what happens for Angular, given that we may have "Angular lovers" and "Angular haters" answering the survey?

if there are more Angular lovers than Angular haters in our respondents, there is bias 1). I don't really see a solution to this issue yet, this is inherent to having an open survey. Our population is "Devographics respondents", not "all developers".
if the Angular lovers are more prompt to answer sentiment questions than Angular haters, there is bias 2), the sentiment ratio is not even representative of the "Devographics respondents" population.

Bias 1) is very likely to exist but regarding bias 2) I can't conclude.

Making the sentiment response optional may or may not induce bias 2), it actually depends on whether their decision to answer is correlated to their opinion. For instance if people tend to share a negative opinion more easily than a positive opinion, you got a bias (how people behave on social media let me suspect that people will indeed more easily share angry comments than positive comments but let's be optimistic...). That's something that could be interesting to experiment some day.

Making the sentiment response mandatory will totally remove the risk of having this bias, but the responses are still limited to "the Devographics respondents" so it's not very useful to do so from a data analysis point of view imho.

Despite those risks, I think we can still use the table, which represents an ideal scenario, as a rule of thumb: around 50-52% it's impossible to conclude, between 60 and 70% you need a few hundreds answers, after that we can be happy with a hundred answer or even a few dozens.

Regarding the neutral sentiment, the same idea works but with just slightly different parameters:

you can think of neutral answers as "non-positive", similarly to negative opinion => so you can consider a positive opinion if you have more than 33% of positive response (this is a quick answer I will double check if it's correct but either way I confirm handling the neutral response is feasible)
so the "strength of the non-positiveness" (being neutral vs being actively negative) is ignored in this model
this means I would need to compute a new table using "33%" as the frontier instead of 50%, but the idea will be the same: around 33% you will never have enough data, far from it a few hundreds/dozens can be enough
it's of course symmetrical with negative sentiment, in practice what you will do is checking the ratios for each response, and then check significance. Example: (42% negative 30% neutral, 28% positive) => it's negative, and we use the table to check if have enough respondent to be confident in this result.

I've asked the question on Reddit in hope to get confirmation: https://www.reddit.com/r/AskStatistics/comments/167cgdd/binomial_test_with_negative_neutral_and_positive/

Another possibility is to do 2 tests: "positive vs neutral", "positive vs negative" for instance. That's a repeated test likewise to an A/B/C test, so you need an additional control logic => you need a lot more data for this to work so that's not something we want to do. Instead we may want to stick to "positive" vs "neutral + negative".

Again I need to put that on paper more properly this is just a rough picture.

(I'll definitely turn that into an article later on as this can be useful for interpreting the results)

atopal · 2023-08-31T13:26:09Z

atopal
Aug 31, 2023

Hey Eric,

You're right about all those things, but unfortunately that only works when your sampling is random and representative of the audience. We have convenience sampling, so anyone can participate, but the audience itself might not represent the population, so we can't calculate confidence intervals, just descriptive stats.

1 reply

eric-burel Aug 31, 2023
Maintainer

That's a good point, to be more precise I always should prefix the questions something like so: "Within the Devographics respondent population, do people have a positive opinion of the img tag?"

But I am not sure we should differentiate confidence intervals from descriptive stats here. I feel like both are biased the same way, because both are computed over the Devographics population and not the whole "developers population". My reasoning is that as soon as you compute a ratio instead of just looking at a headcount, you are doing estimates on an uncontrolled dataset

A possible solution to reduce the biases in our survey would be to resample according to any criteria we may find relevant. For instance we might want to balance the sample based on demographics criteria. That certainly means collecting more data than the value I've given in the table above, which are a strict minimum.

There are still biases left, for instance we don't know if people with a negative opinion will answer more often than people with a positive opinion and probably others, but I am not sure if all of them can be fixed.

I don't have field experience here though, my knowledge mostly comes from A/B testing, so feel free to correct me of course

SachaG · 2023-09-02T06:08:17Z

SachaG
Sep 2, 2023
Maintainer

Applying Lea's sentiment chip idea to the 5-option format to improve readability!

2 replies

LeaVerou Sep 2, 2023
Maintainer Author

I like the direction!

I think you should only wrap the actual sentiment text with chips though, for many reasons:

That will make it more consistent with the first option, right now it looks a bit weird that the first option is the only one with a different presentation.
It would communicate at a glance that the first option doesn't support sentiment.
Currently you're creating affordances that are in conflict with the actual UI: they look like buttons/chips, but just check a radio button when clicked. Restricting this styling to the sentiment bit will help with that. I wonder if it may also help to apply the selected styling from the other prototype to the sentiment "chips" inside the selected option (not sure, would need to see it in practice).
Once you unwrap the experience options, you can add back emojis (the sentiment free ones, 🤷🏼👀etc) which were a nice way for visual anchoring

SachaG Sep 2, 2023
Maintainer

Good points, I need to play around with the styling. My priority is the visual anchoring aspect, so they might not need to look like buttons at all. I think that if the parts that are common to each options ("heard it", "used it") are easy to group together visually it will make scanning all 5 options much easier.

SachaG · 2023-09-02T21:52:55Z

SachaG
Sep 2, 2023
Maintainer

Oh I can't remember if we discussed that, and it's not a huge detail, but does it make sense to have sentiment next to "don't know what it is"? What will people base their sentiment on?

3 replies

SachaG Sep 2, 2023
Maintainer

Also just some added details on phrasing: "Never heard of it/Not sure what it is" aims to capture respondents who have heard the words (for example) "logical properties" before but have literally no idea what that feature does. We assume they can't give accurate sentiment information, which is why we put them in that first bucket. And label the second step "Know about it" to make sure it only applies to people who have some knowledge about the feature.

If the option label is only "Never heard of it", then these respondents will migrate to the "Heard of it" bucket instead and potentially lower the quality of the data.

Not necessarily a problem, but I just wanted to explain how the current labels came to be.

LeaVerou Sep 2, 2023
Maintainer Author

Oh I can't remember if we discussed that, and it's not a huge detail, but does it make sense to have sentiment next to "don't know what it is"? What will people base their sentiment on?

Most features have a description and a code snippet, that's plenty to pique the interest of at least some of the respondents (and to get others to decide that they are not interested). So if sentiment is not mandatory, I think it's fine to have it there too (and keeps that option consistent with the rest). Of course if sentiment is mandatory, we cannot possibly ask for it there.

SachaG Sep 3, 2023
Maintainer

Fair point, I'll leave it in then.

ShaineRosewel · 2023-09-04T02:01:39Z

ShaineRosewel
Sep 4, 2023

Hi there! I just checked the survey. I am seeing this.

0 replies

ShaineRosewel · 2023-09-04T02:59:56Z

ShaineRosewel
Sep 4, 2023

If I suddenly decide to just skip the question after initially selecting an option, I can no longer deselect my initially selected option. Can 'deselecting' be possible? This happens to questions whose options are in radio buttons.

2 replies

LeaVerou Sep 4, 2023
Maintainer Author

This is unrelated to sentiment (unless it's sentiment that you cannot deselect?) and tracked in #186

ShaineRosewel Sep 4, 2023

Oh you're right. It's #186

LeaVerou · 2023-09-04T12:34:34Z

LeaVerou
Sep 4, 2023
Maintainer Author

I think both templates are shaping up nicely for the user testing!

I think it's important to be as consistent as possible in the wording across them — we want to compare and contrast the UIs, not the copy.

Right now, these are the options in the 5-answer template:

And these in the 3-answer template:

To converge the wording we'd need to make these changes:

5-answer template changes:
- Had an overall positive experience -> Positive experience
- Had an overall negative experience -> Negative experience
- Would like to learn -> Want to learn
Could go either way:
- Heard of it <-> Know what it is (Changed it in the 3-answer format)
- I’ve used it <-> Used it (Changed it in the 3-answer format)
- Interested (under "Know what it is/Heard it") <-> Want to learn (Changed it in the 5-answer format)

3 replies

SachaG Sep 4, 2023
Maintainer

I see your point but I think one of the main strengths of the 5-answer template is that it allows us to use longer, more descriptive labels. So I'm not sure I agree that we necessarily want to constrain them to being the same as the 3-answer template (this mostly applies to the "Had an overall xxx experience" ones).

LeaVerou Sep 5, 2023
Maintainer Author

I think such long text genuinely harms the UX of that template, it's not just about consistency. What additional information does "I had an overall positive experience" give us over "Positive experience"?

SachaG Sep 5, 2023
Maintainer

I'm not sure, I was afraid that just "positive experience" was too terse. But maybe that's something that testing can shed some light on.

Capturing sentiment alongside awareness/usage #183

LeaVerou Aug 11, 2023 Maintainer

Problem description

Option 1: 5-answer multiple choice questions

Option 2: Sentiment Chips

Footnotes

Replies: 14 comments · 30 replies

SachaG Aug 11, 2023 Maintainer

SachaG Aug 14, 2023 Maintainer

LeaVerou Aug 15, 2023 Maintainer Author

LeaVerou Aug 15, 2023 Maintainer Author

Option 1

Heuristic usability eval

Accessibility eval

Option 2

Heuristic usability eval

Accessibility eval

LeaVerou Aug 15, 2023 Maintainer Author

LeaVerou Aug 23, 2023 Maintainer Author

LeaVerou Aug 25, 2023 Maintainer Author

Interaction Videos

3-point Feature questions

Desktop

Mobile

Checkbox questions (mini-feature questions)

Desktop

Mobile

Followup comments

Changes from earlier prototype

Architecture

Changes to Data model

Changes to question spec (Specifying sentiment labels)

Changes to Markup

Logic

LeaVerou Aug 29, 2023 Maintainer Author

eric-burel Aug 25, 2023 Maintainer

SachaG Aug 31, 2023 Maintainer

LeaVerou Sep 1, 2023 Maintainer Author

SachaG Sep 1, 2023 Maintainer

SachaG Sep 2, 2023 Maintainer

LeaVerou Sep 2, 2023 Maintainer Author

SachaG Sep 2, 2023 Maintainer

eric-burel Aug 31, 2023 Maintainer

0) Awareness vs sentiment

1) Yes/no answers

2) Estimates

What this imply for the UX and takeaways

LeaVerou Sep 1, 2023 Maintainer Author

eric-burel Sep 1, 2023 Maintainer

eric-burel Aug 31, 2023 Maintainer

SachaG Sep 2, 2023 Maintainer

LeaVerou Sep 2, 2023 Maintainer Author

SachaG Sep 2, 2023 Maintainer

SachaG Sep 2, 2023 Maintainer

SachaG Sep 2, 2023 Maintainer

LeaVerou Sep 2, 2023 Maintainer Author

SachaG Sep 3, 2023 Maintainer

LeaVerou Sep 4, 2023 Maintainer Author

LeaVerou Sep 4, 2023 Maintainer Author

SachaG Sep 4, 2023 Maintainer

LeaVerou Sep 5, 2023 Maintainer Author

SachaG Sep 5, 2023 Maintainer

LeaVerou
Aug 11, 2023
Maintainer

Replies: 14 comments 30 replies

SachaG
Aug 11, 2023
Maintainer

SachaG Aug 14, 2023
Maintainer

LeaVerou Aug 15, 2023
Maintainer Author

LeaVerou
Aug 15, 2023
Maintainer Author

LeaVerou Aug 15, 2023
Maintainer Author

LeaVerou Aug 23, 2023
Maintainer Author

LeaVerou
Aug 25, 2023
Maintainer Author

LeaVerou Aug 29, 2023
Maintainer Author

eric-burel
Aug 25, 2023
Maintainer

SachaG
Aug 31, 2023
Maintainer

LeaVerou Sep 1, 2023
Maintainer Author

SachaG Sep 1, 2023
Maintainer

SachaG Sep 2, 2023
Maintainer

LeaVerou Sep 2, 2023
Maintainer Author

SachaG Sep 2, 2023
Maintainer

eric-burel
Aug 31, 2023
Maintainer

LeaVerou Sep 1, 2023
Maintainer Author

eric-burel Sep 1, 2023
Maintainer

eric-burel Aug 31, 2023
Maintainer

SachaG
Sep 2, 2023
Maintainer

LeaVerou Sep 2, 2023
Maintainer Author

SachaG Sep 2, 2023
Maintainer

SachaG
Sep 2, 2023
Maintainer

SachaG Sep 2, 2023
Maintainer

LeaVerou Sep 2, 2023
Maintainer Author

SachaG Sep 3, 2023
Maintainer

LeaVerou Sep 4, 2023
Maintainer Author

LeaVerou
Sep 4, 2023
Maintainer Author

SachaG Sep 4, 2023
Maintainer

LeaVerou Sep 5, 2023
Maintainer Author

SachaG Sep 5, 2023
Maintainer