Unfolding the universe of possibilities..

Painting the cosmos of your digital dreams.

The Design of Everyday (AI) Things

UI/UX Principles for Building Great Generative AI Tools

Image source: DreamStudio

Don Norman’s 1988 design classic, “The Design of Everyday Things,” laid out user experience principles that have influenced great hardware and software design ever since. While Norman drew on analog examples like door handles and light switches, his principles are broadly applicable to software, including generative AI products. With all the hype about generative AI, it’s easy to forget that products powered by even the most sophisticated models will fail if they lack good UI/UX.

Many new AI tools have generated loads of interest, followed by lackluster user retention (as detailed here by Sequoia). AI hype drives “tourist” signups, but new users struggle to understand or derive real value from the product. This is the classic “trough of disillusionment”, which occurs when a core technology (generative models) jumps ahead while supporting technology (UI/UX design) lags behind.

This post details how to apply three core UX concepts to generative AI products: 1) affordances, 2) feedback, and 3) constraints. Applying these concepts to generative AI leads to conclusions we’ll explore, including:

Don’t aim for a Hole-In-OneUser feedback isn’t freeTreat chatbot interfaces skeptically

The examples to follow are drawn from workplace productivity tools (and partly inspired by learnings from my work at Tome, an AI-powered medium for shaping & sharing ideas), but the strategies apply broadly, from dev tools to social media to e-commerce.

Topic 1: Feedback

Providing quick, clear feedback to the user about a requested action is critical for any technology. Feedback is especially important for generative AI systems, due to the latency and complexity of their output. And feedback goes both ways. A system must solicit relevant feedback from the user without being annoying, to generate better output for that particular user in the near-term, and to enable better versions of the product in the medium or long-term.

Build for latency

Generative AI model response times tend to range from single to double digit seconds. At first glance, waiting ten seconds for a compelling memo, a brilliant image, or a beautiful video might seem like a non-issue. Without generative AI, these actions take hours — who cares if it takes 10 seconds, 1 second, or 100 milliseconds?

But users aren’t economists optimizing for opportunity cost. They’ve been conditioned by non-AI tools to expect software so fast it’s perceived as instant. This leads to a number of user challenges with obviously non-instant AI products:

Confusion whether the system is working and if they need to retry/restart.High perceived cost of iteration. And since most of the time the first artifact an AI generates is not exactly what the user wants, the user wants to iterate.High likelihood the user starts to multi-task. Once a user has switched away from your app, there is no guarantee they will ever come back.

There are good strategies for mitigating latency effects that pre-date generative AI. These include loading animations, progress bars, and background processing (in which the user is routed to another task and receives a notification when the current task finishes). A newer tactic, specific to LLM features, is streaming text word-by-word (or character-by-character) to the UI, rather than rendering the full output all at once. Since many models can generate words faster than a user can read, this can reduce perceived latency to near zero.

Don’t aim for a Hole-In-One

One particularly effective strategy for mitigating latency is to break workflows into small steps, in which system feedback is provided and user feedback is solicited at each step. This allows the user to progress towards an output with increasing confidence that the system will deliver precisely what the user wants. In a well-designed iterative workflow, the latency of initial steps is low — and the user’s trust increases with each successive step that the final output will be what they want. If you’re quite confident that you’re going to get back the artifact you want, then you will be willing to wait ten seconds for the final step to run.

Iterative workflows have an even more powerful benefit than increasing latency tolerance: they enable users to generate output that matches their expectation better. Generative models can sometimes produce exactly what the user wants, from just a simple user prompt. And going straight from input to “nailed it” final output is an amazing user experience; it’s like hitting a hole in one. And like hitting a hole in one, it’s very rare.

The challenge is not how “smart” the model is, but rather what context and information the model needs to produce the user’s vision. Consider a sales manager who wants to summarize her team’s quarterly performance. She’s seen dozens of quarterly sales reports, and she’s deeply familiar with her company’s norms that govern such artifacts (norms like tone, detail level, length, and visual layout). If she needed a colleague to write her such a report, she’d simply ask for “a quarterly sales report” and expect the colleague to already know these norms.

So when this sales manager wants to get such a report from an AI tool, it’s not obvious to her what norms she needs to tell the tool, and what it already knows. This is where iterative workflows are particularly helpful. She can start with something easy and familiar, like requesting “a quarterly sales report” and the tool can then help her hone in on precisely what she has in mind. Zach Lloyd calls this pattern “ask and adjust” in this well-reasoned piece on AI design.

Tome’s outline editor is an example of an intermediate step in an iterative AI workflow, situated in between a prompt and the final output, a multi-page presentation. https://tome.page

User feedback isn’t free

In many classical ML products, each user interaction generates a new piece of training data for the model, improving the next version of the product. Every user click on a search result helps the search model improve. Every email a user marks as spam helps a spam classification model improve.

But many generative AI products lack the inherent “physics” where user interactions mechanically lead to model improvement. For AI products where the output is a sophisticated piece of text, image, etc, it can be hard to distinguish between a frustrated exit (where the user couldn’t get the output they wanted, and quit) vs a satisfied exit ( the user got what they wanted and left). Some products solicit opt-in feedback (e.g. thumbs up/down), but the completion rates tend to be very low and the feedback itself often suffers from selection bias.

It’s far better to design a workflow where the user’s natural next action indicates their perception of the preceding AI output. One pattern, most commonly seen with text models, is in-line suggestions: if the user accepts the suggestion and keeps writing, that’s a strong signal they viewed the suggestion positively. Another pattern is to instrument what AI output gets saved, edited, or shared. These are not perfectly correlated with user satisfaction — a user may share an image because it’s particularly bizarre — but they’re decent proxies when used in the aggregate.

Topic 2: Affordances

An affordance is a cue (normally visual) that suggests how and when to use a feature. Good affordances make it intuitive for users to interact with a product, without extensive instructions or experience. We’ll explore affordances for generative AI at three steps in the user journey: discovering entry points to AI, proving the right input for the AI, and using the AI output.

Discovering AI entry points

Many work tools are adding lots of AI features, and these features are applicable at different points in the creative process. High-level entry points for using AI features include:

Help me start from scratchExtend what I’ve startedEdit what I’ve created

These different entry points have led to significantly different interfaces, even at this early point in AI interface evolution. For (1), free text or “blank canvas” interfaces have emerged as early leading paradigms. For (2), in-line generation (aka autocomplete) tends to dominate text generation features (like Github Copilot), while “show me more like this” tends to dominate image generation features (like Midjourney). For (3), interfaces tend to focus on highlighting, selecting, or uploading existing content (like Grammarly).

Whimsical’s AI Mind Map helps users get started from scratch. https://whimsical.com

For a user who has discovered one AI entry point in a tool with multiple AI features, it’s easy to conclude “this is where the AI lives” and fail to discover the other features. Great products mitigate this by introducing users to their various AI entry points at the times in the user’s workflow when each entry point is most likely to be useful.

Entering input for AI

The core input of many generative AI workflows is free text input, aka “prompting”. Unfortunately, good prompting is complicated, fast-evolving, and inconsistent across tools. Good products help users craft prompts with strategies including example prompts and tooltips.

Perplexity includes a handful of example prompts on its landing page, to illustrate use cases that extend beyond typical search engines. https://www.perplexity.ai/

Good interfaces also help the user understand the context the AI has — and what it lacks. When working with a powerful AI, a reasonable user may conclude that whatever they can see in the app, the AI must also be able to see and understand. For example, if I can see my past conversation with the AI, surely the AI must also be aware of it (this is a behavior ChatGPT popularized). But not every AI works like this! Some systems are aware of the user’s previous prompts, some are aware of even more context than past prompts — and some are only aware of the user’s current interaction and nothing else. The user should not have to figure out what the system knows and what it does not through trial-and-error.

Using AI output

It’s tempting to think that when a system has produced generative AI output, and the output is good, success is at hand. But even when the output is good, this can be a confusing moment for the user.

First, new users are often left wondering how to persist the output. Even when the output is good, many users immediately want to iterate and see if they can go from good to great. But fear they may lose their existing work can lead to hesitation and frustration.

Second, users may be confused how to improve the output. Assuming they used a “start from scratch” AI feature, should they go back to the beginning? Do they need to move to a different AI entry point like “extend” or “edit”? Many users will have encountered products like ChatGPT where the output is not directly editable; if output is editable, users likely need an editing affordance.

Topic 3: Constraints

Constraints restrict input and output to help users work faster and better. Good constraints are clear to the user. If a system can help a user achieve a goal — but only part way or part of the time — it can be better to prevent that path altogether, rather than deliver an unreliable experience.

LLMs open up vast new user experiences (it’s why I love working on them!) and product creators should be eager to relax traditional constraints from deterministic software. Nonetheless, regardless of how intelligent LLMs become, there will always be a place for some thoughtful constraints.

Karina Nguyen on Twitter: “Design has been operating under the ethos “solving user problems while navigating constraints” but the open-ended, ambiguous nature of LLMs makes this a very limited mindset. Truth is, constraints will always evolve and if you are a designer, your goal is to imagine your ideal… / Twitter”

Design has been operating under the ethos “solving user problems while navigating constraints” but the open-ended, ambiguous nature of LLMs makes this a very limited mindset. Truth is, constraints will always evolve and if you are a designer, your goal is to imagine your ideal…

Input: don’t fear controls

Inspired by the success of ChatGPT, many generative AI tools use a free text box as their only, or primary, user input. But many aspects of a user’s intent are best expressed via categorical or numeric inputs. When creating a document, most users have attributes in mind like language (a categorical) and length (a numeric value). Users may not mention these attributes in a free text prompt, but it doesn’t mean they don’t care about them. By soliciting this input via discrete, bounded controls (like a drop-down or slider), a system helps solicit the input it needs to deliver what the user has in their head. And there are time-honored principles for helping users navigate discrete controls: setting good defaults, grouping controls logically, and explaining controls with tooltips or labels.

When it comes to controls, setting good default values is a critical part of the design. The vast majority of the time (well in excess of 90%) users will not change the defaults, even if they would benefit from doing so. One opportunity to combine good defaults with user preference variance is to adjust defaults dynamically, either via hard-coded rules or AI.

Output: not everything that can be generated should be

For generative AI products, there are many situations in which the underlying model can produce some content, but where the user would nothing vs struggling with misleading or jarring output.

For most work-related tasks, users would prefer “I don’t know” rather than a potentially false answer they’re must verify or refute . This Harvard study at consulting firm BCG shows how AI can diminish work quality when AI answers questions beyond its “confidence frontier” and users, unaware where the frontier is, don’t sufficiently scrutinize the output.

Methods for reducing hallucination are fast-evolving (for example, retrieval-augmented generation), and I suspect hallucination will be a mostly “solved” problem a few years from now — but today, output where factuality is critical remains an important place to consider constraints.

Legal and ethical concerns are a second reason to constrain user-facing output. Just because the underlying model can generate text or images on a topic does not mean it’s conscionable to do so. However, much of the time that a system classifies a user request as “out of bounds”, the user’s intent is actually benign. With a bit of help, the user could rephrase their request to stay in bounds. For example, some image generation tools deny prompts that include the word “child”. But if a user wants to generate a picture of a family with children, they could prompt “family of four” or “parents with son and daughter”. The key is that the constraints are clear to the user.

As generative AI products surge in popularity, good product designers and product managers remember: success stems not just from how smart the AI is; but how the product guides the user through AI-enabled workflows. Core design concepts like feedback, affordances, and constraints remain just as important as ever, but the tactics and patterns for how they are implemented are quickly evolving. Using these design patterns well is critical for any AI company that aspires to endure beyond the initial hype cycle and deliver an enduring, widely-used product.

The Design of Everyday (AI) Things was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Leave a Comment