AI Product Design

Skeleton & Loading Patterns for AI

How to design loading states for AI-powered features — where generation can take 2–30 seconds. Streaming, skeletons, progress patterns, and perceived performance.

#skeleton#loading#streaming#progress#ai loading#perceived performance#generation

What is it?

AI generation loading patterns are the UI states presented during the time between a user submitting a prompt and the AI completing its response. Unlike traditional API calls that take 100–500ms, AI generation can take 2–30 seconds or more. This extended wait time requires purpose-built loading design: streaming output, skeleton screens, progress indicators, and in-progress cancellation controls.

Why it matters

Users abandon waits longer than ~3 seconds in traditional web apps. AI generation regularly exceeds this. Without well-designed loading states, users assume the product is broken, click away, submit again, or develop anxiety about the interaction. Streaming output — showing text character-by-character as it generates — is the single most effective technique for making AI generation feel fast, because it converts an empty wait into an active experience.

Best Practices

Stream AI text output whenever technically possible. Streaming dramatically reduces perceived wait time by giving users something to read immediately.
Show a "typing" or "thinking" indicator in the first 500ms before streaming begins — prevents users from thinking nothing happened after submission.
Use skeleton screens for layout-structured AI outputs (reports, summaries with headers). Users see the structure before the content fills in.
Provide a stop/cancel generation button. Users who see the AI going in the wrong direction should be able to stop it without waiting.
Show progress indicators for multi-step AI processes: "Analyzing document (1/3)... Extracting insights (2/3)... Generating summary (3/3)..."
Manage user expectations around generation time for complex requests. "This may take 30–60 seconds" is better than silent waiting.
Preserve the input and context during generation. Users should not be able to accidentally lose their prompt while waiting.
For very long generations, offer an email/notification completion pattern rather than requiring users to wait in-browser.

Common Mistakes

A spinner and blank content for 15 seconds — the worst AI loading experience. Users assume it's broken.
No cancel generation control — users must wait out wrong-direction generations.
Streaming that jumps and re-renders — visually disorienting. Streaming should feel smooth and progressive.
No feedback during multi-step processing — users can't tell if it's still working after 30 seconds.
Streaming too fast to read — when generation is fast, controlling output speed may improve readability.
Losing user context (their prompt, their scroll position) when generation completes.
The loading state disappearing before content is fully ready — blank flash before content appears.

Checklist

Streaming output is implemented for text generation where technically feasible

A thinking/typing indicator appears within 500ms of submission

A stop/cancel generation button is present during generation

Progress indicators describe multi-step AI processes

Long generation times are communicated in advance

User input and context are preserved during and after generation

Skeleton screens match the expected output structure for structured content

Notification/email completion is offered for very long operations

Research & Theory

Perceived Performance and Progressive Rendering

Research from Nielsen Norman Group and Google showing that users perceive progressively-loading content as significantly faster than content that appears all at once after a delay, even when total load time is identical.

Why it's relevant

Streaming AI output is the most direct application of this principle. Character-by-character rendering makes 10 seconds feel like 3 seconds.

Response Time Guidelines — 10-Second Limit (Nielsen)

When a page takes more than 10 seconds, users lose focus and become uncertain whether the task will complete. Engagement drops sharply.

Why it's relevant

AI generation frequently exceeds 10 seconds. Streaming and progress indicators maintain the sense that work is happening, preventing the engagement dropoff.

Real-World Examples

ChatGPT

Token-by-token streaming with a blinking cursor during generation. Stop generating button. Smooth scrolling that follows the output. Typing indicator before streaming begins.

Vercel V0

Multi-phase loading: "Analyzing prompt → Generating component → Rendering preview." Progress is shown per phase. Users can see the component structure before it fills in.

Perplexity

Search results stream progressively. Sources panel populates as sources are found. Related questions appear at the end. The page feels complete and alive throughout generation, not just before and after.