Skeleton & Loading Patterns for AI
How to design loading states for AI-powered features — where generation can take 2–30 seconds. Streaming, skeletons, progress patterns, and perceived performance.
What is it?
AI generation loading patterns are the UI states presented during the time between a user submitting a prompt and the AI completing its response. Unlike traditional API calls that take 100–500ms, AI generation can take 2–30 seconds or more. This extended wait time requires purpose-built loading design: streaming output, skeleton screens, progress indicators, and in-progress cancellation controls.
Why it matters
Users abandon waits longer than ~3 seconds in traditional web apps. AI generation regularly exceeds this. Without well-designed loading states, users assume the product is broken, click away, submit again, or develop anxiety about the interaction. Streaming output — showing text character-by-character as it generates — is the single most effective technique for making AI generation feel fast, because it converts an empty wait into an active experience.
Best Practices
- Stream AI text output whenever technically possible. Streaming dramatically reduces perceived wait time by giving users something to read immediately.
- Show a "typing" or "thinking" indicator in the first 500ms before streaming begins — prevents users from thinking nothing happened after submission.
- Use skeleton screens for layout-structured AI outputs (reports, summaries with headers). Users see the structure before the content fills in.
- Provide a stop/cancel generation button. Users who see the AI going in the wrong direction should be able to stop it without waiting.
- Show progress indicators for multi-step AI processes: "Analyzing document (1/3)... Extracting insights (2/3)... Generating summary (3/3)..."
- Manage user expectations around generation time for complex requests. "This may take 30–60 seconds" is better than silent waiting.
- Preserve the input and context during generation. Users should not be able to accidentally lose their prompt while waiting.
- For very long generations, offer an email/notification completion pattern rather than requiring users to wait in-browser.
Common Mistakes
- A spinner and blank content for 15 seconds — the worst AI loading experience. Users assume it's broken.
- No cancel generation control — users must wait out wrong-direction generations.
- Streaming that jumps and re-renders — visually disorienting. Streaming should feel smooth and progressive.
- No feedback during multi-step processing — users can't tell if it's still working after 30 seconds.
- Streaming too fast to read — when generation is fast, controlling output speed may improve readability.
- Losing user context (their prompt, their scroll position) when generation completes.
- The loading state disappearing before content is fully ready — blank flash before content appears.
Checklist
Research & Theory
Perceived Performance and Progressive Rendering
Research from Nielsen Norman Group and Google showing that users perceive progressively-loading content as significantly faster than content that appears all at once after a delay, even when total load time is identical.
Why it's relevant
Streaming AI output is the most direct application of this principle. Character-by-character rendering makes 10 seconds feel like 3 seconds.
Response Time Guidelines — 10-Second Limit (Nielsen)
When a page takes more than 10 seconds, users lose focus and become uncertain whether the task will complete. Engagement drops sharply.
Why it's relevant
AI generation frequently exceeds 10 seconds. Streaming and progress indicators maintain the sense that work is happening, preventing the engagement dropoff.
Real-World Examples
ChatGPT
Token-by-token streaming with a blinking cursor during generation. Stop generating button. Smooth scrolling that follows the output. Typing indicator before streaming begins.
Vercel V0
Multi-phase loading: "Analyzing prompt → Generating component → Rendering preview." Progress is shown per phase. Users can see the component structure before it fills in.
Perplexity
Search results stream progressively. Sources panel populates as sources are found. Related questions appear at the end. The page feels complete and alive throughout generation, not just before and after.