Recreating Penpot-MCP Demos (updated: 13/02/2026)

ucan · January 17, 2026, 4:17pm

Update: 12/02/2026
here is the github repo containing the images, prompts and svelte code

Penpot-MCP Experiments with Opus-4.5 and Qwen3 (locally)

I used Anthropics Opus 4.5 (state: 06/01/2026) within the VSCode integrated copilot chat and Qwen3:32b + Open-WebUI running locally.

Hi everyone!

I’ve been experimenting with Claude Opus-4.5 and the results are quite impressive. In this test, the design and code were based on an existing svelte app. Most of the time, Opus generated high-quality results without needing any follow-up prompts.

Below are a few examples of the results. You’ll notice that some designs, like the listening activity stats, weren’t perfect on the first try—the line graph was disconnected and there were minor layering issues where the element order was incorrect. However, these were easily fixed.

Comparison: Reference vs. Result

Home Page

Reference	Result
Home Page Reference1663×1029 86.3 KB	Home Page Result1366×768 65 KB

Listening Activity

Reference	Result
Listening Activiy Reference1325×771 54.1 KB	Listening Activity Result1050×964 53.8 KB

Manage Accounts

Reference	Result
Manage Accounts Reference1329×347 32 KB	Manage Accounts Result1565×652 37.8 KB

The Challenge with Open-Weight Models

While proprietary models like Opus-4.5 perform well, relying on them can be frustrating if you prefer open-source and self-hosted solutions. I tried to achieve similar results using open-weight models like Qwen 3, but I couldn’t reach the desired outcome.

Often, these models would claim a task was “fully accomplished” when no output was actually generated. Other times, they would “die” mid-process or provide inconsistent results. For instance, while they could create a simple card, they struggled to reproduce the result.

Enhancing Model Memory and Task Tracking

Through my research into GitHub Copilot’s chat logs, I found that they use a manage_todo_list tool to help the model track progress. The model first creates a to-do list with all statuses set to “not started,” then updates them to “in progress” and “completed” as it works through them one by one.

I’m currently using Open WebUI to interact with these models. Open WebUI allows you to import custom tools (python scripts) to further customize and adjust the workflow to one’s needs.

I’ve been experimenting with custom tools for memory_enhancement, and also for managing tasks manage_todo_list, designed to help smaller models ‘take notes’ on declared variables, and also to break complex tasks into smaller ones and keep track of them.

The results did not go as expected, I did not notice any improvements, even though the tools were used, the model would ignore what it did with the tools beforehand, e.g. ignore the list of declared variables and create new elements instead of updating existing ones, or just mark tasks as ‘in progress’ and then ‘completed’, despite doing nothing.

I’m curious if there are plans to give Penpot-MCP tools for workflow management that help the model plan and keep track of the design process, such as:

Task Tracking: Breaking tasks into atomic, testable steps, adding them to a task list, and updating the list accordingly.

Error Handling: Preventing the model from repeatedly retrying arbitrary code by analyzing the error type, querying relevant API documentation, and updating its reasoning context.

Memory Management: Specifically for Penpot’s working environment, to prevent losing track of state across multiple code blocks or design elements."

My Goal: Fully Local Design Workflow

I’m still testing local models because the ultimate goal is a self-hosted Penpot instance running with a local LLM. I’ve updated my prompts to include more detail and task-tracking tools to counter hallucination and lost context, though results are still hit-or-miss (often resulting in stacked cards rather than updated elements).

[!info] If anyone has resources or tips for working with Penpot–MCP and local LLMs, I’d love to hear them! I’ve also shared a GitHub repository containing the Svelte app, the prompts, the reference images and the custom tools for open-webui I used.

Failed & Work-in-Progress Attempts

Below are the examples where the initial logic failed or where I am currently testing alternative layouts.

Failed Graph (Non-continuous lines):

Screenshots showing the progress and the final result (last image) for the task ‘Create an alternative layout for the home page’:

The final result!

It is an alternative layout for sure, but I would not recommend just prompting ‘Create an alternative’ and hoping for a good result. Also, while producing this result, Opus-4.5 deleted the entire design multiple times and started new.

I think it’s because of the way Penpot components work: Main components cannot be detached from themselves … Make sure that you are trying to detach a component copy.

So, after creating a component (which does not fulfil the user’s request), Opus just deletes the whole component instead. In this case, the entire design was deleted 5-6 times before finalising the design.

Failed (but still the best) results of Qwen3:32b for creating a User Settings Card:

Work in Progress	Final Result
User Settings Card created by Qwen3:32b 1 out of 2736×484 5.5 KB	User Settings Card created by Qwen3:32b 2 out of 2736×484 6.73 KB

the updated prompt penpot-mcp-experiments/qwen3-32b.md

Snippet of the updated prompt:

<system>
You are an AI assistant creating designs in Penpot using the penpot-mcp tools.
You MUST use task tracking and structured reasoning context to ensure reliable execution.
</system>

<instructions>
You are a highly sophisticated automated coding agent with expert-level
knowledge across many different programming languages and frameworks and
software engineering tasks - this encompasses debugging issues,
implementing new features, restructuring code, and providing code
explanations, among other engineering activities.

The user will ask a question, or ask you to perform a task, and it may require
lots of research to answer correctly. There is a selection of tools that
let you perform actions or retrieve helpful context to answer the user's
question.

By default, implement changes rather than only suggesting them. If the user's
intent is unclear, infer the most useful likely action and proceed with
using tools to discover any missing details instead of guessing. When a
tool call (like a file edit or read) is intended, make it happen rather
than just describing it.

You can call tools repeatedly to take actions or gather as much context as
needed until you have completed the task fully. Don't give up unless you
are sure the request cannot be fulfilled with the tools you have. It's YOUR
RESPONSIBILITY to make sure that you have done all you can to collect
necessary context.

Continue working until the user's request is completely resolved before ending
your turn and yielding back to the user. Only terminate your turn when you
are certain the task is complete. Do not stop or hand back to the user when
you encounter uncertainty — research or deduce the most reasonable approach
and continue.

</instructions>

<task>
Create a card container rectangle with title and subtitle and form fields (language dropdown) in my current penpot file, its mandatory to use the penpot-mcp tools.
- Use 'Flex Layout'                      
- Convert the User Settings card into a component and logically group the layers
</task>

<workflowGuidance>
For complex projects that take multiple steps to complete, maintain careful tracking of what you're doing to ensure steady progress. Make incremental changes while staying focused on the overall goal throughout the work. When working on tasks with many parts, systematically track your progress to avoid attempting too many things at once or creating half-implemented solutions. Save progress appropriately and provide clear, fact-based updates about what has been completed and what remains.

When working on multi-step tasks, combine independent read-only operations in parallel batches when appropriate. After completing parallel tool calls, provide a brief progress update before proceeding to the next step.
For context gathering, parallelize discovery efficiently - launch varied queries together, read results, and deduplicate paths. Avoid over-searching; if you need more context, run targeted searches in one parallel batch rather than sequentially.
Get enough context quickly to act, then proceed with implementation. Balance thorough understanding with forward momentum.
</workflowGuidance>

Note: I used Gemini to proofread, correct and revise the initial text for this post, to make things easier for you and me.

Hey everyone,

I was happy to hear about the release of an official Penpot-MCP, and I tried to recreate the examples shown on Penpot’s official YouTube channel or in Penpot’s AI Paper. The latter referred to demo videos on Google Drive.

However, I could not recreate a single example as shown in these demos, so I am looking for guidance. It’s not that the results were “a bit off”, they did not resemble the desired output at all. So, I was wondering if there are more details on the Penpot MCP demos, e.g., prompt templates, LLMs and files used (e.g., Penpot or source files).

Thanks in advance.

The demos I am referring to:
@juan.delacruz | MCP demos - Google Drive

YouTube - Quick demo: Penpot MCP server in action

juan.delacruz · January 19, 2026, 8:32am

Hi @ucan

I understand perfectly what you are saying. I feel your frustration because using LLMs and Prompts can be sometimes very hard and results vary a lot. It is definitely not magic.

To give you more context, almost all the use cases you see in our videos were made using Opus 4.5 model and the Cursor agent. This is very important because the model, the agent, and the prompt change everything.

I have attached here the exact Penpot file we used in the demos so you can test with the same source.

Also, after fighting a lot with agents, here is a brief summary of “Good Prompting” that works for us. If the input is weak, the output will be bad:

Define the Role: Don’t just say “You are a designer.” Be specific. “You are a Senior Product Designer expert in Accessibility and Design Systems.”
Structure the Prompt: Think of it like a User Story ticket. Give Context, Objective, Restrictions (e.g., “only use existing components”), and Quality Criteria.
Images are key: The AI “sees” but needs guidance. Tell it exactly where to look in the screenshot (e.g., “focus on the negative space in the header”).
Give specific Rules: Don’t expect it to read the full documentation. Give explicit rules: “Use only colors from /core/colors” or “Do not invent new font sizes.”
Iterate: Do not expect a perfect result in “one-shot.” It is a conversation: Analysis → Proposal → Feedback → Adjustments.

I hope this helps you get better results. It is a trial and error process!

ucan · January 19, 2026, 4:45pm

Thank you for the helpful tips and resources, I’ll try them out right away! As for the prompts, I tried to follow the examples in the demo videos, which wasn’t easy because most of them were cut off.

I also realised later that Cursor IDE and Opus 4.5 had been used. As I don’t have a premium Cursor subscription, I could only set it to ‘auto’. However, I didn’t get a single result. Although the Penpot MCP server was running and connected, and I could see that the MCP server and tools were recognised correctly, the LLM still couldn’t access them. To resolve the issue, the model generated complete JavaScript code, which I then had to enter manually via the Penpot API REPL. I tried this, but it didn’t work.

That’s why I’m currently working with VSCode, where I’ve tried various models such as Codex, GPT-5.2 and Opus 4.5. Unfortunately, I haven’t obtained any usable results yet. I also realised that I was being far too general. My goals are to recreate a finished HTML/CSS page in Penpot and to generate design variations exclusively using components from the library.

So far, I have had the following results: (1) Designs are not created at all, or are only partially created and not in a usable state, (2) components from the library are not used or (3) elements are cut off.

I also made attempts without being too specific, for example, I prompted to create ‘just’ a button. The BookNook website has been the most successful so far, but I’m no longer sure about the model or prompt used, the model could have been GPT-5.2.

In any case, I’ll continue to try out your suggestions. Unfortunately, I can’t use Cursor here. I’ll let you know if I find a setup that works well.

Thank you all for your great work!

Here are examples of my first attempts. I mainly tried to create designs in Penpot using components from the library or to generate Penpot designs and components based on finished source code + reference image.

Cards

Card using components from the library

ucan · January 19, 2026, 4:46pm

Recreating a card from HTML/CSS with a reference image

ucan · January 19, 2026, 4:46pm

New wireframe based on a existing design

ucan · January 19, 2026, 4:47pm

Web page

Web page using components

ucan · January 19, 2026, 4:48pm

This example ist the best so far, but I can’t get consistent results using the same workflow. I provided HTML/CSS and a reference image. Model should have been GPT-5.2.

ucan · January 19, 2026, 4:48pm

Here is the failed attempt using Opus 4.5

and the prompt

Take this HTML/CSS and recreate it in my current Penpot file. Do not just draw boxes: use 'Flex Layout' for the containers, convert the cards, buttons, list items into components and logically group the layers.

@included files: 
- index.html and style.css generated by Gemini 3 Pro 
- Reference Image

Mat · March 16, 2026, 10:14am

ucan:

ake this HTML/CSS and recreate it in my current Penpot file. Do not just draw boxes: use 'Flex Layout' for the containers, convert the cards, buttons, list items into components and logically group the layers.

@included files: 
- index.html and style.css generated by Gemini 3 Pro 
- Reference Image

These are some of the propts I got from one of the Penpot videos:

Penpot

@mcp-penpot create a basic and meaningful prototyping interaction between the screens currently selected in Penpot. Look for possible interaction elements, such as buttons with text or icons, or cards that can act as interaction triggers to navigate between boards.

@mcp-penpot take the HTML / CSS generated from <> and recreate it in my current Penpot file. Do not just draw boxes, use ‘Flex layout’ for the containers, covert the buttons and other action triggers into components, and logically group the layers.

Generate a Designs System documentation file from a Penpot file:

As a Code Generation AI specialised in Design System documentation, your task is to fully extract the design system data from the provided Penpot file / API response and generate a single, comprehensive markdown file (~/AI/mcp-servers/design-system-documentation.md).

The output must be structured with the following, mandatory sections including all available detail:

Introduction & Overview - A brief, professional introduction to the Design System (DS) and its purpose (eg. Consistency, scalability). Specify the source Penpot file, project name and date of extraction.
Source Penpot File: [Insert Name of Penpot Project/File]
Date of Extraction: [YYYY-MM-DD]
Design System Colors - A comprehensive colour palette. Document semantic color with usage notes &colors with RGB, Hex & HSL values.
Design Tokens: Typography - A complete typography system featuring font-family, text styles (font-size, weight, line-height & letter spacing), typography scale based on Minor Third ratio.
Include details usage guidelines.
Design Tokes: Spacing & Layout - Full spacing & Layout specifications:
6-level spacing scale (XS to 2XL) in px and rem.
12-column grid system with gutter and margins.
Component dimensions table.
5 responsive breakpoints.
Design Tokes: Depth & Effects - Shadows & radius tokens:
Shadow elevation with full CSS box-shadow value
7 border radius values from 4px to 80px.
Opacity values for various shapes.
Component Inventory: Details of major components:
Button (3 variants: Primary, Secondary, Icon:Text).
Input field (with search variant).
Card (Main card, Related card with dimensions and shadows).
Modal/Dialog (Slide in).
Navigation item (4 navigation types).
Category Pill (with 4 different states).
Tabs(with 4 different states).
Banner Carousel (width fade transition, and position indicator).
Additional components (e.g. Headers).
Additional guidelines - Responsive behaviour, Accessibility, Performance, Usability and design principles.
Implementation guidelines - CSS Variable example and HTML component usage.

Critical Requirements

Use Markdown Tables extensively for structured data (Colors, Typography, Spacing, Breakpoints).

All extracted values must be presented with their numeric value and unit (eg. 16px, 1rem, 0.75).

Ensure all sections are present and populated with the available data from the Penpot extraction process).

Use descriptive markdown headings (#, ##, ###) for clear hierarchy.

DO NOT GENERATE ANOTHER FILE TO DOCUMENT THE PROCESS, JUST GENERATE design-system-documentation.md file.

Context: The Design System foundations (Color and Typography tokens) have already been synchronised between Penpot and the codebase (CSS variables/ utility classes etc.) The LLM has access to the Penpot Model Context Protocol (MCP) and the design-system-documentation.md file. Do not generate extra documentation of the process. Once you finish show me the result opening the html in the internal browser in VSCode.

——

Generate a complete, single-page solution for a standard User Login Screen. Add the files to my project.

The output must consist of two separate, non-documentary code blocks only:

HTML (semantic and structured).

CSS (modular and token-based)

The size of the login screen must be mobile.

Design & Consistency restraints:

Layout/Aesthetics: It must adhere to to the overall aesthetic, visual hierarchy and layout style present in the Penpot file’s main screens.

Typography: Use the defined Text Style tokens.

Spacing: Apply spacing tokens for padding, margins and gaps within the form and container.

Components: Leverage properties for input fields, buttons & card containers (like border-radius and box-shadow tokes) as defined by the system.

Output constraint: DO NOT generate an introductory text, explanations, notes or markdown tables).

============

1. Introduction & Overview

This Design System (DS) serves as the foundational, single source of truth for the [Project Name] user interface and experience. Its primary purpose is to ensure visual consistency, improve collaboration between design and development teams, and enable scalability across digital products.

By utilizing pre-defined components, typography, and color tokens, this system reduces redundant work and accelerates development cycles.

Source Penpot File: [Insert Name of Penpot Project/File]

Date of Extraction: [YYYY-MM-DD]

ucan · March 16, 2026, 11:59am

Thank you, that’s very helpful. I’ll try it out as soon as I find the time

Topic		Replies	Views
Penpot 2.15: Master of Puppets Product updates	1	517	May 18, 2026
Multi-directional workflow with Claude via MCP Ask the community mcp , multi-directional	12	576	March 18, 2026
Penpot MCP Skill Contribution community , mcp , ai-agent , ai	5	238	May 28, 2026
Penpot MCP Server showcase + ask for help Events and Announcements release , community , ceo , plugins , llm	18	3300	May 26, 2026
Do you want help us improve Penpot? Help us improve Penpot	69	4285	May 4, 2026