Exploring the Asset Storage Architecture of Penpot

niwinz · June 9, 2025, 11:48am

Introduction

The main objective of this article is to explain how the asset storage system in Penpot works, along with its internal processes. Before diving into implementation details, it’s important to understand why Penpot needs a dedicated asset storage system in the first place.

Unlike applications that simply store user-uploaded assets and delete them when the corresponding object is deleted, Penpot is an editor. This means users can create elements that contain images, then undo and redo those changes. These scenarios require decoupling the asset lifecycle from the editing process, making it asynchronous and eventually consistent.

Additionally, Penpot uses logical deletion for most objects, allowing users a window of a few days to undo destructive actions. This further supports the need for asset storage to be decoupled from the objects we manage.

Key Features

Key features of the asset storage system include:

Logical and/or deferred deletion
Asset categorization via buckets
Asset deduplication within buckets
Reference counting by bucket
Support for multiple storage backends

The Data Model

Before diving into the features, we’ll provide an overview of the data model, focusing on the two most relevant objects.

`storage_object`

Every uploaded asset creates an entry in the storage_object table (within Penpot’s PostgreSQL database). This table stores basic metadata such as size, type, and the backend where the asset is stored.

Schema:

                           Table "public.storage_object"
   Column   |           Type           | Collation | Nullable |      Default       
------------+--------------------------+-----------+----------+--------------------
 id         | uuid                     |           | not null | uuid_generate_v4()
 created_at | timestamp with time zone |           | not null | now()
 deleted_at | timestamp with time zone |           |          | 
 size       | bigint                   |           | not null | 0
 backend    | text                     |           | not null | 
 metadata   | jsonb                    |           |          | 
 touched_at | timestamp with time zone |           |          |

Notable aspects:

backend: Indicates where the physical asset is stored. Current options are fs and s3. A now-deprecated backend stored files in the database itself, which helped with backups but didn’t scale well.
bucket (in metadata): Categorizes the asset semantically. Main buckets include team-font-variant, file-media-object, and profile.
hash (in metadata): A BLAKE2b hash computed from the asset’s content, used for deduplication (within buckets).
touched_at: Marks whether the object is pending reference analysis. A NULL value indicates no recent changes requiring reanalysis.

`file_media_object`

When a user uploads an image in the workspace, an entry is also created in the file_media_object table. This object is what Penpot files internally reference.

Schema:

                          Table "public.file_media_object"
    Column    |           Type           | Collation | Nullable |      Default
--------------+--------------------------+-----------+----------+--------------------
 id           | uuid                     |           | not null | uuid_generate_v4()
 created_at   | timestamp with time zone |           | not null | clock_timestamp()
 deleted_at   | timestamp with time zone |           |          | 
 name         | text                     |           | not null | 
 width        | integer                  |           | not null | 
 height       | integer                  |           | not null | 
 mtype        | text                     |           | not null | 
 file_id      | uuid                     |           | not null | 
 is_local     | boolean                  |           | not null | false
 media_id     | uuid                     |           | not null | 
 thumbnail_id | uuid                     |           |          |

This table links a file to a storage_object through the media_id and thumbnail_id fields.

For instance, multiple file_media_object entries (e.g., from different templates) can reference the same storage_object thanks to deduplication. This also makes garbage collection (GC) more efficient, as it can query references directly without scanning file blobs.

Other objects like fonts and profile photos reference storage_object directly along with their corresponding bucket names.

Processes

Image Uploads

When a user uploads an image (as-is or as a background), a multipart request is made to the API. And internally the upload process performs the following operations:

The BLAKE2b hash of the content is calculated.
The system checks for an existing storage_object with the same hash and bucket (per example file-media-object).
A new entry in file_media_object is created linking the asset and the file, storing metadata like size and MIME type.
The API returns the file_media_object ID.
The frontend updates the file with this ID.

For fonts or other object types, there’s no intermediate relation; they reference the storage_object directly.

If the image is no longer used (e.g., after an undo), it remains referenced until the file becomes inactive (i.e., no recent modifications) and is processed by FileGC (see below).

Logical Deletes

Although not exclusive to asset storage, logical deletion is tightly related.

When a major object (file, project, team, profile) is deleted, it’s marked as deleted, and an asynchronous cascade process begins. This marks related objects as deleted with the same timestamp.

Actual removal is deferred to the Garbage Collection process, offering a aprox 7-day undo window.

Garbage Collection

We often joke that Penpot’s GC is like a Mark and Sweep garbage collector, with a marking phase (analysis) and a sweep phase (deletion).

There are four main GC processes:

FileGC

Cleans up old or unused image references in a file after a period of inactivity. It also performs many other file cleaning operations and it consists in:

Analyzes file content.
Deletes unused entries from file_media_object.
Marks the corresponding storage_object entries via the touched_at field.

Runs periodically, only processing inactive files.

ObjectsGC

Deletes all non-storage objects marked for deletion after ~7 days. If the object is linked to a storage_object, it sets the touched_at field for later analysis.

StorageTouchedGC

Periodically scans storage_object entries with a non-null touched_at.

Determines the reference strategy based on the object’s bucket.
If no references exist, marks the object as deleted for final removal.

StorageDeletedGC

Permanently deletes storage_object entries marked as deleted. Uses batch deletion to optimize throughput—particularly important for S3-type backends that may charge per API call.

Performance Considerations

These processes must run efficiently and incrementally, avoiding long-held locks. A slow GC process can block user actions like uploading images or templates, leading to performance issues or timeouts.

This is especially important with deduplication. If a GC process is analyzing a storage_object already used in a new template being uploaded, the upload may be blocked until GC completes.

To mitigate this, each GC process uses mini-transactions, reducing locking time and making operations virtually invisible to users.

Topic		Replies	Views
Asset are not stored Installation	10	1173	February 20, 2023
Components and their location Ask the community design	3	511	August 9, 2022
Penpot 2.3 release: Plugin system is here! 🧩 Product updates	6	706	November 22, 2024
Copy Style + Asset for multi propriety Feature requests	1	472	November 27, 2023
Pencil, the Penpot Design System Inside Penpot design	5	834	January 10, 2025