PostgreSQL database schema
The yworkflow-instance-store-jdbc-postgresql yWorkflow module uses a relational database schema designed to support workflow state management, history tracking, and long-running operations.
The schema follows a normalized design that separates workflow instances, state instances, and their relationships, enabling efficient querying and scalability.
Schema Overview
The schema consists of five main tables that work together to provide complete workflow persistence:
-
workflow_instance: Represents a single execution of a workflow definition
-
state_instance: Tracks individual state executions within a workflow instance
-
state_instance_history: Maintains the relationship chain between state instances
-
state_instance_owners: Manages ownership and assignment of states
-
persistent_attribute: Stores workflow-level attributes and data
Table Descriptions and Design Decisions
workflow_instance
The workflow_instance table is the root entity that represents a single execution of a workflow definition.
Each row corresponds to one workflow run, from initialization to completion.
| Column | Design Decision |
|---|---|
|
SERIAL primary key provides a stable, auto-incrementing identifier. This design allows efficient foreign key relationships and avoids UUID overhead for internal references. |
|
Optional VARCHAR(255) enables multi-tenancy support. The nullable design allows single-tenant deployments to omit this field, while multi-tenant systems can partition data by tenant. Indexed for efficient tenant-based queries. |
|
VARCHAR(1024) stores the identifier of the workflow definition that this instance executes.
The large size accommodates hierarchical or namespaced definition IDs (e.g., |
|
VARCHAR(20) represents the overall workflow status (e.g., |
|
TIMESTAMP fields track workflow lifecycle.
|
Indexing Strategy:
The three indexes (idx_workflow_instance_tenant, idx_workflow_instance_definition, idx_workflow_instance_status) support common query patterns:
-
Finding all workflows for a tenant
-
Listing instances of a specific workflow definition
-
Monitoring workflows by status (e.g., finding all running workflows)
state_instance
The state_instance table tracks each state execution within a workflow.
This design enables the workflow engine to maintain a complete history of state transitions and support long-running workflows that may persist for extended periods.
| Column | Design Decision |
|---|---|
|
SERIAL primary key for unique identification of each state execution. Multiple state instances can exist for a single workflow instance (one per state transition). |
|
Foreign key to |
|
VARCHAR(64) identifies which state from the workflow definition this instance represents. The size accommodates descriptive state names while maintaining reasonable storage. |
|
VARCHAR(64) stores the transition that triggered this state. Nullable because initial states may not have a triggering transition. This enables tracing the path a workflow took through its state machine. |
|
TIMESTAMP (NOT NULL) records when the state execution began. Essential for calculating state duration and identifying long-running states. |
|
TIMESTAMP (nullable) indicates when the state finished. NULL values represent active or pending states, enabling efficient queries for incomplete states. |
|
TIMESTAMP (nullable) supports deadline tracking and time-based workflows. Allows the system to identify states that are overdue or approaching deadlines. |
|
VARCHAR(64) represents the state execution status (e.g., |
|
BOOLEAN flag for quick identification of the current active state. This denormalization optimizes the common query pattern of finding the current state without scanning history. |
|
VARCHAR(64) stores information about who or what triggered the transition to this state. Supports audit trails and debugging by tracking the source of state changes. |
Indexing Strategy:
-
idx_state_instance_active: Composite index on(workflow_instance_id, started_at)optimizes queries for the active state of a workflow, which typically requires finding the most recent state instance. -
idx_state_instance_inactive: Index onworkflow_instance_idalone supports queries for all states of a workflow, regardless of status.
state_instance_history
The state_instance_history table maintains the relationship chain between state instances, creating a directed graph of state transitions.
This design enables reconstructing the complete execution path of a workflow and supports features like state rollback or replay.
| Column | Design Decision |
|---|---|
|
References the current state instance in the chain.
Part of the composite primary key with |
|
References the state instance that preceded this one. Enables traversing the state history backwards. |
|
INTEGER stores the order of this previous state in the chain. Supports workflows with parallel states or complex branching where a state may have multiple predecessors. Indexed for efficient forward traversal. |
Design Rationale:
-
Composite Primary Key: The combination of
state_instance_idandprevious_state_instance_idensures uniqueness while allowing multiple history entries per state (for parallel execution scenarios). -
CASCADE Delete: Both foreign key relationships use CASCADE delete, ensuring history is automatically cleaned up when state instances are removed.
-
Order Tracking: The
previous_state_instance_orderfield supports workflows with fork-join patterns where states may have multiple predecessors.
state_instance_owners
The state_instance_owners table implements a many-to-many relationship between state instances and owners.
This design supports workflows where states can be assigned to multiple users or systems, enabling collaborative workflows and delegation.
| Column | Design Decision |
|---|---|
|
Foreign key to |
|
VARCHAR(64) identifies the owner (user ID, role, system identifier, etc.). The composite primary key ensures a state can have multiple owners while preventing duplicates. Indexed separately for efficient owner-based queries (e.g., "find all states assigned to user X"). |
Design Rationale:
-
Many-to-Many Relationship: Separate table design allows multiple owners per state and multiple state assignments per owner.
-
CASCADE Delete: Ensures owner assignments are cleaned up when state instances are removed.
-
Owner Index: The
idx_state_instance_owners_owner_idindex optimizes queries for finding all states assigned to a specific owner, which is common in user-facing workflows.
persistent_attribute
The persistent_attribute table stores workflow-level data using an Entity-Attribute-Value (EAV) pattern.
This flexible design allows workflows to store arbitrary data without schema changes, supporting dynamic workflows and varying data requirements.
| Column | Design Decision |
|---|---|
|
Foreign key to |
|
VARCHAR(255) identifies the attribute. Part of the composite primary key, ensuring one value per attribute per workflow instance. |
|
VARCHAR(16) discriminator field indicating which value column contains the actual data. Enables type-safe storage and retrieval while maintaining a single table structure. |
|
Type-specific columns store the actual attribute values.
Only one column is populated per row based on |
Design Rationale:
-
EAV Pattern: Provides schema flexibility without requiring migrations for new data types.
-
Type Safety: The discriminator field (
attribute_type) ensures type consistency and enables efficient queries. -
Storage Efficiency: Separate columns for different types allow appropriate indexing and storage optimization (e.g., BYTEA for binary data, NUMERIC for precise decimals).
-
No CASCADE Delete: Attributes can optionally outlive workflow instances for audit or reporting purposes.
Relationship Design
The schema uses foreign key relationships with strategic CASCADE delete policies:
-
workflow_instance → state_instance: CASCADE ensures all state instances are removed when a workflow is deleted, maintaining referential integrity.
-
state_instance → state_instance_history: CASCADE on both relationships ensures history is cleaned up with state instances.
-
state_instance → state_instance_owners: CASCADE maintains consistency when states are removed.
-
workflow_instance → persistent_attribute: No CASCADE allows attributes to persist independently if needed for auditing or analytics.
Support for Long-Running Workflows
The schema design specifically supports long-running workflows through several mechanisms:
-
State Persistence: The
state_instancetable withis_activeflag allows workflows to persist in waiting states indefinitely, surviving system restarts. -
Timestamp Tracking:
started_at,completed_at, anddue_atenable monitoring and timeout detection for long-running states. -
Status Tracking: Granular status fields at both workflow and state levels enable detailed monitoring and recovery.
-
History Preservation: The
state_instance_historytable maintains complete execution trails, essential for debugging and auditing long-running processes. -
Attribute Persistence: The
persistent_attributetable ensures workflow data persists across restarts, critical for workflows that may run for hours or days.
Performance Considerations
The indexing strategy balances query performance with write overhead:
-
Workflow-level indexes support common administrative queries (tenant, definition, status).
-
State-level composite indexes optimize the critical path of finding active states.
-
Owner index enables efficient user-facing queries.
-
History indexes support both forward and backward traversal of state chains.
The schema avoids over-indexing while ensuring common query patterns remain efficient, even with large numbers of workflow instances and state transitions.