PostgreSQL database schema

The yworkflow-instance-store-jdbc-postgresql yWorkflow module uses a relational database schema designed to support workflow state management, history tracking, and long-running operations. The schema follows a normalized design that separates workflow instances, state instances, and their relationships, enabling efficient querying and scalability.

Schema Overview

The schema consists of five main tables that work together to provide complete workflow persistence:

workflow_instance: Represents a single execution of a workflow definition
state_instance: Tracks individual state executions within a workflow instance
state_instance_history: Maintains the relationship chain between state instances
state_instance_owners: Manages ownership and assignment of states
persistent_attribute: Stores workflow-level attributes and data

Table Descriptions and Design Decisions

workflow_instance

The workflow_instance table is the root entity that represents a single execution of a workflow definition. Each row corresponds to one workflow run, from initialization to completion.

Column Design Decision

Column	Design Decision
`id`	SERIAL primary key provides a stable, auto-incrementing identifier. This design allows efficient foreign key relationships and avoids UUID overhead for internal references.
`tenant_id`	Optional VARCHAR(255) enables multi-tenancy support. The nullable design allows single-tenant deployments to omit this field, while multi-tenant systems can partition data by tenant. Indexed for efficient tenant-based queries.
`workflow_definition_id`	VARCHAR(1024) stores the identifier of the workflow definition that this instance executes. The large size accommodates hierarchical or namespaced definition IDs (e.g., `com.example.workflows.order-processing`). Indexed for querying all instances of a specific workflow type.
`status`	VARCHAR(20) represents the overall workflow status (e.g., `RUNNING`, `COMPLETED`, `FAILED`). The fixed size balances storage efficiency with flexibility for custom status values. Indexed for status-based queries and monitoring.
`created_at`, `updated_at`	TIMESTAMP fields track workflow lifecycle. `updated_at` enables detecting stale workflows and supports audit trails. Both are NOT NULL to ensure data integrity.

id

SERIAL primary key provides a stable, auto-incrementing identifier. This design allows efficient foreign key relationships and avoids UUID overhead for internal references.

tenant_id

Optional VARCHAR(255) enables multi-tenancy support. The nullable design allows single-tenant deployments to omit this field, while multi-tenant systems can partition data by tenant. Indexed for efficient tenant-based queries.

workflow_definition_id

VARCHAR(1024) stores the identifier of the workflow definition that this instance executes. The large size accommodates hierarchical or namespaced definition IDs (e.g., com.example.workflows.order-processing). Indexed for querying all instances of a specific workflow type.

status

VARCHAR(20) represents the overall workflow status (e.g., RUNNING, COMPLETED, FAILED). The fixed size balances storage efficiency with flexibility for custom status values. Indexed for status-based queries and monitoring.

created_at, updated_at

TIMESTAMP fields track workflow lifecycle. updated_at enables detecting stale workflows and supports audit trails. Both are NOT NULL to ensure data integrity.

Indexing Strategy:

The three indexes (idx_workflow_instance_tenant, idx_workflow_instance_definition, idx_workflow_instance_status) support common query patterns:

Finding all workflows for a tenant
Listing instances of a specific workflow definition
Monitoring workflows by status (e.g., finding all running workflows)

state_instance

The state_instance table tracks each state execution within a workflow. This design enables the workflow engine to maintain a complete history of state transitions and support long-running workflows that may persist for extended periods.

Column Design Decision

Column	Design Decision
`id`	SERIAL primary key for unique identification of each state execution. Multiple state instances can exist for a single workflow instance (one per state transition).
`workflow_instance_id`	Foreign key to `workflow_instance`, establishing the parent-child relationship. CASCADE delete ensures cleanup when a workflow instance is removed.
`state_id`	VARCHAR(64) identifies which state from the workflow definition this instance represents. The size accommodates descriptive state names while maintaining reasonable storage.
`transition_id`	VARCHAR(64) stores the transition that triggered this state. Nullable because initial states may not have a triggering transition. This enables tracing the path a workflow took through its state machine.
`started_at`	TIMESTAMP (NOT NULL) records when the state execution began. Essential for calculating state duration and identifying long-running states.
`completed_at`	TIMESTAMP (nullable) indicates when the state finished. NULL values represent active or pending states, enabling efficient queries for incomplete states.
`due_at`	TIMESTAMP (nullable) supports deadline tracking and time-based workflows. Allows the system to identify states that are overdue or approaching deadlines.
`status`	VARCHAR(64) represents the state execution status (e.g., `ACTIVE`, `COMPLETED`, `FAILED`). More granular than workflow-level status, enabling detailed state tracking.
`is_active`	BOOLEAN flag for quick identification of the current active state. This denormalization optimizes the common query pattern of finding the current state without scanning history.
`caller`	VARCHAR(64) stores information about who or what triggered the transition to this state. Supports audit trails and debugging by tracking the source of state changes.

id

SERIAL primary key for unique identification of each state execution. Multiple state instances can exist for a single workflow instance (one per state transition).

workflow_instance_id

Foreign key to workflow_instance, establishing the parent-child relationship. CASCADE delete ensures cleanup when a workflow instance is removed.

state_id

VARCHAR(64) identifies which state from the workflow definition this instance represents. The size accommodates descriptive state names while maintaining reasonable storage.

transition_id

VARCHAR(64) stores the transition that triggered this state. Nullable because initial states may not have a triggering transition. This enables tracing the path a workflow took through its state machine.

started_at

TIMESTAMP (NOT NULL) records when the state execution began. Essential for calculating state duration and identifying long-running states.

completed_at

TIMESTAMP (nullable) indicates when the state finished. NULL values represent active or pending states, enabling efficient queries for incomplete states.

due_at

TIMESTAMP (nullable) supports deadline tracking and time-based workflows. Allows the system to identify states that are overdue or approaching deadlines.

status

VARCHAR(64) represents the state execution status (e.g., ACTIVE, COMPLETED, FAILED). More granular than workflow-level status, enabling detailed state tracking.

is_active

BOOLEAN flag for quick identification of the current active state. This denormalization optimizes the common query pattern of finding the current state without scanning history.

caller

VARCHAR(64) stores information about who or what triggered the transition to this state. Supports audit trails and debugging by tracking the source of state changes.

Indexing Strategy:

idx_state_instance_active: Composite index on (workflow_instance_id, started_at) optimizes queries for the active state of a workflow, which typically requires finding the most recent state instance.
idx_state_instance_inactive: Index on workflow_instance_id alone supports queries for all states of a workflow, regardless of status.

state_instance_history

The state_instance_history table maintains the relationship chain between state instances, creating a directed graph of state transitions. This design enables reconstructing the complete execution path of a workflow and supports features like state rollback or replay.

Column Design Decision

Column	Design Decision
`state_instance_id`	References the current state instance in the chain. Part of the composite primary key with `previous_state_instance_id`.
`previous_state_instance_id`	References the state instance that preceded this one. Enables traversing the state history backwards.
`previous_state_instance_order`	INTEGER stores the order of this previous state in the chain. Supports workflows with parallel states or complex branching where a state may have multiple predecessors. Indexed for efficient forward traversal.

state_instance_id

References the current state instance in the chain. Part of the composite primary key with previous_state_instance_id.

previous_state_instance_id

References the state instance that preceded this one. Enables traversing the state history backwards.

previous_state_instance_order

INTEGER stores the order of this previous state in the chain. Supports workflows with parallel states or complex branching where a state may have multiple predecessors. Indexed for efficient forward traversal.

Design Rationale:

Composite Primary Key: The combination of state_instance_id and previous_state_instance_id ensures uniqueness while allowing multiple history entries per state (for parallel execution scenarios).
CASCADE Delete: Both foreign key relationships use CASCADE delete, ensuring history is automatically cleaned up when state instances are removed.
Order Tracking: The previous_state_instance_order field supports workflows with fork-join patterns where states may have multiple predecessors.

state_instance_owners

The state_instance_owners table implements a many-to-many relationship between state instances and owners. This design supports workflows where states can be assigned to multiple users or systems, enabling collaborative workflows and delegation.

Column Design Decision

Column	Design Decision
`state_instance_id`	Foreign key to `state_instance`. Part of the composite primary key.
`owner_id`	VARCHAR(64) identifies the owner (user ID, role, system identifier, etc.). The composite primary key ensures a state can have multiple owners while preventing duplicates. Indexed separately for efficient owner-based queries (e.g., "find all states assigned to user X").

state_instance_id

Foreign key to state_instance. Part of the composite primary key.

owner_id

VARCHAR(64) identifies the owner (user ID, role, system identifier, etc.). The composite primary key ensures a state can have multiple owners while preventing duplicates. Indexed separately for efficient owner-based queries (e.g., "find all states assigned to user X").

Design Rationale:

Many-to-Many Relationship: Separate table design allows multiple owners per state and multiple state assignments per owner.
CASCADE Delete: Ensures owner assignments are cleaned up when state instances are removed.
Owner Index: The idx_state_instance_owners_owner_id index optimizes queries for finding all states assigned to a specific owner, which is common in user-facing workflows.

persistent_attribute

The persistent_attribute table stores workflow-level data using an Entity-Attribute-Value (EAV) pattern. This flexible design allows workflows to store arbitrary data without schema changes, supporting dynamic workflows and varying data requirements.

Column Design Decision

Column	Design Decision
`workflow_instance_id`	Foreign key to `workflow_instance`. Part of the composite primary key with `attribute_name`. Note: No CASCADE delete, allowing attributes to persist independently if needed.
`attribute_name`	VARCHAR(255) identifies the attribute. Part of the composite primary key, ensuring one value per attribute per workflow instance.
`attribute_type`	VARCHAR(16) discriminator field indicating which value column contains the actual data. Enables type-safe storage and retrieval while maintaining a single table structure.
`string_value`, `date_value`, `data_value`, `float_value`, `number_value`	Type-specific columns store the actual attribute values. Only one column is populated per row based on `attribute_type`. This design avoids type coercion issues and enables efficient indexing on specific types.

workflow_instance_id

Foreign key to workflow_instance. Part of the composite primary key with attribute_name. Note: No CASCADE delete, allowing attributes to persist independently if needed.

attribute_name

VARCHAR(255) identifies the attribute. Part of the composite primary key, ensuring one value per attribute per workflow instance.

attribute_type

VARCHAR(16) discriminator field indicating which value column contains the actual data. Enables type-safe storage and retrieval while maintaining a single table structure.

string_value, date_value, data_value, float_value, number_value

Type-specific columns store the actual attribute values. Only one column is populated per row based on attribute_type. This design avoids type coercion issues and enables efficient indexing on specific types.

Design Rationale:

EAV Pattern: Provides schema flexibility without requiring migrations for new data types.
Type Safety: The discriminator field (attribute_type) ensures type consistency and enables efficient queries.
Storage Efficiency: Separate columns for different types allow appropriate indexing and storage optimization (e.g., BYTEA for binary data, NUMERIC for precise decimals).
No CASCADE Delete: Attributes can optionally outlive workflow instances for audit or reporting purposes.

Relationship Design

The schema uses foreign key relationships with strategic CASCADE delete policies:

workflow_instance → state_instance: CASCADE ensures all state instances are removed when a workflow is deleted, maintaining referential integrity.
state_instance → state_instance_history: CASCADE on both relationships ensures history is cleaned up with state instances.
state_instance → state_instance_owners: CASCADE maintains consistency when states are removed.
workflow_instance → persistent_attribute: No CASCADE allows attributes to persist independently if needed for auditing or analytics.

Support for Long-Running Workflows

The schema design specifically supports long-running workflows through several mechanisms:

State Persistence: The state_instance table with is_active flag allows workflows to persist in waiting states indefinitely, surviving system restarts.
Timestamp Tracking: started_at, completed_at, and due_at enable monitoring and timeout detection for long-running states.
Status Tracking: Granular status fields at both workflow and state levels enable detailed monitoring and recovery.
History Preservation: The state_instance_history table maintains complete execution trails, essential for debugging and auditing long-running processes.
Attribute Persistence: The persistent_attribute table ensures workflow data persists across restarts, critical for workflows that may run for hours or days.

Performance Considerations

The indexing strategy balances query performance with write overhead:

Workflow-level indexes support common administrative queries (tenant, definition, status).
State-level composite indexes optimize the critical path of finding active states.
Owner index enables efficient user-facing queries.
History indexes support both forward and backward traversal of state chains.

The schema avoids over-indexing while ensuring common query patterns remain efficient, even with large numbers of workflow instances and state transitions.