Using Leases for Safe Resource Coordination in Multi-Instance Applications

Consider an application deployment in which a single process runs both a web service and a background job processor inside one service component. The background processor works with shared resources, such as tasks stored in a database table that functions as a work queue. When several replicas of the application run at the same time, more than one replica may try to handle the same task, which can lead to race conditions or duplicate processing. These issues become even harder to solve when the application is built as a monolith and the worker logic cannot be separated from the web service, or when the number of replicas cannot be reduced. In that situation, coordinating access to shared resources becomes significantly more difficult.

There are several ways to address this, but leases provide a straightforward and scalable method for managing shared resources. This article explains how leases can be used in multi-instance environments to support high availability and scalability in modern applications.

Available Approaches

Centralized Locks

A central service can be used to coordinate access to shared resources. Although this method can work well, it may also introduce a bottleneck or become a single point of failure.

Distributed Transactions

Another option is to use distributed transactions so updates across multiple instances happen atomically. However, this approach can be difficult to implement and hard to scale.

Leases

Leases provide temporary, time-limited access to resources. They are a practical and scalable answer to many shared resource management problems.

A lease grants access to a resource for a fixed period and removes that access automatically unless it is renewed before expiration. This helps avoid race conditions and ensures that resources are eventually released even if a process fails.

Core Characteristics of Leases

Time-Limited Access

A lease gives exclusive access for a defined amount of time.

Renewal Support

The current holder can extend access by renewing the lease before it expires.

Automatic Expiration

Resources are released automatically without requiring manual cleanup.

Failure Tolerance

If the lease holder crashes or stops responding, the resource becomes available again for another instance.

Tools That Can Support Leases

Redis

Redis can use time-to-live values to manage leases. A key representing the resource is written with a TTL, and automatic expiration makes the resource available again once the lease ends.

Consul

Consul offers a distributed key-value store that can be used for lease-based coordination. Session-based locking can help manage shared resources and maintain a consistent state across multiple instances.

Zookeeper

Zookeeper provides primitives for lease handling in distributed systems, such as ephemeral nodes that disappear automatically when the client session ends. This enables resource coordination and leader election patterns for distributed workloads.

Benefits of Leases

  • Leases simplify resource handling by automating resource release.
  • They reduce the risk of deadlocks and split-brain conditions through time-based constraints.
  • They improve fault tolerance by allowing resources to be reclaimed after failures.

Example Application: Task Processor with Managed PostgreSQL

The sample application shows a complete implementation of a centralized lease service, a task management service, and task processors. It also includes a utility that creates synthetic work by generating random sleep intervals for processors to consume.

The following sections describe the different parts of the sample application and show how leases can be used inside a task processor service deployed on a managed application platform.

Database

PostgreSQL is used because it is a proven, fully ACID-compliant transactional database that provides strong guarantees. Rather than implementing ACID behavior manually, the complexity of the locking behavior is delegated to the database.

The application uses Prisma to manage database schema changes through migrations and to generate models for entities.

Working with the Database

Because Prisma abstractions do not cover every PostgreSQL-specific feature used by the application, raw queries are used where necessary through queryRaw.

Lease Service

This service is responsible for managing the lifecycle of worker processes. It ensures that only one worker instance can hold a lease for a task or other resource at any given moment. This prevents race conditions and supports safe state changes.

The implementation is designed to be generic and does not contain knowledge about any specific task processing logic. The service handles lease creation, renewal, release, and retrieval through its API routes.

Leases Client

The leases client is an important part of the lease management layer. It communicates with the lease API to acquire, renew, and release leases. It provides a structured way to handle leases so resources are correctly reserved and extended when necessary. As an additional convenience, the client can optionally renew leases automatically.

Sample Application Database Schema

The following schema defines the leases table:

CREATE TABLE IF NOT EXISTS public.leases
(
    id integer NOT NULL DEFAULT nextval('leases_id_seq'::regclass),
    resource text COLLATE pg_catalog."default" NOT NULL,
    holder text COLLATE pg_catalog."default",
    created_at timestamp(3) without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
    renewed_at timestamp(3) without time zone,
    released_at timestamp(3) without time zone,
    expires_at timestamp(3) without time zone,
    CONSTRAINT leases_pkey PRIMARY KEY (id)
)

Sample Application API Routes

/api/leases

GET: Returns the status of the worker along with the full list of leases.

POST: Creates a new lease or updates an existing lease if that lease has already expired.

Request body:

{
    "resource": "resource_name",
    "holder": "holder_name"
}

How Lease Creation Works

The POST method in the lease service is designed to either insert a new lease or update an existing expired lease. The following SQL statement shows how that logic works:

INSERT INTO leases (resource, holder, expires_at)
                VALUES (${resource}, ${holder}, NOW() + INTERVAL '30 seconds')
            ON CONFLICT (resource) 
            DO UPDATE 
                SET 
                    holder = ${holder},
                    created_at = NOW(),
                    renewed_at = null,
                    released_at = null,
                    expires_at = NOW() + INTERVAL '30 seconds'
            WHERE leases.expires_at <= NOW()
            RETURNING *;

Insert a New Lease: The INSERT INTO leases statement tries to create a new row with the given resource, holder, and an expiration time set to 30 seconds after the current timestamp.

Handle Conflicts: The ON CONFLICT (resource) clause defines what happens if a lease already exists for the same resource.

Update an Existing Lease: The DO UPDATE block replaces the holder, resets created_at to the current time, clears renewed_at and released_at, and sets a new expiration time 30 seconds into the future.

Apply the Update Conditionally: The condition WHERE leases.expires_at <= NOW() ensures that only expired leases are replaced. Active leases remain untouched.

Return the Result: The RETURNING * clause gives back the inserted or updated lease record.

Why Use This Lease Creation Pattern?

Atomic Behavior: A single SQL statement combined with ON CONFLICT ensures that the insert-or-update behavior happens atomically, reducing the chance of race conditions.

Conflict Handling: If multiple requests try to create a lease for the same resource at the same time, the conflict logic ensures that only one active lease remains for that resource.

Expiration Control: The condition WHERE leases.expires_at <= NOW() makes sure only expired records are replaced, preserving active leases.

Efficiency: Combining insertion and update logic into one SQL statement reduces the total number of database operations.

This design allows the lease service to create and refresh leases reliably while preserving system integrity and avoiding race conditions.

/api/leases/active

GET: Returns all active leases.

/api/leases/expired

GET: Returns all expired leases.

/api/leases/renew

PUT: Extends the expiration time of an existing lease.

Request body:

{
    "resource": "resource_name",
    "holder": "holder_name"
}

How Lease Renewal Works

The PUT method in the lease service renews an existing lease by pushing its expiration further into the future. The logic is implemented like this:

UPDATE leases
    SET 
        renewed_at = NOW(),
        expires_at = NOW() + INTERVAL '30 seconds'
    WHERE
        holder = ${holder} 
        AND resource = ${resource}
        AND expires_at > NOW()
        AND released_at is null
    RETURNING *;

Update Statement: The UPDATE leases statement modifies an existing row in the leases table.

Set Clause: renewed_at = NOW() records the current renewal timestamp, while expires_at = NOW() + INTERVAL '30 seconds' extends the lease by another 30 seconds.

Where Clause:

holder = ${holder} ensures the correct holder is renewing the lease.

resource = ${resource} ensures the correct resource is matched.

expires_at > NOW() ensures that only unexpired leases are renewed.

released_at is null ensures that only unreleased leases are updated.

Returning Clause: RETURNING * returns the updated lease record so the caller can confirm the result.

Why Lease Releases Matter

Atomic Update: The query performs an atomic update so the release operation happens in a single step, which reduces the chance of race conditions.

Conditional Update: The WHERE clause ensures that only leases not already marked as released are updated.

Timestamp Accuracy: Updating released_at and expires_at makes it possible to track both release time and expiration time precisely.

Immediate Feedback: The RETURNING * clause provides the updated record immediately so the application can respond with current lease information.

/api/leases/renewed

GET: Returns all renewed leases.

/api/leases/released

GET: Returns all released leases.

/api/leases/[id]

GET: Returns a single lease identified by its ID.

DELETE: Releases a lease identified by its ID. The behavior matches the DELETE /api/leases/release route, except that the lease ID is also included in the WHERE clause of the update statement.

Request body:

{
    "resource": "resource_name",
    "holder": "holder_name"
}

Task Service

The task service manages tasks inside the application. It exposes API endpoints for creating, updating, retrieving, and managing the lifecycle of tasks. It also encapsulates the leases API so that a single task can only be processed by one worker at a time. Naturally, this depends on workers respecting the API rules and stopping work once a lease has expired.

Task Service Database Schema

The following schema defines the tasks table:

CREATE TABLE IF NOT EXISTS public.tasks
(
    id integer NOT NULL DEFAULT nextval('tasks_id_seq'::regclass),
    task_data jsonb NOT NULL,
    scheduled_at timestamp(3) without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
    processor text COLLATE pg_catalog."default",
    last_heartbeat_at timestamp(3) without time zone,
    must_heartbeat_before timestamp(3) without time zone,
    started_at timestamp(3) without time zone,
    processed_at timestamp(3) without time zone,
    task_output jsonb,
    CONSTRAINT tasks_pkey PRIMARY KEY (id)
)

Task Service API Routes

Get the Next Task

POST /api/tasks/next

The POST method on this route retrieves the next available task for processing while ensuring that a lease is acquired for that task.

The query below is used to fetch the next task that is ready to be processed:

SELECT * 
FROM tasks 
WHERE 
    processed_at is null
ORDER BY scheduled_at ASC
LIMIT 1
OFFSET ${tasksToSkip}
FOR UPDATE;

SELECT * FROM tasks

This part of the query selects all columns from the tasks table.

WHERE processed_at is null

This condition limits the result set to tasks that have not yet been processed. If processed_at is null, the task is still pending.

ORDER BY scheduled_at ASC

This sorts tasks by their scheduled_at timestamp in ascending order so that earlier tasks are handled first.

LIMIT 1

This ensures that only one matching task is returned.

OFFSET ${tasksToSkip}

This skips a configurable number of tasks before returning a result. The ${tasksToSkip} value can be adjusted dynamically, which is useful when the first tasks inspected are already leased or otherwise unavailable.

FOR UPDATE

This clause locks the selected row for update. It prevents other transactions from modifying or selecting the same row until the current transaction finishes. That ensures another processor cannot claim the task while it is being assigned.

Why Use the Get-Next-Task Query?

Prioritization: Sorting by scheduled_at ensures that tasks scheduled earlier are processed before newer ones.

Concurrency Control: The FOR UPDATE clause is essential for preventing race conditions. Once a task is selected, it is locked for the transaction, preventing other processors from taking it at the same time and allowing a lease to be acquired safely.

Dynamic Skipping: The OFFSET ${tasksToSkip} clause makes it possible to skip tasks that are already leased or unavailable, helping the system locate the next usable task efficiently.

Heartbeat Task

/api/tasks/[id]/heartbeat

The PUT route updates a task heartbeat. This confirms that the correct processor is still handling the task and renews the lease so another processor does not pick it up. If a task processor fails to send heartbeats in time, the underlying lease expires, and the processor must abandon its current work and obtain a new task.

Transaction

The code runs inside a transaction using Prisma’s $transaction method so atomicity and consistency are preserved.

Task Retrieval

Within that transaction, the task with the specified ID is retrieved using a raw SQL query that includes FOR UPDATE so the row is locked.

Processor Validation

The service checks whether the task is assigned to the given processor. If not, it returns a 200 OK response with a message stating that the task is not assigned to that processor.

Processed Check

The service checks whether the task has already been processed. If it has, it returns a 409 Conflict response with a message stating that the task has already been processed.

Lease Renewal

The service sends a PUT request to the lease service in order to renew the lease associated with the task.

If lease renewal fails with a status other than 404, the route returns a 500 Internal Server Error response with a suitable error message.

If the lease has already expired and the response status is 404, the route returns a 409 Conflict response explaining that the task lease has expired.

Task Update

If the lease renewal succeeds, the task’s lastHeartBeatAt and mustHeartBeatBefore fields are updated using the renewed lease timestamps.

The route then returns the updated task with a 202 Accepted status.

Why Use the Heartbeat Route?

Atomic Operations: A transaction ensures that heartbeat updates and lease renewal happen atomically, reducing the risk of race conditions.

Concurrency Control: The FOR UPDATE clause locks the task row so other transactions cannot modify it at the same time.

Lease Management: Renewing the lease ensures that the task remains assigned to the same processor and prevents another processor from claiming it.

Complete Task

PUT /api/tasks/[id]/complete

The PUT route marks a task as completed. It verifies that the task belongs to the correct processor, renews the lease to avoid reassignment during completion, and updates task state accordingly.

Transaction

The logic runs inside a transaction by using Prisma’s $transaction method to preserve consistency and atomicity.

Task Retrieval

Inside the transaction, the task with the given ID is fetched through a raw SQL query that uses FOR UPDATE to lock the row.

If the task cannot be found, the route returns a 200 OK response stating that the task does not exist.

Processor Validation

The service checks whether the task belongs to the processor provided in the request. If not, it returns a 200 OK response indicating that the task is not assigned to that processor.

Processed Check

If the task has already been processed, the route returns a 409 Conflict response stating that the task is already complete.

Lease Renewal

The service sends a PUT request to the lease service to renew the task lease.

If renewal fails with a status other than 404, the route returns a 500 Internal Server Error response with a corresponding message.

If the lease has expired and the response status is 404, the route returns a 409 Conflict response indicating that the task lease has expired.

Task Update

If lease renewal is successful, the task’s lastHeartBeatAt, mustHeartBeatBefore, processedAt, and taskOutput fields are updated using the renewed lease timestamps and the provided task output.

The route then returns the updated task with a 202 Accepted status.

Why Use the Complete-Task Route?

Atomic Operations: A transaction ensures that task completion and lease renewal happen together, reducing race-condition risks.

Concurrency Control: The FOR UPDATE clause locks the task row so it cannot be modified simultaneously by another transaction.

Lease Management: Renewing the lease ensures that the task stays assigned to the processor while completion is being finalized. This is especially useful when the SQL update runs into temporary issues and the operation must be retried after the row lock is released, without risking lease expiration during the retry window.

These routes use simple queries to operate on the task list:

  • Get Task by ID (/api/tasks/[id])
  • Get Processed Tasks (/api/tasks/processed)
  • Get Started Tasks (/api/tasks/started)
  • Get All Tasks (/api/tasks)

Task Worker

The task worker is a worker process that continuously retrieves, processes, and completes tasks through the task service. It ensures tasks are handled reliably by sending regular heartbeats to renew leases and by dealing with errors gracefully.

How the Worker Executes Tasks and Manages Leases

The worker follows a structured loop in which it repeatedly fetches and processes tasks while using a lease mechanism to ensure safe execution. This prevents several workers from processing the same task at the same time and helps ensure reliable completion.

Fetching a Task

The worker begins by querying the task queue to retrieve the next available task. If no task is available, it waits briefly before trying again. This reduces unnecessary load on the queue while still allowing new tasks to be picked up quickly.

Lease Management with Heartbeats

After a task is assigned, the worker starts a heartbeat loop to keep its lease active. The lease prevents other workers from claiming the same task while it is being processed. The worker must send heartbeat signals at intervals to renew the lease. If it does not, the lease expires and the task becomes available for reassignment.

  • If the lease expires, the worker stops processing and abandons the task.
  • If the lease is renewed, the worker continues and may occasionally simulate high latency to test resilience.

Task Execution and Failure Handling

During execution, the worker processes the task in steps. It introduces random failures with a 5% probability to simulate unexpected crashes.

It also introduces latency spikes with a 10% probability to simulate network delays or performance bottlenecks that may cause heartbeats to be missed.

If the worker hits a simulated failure, it exits immediately so the task can be retried later. Otherwise, it keeps running until the task is complete.

Completing the Task

Once processing finishes, the worker marks the task as complete so the system can remove it from the queue.

  • It stops the heartbeat timer.
  • It then immediately looks for another task and restarts the loop.
  • This design ensures that only one worker handles a task at any given time.
  • It also provides automatic recovery through lease expiration.
  • If the application crashes, the hosting platform restarts the process.

Periodic heartbeats help prevent abandoned tasks from remaining stuck or unprocessed.

  • Simulated failures and latency make it possible to test system resilience.
  • Latency is specifically included to simulate missed heartbeats.

By following this pattern, the system achieves fault-tolerant distributed task execution that can scale efficiently.

Workflow

Task Generator

Testing how workers behave under realistic conditions is important. The task generator is intended to simulate workload by continuously creating tasks with random execution times. These tasks represent real types of work a worker might handle, such as image resizing, video transcoding, email delivery, or any other activity that must be completed reliably.

By using a lease-based design, only one instance of the generator can actively produce tasks at a given time. This avoids duplication while still allowing multiple instances to be deployed. The setup helps observe system behavior, validate worker performance, and reveal possible issues in task execution and lease handling.

Handling Task Generation in a Load-Balanced Environment

In a distributed system where several instances of a service run behind a load balancer, maintaining a singleton process is difficult. A singleton process is one that should only be active once at any given time. The task generator demonstrates this challenge by ensuring that only one instance can actively create tasks, even though all instances may still respond to status requests, sometimes misleadingly.

How the Task Generator Works

Ensuring Only One Instance Runs at a Time

  • The generator acquires a lease that acts like a lock, allowing only one instance to actively create tasks.
  • Other instances stay idle but still respond to API requests.

Continuously Creating Tasks

  • The generator inserts new tasks into a PostgreSQL table. Each task contains a JSON object such as {"sleep_duration_seconds": 5} that defines how long processing should take.
  • Those tasks are then consumed by worker processes.

Handling Load-Balanced Requests

  • Because API requests may be handled by different instances, responses can differ depending on which instance receives the request.
  • Only the instance that currently holds the lease reports that it is generating tasks. All other instances remain passive.

Why Status Can Fluctuate

When checking generator status, inconsistent responses may appear.

  • One request may return STARTED, meaning the request reached the instance holding the lease.
  • Another request may return STOPPED, because it reached an instance that does not hold the lease.
  • Pressing Stop may not work right away because the request may be handled by an instance that does not currently hold the lease. Several attempts may be required before the lease holder receives the request.
  • This behavior demonstrates a fundamental challenge of running singleton-style services in load-balanced environments.

How to Stop the Generator

To stop the generator, the stop request must reach the instance currently holding the lease. Because requests are distributed randomly:

  • Stopping may take several attempts until the lease-holding instance receives the request.
  • A more dependable method would be to use the centrally managed lease information, identify which instance holds the lease through the lease service, and then direct the stop request to that specific instance.

Deploying the Task Processor

The full application is available in the GitHub repository. To deploy it on a managed application platform:

  • Open the platform dashboard or use the repository’s deployment button described in the README.
  • Provision two managed PostgreSQL development databases to store tasks and leases.
  • The task generator service creates the required database tables automatically and fills the task queue.

Note: Make sure to remove the application after testing is complete. Otherwise, ongoing usage may generate actual charges on your account.

Conclusion

Leases offer an effective way to manage shared resources in multi-instance environments. By implementing leases with PostgreSQL, it is possible to provide safe, time-limited resource access, automatic expiration, and easier recovery from failures. Managed database services make this approach straightforward and dependable to adopt. Testing the sample application is a practical way to experience the advantages of lease-based resource coordination directly.

Source: digitalocean.com

Create a Free Account

Register now and get access to our Cloud Services.

Posts you might be interested in: