Object Lifecycle Management
Table of Contents
Use MinIO Object Lifecycle Management to create rules for time or date based automatic transition or expiry of objects. For object transition, MinIO automatically moves the object to a configured remote storage tier. For object expiry, MinIO automatically deletes the object.
MinIO derives it’s behavior and syntax from S3 lifecycle for compatibility in migrating workloads and lifecycle rules from S3 to MinIO. For example, you can export S3 lifecycle management rules and import them into MinIO or vice-versa. MinIO uses JSON to describe lifecycle management rules and may require conversion to or from XML as part of importing S3 lifecycle rules.
Object Transition (“Tiering”)
MinIO supports creating object transition lifecycle management rules, where MinIO can automatically move an object to a remote storage “tier”. MinIO supports any of the following remote tier targets:
MinIO object transition supports use cases like moving aged data from MinIO clusters in private or public cloud infrastructure to low-cost private or public cloud storage solutions. MinIO manages retrieving tiered objects on-the-fly without any additional application-side logic.
mc ilm tier add command to create a remote target for tiering data to that target.
You can then use the
mc ilm rule add --transition-days command to transition objects to that tier after a specified number of calendar days.
New in version RELEASE.2022-11-10T18-20-21Z.
You can verify the tiering status of an object using
mc ls against the bucket or bucket prefix.
The output includes the storage tier of each object:
$ mc ls play/mybucket [2022-11-08 11:30:24 PST] 52MB STANDARD log-data.csv [2022-11-09 12:20:18 PST] 120MB WARM event-2022-11-09.mp4
STANDARDmarks objects stored on the MinIO deployment.
WARMmarks objects stored on the remote tier with matching name.
Exclusive Access to Remote Data
MinIO requires exclusive access to the transitioned data on the remote storage tier. MinIO ignores any objects in the remote bucket or bucket prefix not explicitly managed by the MinIO deployment. Automatic transition and transparent object retrieval depend on the following assumptions:
No external mutation, migration, or deletion of objects on the remote storage.
No lifecycle management rules (e.g. transition or expiration) on the remote storage bucket.
MinIO stores all transitioned objects in the remote storage bucket or resource under a unique per-deployment prefix value. This value is not intended to support identifying the source deployment from the backend. MinIO supports an additional optional human-readable prefix when configuring the remote target, which may facilitate operations related to diagnostics, maintenance, or disaster recovery.
MinIO recommends specifying this optional prefix for remote storage tiers which contain other data, including transitioned objects from other MinIO deployments. This tutorial includes the necessary syntax for setting this prefix.
Availability of Remote Data
MinIO tiering behavior depends on the remote storage returning objects immediately (milliseconds to seconds) upon request. MinIO therefore cannot support remote storage which requires rehydration, wait periods, or manual intervention.
MinIO creates metadata for each transitioned object that identifies its location on the remote storage. Applications cannot trivially identify and access a transitioned object independent of MinIO. Availability of the transitioned data therefore depends on the same core protections that erasure coding and distributed deployment topologies provide for all objects on the MinIO deployment. Using object transition does not provide any additional business continuity or disaster recovery benefits.
Workloads that require BC/DR protections should implement MinIO Server-Side replication. Replication ensures objects remains preserved on the remote replication site, such that you can resynchronize from the remote in the event of partial or total data loss. See Resynchronization (Disaster Recovery) for more complete documentation on using replication to recover after partial or total data loss.
MinIO adopts S3 behavior for transition rules on versioned buckets. Specifically, MinIO by default applies the transition operation to the current object version.
To transition noncurrent object versions, specify the
--noncurrent-transition-tier options when creating the transition rule.
MinIO lifecycle management supports expiring objects on a bucket.
Object “expiration” involves performing a
DELETE operation on the object.
For example, you can create a lifecycle management rule to expire any object older than 365 days.
mc ilm rule add --expire-days to expire objects after a specified number of calendar days.
For buckets with replication configured, MinIO does not replicate objects deleted by a lifecycle management expiration rule. See Replication of Delete Operations for more information.
MinIO adopts S3 behavior for expiration rules on versioned buckets. MinIO has two specific default behaviors for versioned buckets:
MinIO applies the expiration option to only the current object version by creating a
DeleteMarkeras is normal with versioned delete.
To expire noncurrent object versions, specify the
--noncurrent-expire-daysoption when creating the expiration rule.
MinIO does not expire
DeleteMarkerseven if no other versions of that object exist.
To expire delete markers when there are no remaining versions for that object, specify the
--expire-delete-markeroption when creating the expiration rule.
Lifecycle Management Object Scanner
MinIO uses a built-in scanner to actively check objects against all configured lifecycle management rules. The scanner is a low-priority process that yields to high I/O workloads to prevent performance spikes triggered by rule timing. The scanner may therefore not detect an object as eligible for a configured transition or expiration lifecycle rule until after the lifecycle rule period has passed.
Scanner performance typically depends on the available node resources, the size of the cluster, and the complexity of bucket hierarchy (objects and prefixes). For example, a cluster that starts with 100TB of data that then grows to 200TB of data may require more time to scan the entire namespace of buckets and objects given the same hardware and workload. As the cluster or workload increases, scanner performance decreases as it yields more frequently to ensure priority of normal S3 operations.
Consider regularly checking cluster metrics, capacity, and resource usage to ensure the cluster hardware is scaling alongside cluster and workload growth: