Module Values_0.BatchReplaceClusterNodesResponseSource

Replaces specific nodes within a SageMaker HyperPod cluster with new hardware. BatchReplaceClusterNodes terminates the specified instances and provisions new replacement instances with the same configuration but fresh hardware. The Amazon Machine Image (AMI) and instance configuration remain the same. This operation is useful for recovering from hardware failures or persistent issues that cannot be resolved through a reboot. Data Loss Warning: Replacing nodes destroys all instance volumes, including both root and secondary volumes. All data stored on these volumes will be permanently lost and cannot be recovered. To safeguard your work, back up your data to Amazon S3 or an FSx for Lustre file system before invoking the API on a worker node group. This will help prevent any potential data loss from the instance root volume. For more information about backup, see Use the backup script provided by SageMaker HyperPod. If you want to invoke this API on an existing cluster, you'll first need to patch the cluster by running the UpdateClusterSoftware API. For more information about patching a cluster, see Update the SageMaker HyperPod platform software of a cluster. You can replace up to 25 nodes in a single request.

Sourcetype nonrec t = {
  1. successful : ClusterNodeIds.t option;
    (*

    A list of EC2 instance IDs for which the replacement operation was successfully initiated.

    *)
  2. failed : BatchReplaceClusterNodesErrors.t option;
    (*

    A list of errors encountered for EC2 instance IDs that could not be replaced. Each error includes the instance ID, an error code, and a descriptive message.

    *)
  3. failedNodeLogicalIds : BatchReplaceClusterNodeLogicalIdsErrors.t option;
    (*

    A list of errors encountered for logical node IDs that could not be replaced. Each error includes the logical node ID, an error code, and a descriptive message. This field is only present when NodeLogicalIds were provided in the request.

    *)
  4. successfulNodeLogicalIds : ClusterNodeLogicalIdList.t option;
    (*

    A list of logical node IDs for which the replacement operation was successfully initiated. This field is only present when NodeLogicalIds were provided in the request.

    *)
}
Sourcetype nonrec error = [
  1. | `ResourceNotFound of ResourceNotFound.t
  2. | `Unknown_operation_error of string * string option
]
Sourceval make : ?successful:??? -> ?failed:??? -> ?failedNodeLogicalIds:??? -> ?successfulNodeLogicalIds:??? -> unit -> t
Sourceval error_of_json : string -> Yojson.Safe.t -> [> `ResourceNotFound of ResourceNotFound.t | `Unknown_operation_error of string * string option ]
Sourceval error_of_xml : string -> Awso.Xml.t -> [> `ResourceNotFound of ResourceNotFound.t | `Unknown_operation_error of string * string option ]
Sourceval error_to_json : error -> Yojson.Safe.t
Sourceval to_value : t -> [> `Structure of (string * [> `List of [> `String of ClusterNodeId.t | `Structure of (string * [> `Enum of string | `String of ClusterNodeId.t ]) list ] list ]) list ]
Sourceval to_query : t -> Awso.Client.Query.t
Sourceval of_xml : Awso.Xml.t -> t
Sourceval of_string : string -> t
Sourceval of_json : Yojson.Safe.t -> t
Sourceval to_json : t -> Yojson.Safe.t