Values_0.CreateAIRecommendationJobRequestSourceCreates a recommendation job that generates intelligent optimization recommendations for generative AI inference deployments. The job analyzes your model, workload configuration, and performance targets to recommend optimal instance types, model optimization techniques (such as quantization and speculative decoding), and deployment configurations.
type nonrec t = {aIRecommendationJobName : AIEntityName.t;The name of the AI recommendation job. The name must be unique within your Amazon Web Services account in the current Amazon Web Services Region.
*)modelSource : AIModelSource.t;The source of the model to optimize. Specify the Amazon S3 location of the model artifacts.
*)outputConfig : AIRecommendationOutputConfig.t;The output configuration for the recommendation job, including the Amazon S3 location for results and an optional model package group where the optimized model is registered.
*)aIWorkloadConfigIdentifier : AIResourceIdentifier.t;The name or Amazon Resource Name (ARN) of the AI workload configuration to use for this recommendation job.
*)performanceTarget : AIRecommendationPerformanceTarget.t;The performance targets for the recommendation job. Specify constraints on metrics such as time to first token (ttft-ms), throughput, or cost.
*)roleArn : RoleArn.t;The Amazon Resource Name (ARN) of an IAM role that enables Amazon SageMaker AI to perform tasks on your behalf.
*)inferenceSpecification : AIRecommendationInferenceSpecification.t option;The inference framework configuration. Specify the framework (such as LMI or vLLM) for the recommendation job.
*)optimizeModel : AIRecommendationAllowOptimization.t option;Whether to allow model optimization techniques such as quantization, speculative decoding, and kernel tuning. The default is true.
*)computeSpec : AIRecommendationComputeSpec.t option;The compute resource specification for the recommendation job. You can specify up to 3 instance types to consider, and optionally provide capacity reservation configuration.
*)}val make :
?inferenceSpecification:??? ->
?optimizeModel:??? ->
?computeSpec:??? ->
?tags:??? ->
aIRecommendationJobName:AIEntityName.t ->
modelSource:AIModelSource.t ->
outputConfig:AIRecommendationOutputConfig.t ->
aIWorkloadConfigIdentifier:AIResourceIdentifier.t ->
performanceTarget:AIRecommendationPerformanceTarget.t ->
roleArn:RoleArn.t ->
unit ->
tval to_value :
t ->
[> `Structure of
(string
* [> `Boolean of AIRecommendationAllowOptimization.t
| `List of
[> `Structure of (string * [> `String of TagKey.t ]) list ] list
| `String of AIEntityName.t
| `Structure of
(string
* [> `Enum of string
| `List of
[> `Enum of string
| `Structure of (string * [> `Enum of string ]) list ]
list
| `String of S3Uri.t
| `Structure of
(string
* [> `Enum of string
| `List of [> `String of AIMlReservationArn.t ] list
| `String of S3Uri.t ])
list ])
list ])
list ]