AutoML Training Job

Create Vertex AI AutoML training jobs for image, text, video, and forecasting.

Components:

AutoMLForecastingTrainingJobRunOp(project, ...)

Runs the training job and returns a model.

AutoMLImageTrainingJobRunOp(project, ...[, ...])

Runs the AutoML Image training job and returns a model.

AutoMLTabularTrainingJobRunOp(project, ...)

Runs the training job and returns a model.

AutoMLTextTrainingJobRunOp(project, ...[, ...])

Runs the training job and returns a model.

AutoMLVideoTrainingJobRunOp(project, ...[, ...])

Runs the AutoML Video training job and returns a model.

v1.automl.training_job.AutoMLForecastingTrainingJobRunOp(project: str, display_name: str, target_column: str, time_column: str, time_series_identifier_column: str, unavailable_at_forecast_columns: list, available_at_forecast_columns: list, forecast_horizon: int, data_granularity_unit: str, data_granularity_count: int, dataset: dsl.Input[google.VertexDataset], model: dsl.Output[google.VertexModel], location: str | None = 'us-central1', optimization_objective: str | None = None, time_series_attribute_columns: list | None = None, context_window: int | None = None, quantiles: list | None = None, validation_options: str | None = None, labels: dict | None = {}, training_encryption_spec_key_name: str | None = None, model_encryption_spec_key_name: str | None = None, budget_milli_node_hours: int | None = None, model_display_name: str | None = None, model_labels: dict | None = None, model_id: str | None = None, parent_model: str | None = None, is_default_version: bool | None = None, model_version_aliases: list | None = None, model_version_description: str | None = None, hierarchy_group_columns: list | None = None, hierarchy_group_total_weight: float | None = None, hierarchy_temporal_total_weight: float | None = None, hierarchy_group_temporal_total_weight: float | None = None, window_column: str | None = None, window_stride_length: int | None = None, window_max_count: int | None = None, holiday_regions: list | None = None, column_specs: dict | None = None, column_transformations: list | None = None, training_fraction_split: float | None = None, validation_fraction_split: float | None = None, test_fraction_split: float | None = None, predefined_split_column_name: str | None = None, timestamp_split_column_name: str | None = None, weight_column: str | None = None, export_evaluated_data_items: bool | None = False, export_evaluated_data_items_bigquery_destination_uri: str | None = None, export_evaluated_data_items_override_destination: bool | None = None, additional_experiments: list | None = None)

Runs the training job and returns a model.

If training on a Vertex AI dataset, you can use one of the following split configurations: Data fraction splits: Any of training_fraction_split, validation_fraction_split and test_fraction_split may optionally be provided, they must sum to up to 1. If the provided ones sum to less than 1, the remainder is assigned to sets as decided by Vertex AI. If none of the fractions are set, by default roughly 80% of data will be used for training, 10% for validation, and 10% for test. Predefined splits: Assigns input data to training, validation, and test sets based on the value of a provided key. If using predefined splits, predefined_split_column_name must be provided. Supported only for tabular Datasets. Timestamp splits: Assigns input data to training, validation, and test sets based on a provided timestamps. The youngest data pieces are assigned to training set, next to validation set, and the oldest to the test set. Supported only for tabular Datasets.

Parameters:
dataset: dsl.Input[google.VertexDataset]

The dataset within the same Project from which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline’s [training_task_definition] [google.cloud.aiplatform.v1beta1.TrainingPipeline.training_task_definition]. For time series Datasets, all their data is exported to training, to pick and choose from.

target_column: str

Name of the column that the Model is to predict values for. This column must be unavailable at forecast.

time_column: str

Name of the column that identifies time order in the time series. This column must be available at forecast.

time_series_identifier_column: str

Name of the column that identifies the time series.

unavailable_at_forecast_columns: list

Column names of columns that are unavailable at forecast. Each column contains information for the given entity (identified by the [time_series_identifier_column]) that is unknown before the forecast (e.g. population of a city in a given year, or weather on a given day).

available_at_forecast_columns: list

Column names of columns that are available at forecast. Each column contains information for the given entity (identified by the [time_series_identifier_column]) that is known at forecast.

forecast_horizon: int

The amount of time into the future for which forecasted values for the target are returned. Expressed in number of units defined by the [data_granularity_unit] and [data_granularity_count] field. Inclusive.

data_granularity_unit: str

The data granularity unit. Accepted values are minute, hour, day, week, month, year.

data_granularity_count: int

The number of data granularity units between data points in the training data. If [data_granularity_unit] is minute, can be 1, 5, 10, 15, or 30. For all other values of [data_granularity_unit], must be 1.

training_fraction_split: float | None = None

The fraction of the input data that is to be used to train the Model. This is ignored if Dataset is not provided.

validation_fraction_split: float | None = None

The fraction of the input data that is to be used to validate the Model. This is ignored if Dataset is not provided.

test_fraction_split: float | None = None

The fraction of the input data that is to be used to evaluate the Model. This is ignored if Dataset is not provided.

predefined_split_column_name: str | None = None

The key is a name of one of the Dataset’s data columns. The value of the key (either the label’s value or value in the column) must be one of {TRAIN, VALIDATE, TEST}, and it defines to which set the given piece of data is assigned. If for a piece of data the key is not present or has an invalid value, that piece is ignored by the pipeline. Supported only for tabular and time series Datasets.

timestamp_split_column_name: str | None = None

The key is a name of one of the Dataset’s data columns. The value of the key values of the key (the values in the column) must be in RFC 3339 date-time format, where time-offset = "Z" (e.g. 1985-04-12T23:20:50.52Z). If for a piece of data the key is not present or has an invalid value, that piece is ignored by the pipeline. Supported only for tabular and time series Datasets. This parameter must be used with training_fraction_split, validation_fraction_split, and test_fraction_split.

weight_column: str | None = None

Name of the column that should be used as the weight column. Higher values in this column give more importance to the row during Model training. The column must have numeric values between 0 and 10000 inclusively, and 0 value means that the row is ignored. If the weight column field is not set, then all rows are assumed to have equal weight of 1.

time_series_attribute_columns: list | None = None

Column names that should be used as attribute columns. Each column is constant within a time series.

context_window: int | None = None

The amount of time into the past training and prediction data is used for model training and prediction respectively. Expressed in number of units defined by the [data_granularity_unit] and [data_granularity_count] fields. When not provided uses the default value of 0 which means the model sets each series context window to be 0 (also known as “cold start”). Inclusive.

export_evaluated_data_items: bool | None = False

Whether to export the test set predictions to a BigQuery table. If False, then the export is not performed.

export_evaluated_data_items_bigquery_destination_uri: str | None = None

URI of desired destination BigQuery table for exported test set predictions. Expected format: bq://<project_id>:<dataset_id>:<table> If not specified, then results are exported to the following auto-created BigQuery table: <project_id>:export_evaluated_examples_<model_name>_<yyyy_MM_dd'T'HH_mm_ss_SSS'Z'>.evaluated_examples Applies only if [export_evaluated_data_items] is True.

export_evaluated_data_items_override_destination: bool | None = None

Whether to override the contents of [export_evaluated_data_items_bigquery_destination_uri], if the table exists, for exported test set predictions. If False, and the table exists, then the training job will fail. Applies only if [export_evaluated_data_items] is True and [export_evaluated_data_items_bigquery_destination_uri] is specified.

quantiles: list | None = None

Quantiles to use for the minimize-quantile-loss [AutoMLForecastingTrainingJob.optimization_objective]. This argument is required in this case. Accepts up to 5 quantiles in the form of a double from 0 to 1, exclusive. Each quantile must be unique.

validation_options: str | None = None

Validation options for the data validation component. The available options are: “fail-pipeline” - (default), will validate against the validation and fail the pipeline if it fails. “ignore-validation” - ignore the results of the validation and continue the pipeline

budget_milli_node_hours: int | None = None

The train budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. The training cost of the model will not exceed this budget. The final cost will be attempted to be close to the budget, though may end up being (even) noticeably smaller - at the backend’s discretion. This especially may happen when further model training ceases to provide any improvements. If the budget is set to a value known to be insufficient to train a Model for the given training set, the training won’t be attempted and will error. The minimum value is 1000 and the maximum is 72000.

model_display_name: str | None = None

If the script produces a managed Vertex AI Model. The display name of the Model. The name can be up to 128 characters long and can be consist of any UTF-8 characters. If not provided upon creation, the job’s display_name is used.

model_labels: dict | None = None

The labels with user-defined metadata to organize your Models. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.

model_id: str | None = None

The ID to use for the Model produced by this job, which will become the final component of the model resource name. This value may be up to 63 characters, and valid characters are [a-z0-9_-]. The first character cannot be a number or hyphen.

parent_model: str | None = None

The resource name or model ID of an existing model. The new model uploaded by this job will be a version of parent_model. Only set this field when training a new version of an existing model.

is_default_version: bool | None = None

When set to True, the newly uploaded model version will automatically have alias “default” included. Subsequent uses of the model produced by this job without a version specified will use this “default” version. When set to False, the “default” alias will not be moved. Actions targeting the model version produced by this job will need to specifically reference this version by ID or alias. New model uploads, i.e. version 1, will always be “default” aliased.

model_version_aliases: list | None = None

User provided version aliases so that the model version uploaded by this job can be referenced via alias instead of auto-generated version ID. A default version alias will be created for the first version of the model. The format is [a-z][a-zA-Z0-9-]{0,126}[a-z0-9]

model_version_description: str | None = None

The description of the model version being uploaded by this job.

hierarchy_group_columns: list | None = None

A list of time series attribute column names that define the time series hierarchy. Only one level of hierarchy is supported, ex. region for a hierarchy of stores or department for a hierarchy of products. If multiple columns are specified, time series will be grouped by their combined values, ex. (blue, large) for color and size, up to 5 columns are accepted. If no group columns are specified, all time series are considered to be part of the same group.

hierarchy_group_total_weight: float | None = None

The weight of the loss for predictions aggregated over time series in the same hierarchy group.

hierarchy_temporal_total_weight: float | None = None

The weight of the loss for predictions aggregated over the horizon for a single time series.

hierarchy_group_temporal_total_weight: float | None = None

The weight of the loss for predictions aggregated over both the horizon and time series in the same hierarchy group.

window_column: str | None = None

Name of the column that should be used to filter input rows. The column should contain either booleans or string booleans; if the value of the row is True, generate a sliding window from that row.

window_stride_length: int | None = None

Step length used to generate input examples. Every window_stride_length rows will be used to generate a sliding window.

window_max_count: int | None = None

Number of rows that should be used to generate input examples. If the total row count is larger than this number, the input data will be randomly sampled to hit the count.

holiday_regions: list | None = None

The geographical regions to use when creating holiday features. This option is only allowed when data_granularity_unit is day. Acceptable values can come from any of the following levels:

Top level: GLOBAL Second level: continental regions NA: North America JAPAC: Japan and Asia Pacific EMEA: Europe, the Middle East and Africa LAC: Latin America and the Caribbean Third level: countries from ISO 3166-1 Country codes. :param display_name: The user-defined name of this TrainingPipeline. :param optimization_objective: Objective function the model is to be optimized towards. The training process creates a Model that optimizes the value of the objective function over the validation set. The supported optimization objectives: “minimize-rmse” (default) - Minimize root-mean-squared error (RMSE). “minimize-mae” - Minimize mean-absolute error (MAE). “minimize-rmsle” - Minimize root-mean-squared log error (RMSLE). “minimize-rmspe” - Minimize root-mean-squared percentage error (RMSPE). “minimize-wape-mae” - Minimize the combination of weighted absolute percentage error (WAPE) and mean-absolute-error (MAE). “minimize-quantile-loss” - Minimize the quantile loss at the defined quantiles. (Set this objective to build quantile forecasts.) :param column_specs: Alternative to column_transformations where the keys of the dict are column names and their respective values are one of AutoMLTabularTrainingJob.column_data_types. When creating transformation for BigQuery Struct column, the column should be flattened using “.” as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed. :param column_transformations: Transformations to apply to the input columns (i.e. columns other than the targetColumn). Each transformation may produce multiple result values from the column’s value, and all are used for training. When creating transformation for BigQuery Struct column, the column should be flattened using “.” as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed. Consider using column_specs as column_transformations will be deprecated eventually. :param project: Project to retrieve dataset from. :param location: Optional location to retrieve dataset from. :param labels: The labels with user-defined metadata to organize TrainingPipelines. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels. :param training_encryption_spec_key_name: The Cloud KMS resource identifier of the customer managed encryption key used to protect the training pipeline. Has the form: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. If set, this TrainingPipeline will be secured by this key. Note: Model trained by this TrainingPipeline is also secured by this key if model_to_upload is not set separately. Overrides encryption_spec_key_name set in aiplatform.init. :param model_encryption_spec_key_name: The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. If set, the trained Model will be secured by this key. Overrides encryption_spec_key_name set in aiplatform.init. :param additional_experiments: Additional experiment flags for the time series forcasting training.

Returns:

model: dsl.Output[google.VertexModel]

The trained Vertex AI Model resource or None if training did not produce a Vertex AI Model.

v1.automl.training_job.AutoMLImageTrainingJobRunOp(project: str, display_name: str, dataset: dsl.Input[google.VertexDataset], model: dsl.Output[google.VertexModel], gcp_resources: dsl.OutputPath(str), location: str | None = 'us-central1', prediction_type: str | None = 'classification', multi_label: bool | None = False, model_type: str | None = 'CLOUD', base_model: dsl.Input[google.VertexModel] | None = None, incremental_train_base_model: dsl.Input[google.VertexModel] | None = None, parent_model: dsl.Input[google.VertexModel] | None = None, is_default_version: bool | None = True, model_version_aliases: list[str] | None = None, model_version_description: str | None = None, labels: dict[str, str] | None = {}, training_encryption_spec_key_name: str | None = None, model_encryption_spec_key_name: str | None = None, training_fraction_split: float | None = None, validation_fraction_split: float | None = None, test_fraction_split: float | None = None, training_filter_split: str | None = None, validation_filter_split: str | None = None, test_filter_split: str | None = None, budget_milli_node_hours: int | None = None, model_display_name: str | None = None, model_labels: dict[str, str] | None = None, disable_early_stopping: bool | None = False)

Runs the AutoML Image training job and returns a model.

If training on a Vertex AI dataset, you can use one of the following split configurations: Data fraction splits: Any of training_fraction_split, validation_fraction_split and test_fraction_split may optionally be provided, they must sum to up to 1. If the provided ones sum to less than 1, the remainder is assigned to sets as decided by Vertex AI. If none of the fractions are set, by default roughly 80% of data will be used for training, 10% for validation, and 10% for test. Data filter splits: Assigns input data to training, validation, and test sets based on the given filters, data pieces not matched by any filter are ignored. Currently only supported for Datasets containing DataItems. If any of the filters in this message are to match nothing, then they can be set as ‘-’ (the minus sign). If using filter splits, all of training_filter_split, validation_filter_split and test_filter_split must be provided. Supported only for unstructured Datasets.

Parameters:
dataset: dsl.Input[google.VertexDataset]

The dataset within the same Project from which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline’s [training_task_definition] [google.cloud.aiplatform.v1beta1.TrainingPipeline.training_task_definition]. For tabular Datasets, all their data is exported to training, to pick and choose from.

training_fraction_split: float | None = None

The fraction of the input data that is to be used to train the Model. This is ignored if Dataset is not provided.

validation_fraction_split: float | None = None

The fraction of the input data that is to be used to validate the Model. This is ignored if Dataset is not provided.

test_fraction_split: float | None = None

The fraction of the input data that is to be used to evaluate the Model. This is ignored if Dataset is not provided.

training_filter_split: str | None = None

A filter on DataItems of the Dataset. DataItems that match this filter are used to train the Model. A filter with same syntax as the one used in DatasetService.ListDataItems may be used. If a single DataItem is matched by more than one of the FilterSplit filters, then it is assigned to the first set that applies to it in the training, validation, test order. This is ignored if Dataset is not provided. Example usage: training_filter_split=”labels.aiplatform.googleapis.com/ml_use=training”.

validation_filter_split: str | None = None

A filter on DataItems of the Dataset. DataItems that match this filter are used to validate the Model. A filter with same syntax as the one used in DatasetService.ListDataItems may be used. If a single DataItem is matched by more than one of the FilterSplit filters, then it is assigned to the first set that applies to it in the training, validation, test order. This is ignored if Dataset is not provided. Example usage: validation_filter_split= “labels.aiplatform.googleapis.com/ml_use=validation”.

test_filter_split: str | None = None

A filter on DataItems of the Dataset. DataItems that match this filter are used to test the Model. A filter with same syntax as the one used in DatasetService.ListDataItems may be used. If a single DataItem is matched by more than one of the FilterSplit filters, then it is assigned to the first set that applies to it in the training, validation, test order. This is ignored if Dataset is not provided. Example usage: test_filter_split= “labels.aiplatform.googleapis.com/ml_use=test”.

budget_milli_node_hours: int | None = None

The train budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. Defaults by prediction_type: classification - For Cloud models the budget must be: 8,000 - 800,000 milli node hours (inclusive). The default value is 192,000 which represents one day in wall time, assuming 8 nodes are used. object_detection - For Cloud models the budget must be: 20,000 - 900,000 milli node hours (inclusive). The default value is 216,000 which represents one day in wall time, assuming 9 nodes are used. The training cost of the model will not exceed this budget. The final cost will be attempted to be close to the budget, though may end up being (even) noticeably smaller - at the backend’s discretion. This especially may happen when further model training ceases to provide any improvements. If the budget is set to a value known to be insufficient to train a Model for the given training set, the training won’t be attempted and will error.

model_display_name: str | None = None

The display name of the managed Vertex AI Model. The name can be up to 128 characters long and can be consist of any UTF-8 characters. If not provided upon creation, the job’s display_name is used.

model_labels: dict[str, str] | None = None

The labels with user-defined metadata to organize your Models. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.

disable_early_stopping: bool | None = False

If true, the entire budget is used. This disables the early stopping feature. By default, the early stopping feature is enabled, which means that training might stop before the entire training budget has been used, if further training does no longer brings significant improvement to the model.

display_name: str

The user-defined name of this TrainingPipeline.

prediction_type: str | None = 'classification'

The type of prediction the Model is to produce, one of: “classification” - Predict one out of multiple target values is picked for each row. “object_detection” - Predict a value based on its relation to other values. This type is available only to columns that contain semantically numeric values, i.e. integers or floating point number, even if stored as e.g. strings.

multi_label: bool | None = False

Default is False. If false, a single-label (multi-class) Model will be trained (i.e. assuming that for each image just up to one annotation may be applicable). If true, a multi-label Model will be trained (i.e. assuming that for each image multiple annotations may be applicable). This is only applicable for the “classification” prediction_type and will be ignored otherwise.

model_type: str | None = 'CLOUD'

One of the following: “CLOUD” - Default for Image Classification. A Model best tailored to be used within Google Cloud, and which cannot be exported. “CLOUD_HIGH_ACCURACY_1” - Default for Image Object Detection. A model best tailored to be used within Google Cloud, and which cannot be exported. Expected to have a higher latency, but should also have a higher prediction quality than other cloud models. “CLOUD_LOW_LATENCY_1” - A model best tailored to be used within Google Cloud, and which cannot be exported. Expected to have a low latency, but may have lower prediction quality than other cloud models. “MOBILE_TF_LOW_LATENCY_1” - A model that, in addition to being available within Google Cloud, can also be exported as TensorFlow or Core ML model and used on a mobile or edge device afterwards. Expected to have low latency, but may have lower prediction quality than other mobile models. “MOBILE_TF_VERSATILE_1” - A model that, in addition to being available within Google Cloud, can also be exported as TensorFlow or Core ML model and used on a mobile or edge device with afterwards. “MOBILE_TF_HIGH_ACCURACY_1” - A model that, in addition to being available within Google Cloud, can also be exported as TensorFlow or Core ML model and used on a mobile or edge device afterwards. Expected to have a higher latency, but should also have a higher prediction quality than other mobile models.

base_model: dsl.Input[google.VertexModel] | None = None

Only permitted for Image Classification models. If it is specified, the new model will be trained based on the base model. Otherwise, the new model will be trained from scratch. The base model must be in the same Project and Location as the new Model to train, and have the same model_type.

incremental_train_base_model: dsl.Input[google.VertexModel] | None = None

Optional for both Image Classification and Object detection models, to incrementally train a new model using an existing model as the starting point, with a reduced training time. If not specified, the new model will be trained from scratch. The base model must be in the same Project and Location as the new Model to train, and have the same prediction_type and model_type.

parent_model: dsl.Input[google.VertexModel] | None = None

The resource name or model ID of an existing model. The new model uploaded by this job will be a version of parent_model. Only set this field when training a new version of an existing model.

is_default_version: bool | None = True

When set to True, the newly uploaded model version will automatically have alias “default” included. Subsequent uses of the model produced by this job without a version specified will use this “default” version. When set to False, the “default” alias will not be moved. Actions targeting the model version produced by this job will need to specifically reference this version by ID or alias. New model uploads, i.e. version 1, will always be “default” aliased.

model_version_aliases: list[str] | None = None

User provided version aliases so that the model version uploaded by this job can be referenced via alias instead of auto-generated version ID. A default version alias will be created for the first version of the model. The format is [a-z][a-zA-Z0-9-]{0,126}[a-z0-9]

model_version_description: str | None = None

The description of the model version being uploaded by this job.

project: str

Project to retrieve dataset from.

location: str | None = 'us-central1'

Optional location to retrieve dataset from.

labels: dict[str, str] | None = {}

The labels with user-defined metadata to organize TrainingPipelines. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.

training_encryption_spec_key_name: str | None = None

The Cloud KMS resource identifier of the customer managed encryption key used to protect the training pipeline. Has the form: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. If set, this TrainingPipeline will be secured by this key. Note: Model trained by this TrainingPipeline is also secured by this key if model_to_upload is not set separately. Overrides encryption_spec_key_name set in aiplatform.init.

model_encryption_spec_key_name: str | None = None

The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. If set, the trained Model will be secured by this key. Overrides encryption_spec_key_name set in aiplatform.init.

Returns:

model: dsl.Output[google.VertexModel]

The trained Vertex AI Model resource or None if training did not produce a Vertex AI Model.

gcp_resources: dsl.OutputPath(str)

Serialized gcp_resources proto tracking the batch prediction job. For more details, see https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md.

v1.automl.training_job.AutoMLTabularTrainingJobRunOp(project: str, display_name: str, optimization_prediction_type: str, dataset: dsl.Input[google.VertexDataset], target_column: str, model: dsl.Output[google.VertexModel], location: str | None = 'us-central1', optimization_objective: str | None = None, column_specs: dict | None = None, column_transformations: list | None = None, optimization_objective_recall_value: float | None = None, optimization_objective_precision_value: float | None = None, labels: dict | None = {}, training_encryption_spec_key_name: str | None = None, model_encryption_spec_key_name: str | None = None, training_fraction_split: float | None = None, test_fraction_split: float | None = None, validation_fraction_split: float | None = None, predefined_split_column_name: str | None = None, timestamp_split_column_name: str | None = None, weight_column: str | None = None, budget_milli_node_hours: int | None = None, model_display_name: str | None = None, model_labels: dict | None = None, model_id: str | None = None, parent_model: str | None = None, is_default_version: bool | None = None, model_version_aliases: list | None = None, model_version_description: str | None = None, disable_early_stopping: bool | None = False, export_evaluated_data_items: bool | None = False, export_evaluated_data_items_bigquery_destination_uri: str | None = None, export_evaluated_data_items_override_destination: bool | None = None)

Runs the training job and returns a model.

If training on a Vertex AI dataset, you can use one of the following split configurations: Data fraction splits: Any of training_fraction_split, validation_fraction_split and test_fraction_split may optionally be provided, they must sum to up to 1. If the provided ones sum to less than 1, the remainder is assigned to sets as decided by Vertex AI. If none of the fractions are set, by default roughly 80% of data will be used for training, 10% for validation, and 10% for test. Predefined splits: Assigns input data to training, validation, and test sets based on the value of a provided key. If using predefined splits, predefined_split_column_name must be provided. Supported only for tabular Datasets. Timestamp splits: Assigns input data to training, validation, and test sets based on a provided timestamps. The youngest data pieces are assigned to training set, next to validation set, and the oldest to the test set. Supported only for tabular Datasets.

Parameters:
dataset: dsl.Input[google.VertexDataset]

The dataset within the same Project from which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline’s [training_task_definition] [google.cloud.aiplatform.v1beta1.TrainingPipeline.training_task_definition]. For tabular Datasets, all their data is exported to training, to pick and choose from.

target_column: str

The name of the column values of which the Model is to predict.

training_fraction_split: float | None = None

The fraction of the input data that is to be used to train the Model. This is ignored if Dataset is not provided.

validation_fraction_split: float | None = None

The fraction of the input data that is to be used to validate the Model. This is ignored if Dataset is not provided.

test_fraction_split: float | None = None

The fraction of the input data that is to be used to evaluate the Model. This is ignored if Dataset is not provided.

predefined_split_column_name: str | None = None

The key is a name of one of the Dataset’s data columns. The value of the key (either the label’s value or value in the column) must be one of {training, validation, test}, and it defines to which set the given piece of data is assigned. If for a piece of data the key is not present or has an invalid value, that piece is ignored by the pipeline. Supported only for tabular and time series Datasets.

timestamp_split_column_name: str | None = None

The key is a name of one of the Dataset’s data columns. The value of the key values of the key (the values in the column) must be in RFC 3339 date-time format, where time-offset = "Z" (e.g. 1985-04-12T23:20:50.52Z). If for a piece of data the key is not present or has an invalid value, that piece is ignored by the pipeline. Supported only for tabular and time series Datasets. This parameter must be used with training_fraction_split, validation_fraction_split and test_fraction_split.

weight_column: str | None = None

Name of the column that should be used as the weight column. Higher values in this column give more importance to the row during Model training. The column must have numeric values between 0 and 10000 inclusively, and 0 value means that the row is ignored. If the weight column field is not set, then all rows are assumed to have equal weight of 1.

budget_milli_node_hours: int | None = None

The train budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. The training cost of the model will not exceed this budget. The final cost will be attempted to be close to the budget, though may end up being (even) noticeably smaller - at the backend’s discretion. This especially may happen when further model training ceases to provide any improvements. If the budget is set to a value known to be insufficient to train a Model for the given training set, the training won’t be attempted and will error. The minimum value is 1000 and the maximum is 72000.

model_display_name: str | None = None

If the script produces a managed Vertex AI Model. The display name of the Model. The name can be up to 128 characters long and can be consist of any UTF-8 characters. If not provided upon creation, the job’s display_name is used.

model_labels: dict | None = None

The labels with user-defined metadata to organize your Models. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.

model_id: str | None = None

The ID to use for the Model produced by this job, which will become the final component of the model resource name. This value may be up to 63 characters, and valid characters are [a-z0-9_-]. The first character cannot be a number or hyphen.

parent_model: str | None = None

The resource name or model ID of an existing model. The new model uploaded by this job will be a version of parent_model. Only set this field when training a new version of an existing model.

is_default_version: bool | None = None

When set to True, the newly uploaded model version will automatically have alias “default” included. Subsequent uses of the model produced by this job without a version specified will use this “default” version. When set to False, the “default” alias will not be moved. Actions targeting the model version produced by this job will need to specifically reference this version by ID or alias. New model uploads, i.e. version 1, will always be “default” aliased.

model_version_aliases: list | None = None

User provided version aliases so that the model version uploaded by this job can be referenced via alias instead of auto-generated version ID. A default version alias will be created for the first version of the model. The format is [a-z][a-zA-Z0-9-]{0,126}[a-z0-9]

model_version_description: str | None = None

The description of the model version being uploaded by this job.

disable_early_stopping: bool | None = False

If true, the entire budget is used. This disables the early stopping feature. By default, the early stopping feature is enabled, which means that training might stop before the entire training budget has been used, if further training does no longer brings significant improvement to the model.

export_evaluated_data_items: bool | None = False

Whether to export the test set predictions to a BigQuery table. If False, then the export is not performed.

export_evaluated_data_items_bigquery_destination_uri: str | None = None

URI of desired destination BigQuery table for exported test set predictions. Expected format: bq://<project_id>:<dataset_id>:<table> If not specified, then results are exported to the following auto-created BigQuery table: <project_id>:export_evaluated_examples_<model_name>_<yyyy_MM_dd'T'HH_mm_ss_SSS'Z'>.evaluated_examples Applies only if [export_evaluated_data_items] is True.

export_evaluated_data_items_override_destination: bool | None = None

Whether to override the contents of [export_evaluated_data_items_bigquery_destination_uri], if the table exists, for exported test set predictions. If False, and the table exists, then the training job will fail. Applies only if [export_evaluated_data_items] is True and [export_evaluated_data_items_bigquery_destination_uri] is specified.

display_name: str

The user-defined name of this TrainingPipeline.

optimization_prediction_type: str

The type of prediction the Model is to produce. “classification” - Predict one out of multiple target values is picked for each row. “regression” - Predict a value based on its relation to other values. This type is available only to columns that contain semantically numeric values, i.e. integers or floating point number, even if stored as e.g. strings.

optimization_objective: str | None = None

Objective function the Model is to be optimized towards. The training task creates a Model that maximizes/minimizes the value of the objective function over the validation set. The supported optimization objectives depend on the prediction type, and in the case of classification also the number of distinct values in the target column (two distint values -> binary, 3 or more distinct values -> multi class). If the field is not set, the default objective function is used. Classification: “maximize-au-roc” (default) - Maximize the area under the receiver operating characteristic (ROC) curve. “minimize-log-loss” - Minimize log loss. “maximize-au-prc” - Maximize the area under the precision-recall curve. “maximize-precision-at-recall” - Maximize precision for a specified recall value. “maximize-recall-at-precision” - Maximize recall for a specified precision value. Classification (multi class): “minimize-log-loss” (default) - Minimize log loss. Regression: “minimize-rmse” (default) - Minimize root-mean-squared error (RMSE). “minimize-mae” - Minimize mean-absolute error (MAE). “minimize-rmsle” - Minimize root-mean-squared log error (RMSLE).

column_specs: dict | None = None

Alternative to column_transformations where the keys of the dict are column names and their respective values are one of AutoMLTabularTrainingJob.column_data_types. When creating transformation for BigQuery Struct column, the column should be flattened using “.” as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed.

column_transformations: list | None = None

Transformations to apply to the input columns (i.e. columns other than the targetColumn). Each transformation may produce multiple result values from the column’s value, and all are used for training. When creating transformation for BigQuery Struct column, the column should be flattened using “.” as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed. Consider using column_specs as column_transformations will be deprecated eventually.

optimization_objective_recall_value: float | None = None

Required when maximize-precision-at-recall optimizationObjective was picked, represents the recall value at which the optimization is done. The minimum value is 0 and the maximum is 1.0.

optimization_objective_precision_value: float | None = None

Required when maximize-recall-at-precision optimizationObjective was picked, represents the precision value at which the optimization is done. The minimum value is 0 and the maximum is 1.0.

project: str

Project to retrieve dataset from.

location: str | None = 'us-central1'

Optional location to retrieve dataset from.

labels: dict | None = {}

The labels with user-defined metadata to organize TrainingPipelines. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.

training_encryption_spec_key_name: str | None = None

The Cloud KMS resource identifier of the customer managed encryption key used to protect the training pipeline. Has the form: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. If set, this TrainingPipeline will be secured by this key. Note: Model trained by this TrainingPipeline is also secured by this key if model_to_upload is not set separately. Overrides encryption_spec_key_name set in aiplatform.init.

model_encryption_spec_key_name: str | None = None

The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. If set, the trained Model will be secured by this key. Overrides encryption_spec_key_name set in aiplatform.init.

Returns:

model: dsl.Output[google.VertexModel]

The trained Vertex AI Model resource or None if training did not produce a Vertex AI Model.

v1.automl.training_job.AutoMLTextTrainingJobRunOp(project: str, display_name: str, dataset: dsl.Input[google.VertexDataset], model: dsl.Output[google.VertexModel], location: str | None = 'us-central1', prediction_type: str | None = 'classification', multi_label: bool | None = False, labels: dict | None = {}, training_encryption_spec_key_name: str | None = None, model_encryption_spec_key_name: str | None = None, training_fraction_split: float | None = None, validation_fraction_split: float | None = None, test_fraction_split: float | None = None, sentiment_max: int | None = 10, model_display_name: str | None = None, model_labels: dict | None = None)

Runs the training job and returns a model.

If training on a Vertex AI dataset, you can use one of the following split configurations: Data fraction splits: Any of training_fraction_split, validation_fraction_split and test_fraction_split may optionally be provided, they must sum to up to 1. If the provided ones sum to less than 1, the remainder is assigned to sets as decided by Vertex AI. If none of the fractions are set, by default roughly 80% of data will be used for training, 10% for validation, and 10% for test. Data filter splits: Assigns input data to training, validation, and test sets based on the given filters, data pieces not matched by any filter are ignored. Currently only supported for Datasets containing DataItems. If any of the filters in this message are to match nothing, then they can be set as ‘-’ (the minus sign). Supported only for unstructured Datasets.

Parameters:
dataset: dsl.Input[google.VertexDataset]

The dataset within the same Project from which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline’s [training_task_definition] [google.cloud.aiplatform.v1beta1.TrainingPipeline.training_task_definition].

training_fraction_split: float | None = None

The fraction of the input data that is to be used to train the Model. This is ignored if Dataset is not provided.

validation_fraction_split: float | None = None

The fraction of the input data that is to be used to validate the Model. This is ignored if Dataset is not provided.

test_fraction_split: float | None = None

The fraction of the input data that is to be used to evaluate the Model. This is ignored if Dataset is not provided.

model_display_name: str | None = None

The display name of the managed Vertex AI Model. The name can be up to 128 characters long and can consist of any UTF-8 characters. If not provided upon creation, the job’s display_name is used.

model_labels: dict | None = None

The labels with user-defined metadata to organize your Models. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.

display_name: str

The user-defined name of this TrainingPipeline.

prediction_type: str | None = 'classification'

The type of prediction the Model is to produce, one of: “classification” - A classification model analyzes text data and returns a list of categories that apply to the text found in the data. Vertex AI offers both single-label and multi-label text classification models. “extraction” - An entity extraction model inspects text data known entities referenced in the data and labels those entities in the text. “sentiment” - A sentiment analysis model inspects text data and identifies the prevailing emotional opinion within it, especially to determine a writer’s attitude as positive, negative, or neutral.

multi_label: bool | None = False

Required and only applicable for text classification task. If false, a single-label (multi-class) Model will be trained (i.e. assuming that for each text snippet just up to one annotation may be applicable). If true, a multi-label Model will be trained (i.e. assuming that for each text snippet multiple annotations may be applicable).

sentiment_max: int | None = 10

Required and only applicable for sentiment task. A sentiment is expressed as an integer ordinal, where higher value means a more positive sentiment. The range of sentiments that will be used is between 0 and sentimentMax (inclusive on both ends), and all the values in the range must be represented in the dataset before a model can be created. Only the Annotations with this sentimentMax will be used for training. sentimentMax value must be between 1 and 10 (inclusive).

project: str

Project to retrieve dataset from.

location: str | None = 'us-central1'

Optional location to retrieve dataset from.

labels: dict | None = {}

The labels with user-defined metadata to organize TrainingPipelines. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.

training_encryption_spec_key_name: str | None = None

The Cloud KMS resource identifier of the customer managed encryption key used to protect the training pipeline. Has the form: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. If set, this TrainingPipeline will be secured by this key. Note: Model trained by this TrainingPipeline is also secured by this key if model_to_upload is not set separately. Overrides encryption_spec_key_name set in aiplatform.init.

model_encryption_spec_key_name: str | None = None

The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. If set, the trained Model will be secured by this key. Overrides encryption_spec_key_name set in aiplatform.init.

Returns:

model: dsl.Output[google.VertexModel]

The trained Vertex AI Model resource.

v1.automl.training_job.AutoMLVideoTrainingJobRunOp(project: str, display_name: str, dataset: dsl.Input[google.VertexDataset], model: dsl.Output[google.VertexModel], location: str | None = 'us-central1', prediction_type: str | None = 'classification', model_type: str | None = 'CLOUD', labels: dict | None = {}, training_encryption_spec_key_name: str | None = None, model_encryption_spec_key_name: str | None = None, training_fraction_split: float | None = None, test_fraction_split: float | None = None, model_display_name: str | None = None, model_labels: dict | None = None)

Runs the AutoML Video training job and returns a model.

If training on a Vertex AI dataset, you can use one of the following split configurations: Data fraction splits: training_fraction_split, and test_fraction_split may optionally be provided, they must sum to up to 1. If none of the fractions are set, by default roughly 80% of data will be used for training, and 20% for test. Data filter splits: Assigns input data to training, validation, and test sets based on the given filters, data pieces not matched by any filter are ignored. Currently only supported for Datasets containing DataItems. If any of the filters in this message are to match nothing, then they can be set as ‘-’ (the minus sign). Supported only for unstructured Datasets.

Parameters:
dataset: dsl.Input[google.VertexDataset]

The dataset within the same Project from which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline’s [training_task_definition] [google.cloud.aiplatform.v1beta1.TrainingPipeline.training_task_definition]. For tabular Datasets, all their data is exported to training, to pick and choose from.

training_fraction_split: float | None = None

The fraction of the input data that is to be used to train the Model. This is ignored if Dataset is not provided.

test_fraction_split: float | None = None

The fraction of the input data that is to be used to evaluate the Model. This is ignored if Dataset is not provided.

model_display_name: str | None = None

The display name of the managed Vertex AI Model. The name can be up to 128 characters long and can be consist of any UTF-8 characters. If not provided upon creation, the job’s display_name is used.

model_labels: dict | None = None

The labels with user-defined metadata to organize your Models. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.

display_name: str

The user-defined name of this TrainingPipeline.

prediction_type: str | None = 'classification'

The type of prediction the Model is to produce, one of: “classification” - A video classification model classifies shots and segments in your videos according to your own defined labels. “object_tracking” - A video object tracking model detects and tracks multiple objects in shots and segments. You can use these models to track objects in your videos according to your own pre-defined, custom labels. “action_recognition” - A video action reconition model pinpoints the location of actions with short temporal durations (~1 second).

model_type: str | None = 'CLOUD'

str = “CLOUD” One of the following: “CLOUD” - available for “classification”, “object_tracking” and “action_recognition” A Model best tailored to be used within Google Cloud, and which cannot be exported. “MOBILE_VERSATILE_1” - available for “classification”, “object_tracking” and “action_recognition” A model that, in addition to being available within Google Cloud, can also be exported (see ModelService.ExportModel) as a TensorFlow or TensorFlow Lite model and used on a mobile or edge device with afterwards. “MOBILE_CORAL_VERSATILE_1” - available only for “object_tracking” A versatile model that is meant to be exported (see ModelService.ExportModel) and used on a Google Coral device. “MOBILE_CORAL_LOW_LATENCY_1” - available only for “object_tracking” A model that trades off quality for low latency, to be exported (see ModelService.ExportModel) and used on a Google Coral device. “MOBILE_JETSON_VERSATILE_1” - available only for “object_tracking” A versatile model that is meant to be exported (see ModelService.ExportModel) and used on an NVIDIA Jetson device. “MOBILE_JETSON_LOW_LATENCY_1” - available only for “object_tracking” A model that trades off quality for low latency, to be exported (see ModelService.ExportModel) and used on an NVIDIA Jetson device.

project: str

Project to retrieve dataset from.

location: str | None = 'us-central1'

Optional location to retrieve dataset from.

labels: dict | None = {}

The labels with user-defined metadata to organize TrainingPipelines. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.

training_encryption_spec_key_name: str | None = None

The Cloud KMS resource identifier of the customer managed encryption key used to protect the training pipeline. Has the form: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. If set, this TrainingPipeline will be secured by this key. Note: Model trained by this TrainingPipeline is also secured by this key if model_to_upload is not set separately. Overrides encryption_spec_key_name set in aiplatform.init.

model_encryption_spec_key_name: str | None = None

The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. If set, the trained Model will be secured by this key. Overrides encryption_spec_key_name set in aiplatform.init.

Returns:

model: dsl.Output[google.VertexModel]

The trained Vertex AI Model resource or None if training did not produce a Vertex AI Model.