google_cloud_pipeline_components.aiplatform components
Core modules for AI Platform Pipeline Components.
- google_cloud_pipeline_components.aiplatform.AutoMLForecastingTrainingJobRunOp(project: str, display_name: str, target_column: str, time_column: str, time_series_identifier_column: str, unavailable_at_forecast_columns: list, available_at_forecast_columns: list, forecast_horizon: int, data_granularity_unit: str, data_granularity_count: int, dataset: google.VertexDataset, location: str = 'us-central1', optimization_objective: str = None, time_series_attribute_columns: list = None, context_window: int = None, quantiles: list = None, validation_options: str = None, labels: dict = '{}', training_encryption_spec_key_name: str = None, model_encryption_spec_key_name: str = None, training_fraction_split: float = None, budget_milli_node_hours: int = None, model_display_name: str = None, model_labels: dict = None, column_specs: dict = None, column_transformations: list = None, predefined_split_column_name: str = None, weight_column: str = None, export_evaluated_data_items: bool = False, export_evaluated_data_items_bigquery_destination_uri: str = None, export_evaluated_data_items_override_destination: bool = None, test_fraction_split: float = None, validation_fraction_split: float = None)
automl_forecasting_training_job Runs the training job and returns a model. If training on a Vertex AI dataset, you can use one of the following split configurations:
Data fraction splits: Any of
training_fraction_split
,validation_fraction_split
andtest_fraction_split
may optionally be provided, they must sum to up to 1. If the provided ones sum to less than 1, the remainder is assigned to sets as decided by Vertex AI. If none of the fractions are set, by default roughly 80% of data will be used for training, 10% for validation, and 10% for test. Predefined splits: Assigns input data to training, validation, and test sets based on the value of a provided key. If using predefined splits,predefined_split_column_name
must be provided. Supported only for tabular Datasets. Timestamp splits: Assigns input data to training, validation, and test sets based on a provided timestamps. The youngest data pieces are assigned to training set, next to validation set, and the oldest to the test set. Supported only for tabular Datasets.- Args:
- dataset (google.VertexDataset):
Required. The dataset within the same Project from which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline’s [training_task_definition] [google.cloud.aiplatform.v1beta1.TrainingPipeline.training_task_definition]. For time series Datasets, all their data is exported to training, to pick and choose from.
- target_column (String):
Required. Name of the column that the Model is to predict values for.
- time_column (String):
Required. Name of the column that identifies time order in the time series.
- time_series_identifier_column (String):
Required. Name of the column that identifies the time series.
- unavailable_at_forecast_columns (JsonArray):
Required. Column names of columns that are unavailable at forecast. Each column contains information for the given entity (identified by the [time_series_identifier_column]) that is unknown before the forecast (e.g. population of a city in a given year, or weather on a given day).
- available_at_forecast_columns (JsonArray):
Required. Column names of columns that are available at forecast. Each column contains information for the given entity (identified by the [time_series_identifier_column]) that is known at forecast.
- forecast_horizon: (Integer):
Required. The amount of time into the future for which forecasted values for the target are returned. Expressed in number of units defined by the [data_granularity_unit] and [data_granularity_count] field. Inclusive.
- data_granularity_unit (String):
Required. The data granularity unit. Accepted values are
minute
,hour
,day
,week
,month
,year
.- data_granularity_count (Integer):
Required. The number of data granularity units between data points in the training data. If [data_granularity_unit] is minute, can be 1, 5, 10, 15, or 30. For all other values of [data_granularity_unit], must be 1.
- predefined_split_column_name (String):
Optional. The key is a name of one of the Dataset’s data columns. The value of the key (either the label’s value or value in the column) must be one of {
TRAIN
,VALIDATE
,TEST
}, and it defines to which set the given piece of data is assigned. If for a piece of data the key is not present or has an invalid value, that piece is ignored by the pipeline. Supported only for tabular and time series Datasets.- weight_column (String):
Optional. Name of the column that should be used as the weight column. Higher values in this column give more importance to the row during Model training. The column must have numeric values between 0 and 10000 inclusively, and 0 value means that the row is ignored. If the weight column field is not set, then all rows are assumed to have equal weight of 1.
- time_series_attribute_columns (JsonArray):
Optional. Column names that should be used as attribute columns. Each column is constant within a time series.
- context_window (Integer):
Optional. The amount of time into the past training and prediction data is used for model training and prediction respectively. Expressed in number of units defined by the [data_granularity_unit] and [data_granularity_count] fields. When not provided uses the default value of 0 which means the model sets each series context window to be 0 (also known as “cold start”). Inclusive.
- export_evaluated_data_items (Boolean):
Whether to export the test set predictions to a BigQuery table. If False, then the export is not performed.
- export_evaluated_data_items_bigquery_destination_uri (String):
Optional. URI of desired destination BigQuery table for exported test set predictions. Expected format:
bq://<project_id>:<dataset_id>:<table>
If not specified, then results are exported to the following auto-created BigQuery table:<project_id>:export_evaluated_examples_<model_name>_<yyyy_MM_dd'T'HH_mm_ss_SSS'Z'>.evaluated_examples
Applies only if [export_evaluated_data_items] is True.- export_evaluated_data_items_override_destination (Boolean):
Whether to override the contents of [export_evaluated_data_items_bigquery_destination_uri], if the table exists, for exported test set predictions. If False, and the table exists, then the training job will fail. Applies only if [export_evaluated_data_items] is True and [export_evaluated_data_items_bigquery_destination_uri] is specified.
- quantiles (JsonArray):
Quantiles to use for the minizmize-quantile-loss [AutoMLForecastingTrainingJob.optimization_objective]. This argument is required in this case. Accepts up to 5 quantiles in the form of a double from 0 to 1, exclusive. Each quantile must be unique.
- validation_options (String):
Validation options for the data validation component. The available options are: “fail-pipeline” - (default), will validate against the validation and fail the pipeline
if it fails.
“ignore-validation” - ignore the results of the validation and continue the pipeline
- budget_milli_node_hours (Integer):
Optional. The train budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. The training cost of the model will not exceed this budget. The final cost will be attempted to be close to the budget, though may end up being (even) noticeably smaller - at the backend’s discretion. This especially may happen when further model training ceases to provide any improvements. If the budget is set to a value known to be insufficient to train a Model for the given training set, the training won’t be attempted and will error. The minimum value is 1000 and the maximum is 72000.
- model_display_name (String):
Optional. If the script produces a managed Vertex AI Model. The display name of the Model. The name can be up to 128 characters long and can be consist of any UTF-8 characters. If not provided upon creation, the job’s display_name is used.
- model_labels (JsonObject):
Optional. The labels with user-defined metadata to organize your Models. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
- display_name (String):
Required. The user-defined name of this TrainingPipeline.
- optimization_objective (String):
Optional. Objective function the model is to be optimized towards. The training process creates a Model that optimizes the value of the objective function over the validation set. The supported optimization objectives: “minimize-rmse” (default) - Minimize root-mean-squared error (RMSE). “minimize-mae” - Minimize mean-absolute error (MAE). “minimize-rmsle” - Minimize root-mean-squared log error (RMSLE). “minimize-rmspe” - Minimize root-mean-squared percentage error (RMSPE). “minimize-wape-mae” - Minimize the combination of weighted absolute percentage error (WAPE)
and mean-absolute-error (MAE).
- “minimize-quantile-loss” - Minimize the quantile loss at the defined quantiles.
(Set this objective to build quantile forecasts.)
- column_specs (JsonObject):
Optional. Alternative to column_transformations where the keys of the dict are column names and their respective values are one of AutoMLTabularTrainingJob.column_data_types. When creating transformation for BigQuery Struct column, the column should be flattened using “.” as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed.
- column_transformations (List[Dict[str, Dict[str, str]]]):
Optional. Transformations to apply to the input columns (i.e. columns other than the targetColumn). Each transformation may produce multiple result values from the column’s value, and all are used for training. When creating transformation for BigQuery Struct column, the column should be flattened using “.” as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed. Consider using column_specs as column_transformations will be deprecated eventually.
- project (String):
Required. project to retrieve dataset from.
- location (String):
Optional location to retrieve dataset from.
- labels (JsonObject):
Optional. The labels with user-defined metadata to organize TrainingPipelines. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
- training_encryption_spec_key_name (Optional[String]):
Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the training pipeline. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created. If set, this TrainingPipeline will be secured by this key. Note: Model trained by this TrainingPipeline is also secured by this key ifmodel_to_upload
is not set separately. Overrides encryption_spec_key_name set in aiplatform.init.- model_encryption_spec_key_name (Optional[String]):
Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created. If set, the trained Model will be secured by this key. Overrides encryption_spec_key_name set in aiplatform.init.
- Returns:
- model: The trained Vertex AI Model resource or None if training did not
produce a Vertex AI Model.
- google_cloud_pipeline_components.aiplatform.AutoMLImageTrainingJobRunOp(project: str, display_name: str, dataset: google.VertexDataset, location: str = 'us-central1', prediction_type: str = 'classification', multi_label: bool = False, model_type: str = 'CLOUD', base_model: google.VertexModel = None, labels: dict = '{}', training_encryption_spec_key_name: str = None, model_encryption_spec_key_name: str = None, training_fraction_split: float = None, validation_fraction_split: float = None, test_fraction_split: float = None, budget_milli_node_hours: int = None, model_display_name: str = None, model_labels: dict = None, disable_early_stopping: bool = False)
automl_image_training_job Runs the AutoML Image training job and returns a model. If training on a Vertex AI dataset, you can use one of the following split configurations:
Data fraction splits: Any of
training_fraction_split
,validation_fraction_split
andtest_fraction_split
may optionally be provided, they must sum to up to 1. If the provided ones sum to less than 1, the remainder is assigned to sets as decided by Vertex AI. If none of the fractions are set, by default roughly 80% of data will be used for training, 10% for validation, and 10% for test. Data filter splits: Assigns input data to training, validation, and test sets based on the given filters, data pieces not matched by any filter are ignored. Currently only supported for Datasets containing DataItems. If any of the filters in this message are to match nothing, then they can be set as ‘-’ (the minus sign). Supported only for unstructured Datasets.- Args:
- dataset (datasets.ImageDataset):
Required. The dataset within the same Project from which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline’s [training_task_definition] [google.cloud.aiplatform.v1beta1.TrainingPipeline.training_task_definition]. For tabular Datasets, all their data is exported to training, to pick and choose from.
- training_fraction_split (Float):
Optional. The fraction of the input data that is to be used to train the Model. This is ignored if Dataset is not provided.
- validation_fraction_split (Float):
Optional. The fraction of the input data that is to be used to validate the Model. This is ignored if Dataset is not provided.
- test_fraction_split (Float):
Optional. The fraction of the input data that is to be used to evaluate the Model. This is ignored if Dataset is not provided.
- budget_milli_node_hours (Integer):
Optional. The train budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. Defaults by prediction_type:
classification - For Cloud models the budget must be: 8,000 - 800,000 milli node hours (inclusive). The default value is 192,000 which represents one day in wall time, assuming 8 nodes are used. object_detection - For Cloud models the budget must be: 20,000 - 900,000 milli node hours (inclusive). The default value is 216,000 which represents one day in wall time, assuming 9 nodes are used.
The training cost of the model will not exceed this budget. The final cost will be attempted to be close to the budget, though may end up being (even) noticeably smaller - at the backend’s discretion. This especially may happen when further model training ceases to provide any improvements. If the budget is set to a value known to be insufficient to train a Model for the given training set, the training won’t be attempted and will error.
- model_display_name (String):
Optional. The display name of the managed Vertex AI Model. The name can be up to 128 characters long and can be consist of any UTF-8 characters. If not provided upon creation, the job’s display_name is used.
- model_labels (JsonObject):
Optional. The labels with user-defined metadata to organize your Models. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
- disable_early_stopping: bool = False
Required. If true, the entire budget is used. This disables the early stopping feature. By default, the early stopping feature is enabled, which means that training might stop before the entire training budget has been used, if further training does no longer brings significant improvement to the model.
- display_name (String):
Required. The user-defined name of this TrainingPipeline.
- prediction_type (String):
- The type of prediction the Model is to produce, one of:
- “classification” - Predict one out of multiple target values is
picked for each row.
- “object_detection” - Predict a value based on its relation to other values.
This type is available only to columns that contain semantically numeric values, i.e. integers or floating point number, even if stored as e.g. strings.
- multi_label: bool = False
Required. Default is False. If false, a single-label (multi-class) Model will be trained (i.e. assuming that for each image just up to one annotation may be applicable). If true, a multi-label Model will be trained (i.e. assuming that for each image multiple annotations may be applicable). This is only applicable for the “classification” prediction_type and will be ignored otherwise.
- model_type: str = “CLOUD”
- Required. One of the following:
- “CLOUD” - Default for Image Classification.
A Model best tailored to be used within Google Cloud, and which cannot be exported.
- “CLOUD_HIGH_ACCURACY_1” - Default for Image Object Detection.
A model best tailored to be used within Google Cloud, and which cannot be exported. Expected to have a higher latency, but should also have a higher prediction quality than other cloud models.
- “CLOUD_LOW_LATENCY_1” - A model best tailored to be used within
Google Cloud, and which cannot be exported. Expected to have a low latency, but may have lower prediction quality than other cloud models.
- “MOBILE_TF_LOW_LATENCY_1” - A model that, in addition to being
available within Google Cloud, can also be exported as TensorFlow or Core ML model and used on a mobile or edge device afterwards. Expected to have low latency, but may have lower prediction quality than other mobile models.
- “MOBILE_TF_VERSATILE_1” - A model that, in addition to being
available within Google Cloud, can also be exported as TensorFlow or Core ML model and used on a mobile or edge device with afterwards.
- “MOBILE_TF_HIGH_ACCURACY_1” - A model that, in addition to being
available within Google Cloud, can also be exported as TensorFlow or Core ML model and used on a mobile or edge device afterwards. Expected to have a higher latency, but should also have a higher prediction quality than other mobile models.
- base_model: Optional[models.Model] = None
Optional. Only permitted for Image Classification models. If it is specified, the new model will be trained based on the base model. Otherwise, the new model will be trained from scratch. The base model must be in the same Project and Location as the new Model to train, and have the same model_type.
- project (String):
Required. project to retrieve dataset from.
- location (String):
Optional location to retrieve dataset from.
- labels (JsonObject):
Optional. The labels with user-defined metadata to organize TrainingPipelines. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
- training_encryption_spec_key_name (Optional[String]):
Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the training pipeline. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created. If set, this TrainingPipeline will be secured by this key. Note: Model trained by this TrainingPipeline is also secured by this key ifmodel_to_upload
is not set separately. Overrides encryption_spec_key_name set in aiplatform.init.- model_encryption_spec_key_name (Optional[String]):
Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created. If set, the trained Model will be secured by this key. Overrides encryption_spec_key_name set in aiplatform.init.
- Returns:
- model: The trained Vertex AI Model resource or None if training did not
produce a Vertex AI Model.
- google_cloud_pipeline_components.aiplatform.AutoMLTabularTrainingJobRunOp(project: str, display_name: str, optimization_prediction_type: str, dataset: google.VertexDataset, target_column: str, location: str = 'us-central1', optimization_objective: str = None, column_specs: dict = None, column_transformations: list = None, optimization_objective_recall_value: float = None, optimization_objective_precision_value: float = None, labels: dict = '{}', training_encryption_spec_key_name: str = None, model_encryption_spec_key_name: str = None, training_fraction_split: float = None, test_fraction_split: float = None, validation_fraction_split: float = None, predefined_split_column_name: str = None, timestamp_split_column_name: str = None, weight_column: str = None, budget_milli_node_hours: int = None, model_display_name: str = None, model_labels: dict = None, disable_early_stopping: bool = False, export_evaluated_data_items: bool = False, export_evaluated_data_items_bigquery_destination_uri: str = None, export_evaluated_data_items_override_destination: bool = None)
automl_tabular_training_job Runs the training job and returns a model. If training on a Vertex AI dataset, you can use one of the following split configurations:
Data fraction splits: Any of
training_fraction_split
,validation_fraction_split
andtest_fraction_split
may optionally be provided, they must sum to up to 1. If the provided ones sum to less than 1, the remainder is assigned to sets as decided by Vertex AI. If none of the fractions are set, by default roughly 80% of data will be used for training, 10% for validation, and 10% for test. Predefined splits: Assigns input data to training, validation, and test sets based on the value of a provided key. If using predefined splits,predefined_split_column_name
must be provided. Supported only for tabular Datasets. Timestamp splits: Assigns input data to training, validation, and test sets based on a provided timestamps. The youngest data pieces are assigned to training set, next to validation set, and the oldest to the test set. Supported only for tabular Datasets.- Args:
- dataset (datasets.TabularDataset):
Required. The dataset within the same Project from which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline’s [training_task_definition] [google.cloud.aiplatform.v1beta1.TrainingPipeline.training_task_definition]. For tabular Datasets, all their data is exported to training, to pick and choose from.
- target_column (String):
Required. The name of the column values of which the Model is to predict.
- training_fraction_split (Float):
Optional. The fraction of the input data that is to be used to train the Model. This is ignored if Dataset is not provided.
- validation_fraction_split (Float):
Optional. The fraction of the input data that is to be used to validate the Model. This is ignored if Dataset is not provided.
- test_fraction_split (Float):
Optional. The fraction of the input data that is to be used to evaluate the Model. This is ignored if Dataset is not provided.
- predefined_split_column_name (String):
Optional. The key is a name of one of the Dataset’s data columns. The value of the key (either the label’s value or value in the column) must be one of {
training
,validation
,test
}, and it defines to which set the given piece of data is assigned. If for a piece of data the key is not present or has an invalid value, that piece is ignored by the pipeline. Supported only for tabular and time series Datasets.- timestamp_split_column_name (String):
Optional. The key is a name of one of the Dataset’s data columns. The value of the key values of the key (the values in the column) must be in RFC 3339 date-time format, where time-offset = “Z” (e.g. 1985-04-12T23:20:50.52Z). If for a piece of data the key is not present or has an invalid value, that piece is ignored by the pipeline. Supported only for tabular and time series Datasets. This parameter must be used with training_fraction_split, validation_fraction_split and test_fraction_split.
- weight_column (String):
Optional. Name of the column that should be used as the weight column. Higher values in this column give more importance to the row during Model training. The column must have numeric values between 0 and 10000 inclusively, and 0 value means that the row is ignored. If the weight column field is not set, then all rows are assumed to have equal weight of 1.
- budget_milli_node_hours (Integer):
Optional. The train budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. The training cost of the model will not exceed this budget. The final cost will be attempted to be close to the budget, though may end up being (even) noticeably smaller - at the backend’s discretion. This especially may happen when further model training ceases to provide any improvements. If the budget is set to a value known to be insufficient to train a Model for the given training set, the training won’t be attempted and will error. The minimum value is 1000 and the maximum is 72000.
- model_display_name (String):
Optional. If the script produces a managed Vertex AI Model. The display name of the Model. The name can be up to 128 characters long and can be consist of any UTF-8 characters. If not provided upon creation, the job’s display_name is used.
- model_labels (JsonObject):
Optional. The labels with user-defined metadata to organize your Models. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
- disable_early_stopping (Boolean):
Required. If true, the entire budget is used. This disables the early stopping feature. By default, the early stopping feature is enabled, which means that training might stop before the entire training budget has been used, if further training does no longer brings significant improvement to the model.
- export_evaluated_data_items (Boolean):
Whether to export the test set predictions to a BigQuery table. If False, then the export is not performed.
- export_evaluated_data_items_bigquery_destination_uri (String):
Optional. URI of desired destination BigQuery table for exported test set predictions. Expected format:
bq://<project_id>:<dataset_id>:<table>
If not specified, then results are exported to the following auto-created BigQuery table:<project_id>:export_evaluated_examples_<model_name>_<yyyy_MM_dd'T'HH_mm_ss_SSS'Z'>.evaluated_examples
Applies only if [export_evaluated_data_items] is True.- export_evaluated_data_items_override_destination (Boolean):
Whether to override the contents of [export_evaluated_data_items_bigquery_destination_uri], if the table exists, for exported test set predictions. If False, and the table exists, then the training job will fail. Applies only if [export_evaluated_data_items] is True and [export_evaluated_data_items_bigquery_destination_uri] is specified.
- display_name (String):
Required. The user-defined name of this TrainingPipeline.
- optimization_prediction_type (String):
The type of prediction the Model is to produce. “classification” - Predict one out of multiple target values is picked for each row. “regression” - Predict a value based on its relation to other values. This type is available only to columns that contain semantically numeric values, i.e. integers or floating point number, even if stored as e.g. strings.
- optimization_objective (String):
Optional. Objective function the Model is to be optimized towards. The training task creates a Model that maximizes/minimizes the value of the objective function over the validation set. The supported optimization objectives depend on the prediction type, and in the case of classification also the number of distinct values in the target column (two distint values -> binary, 3 or more distinct values -> multi class). If the field is not set, the default objective function is used. Classification (binary): “maximize-au-roc” (default) - Maximize the area under the receiver
operating characteristic (ROC) curve.
“minimize-log-loss” - Minimize log loss. “maximize-au-prc” - Maximize the area under the precision-recall curve. “maximize-precision-at-recall” - Maximize precision for a specified
recall value.
- “maximize-recall-at-precision” - Maximize recall for a specified
precision value.
Classification (multi class): “minimize-log-loss” (default) - Minimize log loss. Regression: “minimize-rmse” (default) - Minimize root-mean-squared error (RMSE). “minimize-mae” - Minimize mean-absolute error (MAE). “minimize-rmsle” - Minimize root-mean-squared log error (RMSLE).
- column_specs (JsonObject):
Optional. Alternative to column_transformations where the keys of the dict are column names and their respective values are one of AutoMLTabularTrainingJob.column_data_types. When creating transformation for BigQuery Struct column, the column should be flattened using “.” as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed.
- column_transformations (List[Dict[str, Dict[str, str]]]):
Optional. Transformations to apply to the input columns (i.e. columns other than the targetColumn). Each transformation may produce multiple result values from the column’s value, and all are used for training. When creating transformation for BigQuery Struct column, the column should be flattened using “.” as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed. Consider using column_specs as column_transformations will be deprecated eventually.
- optimization_objective_recall_value (Float):
Optional. Required when maximize-precision-at-recall optimizationObjective was picked, represents the recall value at which the optimization is done. The minimum value is 0 and the maximum is 1.0.
- optimization_objective_precision_value (Float):
Optional. Required when maximize-recall-at-precision optimizationObjective was picked, represents the precision value at which the optimization is done. The minimum value is 0 and the maximum is 1.0.
- project (String):
Required. project to retrieve dataset from.
- location (String):
Optional location to retrieve dataset from.
- labels (JsonObject):
Optional. The labels with user-defined metadata to organize TrainingPipelines. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
- training_encryption_spec_key_name (Optional[String]):
Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the training pipeline. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created. If set, this TrainingPipeline will be secured by this key. Note: Model trained by this TrainingPipeline is also secured by this key ifmodel_to_upload
is not set separately. Overrides encryption_spec_key_name set in aiplatform.init.- model_encryption_spec_key_name (Optional[String]):
Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created. If set, the trained Model will be secured by this key. Overrides encryption_spec_key_name set in aiplatform.init.
- Returns:
- model: The trained Vertex AI Model resource or None if training did not
produce a Vertex AI Model.
- google_cloud_pipeline_components.aiplatform.AutoMLTextTrainingJobRunOp(project: str, display_name: str, dataset: google.VertexDataset, location: str = 'us-central1', prediction_type: str = 'classification', multi_label: bool = False, labels: dict = '{}', training_encryption_spec_key_name: str = None, model_encryption_spec_key_name: str = None, training_fraction_split: float = None, validation_fraction_split: float = None, test_fraction_split: float = None, sentiment_max: int = 10, model_display_name: str = None, model_labels: dict = None)
automl_text_training_job Runs the training job and returns a model. If training on a Vertex AI dataset, you can use one of the following split configurations:
Data fraction splits: Any of
training_fraction_split
,validation_fraction_split
andtest_fraction_split
may optionally be provided, they must sum to up to 1. If the provided ones sum to less than 1, the remainder is assigned to sets as decided by Vertex AI. If none of the fractions are set, by default roughly 80% of data will be used for training, 10% for validation, and 10% for test. Data filter splits: Assigns input data to training, validation, and test sets based on the given filters, data pieces not matched by any filter are ignored. Currently only supported for Datasets containing DataItems. If any of the filters in this message are to match nothing, then they can be set as ‘-’ (the minus sign). Supported only for unstructured Datasets.- Args:
- dataset (datasets.TextDataset):
Required. The dataset within the same Project from which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline’s [training_task_definition] [google.cloud.aiplatform.v1beta1.TrainingPipeline.training_task_definition].
- training_fraction_split (Float):
Optional. The fraction of the input data that is to be used to train the Model. This is ignored if Dataset is not provided.
- validation_fraction_split (Float):
Optional. The fraction of the input data that is to be used to validate the Model. This is ignored if Dataset is not provided.
- test_fraction_split (Float):
Optional. The fraction of the input data that is to be used to evaluate the Model. This is ignored if Dataset is not provided.
- model_display_name (String):
Optional. The display name of the managed Vertex AI Model. The name can be up to 128 characters long and can consist of any UTF-8 characters. If not provided upon creation, the job’s display_name is used.
- model_labels (JsonObject):
Optional. The labels with user-defined metadata to organize your Models. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
- display_name (String):
Required. The user-defined name of this TrainingPipeline.
- prediction_type (String):
- The type of prediction the Model is to produce, one of:
- “classification” - A classification model analyzes text data and
returns a list of categories that apply to the text found in the data. Vertex AI offers both single-label and multi-label text classification models.
- “extraction” - An entity extraction model inspects text data
for known entities referenced in the data and labels those entities in the text.
- “sentiment” - A sentiment analysis model inspects text data and identifies the
prevailing emotional opinion within it, especially to determine a writer’s attitude as positive, negative, or neutral.
- multi_label (Boolean):
Required and only applicable for text classification task. If false, a single-label (multi-class) Model will be trained (i.e. assuming that for each text snippet just up to one annotation may be applicable). If true, a multi-label Model will be trained (i.e. assuming that for each text snippet multiple annotations may be applicable).
- sentiment_max (Integer):
Required and only applicable for sentiment task. A sentiment is expressed as an integer ordinal, where higher value means a more positive sentiment. The range of sentiments that will be used is between 0 and sentimentMax (inclusive on both ends), and all the values in the range must be represented in the dataset before a model can be created. Only the Annotations with this sentimentMax will be used for training. sentimentMax value must be between 1 and 10 (inclusive).
- project (String):
Required. project to retrieve dataset from.
- location (String):
Optional location to retrieve dataset from.
- labels (JsonObject):
Optional. The labels with user-defined metadata to organize TrainingPipelines. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
- training_encryption_spec_key_name (Optional[String]):
Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the training pipeline. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created. If set, this TrainingPipeline will be secured by this key. Note: Model trained by this TrainingPipeline is also secured by this key ifmodel_to_upload
is not set separately. Overrides encryption_spec_key_name set in aiplatform.init.- model_encryption_spec_key_name (Optional[String]):
Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created. If set, the trained Model will be secured by this key. Overrides encryption_spec_key_name set in aiplatform.init.
- Returns:
model: The trained Vertex AI Model resource.
- google_cloud_pipeline_components.aiplatform.AutoMLVideoTrainingJobRunOp(project: str, display_name: str, dataset: google.VertexDataset, location: str = 'us-central1', prediction_type: str = 'classification', model_type: str = 'CLOUD', labels: dict = '{}', training_encryption_spec_key_name: str = None, model_encryption_spec_key_name: str = None, training_fraction_split: float = None, test_fraction_split: float = None, model_display_name: str = None, model_labels: dict = None)
automl_video_training_job Runs the AutoML Video training job and returns a model. If training on a Vertex AI dataset, you can use one of the following split configurations:
Data fraction splits:
training_fraction_split
, andtest_fraction_split
may optionally be provided, they must sum to up to 1. If none of the fractions are set, by default roughly 80% of data will be used for training, and 20% for test. Data filter splits: Assigns input data to training, validation, and test sets based on the given filters, data pieces not matched by any filter are ignored. Currently only supported for Datasets containing DataItems. If any of the filters in this message are to match nothing, then they can be set as ‘-’ (the minus sign). Supported only for unstructured Datasets.- Args:
- dataset (datasets.VideoDataset):
Required. The dataset within the same Project from which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline’s [training_task_definition] [google.cloud.aiplatform.v1beta1.TrainingPipeline.training_task_definition]. For tabular Datasets, all their data is exported to training, to pick and choose from.
- training_fraction_split (Float):
Optional. The fraction of the input data that is to be used to train the Model. This is ignored if Dataset is not provided.
- test_fraction_split (Float):
Optional. The fraction of the input data that is to be used to evaluate the Model. This is ignored if Dataset is not provided.
- model_display_name (String):
Optional. The display name of the managed Vertex AI Model. The name can be up to 128 characters long and can be consist of any UTF-8 characters. If not provided upon creation, the job’s display_name is used.
- model_labels (JsonObject):
Optional. The labels with user-defined metadata to organize your Models. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
- display_name (String):
Required. The user-defined name of this TrainingPipeline.
- prediction_type (String):
- The type of prediction the Model is to produce, one of:
- “classification” - A video classification model classifies shots
and segments in your videos according to your own defined labels.
- “object_tracking” - A video object tracking model detects and tracks
multiple objects in shots and segments. You can use these models to track objects in your videos according to your own pre-defined, custom labels.
- “action_recognition” - A video action reconition model pinpoints
the location of actions with short temporal durations (~1 second).
- model_type: str = “CLOUD”
- Required. One of the following:
- “CLOUD” - available for “classification”, “object_tracking” and “action_recognition”
A Model best tailored to be used within Google Cloud, and which cannot be exported.
- “MOBILE_VERSATILE_1” - available for “classification”, “object_tracking” and “action_recognition”
A model that, in addition to being available within Google Cloud, can also be exported (see ModelService.ExportModel) as a TensorFlow or TensorFlow Lite model and used on a mobile or edge device with afterwards.
- “MOBILE_CORAL_VERSATILE_1” - available only for “object_tracking”
A versatile model that is meant to be exported (see ModelService.ExportModel) and used on a Google Coral device.
- “MOBILE_CORAL_LOW_LATENCY_1” - available only for “object_tracking”
A model that trades off quality for low latency, to be exported (see ModelService.ExportModel) and used on a Google Coral device.
- “MOBILE_JETSON_VERSATILE_1” - available only for “object_tracking”
A versatile model that is meant to be exported (see ModelService.ExportModel) and used on an NVIDIA Jetson device.
- “MOBILE_JETSON_LOW_LATENCY_1” - available only for “object_tracking”
A model that trades off quality for low latency, to be exported (see ModelService.ExportModel) and used on an NVIDIA Jetson device.
- project (String):
Required. project to retrieve dataset from.
- location (String):
Optional location to retrieve dataset from.
- labels (JsonObject):
Optional. The labels with user-defined metadata to organize TrainingPipelines. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
- training_encryption_spec_key_name (Optional[String]):
Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the training pipeline. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created. If set, this TrainingPipeline will be secured by this key. Note: Model trained by this TrainingPipeline is also secured by this key ifmodel_to_upload
is not set separately. Overrides encryption_spec_key_name set in aiplatform.init.- model_encryption_spec_key_name (Optional[String]):
Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created. If set, the trained Model will be secured by this key. Overrides encryption_spec_key_name set in aiplatform.init.
- Returns:
- model: The trained Vertex AI Model resource or None if training did not
produce a Vertex AI Model.
- google_cloud_pipeline_components.aiplatform.CustomContainerTrainingJobRunOp(display_name: str, container_uri: str, command: Sequence[str] = None, model_serving_container_image_uri: Optional[str] = None, model_serving_container_predict_route: Optional[str] = None, model_serving_container_health_route: Optional[str] = None, model_serving_container_command: Optional[Sequence[str]] = None, model_serving_container_args: Optional[Sequence[str]] = None, model_serving_container_environment_variables: Optional[Dict[str, str]] = None, model_serving_container_ports: Optional[Sequence[int]] = None, model_description: Optional[str] = None, model_instance_schema_uri: Optional[str] = None, model_parameters_schema_uri: Optional[str] = None, model_prediction_schema_uri: Optional[str] = None, project: Optional[str] = None, location: Optional[str] = None, labels: Optional[Dict[str, str]] = None, training_encryption_spec_key_name: Optional[str] = None, model_encryption_spec_key_name: Optional[str] = None, staging_bucket: Optional[str] = None, dataset: Optional[Union[google.cloud.aiplatform.datasets.image_dataset.ImageDataset, google.cloud.aiplatform.datasets.tabular_dataset.TabularDataset, google.cloud.aiplatform.datasets.text_dataset.TextDataset, google.cloud.aiplatform.datasets.video_dataset.VideoDataset]] = None, annotation_schema_uri: Optional[str] = None, model_display_name: Optional[str] = None, model_labels: Optional[Dict[str, str]] = None, base_output_dir: Optional[str] = None, service_account: Optional[str] = None, network: Optional[str] = None, bigquery_destination: Optional[str] = None, args: Optional[List[Union[float, int, str]]] = None, environment_variables: Optional[Dict[str, str]] = None, replica_count: int = 1, machine_type: str = 'n1-standard-4', accelerator_type: str = 'ACCELERATOR_TYPE_UNSPECIFIED', accelerator_count: int = 0, boot_disk_type: str = 'pd-ssd', boot_disk_size_gb: int = 100, reduction_server_replica_count: int = 0, reduction_server_machine_type: Optional[str] = None, reduction_server_container_uri: Optional[str] = None, training_fraction_split: Optional[float] = None, validation_fraction_split: Optional[float] = None, test_fraction_split: Optional[float] = None, training_filter_split: Optional[str] = None, validation_filter_split: Optional[str] = None, test_filter_split: Optional[str] = None, predefined_split_column_name: Optional[str] = None, timestamp_split_column_name: Optional[str] = None, enable_web_access: bool = False, tensorboard: Optional[str] = None) Optional[google.cloud.aiplatform.models.Model]
Runs the custom training job. Distributed Training Support: If replica count = 1 then one chief replica will be provisioned. If replica_count > 1 the remainder will be provisioned as a worker replica pool. ie: replica_count = 10 will result in 1 chief and 9 workers All replicas have same machine_type, accelerator_type, and accelerator_count
- If training on a Vertex AI dataset, you can use one of the following split configurations:
Data fraction splits: Any of
training_fraction_split
,validation_fraction_split
andtest_fraction_split
may optionally be provided, they must sum to up to 1. If the provided ones sum to less than 1, the remainder is assigned to sets as decided by Vertex AI. If none of the fractions are set, by default roughly 80% of data will be used for training, 10% for validation, and 10% for test.Data filter splits: Assigns input data to training, validation, and test sets based on the given filters, data pieces not matched by any filter are ignored. Currently only supported for Datasets containing DataItems. If any of the filters in this message are to match nothing, then they can be set as ‘-’ (the minus sign). If using filter splits, all of
training_filter_split
,validation_filter_split
andtest_filter_split
must be provided. Supported only for unstructured Datasets.Predefined splits: Assigns input data to training, validation, and test sets based on the value of a provided key. If using predefined splits,
predefined_split_column_name
must be provided. Supported only for tabular Datasets.Timestamp splits: Assigns input data to training, validation, and test sets based on a provided timestamps. The youngest data pieces are assigned to training set, next to validation set, and the oldest to the test set. Supported only for tabular Datasets.
- Args:
- dataset:
Vertex AI to fit this training against. Custom training script should retrieve datasets through passed in environment variables uris:
os.environ[“AIP_TRAINING_DATA_URI”] os.environ[“AIP_VALIDATION_DATA_URI”] os.environ[“AIP_TEST_DATA_URI”]
Additionally the dataset format is passed in as:
os.environ[“AIP_DATA_FORMAT”]
- annotation_schema_uri:
Google Cloud Storage URI points to a YAML file describing annotation schema. The schema is defined as an OpenAPI 3.0.2 [Schema Object](https://github.com/OAI/OpenAPI-Specification/blob/main/versions/3.0.2.md#schema-object) The schema files that can be used here are found in gs://google-cloud-aiplatform/schema/dataset/annotation/, note that the chosen schema must be consistent with
metadata
of the Dataset specified bydataset_id
.Only Annotations that both match this schema and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on.
When used in conjunction with
annotations_filter
, the Annotations used for training are filtered by bothannotations_filter
andannotation_schema_uri
.- model_display_name:
If the script produces a managed Vertex AI Model. The display name of the Model. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
If not provided upon creation, the job’s display_name is used.
- model_labels:
Optional. The labels with user-defined metadata to organize your Models. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
- base_output_dir:
GCS output directory of job. If not provided a timestamped directory in the staging directory will be used.
Vertex AI sets the following environment variables when it runs your training code:
AIP_MODEL_DIR: a Cloud Storage URI of a directory intended for saving model artifacts, i.e. <base_output_dir>/model/
AIP_CHECKPOINT_DIR: a Cloud Storage URI of a directory intended for saving checkpoints, i.e. <base_output_dir>/checkpoints/
AIP_TENSORBOARD_LOG_DIR: a Cloud Storage URI of a directory intended for saving TensorBoard logs, i.e. <base_output_dir>/logs/
- service_account:
Specifies the service account for workload run-as account. Users submitting jobs must have act-as permission on this run-as account.
- network:
The full name of the Compute Engine network to which the job should be peered. For example, projects/12345/global/networks/myVPC. Private services access must already be configured for the network. If left unspecified, the job is not peered with any network.
- bigquery_destination:
Provide this field if dataset is a BiqQuery dataset. The BigQuery project location where the training data is to be written to. In the given project a new dataset is created with name
dataset_<dataset-id>_<annotation-type>_<timestamp-of-training-call>
where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training input data will be written into that dataset. In the dataset three tables will be created,training
,validation
andtest
.AIP_DATA_FORMAT = “bigquery”.
AIP_TRAINING_DATA_URI =”bigquery_destination.dataset_*.training”
AIP_VALIDATION_DATA_URI = “bigquery_destination.dataset_*.validation”
AIP_TEST_DATA_URI = “bigquery_destination.dataset_*.test”
- args:
Command line arguments to be passed to the Python script.
- environment_variables:
Environment variables to be passed to the container. Should be a dictionary where keys are environment variable names and values are environment variable values for those names. At most 10 environment variables can be specified. The Name of the environment variable must be unique.
- environment_variables = {
‘MY_KEY’: ‘MY_VALUE’
}
- replica_count:
The number of worker replicas. If replica count = 1 then one chief replica will be provisioned. If replica_count > 1 the remainder will be provisioned as a worker replica pool.
- machine_type:
The type of machine to use for training.
- accelerator_type:
Hardware accelerator type. One of ACCELERATOR_TYPE_UNSPECIFIED, NVIDIA_TESLA_K80, NVIDIA_TESLA_P100, NVIDIA_TESLA_V100, NVIDIA_TESLA_P4, NVIDIA_TESLA_T4
- accelerator_count:
The number of accelerators to attach to a worker replica.
- boot_disk_type:
Type of the boot disk, default is pd-ssd. Valid values: pd-ssd (Persistent Disk Solid State Drive) or pd-standard (Persistent Disk Hard Disk Drive).
- boot_disk_size_gb:
Size in GB of the boot disk, default is 100GB. boot disk size must be within the range of [100, 64000].
- reduction_server_replica_count:
The number of reduction server replicas, default is 0.
- reduction_server_machine_type:
Optional. The type of machine to use for reduction server.
- reduction_server_container_uri:
Optional. The Uri of the reduction server container image. See details: https://cloud.google.com/vertex-ai/docs/training/distributed-training#reduce_training_time_with_reduction_server
- training_fraction_split:
Optional. The fraction of the input data that is to be used to train the Model. This is ignored if Dataset is not provided.
- validation_fraction_split:
Optional. The fraction of the input data that is to be used to validate the Model. This is ignored if Dataset is not provided.
- test_fraction_split:
Optional. The fraction of the input data that is to be used to evaluate the Model. This is ignored if Dataset is not provided.
- training_filter_split:
Optional. A filter on DataItems of the Dataset. DataItems that match this filter are used to train the Model. A filter with same syntax as the one used in DatasetService.ListDataItems may be used. If a single DataItem is matched by more than one of the FilterSplit filters, then it is assigned to the first set that applies to it in the training, validation, test order. This is ignored if Dataset is not provided.
- validation_filter_split:
Optional. A filter on DataItems of the Dataset. DataItems that match this filter are used to validate the Model. A filter with same syntax as the one used in DatasetService.ListDataItems may be used. If a single DataItem is matched by more than one of the FilterSplit filters, then it is assigned to the first set that applies to it in the training, validation, test order. This is ignored if Dataset is not provided.
- test_filter_split:
Optional. A filter on DataItems of the Dataset. DataItems that match this filter are used to test the Model. A filter with same syntax as the one used in DatasetService.ListDataItems may be used. If a single DataItem is matched by more than one of the FilterSplit filters, then it is assigned to the first set that applies to it in the training, validation, test order. This is ignored if Dataset is not provided.
- predefined_split_column_name:
Optional. The key is a name of one of the Dataset’s data columns. The value of the key (either the label’s value or value in the column) must be one of {
training
,validation
,test
}, and it defines to which set the given piece of data is assigned. If for a piece of data the key is not present or has an invalid value, that piece is ignored by the pipeline.Supported only for tabular and time series Datasets.
- timestamp_split_column_name:
Optional. The key is a name of one of the Dataset’s data columns. The value of the key values of the key (the values in the column) must be in RFC 3339 date-time format, where time-offset = “Z” (e.g. 1985-04-12T23:20:50.52Z). If for a piece of data the key is not present or has an invalid value, that piece is ignored by the pipeline.
Supported only for tabular and time series Datasets.
- enable_web_access:
Whether you want Vertex AI to enable interactive shell access to training containers. https://cloud.google.com/vertex-ai/docs/training/monitor-debug-interactive-shell
- tensorboard:
Optional. The name of a Vertex AI [Tensorboard][google.cloud.aiplatform.v1beta1.Tensorboard] resource to which this CustomJob will upload Tensorboard logs. Format:
projects/{project}/locations/{location}/tensorboards/{tensorboard}
The training script should write Tensorboard to following Vertex AI environment variable:
AIP_TENSORBOARD_LOG_DIR
service_account is required with provided tensorboard. For more information on configuring your service account please visit: https://cloud.google.com/vertex-ai/docs/experiments/tensorboard-training
- display_name:
Required. The user-defined name of this TrainingPipeline.
- container_uri:
Required: Uri of the training container image in the GCR.
- command:
The command to be invoked when the container is started. It overrides the entrypoint instruction in Dockerfile when provided
- model_serving_container_image_uri:
If the training produces a managed Vertex AI Model, the URI of the Model serving container suitable for serving the model produced by the training script.
- model_serving_container_predict_route:
If the training produces a managed Vertex AI Model, An HTTP path to send prediction requests to the container, and which must be supported by it. If not specified a default HTTP path will be used by Vertex AI.
- model_serving_container_health_route:
If the training produces a managed Vertex AI Model, an HTTP path to send health check requests to the container, and which must be supported by it. If not specified a standard HTTP path will be used by AI Platform.
- model_serving_container_command:
The command with which the container is run. Not executed within a shell. The Docker image’s ENTRYPOINT is used if this is not provided. Variable references $(VAR_NAME) are expanded using the container’s environment. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not.
- model_serving_container_args:
The arguments to the command. The Docker image’s CMD is used if this is not provided. Variable references $(VAR_NAME) are expanded using the container’s environment. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not.
- model_serving_container_environment_variables:
The environment variables that are to be present in the container. Should be a dictionary where keys are environment variable names and values are environment variable values for those names.
- model_serving_container_ports:
Declaration of ports that are exposed by the container. This field is primarily informational, it gives Vertex AI information about the network connections the container uses. Listing or not a port here has no impact on whether the port is actually exposed, any port listening on the default “0.0.0.0” address inside a container will be accessible from the network.
- model_description:
The description of the Model.
- model_instance_schema_uri:
Optional. Points to a YAML file stored on Google Cloud Storage describing the format of a single instance, which are used in
PredictRequest.instances
,ExplainRequest.instances
andBatchPredictionJob.input_config
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.- model_parameters_schema_uri:
Optional. Points to a YAML file stored on Google Cloud Storage describing the parameters of prediction and explanation via
PredictRequest.parameters
,ExplainRequest.parameters
andBatchPredictionJob.model_parameters
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform, if no parameters are supported it is set to an empty string. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.- model_prediction_schema_uri:
Optional. Points to a YAML file stored on Google Cloud Storage describing the format of a single prediction produced by this Model, which are returned via
PredictResponse.predictions
,ExplainResponse.explanations
, andBatchPredictionJob.output_config
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.- project:
Project to run training in. Overrides project set in aiplatform.init.
- location:
Location to run training in. Overrides location set in aiplatform.init.
- labels:
Optional. The labels with user-defined metadata to organize TrainingPipelines. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
- training_encryption_spec_key_name:
Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the training pipeline. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, this TrainingPipeline will be secured by this key.
Note: Model trained by this TrainingPipeline is also secured by this key if
model_to_upload
is not set separately.Overrides encryption_spec_key_name set in aiplatform.init.
- model_encryption_spec_key_name:
Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, the trained Model will be secured by this key.
Overrides encryption_spec_key_name set in aiplatform.init.
- staging_bucket:
Bucket used to stage source and training artifacts. Overrides staging_bucket set in aiplatform.init.
- Returns:
The trained Vertex AI Model resource or None if training did not produce a Vertex AI Model.
- Raises:
- RuntimeError:
If Training job has already been run, staging_bucket has not been set, or model_display_name was provided but required arguments were not provided in constructor.
- google_cloud_pipeline_components.aiplatform.CustomPythonPackageTrainingJobRunOp(display_name: str, python_package_gcs_uri: str, python_module_name: str, container_uri: str, model_serving_container_image_uri: Optional[str] = None, model_serving_container_predict_route: Optional[str] = None, model_serving_container_health_route: Optional[str] = None, model_serving_container_command: Optional[Sequence[str]] = None, model_serving_container_args: Optional[Sequence[str]] = None, model_serving_container_environment_variables: Optional[Dict[str, str]] = None, model_serving_container_ports: Optional[Sequence[int]] = None, model_description: Optional[str] = None, model_instance_schema_uri: Optional[str] = None, model_parameters_schema_uri: Optional[str] = None, model_prediction_schema_uri: Optional[str] = None, project: Optional[str] = None, location: Optional[str] = None, labels: Optional[Dict[str, str]] = None, training_encryption_spec_key_name: Optional[str] = None, model_encryption_spec_key_name: Optional[str] = None, staging_bucket: Optional[str] = None, dataset: Optional[Union[google.cloud.aiplatform.datasets.image_dataset.ImageDataset, google.cloud.aiplatform.datasets.tabular_dataset.TabularDataset, google.cloud.aiplatform.datasets.text_dataset.TextDataset, google.cloud.aiplatform.datasets.video_dataset.VideoDataset]] = None, annotation_schema_uri: Optional[str] = None, model_display_name: Optional[str] = None, model_labels: Optional[Dict[str, str]] = None, base_output_dir: Optional[str] = None, service_account: Optional[str] = None, network: Optional[str] = None, bigquery_destination: Optional[str] = None, args: Optional[List[Union[float, int, str]]] = None, environment_variables: Optional[Dict[str, str]] = None, replica_count: int = 1, machine_type: str = 'n1-standard-4', accelerator_type: str = 'ACCELERATOR_TYPE_UNSPECIFIED', accelerator_count: int = 0, boot_disk_type: str = 'pd-ssd', boot_disk_size_gb: int = 100, reduction_server_replica_count: int = 0, reduction_server_machine_type: Optional[str] = None, reduction_server_container_uri: Optional[str] = None, training_fraction_split: Optional[float] = None, validation_fraction_split: Optional[float] = None, test_fraction_split: Optional[float] = None, training_filter_split: Optional[str] = None, validation_filter_split: Optional[str] = None, test_filter_split: Optional[str] = None, predefined_split_column_name: Optional[str] = None, timestamp_split_column_name: Optional[str] = None, enable_web_access: bool = False, tensorboard: Optional[str] = None) Optional[google.cloud.aiplatform.models.Model]
Runs the custom training job. Distributed Training Support: If replica count = 1 then one chief replica will be provisioned. If replica_count > 1 the remainder will be provisioned as a worker replica pool. ie: replica_count = 10 will result in 1 chief and 9 workers All replicas have same machine_type, accelerator_type, and accelerator_count
- If training on a Vertex AI dataset, you can use one of the following split configurations:
Data fraction splits: Any of
training_fraction_split
,validation_fraction_split
andtest_fraction_split
may optionally be provided, they must sum to up to 1. If the provided ones sum to less than 1, the remainder is assigned to sets as decided by Vertex AI. If none of the fractions are set, by default roughly 80% of data will be used for training, 10% for validation, and 10% for test.Data filter splits: Assigns input data to training, validation, and test sets based on the given filters, data pieces not matched by any filter are ignored. Currently only supported for Datasets containing DataItems. If any of the filters in this message are to match nothing, then they can be set as ‘-’ (the minus sign). If using filter splits, all of
training_filter_split
,validation_filter_split
andtest_filter_split
must be provided. Supported only for unstructured Datasets.Predefined splits: Assigns input data to training, validation, and test sets based on the value of a provided key. If using predefined splits,
predefined_split_column_name
must be provided. Supported only for tabular Datasets.Timestamp splits: Assigns input data to training, validation, and test sets based on a provided timestamps. The youngest data pieces are assigned to training set, next to validation set, and the oldest to the test set. Supported only for tabular Datasets.
- Args:
- dataset:
Vertex AI to fit this training against. Custom training script should retrieve datasets through passed in environment variables uris:
os.environ[“AIP_TRAINING_DATA_URI”] os.environ[“AIP_VALIDATION_DATA_URI”] os.environ[“AIP_TEST_DATA_URI”]
Additionally the dataset format is passed in as:
os.environ[“AIP_DATA_FORMAT”]
- annotation_schema_uri:
Google Cloud Storage URI points to a YAML file describing annotation schema. The schema is defined as an OpenAPI 3.0.2 [Schema Object](https://github.com/OAI/OpenAPI-Specification/blob/main/versions/3.0.2.md#schema-object) The schema files that can be used here are found in gs://google-cloud-aiplatform/schema/dataset/annotation/, note that the chosen schema must be consistent with
metadata
of the Dataset specified bydataset_id
.Only Annotations that both match this schema and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on.
When used in conjunction with
annotations_filter
, the Annotations used for training are filtered by bothannotations_filter
andannotation_schema_uri
.- model_display_name:
If the script produces a managed Vertex AI Model. The display name of the Model. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
If not provided upon creation, the job’s display_name is used.
- model_labels:
Optional. The labels with user-defined metadata to organize your Models. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
- base_output_dir:
GCS output directory of job. If not provided a timestamped directory in the staging directory will be used.
Vertex AI sets the following environment variables when it runs your training code:
AIP_MODEL_DIR: a Cloud Storage URI of a directory intended for saving model artifacts, i.e. <base_output_dir>/model/
AIP_CHECKPOINT_DIR: a Cloud Storage URI of a directory intended for saving checkpoints, i.e. <base_output_dir>/checkpoints/
AIP_TENSORBOARD_LOG_DIR: a Cloud Storage URI of a directory intended for saving TensorBoard logs, i.e. <base_output_dir>/logs/
- service_account:
Specifies the service account for workload run-as account. Users submitting jobs must have act-as permission on this run-as account.
- network:
The full name of the Compute Engine network to which the job should be peered. For example, projects/12345/global/networks/myVPC. Private services access must already be configured for the network. If left unspecified, the job is not peered with any network.
- bigquery_destination:
Provide this field if dataset is a BiqQuery dataset. The BigQuery project location where the training data is to be written to. In the given project a new dataset is created with name
dataset_<dataset-id>_<annotation-type>_<timestamp-of-training-call>
where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training input data will be written into that dataset. In the dataset three tables will be created,training
,validation
andtest
.AIP_DATA_FORMAT = “bigquery”.
AIP_TRAINING_DATA_URI =”bigquery_destination.dataset_*.training”
AIP_VALIDATION_DATA_URI = “bigquery_destination.dataset_*.validation”
AIP_TEST_DATA_URI = “bigquery_destination.dataset_*.test”
- args:
Command line arguments to be passed to the Python script.
- environment_variables:
Environment variables to be passed to the container. Should be a dictionary where keys are environment variable names and values are environment variable values for those names. At most 10 environment variables can be specified. The Name of the environment variable must be unique.
- environment_variables = {
‘MY_KEY’: ‘MY_VALUE’
}
- replica_count:
The number of worker replicas. If replica count = 1 then one chief replica will be provisioned. If replica_count > 1 the remainder will be provisioned as a worker replica pool.
- machine_type:
The type of machine to use for training.
- accelerator_type:
Hardware accelerator type. One of ACCELERATOR_TYPE_UNSPECIFIED, NVIDIA_TESLA_K80, NVIDIA_TESLA_P100, NVIDIA_TESLA_V100, NVIDIA_TESLA_P4, NVIDIA_TESLA_T4
- accelerator_count:
The number of accelerators to attach to a worker replica.
- boot_disk_type:
Type of the boot disk, default is pd-ssd. Valid values: pd-ssd (Persistent Disk Solid State Drive) or pd-standard (Persistent Disk Hard Disk Drive).
- boot_disk_size_gb:
Size in GB of the boot disk, default is 100GB. boot disk size must be within the range of [100, 64000].
- reduction_server_replica_count:
The number of reduction server replicas, default is 0.
- reduction_server_machine_type:
Optional. The type of machine to use for reduction server.
- reduction_server_container_uri:
Optional. The Uri of the reduction server container image. See details: https://cloud.google.com/vertex-ai/docs/training/distributed-training#reduce_training_time_with_reduction_server
- training_fraction_split:
Optional. The fraction of the input data that is to be used to train the Model. This is ignored if Dataset is not provided.
- validation_fraction_split:
Optional. The fraction of the input data that is to be used to validate the Model. This is ignored if Dataset is not provided.
- test_fraction_split:
Optional. The fraction of the input data that is to be used to evaluate the Model. This is ignored if Dataset is not provided.
- training_filter_split:
Optional. A filter on DataItems of the Dataset. DataItems that match this filter are used to train the Model. A filter with same syntax as the one used in DatasetService.ListDataItems may be used. If a single DataItem is matched by more than one of the FilterSplit filters, then it is assigned to the first set that applies to it in the training, validation, test order. This is ignored if Dataset is not provided.
- validation_filter_split:
Optional. A filter on DataItems of the Dataset. DataItems that match this filter are used to validate the Model. A filter with same syntax as the one used in DatasetService.ListDataItems may be used. If a single DataItem is matched by more than one of the FilterSplit filters, then it is assigned to the first set that applies to it in the training, validation, test order. This is ignored if Dataset is not provided.
- test_filter_split:
Optional. A filter on DataItems of the Dataset. DataItems that match this filter are used to test the Model. A filter with same syntax as the one used in DatasetService.ListDataItems may be used. If a single DataItem is matched by more than one of the FilterSplit filters, then it is assigned to the first set that applies to it in the training, validation, test order. This is ignored if Dataset is not provided.
- predefined_split_column_name:
Optional. The key is a name of one of the Dataset’s data columns. The value of the key (either the label’s value or value in the column) must be one of {
training
,validation
,test
}, and it defines to which set the given piece of data is assigned. If for a piece of data the key is not present or has an invalid value, that piece is ignored by the pipeline.Supported only for tabular and time series Datasets.
- timestamp_split_column_name:
Optional. The key is a name of one of the Dataset’s data columns. The value of the key values of the key (the values in the column) must be in RFC 3339 date-time format, where time-offset = “Z” (e.g. 1985-04-12T23:20:50.52Z). If for a piece of data the key is not present or has an invalid value, that piece is ignored by the pipeline.
Supported only for tabular and time series Datasets.
- enable_web_access:
Whether you want Vertex AI to enable interactive shell access to training containers. https://cloud.google.com/vertex-ai/docs/training/monitor-debug-interactive-shell
- tensorboard:
Optional. The name of a Vertex AI [Tensorboard][google.cloud.aiplatform.v1beta1.Tensorboard] resource to which this CustomJob will upload Tensorboard logs. Format:
projects/{project}/locations/{location}/tensorboards/{tensorboard}
The training script should write Tensorboard to following Vertex AI environment variable:
AIP_TENSORBOARD_LOG_DIR
service_account is required with provided tensorboard. For more information on configuring your service account please visit: https://cloud.google.com/vertex-ai/docs/experiments/tensorboard-training
- display_name:
Required. The user-defined name of this TrainingPipeline.
- python_package_gcs_uri:
Required: GCS location of the training python package.
- python_module_name:
Required: The module name of the training python package.
- container_uri:
Required: Uri of the training container image in the GCR.
- model_serving_container_image_uri:
If the training produces a managed Vertex AI Model, the URI of the Model serving container suitable for serving the model produced by the training script.
- model_serving_container_predict_route:
If the training produces a managed Vertex AI Model, An HTTP path to send prediction requests to the container, and which must be supported by it. If not specified a default HTTP path will be used by Vertex AI.
- model_serving_container_health_route:
If the training produces a managed Vertex AI Model, an HTTP path to send health check requests to the container, and which must be supported by it. If not specified a standard HTTP path will be used by AI Platform.
- model_serving_container_command:
The command with which the container is run. Not executed within a shell. The Docker image’s ENTRYPOINT is used if this is not provided. Variable references $(VAR_NAME) are expanded using the container’s environment. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not.
- model_serving_container_args:
The arguments to the command. The Docker image’s CMD is used if this is not provided. Variable references $(VAR_NAME) are expanded using the container’s environment. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not.
- model_serving_container_environment_variables:
The environment variables that are to be present in the container. Should be a dictionary where keys are environment variable names and values are environment variable values for those names.
- model_serving_container_ports:
Declaration of ports that are exposed by the container. This field is primarily informational, it gives Vertex AI information about the network connections the container uses. Listing or not a port here has no impact on whether the port is actually exposed, any port listening on the default “0.0.0.0” address inside a container will be accessible from the network.
- model_description:
The description of the Model.
- model_instance_schema_uri:
Optional. Points to a YAML file stored on Google Cloud Storage describing the format of a single instance, which are used in
PredictRequest.instances
,ExplainRequest.instances
andBatchPredictionJob.input_config
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.- model_parameters_schema_uri:
Optional. Points to a YAML file stored on Google Cloud Storage describing the parameters of prediction and explanation via
PredictRequest.parameters
,ExplainRequest.parameters
andBatchPredictionJob.model_parameters
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform, if no parameters are supported it is set to an empty string. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.- model_prediction_schema_uri:
Optional. Points to a YAML file stored on Google Cloud Storage describing the format of a single prediction produced by this Model, which are returned via
PredictResponse.predictions
,ExplainResponse.explanations
, andBatchPredictionJob.output_config
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.- project:
Project to run training in. Overrides project set in aiplatform.init.
- location:
Location to run training in. Overrides location set in aiplatform.init.
- labels:
Optional. The labels with user-defined metadata to organize TrainingPipelines. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
- training_encryption_spec_key_name:
Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the training pipeline. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, this TrainingPipeline will be secured by this key.
Note: Model trained by this TrainingPipeline is also secured by this key if
model_to_upload
is not set separately.Overrides encryption_spec_key_name set in aiplatform.init.
- model_encryption_spec_key_name:
Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, the trained Model will be secured by this key.
Overrides encryption_spec_key_name set in aiplatform.init.
- staging_bucket:
Bucket used to stage source and training artifacts. Overrides staging_bucket set in aiplatform.init.
- Returns:
The trained Vertex AI Model resource or None if training did not produce a Vertex AI Model.
- google_cloud_pipeline_components.aiplatform.EndpointCreateOp(project: str, display_name: str, location: str = 'us-central1', description: str = '', labels: dict = '{}', encryption_spec_key_name: str = '', network: str = '')
endpoint_create Creates a Google Cloud Vertex Endpoint and waits for it to be ready. For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.endpoints/create.
- Args:
- project (str):
Required. Project to create the endpoint.
- location (Optional[str]):
Location to create the endpoint. If not set, default to us-central1.
- display_name (str):
Required. The user-defined name of the Endpoint. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
- description (Optional[str]):
The description of the Endpoint.
- labels (Optional[dict]):
The labels with user-defined metadata to organize your Endpoints.
Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed.
See https://goo.gl/xmQnxf for more information and examples of labels.
- encryption_spec_key_name (Optional[str]):
Customer-managed encryption key spec for an Endpoint. If set, this Endpoint and all of this Endoint’s sub-resources will be secured by this key. Has the form:
projects/my-project/locations/my-location/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.If set, this Endpoint and all sub-resources of this Endpoint will be secured by this key.
- network (Optional[str]):
The full name of the Google Compute Engine network to which the Endpoint should be peered. Private services access must already be configured for the network. If left unspecified, the Endpoint is not peered with any network.
[Format](https://cloud.google.com/compute/docs/reference/rest/v1/networks/insert): projects/{project}/global/networks/{network}. Where {project} is a project number, as in ‘12345’, and {network} is network name.
- Returns:
- endpoint (google.VertexEndpoint):
Artifact tracking the created endpoint.
- gcp_resources (str):
Serialized gcp_resources proto tracking the create endpoint’s long running operation.
For more details, see https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md.
- google_cloud_pipeline_components.aiplatform.EndpointDeleteOp(endpoint: google.VertexEndpoint)
endpoint_delete Deletes a Google Cloud Vertex Endpoint. For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.endpoints/delete.
- Args:
- endpoint (google.VertexEndpoint):
Required. The endpoint to be deleted.
- Returns:
- gcp_resources (str):
Serialized gcp_resources proto tracking the delete endpoint’s long running operation.
For more details, see https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md.
- google_cloud_pipeline_components.aiplatform.ImageDatasetCreateOp(project: str, display_name: str, location: str = 'us-central1', data_item_labels: dict = '{}', gcs_source: str = None, import_schema_uri: str = None, labels: dict = '{}', encryption_spec_key_name: str = None)
image_dataset_create Creates a new image dataset and optionally imports data into dataset when source and import_schema_uri are passed. Args:
- display_name (String):
Required. The user-defined name of the Dataset. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
- gcs_source (Union[str, Sequence[str]]):
Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. examples:
str: “gs://bucket/file.csv” Sequence[str]: [“gs://bucket/file1.csv”, “gs://bucket/file2.csv”]
- import_schema_uri (String):
Points to a YAML file stored on Google Cloud Storage describing the import format. Validation will be done against the schema. The schema is defined as an OpenAPI 3.0.2 Schema Object <https://tinyurl.com/y538mdwt>.
- data_item_labels (JsonObject):
Labels that will be applied to newly imported DataItems. If an identical DataItem as one being imported already exists in the Dataset, then these labels will be appended to these of the already existing one, and if labels with identical key is imported before, the old label value will be overwritten. If two DataItems are identical in the same import data operation, the labels will be combined and if key collision happens in this case, one of the values will be picked randomly. Two DataItems are considered identical if their content bytes are identical (e.g. image bytes or pdf bytes). These labels will be overridden by Annotation labels specified inside index file refenced by
import_schema_uri
, e.g. jsonl file.- project (String):
Required. project to retrieve dataset from.
- location (String):
Optional location to retrieve dataset from.
- labels (JsonObject):
Optional. Labels with user-defined metadata to organize your Tensorboards. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. No more than 64 user labels can be associated with one Tensorboard (System labels are excluded). See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with “aiplatform.googleapis.com/” and are immutable.
- encryption_spec_key_name (Optional[String]):
Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the dataset. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created. If set, this Dataset and all sub-resources of this Dataset will be secured by this key. Overrides encryption_spec_key_name set in aiplatform.init.
- Returns:
- dataset (google.VertexDataset):
Instantiated representation of the managed image dataset resource.
- google_cloud_pipeline_components.aiplatform.ImageDatasetExportDataOp(project: str, dataset: google.VertexDataset, output_dir: str, location: str = 'us-central1')
image_dataset_export Exports data to output dir to GCS. Args:
- output_dir (String):
Required. The Google Cloud Storage location where the output is to be written to. In the given directory a new directory will be created with name:
export-data-<dataset-display-name>-<timestamp-of-export-call>
where timestamp is in YYYYMMDDHHMMSS format. All export output will be written into that directory. Inside that directory, annotations with the same schema will be grouped into sub directories which are named with the corresponding annotations’ schema title. Inside these sub directories, a schema.yaml will be created to describe the output format. If the uri doesn’t end with ‘/’, a ‘/’ will be automatically appended. The directory is created if it doesn’t exist.- project (String):
Required. project to retrieve dataset from.
- location (String):
Optional location to retrieve dataset from.
- Returns:
- exported_files (Sequence[str]):
All of the files that are exported in this export operation.
- google_cloud_pipeline_components.aiplatform.ImageDatasetImportDataOp(project: str, dataset: google.VertexDataset, location: str = 'us-central1', data_item_labels: dict = '{}', gcs_source: str = None, import_schema_uri: str = None)
image_dataset_import Upload data to existing managed dataset. Args:
- project (String):
Required. project to retrieve dataset from.
- location (String):
Optional location to retrieve dataset from.
- dataset (Dataset):
Required. The dataset to be updated.
- gcs_source (Union[str, Sequence[str]]):
Required. Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. examples:
str: “gs://bucket/file.csv” Sequence[str]: [“gs://bucket/file1.csv”, “gs://bucket/file2.csv”]
- import_schema_uri (String):
Required. Points to a YAML file stored on Google Cloud Storage describing the import format. Validation will be done against the schema. The schema is defined as an OpenAPI 3.0.2 Schema Object.
- data_item_labels (JsonObject):
Labels that will be applied to newly imported DataItems. If an identical DataItem as one being imported already exists in the Dataset, then these labels will be appended to these of the already existing one, and if labels with identical key is imported before, the old label value will be overwritten. If two DataItems are identical in the same import data operation, the labels will be combined and if key collision happens in this case, one of the values will be picked randomly. Two DataItems are considered identical if their content bytes are identical (e.g. image bytes or pdf bytes). These labels will be overridden by Annotation labels specified inside index file refenced by
import_schema_uri
, e.g. jsonl file.
- Returns:
- dataset (Dataset):
Instantiated representation of the managed dataset resource.
- google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp(project: str, job_display_name: str, location: str = 'us-central1', model: google.VertexModel = None, unmanaged_container_model: google.UnmanagedContainerModel = None, instances_format: str = 'jsonl', gcs_source_uris: list = '[]', bigquery_source_input_uri: str = None, model_parameters: dict = '{}', predictions_format: str = 'jsonl', gcs_destination_output_uri_prefix: str = '', bigquery_destination_output_uri: str = '', machine_type: str = '', accelerator_type: str = '', accelerator_count: int = 0, starting_replica_count: int = 0, max_replica_count: int = 0, manual_batch_tuning_parameters_batch_size: int = 0, generate_explanation: bool = False, explanation_metadata: dict = '{}', explanation_parameters: dict = '{}', labels: dict = '{}', encryption_spec_key_name: str = '')
model_batch_predict Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs/create.
- Args:
- project (str):
Required. Project to create the BatchPredictionJob.
- location (Optional[str]):
Location for creating the BatchPredictionJob. If not set, default to us-central1.
- job_display_name (str):
Required. The user-defined name of this BatchPredictionJob.
- model (Optional[google.VertexModel]):
The Model used to get predictions via this job. Must share the same ancestor Location. Starting this job has no impact on any existing deployments of the Model and their resources. Either this or unmanaged_container_model must be specified.
- unmanaged_container_model (Optional[google.UnmanagedContainerModel]):
The unmanaged container model used to get predictions via this job. This should be used for models that are not uploaded to Vertex. Either this or model must be specified.
- gcs_source_uris (Optional[Sequence[str]]):
Google Cloud Storage URI(-s) to your instances to run batch prediction on. They must match instances_format. May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames.
For more details about this input config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#InputConfig.
- bigquery_source_input_uri (Optional[str]):
BigQuery URI to a table, up to 2000 characters long. For example: projectId.bqDatasetId.bqTableId
For more details about this input config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#InputConfig.
- instances_format (Optional[str]):
The format in which instances are given, must be one of the Model’s supportedInputStorageFormats. If not set, default to “jsonl”.
For more details about this input config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#InputConfig.
- gcs_destination_output_uri_prefix (Optional[str]):
The Google Cloud Storage location of the directory where the output is to be written to. In the given directory a new directory is created. Its name is
prediction-<model-display-name>-<job-create-time>
, where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format. Inside of it filespredictions_0001.<extension>
,predictions_0002.<extension>
, …,predictions_N.<extension>
are created where<extension>
depends on chosenpredictions_format
, and N may equal 0001 and depends on the total number of successfully predicted instances. If the Model has bothinstance
andprediction
schemata defined then each such file contains predictions as per thepredictions_format
. If prediction for any instance failed (partially or completely), then an additionalerrors_0001.<extension>
,errors_0002.<extension>
,…,errors_N.<extension>
files are created (N depends on total number of failed predictions). These files contain the failed instances, as per their schema, followed by an additionalerror
field which as value hasgoogle.rpc.Status
containing onlycode
andmessage
fields.For more details about this output config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#OutputConfig.
- bigquery_destination_output_uri (Optional[str]):
The BigQuery project location where the output is to be written to. In the given project a new dataset is created with name
prediction_<model-display-name>_<job-create-time>
where is made BigQuery-dataset-name compatible (for example, most special characters become underscores), and timestamp is in YYYY_MM_DDThh_mm_ss_sssZ “based on ISO-8601” format. In the dataset two tables will be created,predictions
, anderrors
. If the Model has bothinstance
andprediction
schemata defined then the tables have columns as follows: Thepredictions
table contains instances for which the prediction succeeded, it has columns as per a concatenation of the Model’s instance and prediction schemata. Theerrors
table contains rows for which the prediction has failed, it has instance columns, as per the instance schema, followed by a single “errors” column, which as values has`google.rpc.Status
<Status>`__ represented as a STRUCT, and containing onlycode
andmessage
.For more details about this output config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#OutputConfig.
- predictions_format (Optional[str]):
The format in which Vertex AI gives the predictions. Must be one of the Model’s supportedOutputStorageFormats. If not set, default to “jsonl”.
For more details about this output config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#OutputConfig.
- model_parameters (Optional[dict]):
The parameters that govern the predictions. The schema of the parameters may be specified via the Model’s parameters_schema_uri.
- machine_type (Optional[str]):
The type of machine for running batch prediction on dedicated resources. If the Model supports DEDICATED_RESOURCES this config may be provided (and the job will use these resources). If the Model doesn’t support AUTOMATIC_RESOURCES, this config must be provided.
For more details about the BatchDedicatedResources, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#BatchDedicatedResources.
For more details about the machine spec, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/MachineSpec
- accelerator_type (Optional[str]):
The type of accelerator(s) that may be attached to the machine as per accelerator_count. Only used if machine_type is set.
For more details about the machine spec, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/MachineSpec
- accelerator_count (Optional[int]):
The number of accelerators to attach to the machine_type. Only used if machine_type is set.
For more details about the machine spec, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/MachineSpec
- starting_replica_count (Optional[int]):
The number of machine replicas used at the start of the batch operation. If not set, Vertex AI decides starting number, not greater than max_replica_count. Only used if machine_type is set.
- max_replica_count (Optional[int]):
The maximum number of machine replicas the batch operation may be scaled to. Only used if machine_type is set. Default is 10.
- manual_batch_tuning_parameters_batch_size (Optional[int]):
The number of the records (e.g. instances) of the operation given in each batch to a machine replica. Machine type, and size of a single record should be considered when setting this parameter, higher value speeds up the batch operation’s execution, but too high value will result in a whole batch not fitting in a machine’s memory, and the whole operation will fail. The default value is 4.
- generate_explanation (Optional[bool]):
Generate explanation along with the batch prediction results. This will cause the batch prediction output to include explanations based on the prediction_format:
- bigquery: output includes a column named explanation. The value
is a struct that conforms to the [aiplatform.gapic.Explanation] object.
- jsonl: The JSON objects on each line include an additional entry
keyed explanation. The value of the entry is a JSON object that conforms to the [aiplatform.gapic.Explanation] object.
csv: Generating explanations for CSV format is not supported.
If this field is set to true, either the Model.explanation_spec or explanation_metadata and explanation_parameters must be populated.
- explanation_metadata (Optional[dict]):
Explanation metadata configuration for this BatchPredictionJob. Can be specified only if generate_explanation is set to True.
This value overrides the value of Model.explanation_metadata. All fields of explanation_metadata are optional in the request. If a field of the explanation_metadata object is not populated, the corresponding field of the Model.explanation_metadata object is inherited.
For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/ExplanationSpec#explanationmetadata.
- explanation_parameters (Optional[dict]):
Parameters to configure explaining for Model’s predictions. Can be specified only if generate_explanation is set to True.
This value overrides the value of Model.explanation_parameters. All fields of explanation_parameters are optional in the request. If a field of the explanation_parameters object is not populated, the corresponding field of the Model.explanation_parameters object is inherited.
For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/ExplanationSpec#ExplanationParameters.
- labels (Optional[dict]):
The labels with user-defined metadata to organize your BatchPredictionJobs.
Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed.
See https://goo.gl/xmQnxf for more information and examples of labels.
- encryption_spec_key_name (Optional[str]):
Customer-managed encryption key options for a BatchPredictionJob. If this is set, then all resources created by the BatchPredictionJob will be encrypted with the provided encryption key.
Has the form:
projects/my-project/locations/my-location/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.
- Returns:
- batchpredictionjob (google.VertexBatchPredictionJob):
Artifact representation of the created batch prediction job.
- gcp_resources (str):
Serialized gcp_resources proto tracking the batch prediction job.
For more details, see https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md.
- google_cloud_pipeline_components.aiplatform.ModelDeleteOp(model: google.VertexModel)
model_delete Deletes a Google Cloud Vertex Model. For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.models/delete.
- Args:
- model (google.VertexModel):
Required. The model to be deleted.
- Returns:
- gcp_resources (str):
Serialized gcp_resources proto tracking the delete model’s long running operation.
For more details, see https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md.
- google_cloud_pipeline_components.aiplatform.ModelDeployOp(model: google.VertexModel, endpoint: google.VertexEndpoint = None, deployed_model_display_name: str = '', traffic_split: dict = '{}', dedicated_resources_machine_type: str = '', dedicated_resources_min_replica_count: int = 0, dedicated_resources_max_replica_count: int = 0, dedicated_resources_accelerator_type: str = '', dedicated_resources_accelerator_count: int = 0, automatic_resources_min_replica_count: int = 0, automatic_resources_max_replica_count: int = 0, service_account: str = '', disable_container_logging: bool = False, enable_access_logging: bool = False, explanation_metadata: dict = '{}', explanation_parameters: dict = '{}')
model_deploy Deploys a Google Cloud Vertex Model to the Endpoint, creating a DeployedModel within it. For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.endpoints/deployModel.
- Args:
- model (google.VertexModel):
Required. The model to be deployed.
- endpoint (google.VertexEndpoint):
Required. The endpoint to be deployed to.
- deployed_model_display_name (Optional[str]):
The display name of the DeployedModel. If not provided upon creation, the Model’s display_name is used.
- traffic_split (Optional[Dict[str, int]]):
A map from a DeployedModel’s ID to the percentage of this Endpoint’s traffic that should be forwarded to that DeployedModel.
If this field is non-empty, then the Endpoint’s trafficSplit will be overwritten with it. To refer to the ID of the just being deployed Model, a “0” should be used, and the actual ID of the new DeployedModel will be filled in its place by this method. The traffic percentage values must add up to 100.
If this field is empty, then the Endpoint’s trafficSplit is not updated.
- dedicated_resources_machine_type (Optional[str]):
The specification of a single machine used by the prediction.
This field is required if automatic_resources_min_replica_count is not specified.
For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.endpoints#dedicatedresources.
- dedicated_resources_accelerator_type (Optional[str]):
Hardware accelerator type. Must also set accelerator_count if used. See https://cloud.google.com/vertex-ai/docs/reference/rest/v1/MachineSpec#AcceleratorType for available options.
This field is required if dedicated_resources_machine_type is specified.
- dedicated_resources_accelerator_count (Optional[int]):
The number of accelerators to attach to a worker replica.
- dedicated_resources_min_replica_count (Optional[int]):
The minimum number of machine replicas this DeployedModel will be always deployed on. This value must be greater than or equal to 1. If traffic against the DeployedModel increases, it may dynamically be deployed onto more replicas, and as traffic decreases, some of these extra replicas may be freed.
- dedicated_resources_max_replica_count (Optional[int]):
The maximum number of replicas this deployed model may the larger value of min_replica_count or 1 will be used. If value provided is smaller than min_replica_count, it will automatically be increased to be min_replica_count. The maximum number of replicas this deployed model may be deployed on when the traffic against it increases. If requested value is too large, the deployment will error, but if deployment succeeds then the ability to scale the model to that many replicas is guaranteed (barring service outages). If traffic against the deployed model increases beyond what its replicas at maximum may handle, a portion of the traffic will be dropped. If this value is not provided, will use dedicated_resources_min_replica_count as the default value.
- automatic_resources_min_replica_count (Optional[int]):
The minimum number of replicas this DeployedModel will be always deployed on. If traffic against it increases, it may dynamically be deployed onto more replicas up to automatic_resources_max_replica_count, and as traffic decreases, some of these extra replicas may be freed. If the requested value is too large, the deployment will error.
This field is required if dedicated_resources_machine_type is not specified.
- automatic_resources_max_replica_count (Optional[int]):
The maximum number of replicas this DeployedModel may be deployed on when the traffic against it increases. If the requested value is too large, the deployment will error, but if deployment succeeds then the ability to scale the model to that many replicas is guaranteed (barring service outages). If traffic against the DeployedModel increases beyond what its replicas at maximum may handle, a portion of the traffic will be dropped. If this value is not provided, a no upper bound for scaling under heavy traffic will be assume, though Vertex AI may be unable to scale beyond certain replica number.
- service_account (Optional[str]):
The service account that the DeployedModel’s container runs as. Specify the email address of the service account. If this service account is not specified, the container runs as a service account that doesn’t have access to the resource project.
Users deploying the Model must have the iam.serviceAccounts.actAs permission on this service account.
- disable_container_logging (Optional[bool]):
For custom-trained Models and AutoML Tabular Models, the container of the DeployedModel instances will send stderr and stdout streams to Stackdriver Logging by default. Please note that the logs incur cost, which are subject to Cloud Logging pricing.
User can disable container logging by setting this flag to true.
- enable_access_logging (Optional[bool]):
These logs are like standard server access logs, containing information like timestamp and latency for each prediction request.
Note that Stackdriver logs may incur a cost, especially if your project receives prediction requests at a high queries per second rate (QPS). Estimate your costs before enabling this option.
- explanation_metadata (Optional[dict]):
Metadata describing the Model’s input and output for explanation.
For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/ExplanationSpec#explanationmetadata.
- explanation_parameters (Optional[dict]):
Parameters that configure explaining information of the Model’s predictions.
For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/ExplanationSpec#explanationmetadata.
- Returns:
- gcp_resources (str):
Serialized gcp_resources proto tracking the deploy model’s long running operation.
For more details, see https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md.
- google_cloud_pipeline_components.aiplatform.ModelExportOp(model: google.VertexModel, export_format_id: str, artifact_destination: str = '', image_destination: str = '')
model_export Exports a trained, exportable, Model to a location specified by the user. A Model is considered to be exportable if it has at least one supported export format. For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.models/export.
- Args:
- model (google.VertexModel):
Required. The model to be exported.
- export_format_id (str):
The ID of the format in which the Model must be exported. Each Model lists the export formats it supports. If no value is provided here, then the first from the list of the Model’s supported formats is used by default.
- artifact_destination (Optional[str]):
The Cloud Storage location where the Model artifact is to be written to. Under the directory given as the destination a new one with name “
model-export-<model-display-name>-<timestamp-of-export-call>
”, where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format, will be created. Inside, the Model and any of its supporting files will be written.This field should only be set when, in [Model.supported_export_formats], the value for the key given in export_format_id contains
ARTIFACT
.- image_destination (Optional[str]):
The Google Container Registry or Artifact Registry URI where the Model container image will be copied to. Accepted forms:
Google Container Registry path. For example:
gcr.io/projectId/imageName:tag
.Artifact Registry path. For example:
us-central1-docker.pkg.dev/projectId/repoName/imageName:tag
.This field should only be set when, in [Model.supported_export_formats], the value for the key given in export_format_id contains
IMAGE
.
- Returns:
- output_info (str):
Details of the completed export with output destination paths to the artifacts or container image.
- google_cloud_pipeline_components.aiplatform.ModelUndeployOp(model: google.VertexModel, endpoint: google.VertexEndpoint, traffic_split: dict = '{}')
model_undeploy Undeploys a Google Cloud Vertex DeployedModel within the Endpoint. For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.endpoints/deployModel.
- Args:
- model (google.VertexModel):
Required. The model that was deployed to the Endpoint.
- endpoint (google.VertexEndpoint):
Required. The endpoint for the DeployedModel to be undeployed from.
- traffic_split (Optional[Dict[str, int]]):
If this field is provided, then the Endpoint’s trafficSplit will be overwritten with it. If last DeployedModel is being undeployed from the Endpoint, the [Endpoint.traffic_split] will always end up empty when this call returns. A DeployedModel will be successfully undeployed only if it doesn’t have any traffic assigned to it when this method executes, or if this field unassigns any traffic to it.
- Returns:
- gcp_resources (str):
Serialized gcp_resources proto tracking the undeploy model’s long running operation.
For more details, see https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md.
- google_cloud_pipeline_components.aiplatform.ModelUploadOp(project: str, display_name: str, location: str = 'us-central1', description: str = '', unmanaged_container_model: google.UnmanagedContainerModel = None, serving_container_image_uri: str = '', serving_container_command: list = '[]', serving_container_args: list = '[]', serving_container_environment_variables: list = '[]', serving_container_ports: list = '[]', serving_container_predict_route: str = '', serving_container_health_route: str = '', instance_schema_uri: str = '', parameters_schema_uri: str = '', prediction_schema_uri: str = '', artifact_uri: str = '', explanation_metadata: dict = '{}', explanation_parameters: dict = '{}', encryption_spec_key_name: str = '', labels: dict = '{}')
model_upload Uploads a model and returns a Model representing the uploaded Model resource. For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.models/upload.
- Args:
- project (str):
Required. Project to upload this model to.
- location (Optional[str]):
Optional location to upload this model to. If not set, default to us-central1.
- display_name (str):
Required. The display name of the Model. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
- description (Optional[str]):
The description of the model.
- unmanaged_container_model (Optional[google.UnmanagedContainerModel]):
Optional. The unmanaged container model to be uploaded.
The model can be passed from an upstream step, or imported via an importer node. ```
from kfp.v2.components import importer_node from google_cloud_pipeline_components.types import artifact_types
- importer_spec = importer_node.importer(
artifact_uri=’gs://managed-pipeline-gcpc-e2e-test/automl-tabular/model’, artifact_class=artifact_types.UnmanagedContainerModel, metadata={
- ‘containerSpec’: {
- ‘imageUri’:
‘us-docker.pkg.dev/vertex-ai/automl-tabular/prediction-server:prod’
}
})
- serving_container_image_uri (Optional[str]):
Deprecated. Please use unmanaged_container_model instead. Optional. The URI of the Model serving container. Either this parameter or unmanaged_container_model needs to be provided.
For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.models#Model.ModelContainerSpec.
- serving_container_command (Optional[Sequence[str]]=None):
Deprecated. Please use unmanaged_container_model instead.
The command with which the container is run. Not executed within a shell. The Docker image’s ENTRYPOINT is used if this is not provided. Variable references $(VAR_NAME) are expanded using the container’s environment. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not.
For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.models#Model.ModelContainerSpec.
- serving_container_args (Optional[Sequence[str]]=None):
Deprecated. Please use unmanaged_container_model instead.
The arguments to the command. The Docker image’s CMD is used if this is not provided. Variable references $(VAR_NAME) are expanded using the container’s environment. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not.
For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.models#Model.ModelContainerSpec.
- serving_container_environment_variables (Optional[dict[str, str]]=None):
Deprecated. Please use unmanaged_container_model instead.
The environment variables that are to be present in the container. Should be a dictionary where keys are environment variable names and values are environment variable values for those names.
For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.models#Model.ModelContainerSpec.
- serving_container_ports (Optional[Sequence[int]]=None):
Deprecated. Please use unmanaged_container_model instead.
Declaration of ports that are exposed by the container. This field is primarily informational, it gives Vertex AI information about the network connections the container uses. Listing or not a port here has no impact on whether the port is actually exposed, any port listening on the default “0.0.0.0” address inside a container will be accessible from the network.
For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.models#Model.ModelContainerSpec.
- serving_container_predict_route (Optional[str]):
Deprecated. Please use unmanaged_container_model instead.
An HTTP path to send prediction requests to the container, and which must be supported by it. If not specified a default HTTP path will be used by Vertex AI.
For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.models#Model.ModelContainerSpec.
- serving_container_health_route (Optional[str]):
Deprecated. Please use unmanaged_container_model instead.
An HTTP path to send health check requests to the container, and which must be supported by it. If not specified a standard HTTP path will be used by Vertex AI.
For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.models#Model.ModelContainerSpec.
- instance_schema_uri (Optional[str]):
Deprecated. Please use unmanaged_container_model instead.
Points to a YAML file stored on Google Cloud Storage describing the format of a single instance, which are used in
PredictRequest.instances
,ExplainRequest.instances
andBatchPredictionJob.input_config
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.For more details on PredictionSchema, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.models#predictschemata.
- parameters_schema_uri (Optional[str]):
Deprecated. Please use unmanaged_container_model instead.
Points to a YAML file stored on Google Cloud Storage describing the parameters of prediction and explanation via
PredictRequest.parameters
,ExplainRequest.parameters
andBatchPredictionJob.model_parameters
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform, if no parameters are supported it is set to an empty string. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.For more details on PredictionSchema, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.models#predictschemata.
- prediction_schema_uri (Optional[str]):
Deprecated. Please use unmanaged_container_model instead.
Points to a YAML file stored on Google Cloud Storage describing the format of a single prediction produced by this Model, which are returned via
PredictResponse.predictions
,ExplainResponse.explanations
, andBatchPredictionJob.output_config
. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML Models always have this field populated by AI Platform. Note: The URI given on output will be immutable and probably different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.For more details on PredictionSchema, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.models#predictschemata
- artifact_uri (Optional[str]):
Deprecated. Please use unmanaged_container_model instead.
The path to the directory containing the Model artifact and any of its supporting files. Leave blank for custom container prediction. Not present for AutoML Models.
- explanation_metadata (Optional[dict]):
Metadata describing the Model’s input and output for explanation. Both explanation_metadata and explanation_parameters must be passed together when used.
For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/ExplanationSpec#explanationmetadata.
- explanation_parameters (Optional[dict]):
Parameters to configure explaining for Model’s predictions.
For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/ExplanationSpec#explanationmetadata.
- encryption_spec_key_name (Optional[str]):
Customer-managed encryption key spec for a Model. If set, this Model and all sub-resources of this Model will be secured by this key.
Has the form:
projects/my-project/locations/my-location/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created.- labels (Optional[dict]):
The labels with user-defined metadata to organize your model.
Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed.
See https://goo.gl/xmQnxf for more information and examples of labels.
- Returns:
- model (google.VertexModel):
Artifact tracking the created model.
- gcp_resources (str):
Serialized gcp_resources proto tracking the upload model’s long running operation.
For more details, see https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md.
- google_cloud_pipeline_components.aiplatform.TabularDatasetCreateOp(project: str, display_name: str, location: str = 'us-central1', gcs_source: str = None, bq_source: str = None, labels: dict = '{}', encryption_spec_key_name: str = None)
tabular_dataset_create Creates a new tabular dataset. Args:
- display_name (String):
Required. The user-defined name of the Dataset. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
- gcs_source (Union[str, Sequence[str]]):
Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. examples:
str: “gs://bucket/file.csv” Sequence[str]: [“gs://bucket/file1.csv”, “gs://bucket/file2.csv”]
- bq_source (String):
BigQuery URI to the input table. example:
“bq://project.dataset.table_name”
- project (String):
Required. project to retrieve dataset from.
- location (String):
Optional location to retrieve dataset from.
- labels (JsonObject):
Optional. Labels with user-defined metadata to organize your Tensorboards. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. No more than 64 user labels can be associated with one Tensorboard (System labels are excluded). See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with “aiplatform.googleapis.com/” and are immutable.
- encryption_spec_key_name (Optional[String]):
Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the dataset. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created. If set, this Dataset and all sub-resources of this Dataset will be secured by this key. Overrides encryption_spec_key_name set in aiplatform.init.
- Returns:
- tabular_dataset (TabularDataset):
Instantiated representation of the managed tabular dataset resource.
- google_cloud_pipeline_components.aiplatform.TabularDatasetExportDataOp(project: str, dataset: google.VertexDataset, output_dir: str, location: str = 'us-central1')
tabular_dataset_export Exports data to output dir to GCS. Args:
- output_dir (String):
Required. The Google Cloud Storage location where the output is to be written to. In the given directory a new directory will be created with name:
export-data-<dataset-display-name>-<timestamp-of-export-call>
where timestamp is in YYYYMMDDHHMMSS format. All export output will be written into that directory. Inside that directory, annotations with the same schema will be grouped into sub directories which are named with the corresponding annotations’ schema title. Inside these sub directories, a schema.yaml will be created to describe the output format. If the uri doesn’t end with ‘/’, a ‘/’ will be automatically appended. The directory is created if it doesn’t exist.- project (String):
Required. project to retrieve dataset from.
- location (String):
Optional location to retrieve dataset from.
- Returns:
- exported_files (Sequence[str]):
All of the files that are exported in this export operation.
- google_cloud_pipeline_components.aiplatform.TextDatasetCreateOp(project: str, display_name: str, location: str = 'us-central1', data_item_labels: dict = '{}', gcs_source: str = None, import_schema_uri: str = None, labels: dict = '{}', encryption_spec_key_name: str = None)
text_dataset_create Creates a new text dataset and optionally imports data into dataset when source and import_schema_uri are passed. Args:
- display_name (String):
Required. The user-defined name of the Dataset. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
- gcs_source (Union[str, Sequence[str]]):
Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. examples:
str: “gs://bucket/file.csv” Sequence[str]: [“gs://bucket/file1.csv”, “gs://bucket/file2.csv”]
- import_schema_uri (String):
Points to a YAML file stored on Google Cloud Storage describing the import format. Validation will be done against the schema. The schema is defined as an OpenAPI 3.0.2 Schema Object.
- data_item_labels (JsonObject):
Labels that will be applied to newly imported DataItems. If an identical DataItem as one being imported already exists in the Dataset, then these labels will be appended to these of the already existing one, and if labels with identical key is imported before, the old label value will be overwritten. If two DataItems are identical in the same import data operation, the labels will be combined and if key collision happens in this case, one of the values will be picked randomly. Two DataItems are considered identical if their content bytes are identical (e.g. image bytes or pdf bytes). These labels will be overridden by Annotation labels specified inside index file refenced by
import_schema_uri
, e.g. jsonl file.- project (String):
Required. project to retrieve dataset from.
- location (String):
Optional location to retrieve dataset from.
- labels (JsonObject):
Optional. Labels with user-defined metadata to organize your Tensorboards. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. No more than 64 user labels can be associated with one Tensorboard (System labels are excluded). See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with “aiplatform.googleapis.com/” and are immutable.
- encryption_spec_key_name (Optional[String]):
Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the dataset. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created. If set, this Dataset and all sub-resources of this Dataset will be secured by this key. Overrides encryption_spec_key_name set in aiplatform.init.
- Returns:
- text_dataset (TextDataset):
Instantiated representation of the managed text dataset resource.
- google_cloud_pipeline_components.aiplatform.TextDatasetExportDataOp(project: str, dataset: google.VertexDataset, output_dir: str, location: str = 'us-central1')
text_dataset_export Exports data to output dir to GCS. Args:
- output_dir (String):
Required. The Google Cloud Storage location where the output is to be written to. In the given directory a new directory will be created with name:
export-data-<dataset-display-name>-<timestamp-of-export-call>
where timestamp is in YYYYMMDDHHMMSS format. All export output will be written into that directory. Inside that directory, annotations with the same schema will be grouped into sub directories which are named with the corresponding annotations’ schema title. Inside these sub directories, a schema.yaml will be created to describe the output format. If the uri doesn’t end with ‘/’, a ‘/’ will be automatically appended. The directory is created if it doesn’t exist.- project (String):
Required. project to retrieve dataset from.
- location (String):
Optional location to retrieve dataset from.
- Returns:
- exported_files (Sequence[str]):
All of the files that are exported in this export operation.
- google_cloud_pipeline_components.aiplatform.TextDatasetImportDataOp(project: str, dataset: google.VertexDataset, location: str = 'us-central1', data_item_labels: dict = '{}', gcs_source: str = None, import_schema_uri: str = None)
text_dataset_import Upload data to existing managed dataset. Args:
- project (String):
Required. project to retrieve dataset from.
- location (String):
Optional location to retrieve dataset from.
- dataset (Dataset):
Required. The dataset to be updated.
- gcs_source (Union[str, Sequence[str]]):
Required. Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. examples:
str: “gs://bucket/file.csv” Sequence[str]: [“gs://bucket/file1.csv”, “gs://bucket/file2.csv”]
- import_schema_uri (String):
Required. Points to a YAML file stored on Google Cloud Storage describing the import format. Validation will be done against the schema. The schema is defined as an OpenAPI 3.0.2 Schema Object.
- data_item_labels (JsonObject):
Labels that will be applied to newly imported DataItems. If an identical DataItem as one being imported already exists in the Dataset, then these labels will be appended to these of the already existing one, and if labels with identical key is imported before, the old label value will be overwritten. If two DataItems are identical in the same import data operation, the labels will be combined and if key collision happens in this case, one of the values will be picked randomly. Two DataItems are considered identical if their content bytes are identical (e.g. image bytes or pdf bytes). These labels will be overridden by Annotation labels specified inside index file refenced by
import_schema_uri
, e.g. jsonl file.
- Returns:
- dataset (Dataset):
Instantiated representation of the managed dataset resource.
- google_cloud_pipeline_components.aiplatform.TimeSeriesDatasetCreateOp(project: str, display_name: str, location: str = 'us-central1', gcs_source: str = None, bq_source: str = None, labels: dict = '{}', encryption_spec_key_name: str = None)
time_series_dataset_create Creates a new time series dataset. Args:
- display_name (String):
Required. The user-defined name of the Dataset. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
- gcs_source (Union[str, Sequence[str]]):
Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. examples:
str: “gs://bucket/file.csv” Sequence[str]: [“gs://bucket/file1.csv”, “gs://bucket/file2.csv”]
- bq_source (String):
BigQuery URI to the input table. example:
“bq://project.dataset.table_name”
- project (String):
Required. project to retrieve dataset from.
- location (String):
Optional location to retrieve dataset from.
- labels (JsonObject):
Optional. Labels with user-defined metadata to organize your Tensorboards. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. No more than 64 user labels can be associated with one Tensorboard (System labels are excluded). See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with “aiplatform.googleapis.com/” and are immutable.
- encryption_spec_key_name (Optional[String]):
Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the dataset. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created. If set, this Dataset and all sub-resources of this Dataset will be secured by this key. Overrides encryption_spec_key_name set in aiplatform.init.
- Returns:
- time_series_dataset (TimeSeriesDataset):
Instantiated representation of the managed time series dataset resource.
- google_cloud_pipeline_components.aiplatform.TimeSeriesDatasetExportDataOp(project: str, dataset: google.VertexDataset, output_dir: str, location: str = 'us-central1')
time_series_dataset_export Exports data to output dir to GCS. Args:
- output_dir (String):
Required. The Google Cloud Storage location where the output is to be written to. In the given directory a new directory will be created with name:
export-data-<dataset-display-name>-<timestamp-of-export-call>
where timestamp is in YYYYMMDDHHMMSS format. All export output will be written into that directory. Inside that directory, annotations with the same schema will be grouped into sub directories which are named with the corresponding annotations’ schema title. Inside these sub directories, a schema.yaml will be created to describe the output format. If the uri doesn’t end with ‘/’, a ‘/’ will be automatically appended. The directory is created if it doesn’t exist.- project (String):
Required. project to retrieve dataset from.
- location (String):
Optional location to retrieve dataset from.
- Returns:
- exported_files (Sequence[str]):
All of the files that are exported in this export operation.
- google_cloud_pipeline_components.aiplatform.VideoDatasetCreateOp(project: str, display_name: str, location: str = 'us-central1', data_item_labels: dict = '{}', gcs_source: str = None, import_schema_uri: str = None, labels: dict = '{}', encryption_spec_key_name: str = None)
video_dataset_create Creates a new video dataset and optionally imports data into dataset when source and import_schema_uri are passed. Args:
- display_name (String):
Required. The user-defined name of the Dataset. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
- gcs_source (Union[str, Sequence[str]]):
Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. examples:
str: “gs://bucket/file.csv” Sequence[str]: [“gs://bucket/file1.csv”, “gs://bucket/file2.csv”]
- import_schema_uri (String):
Points to a YAML file stored on Google Cloud Storage describing the import format. Validation will be done against the schema. The schema is defined as an OpenAPI 3.0.2 Schema Object.
- data_item_labels (JsonObject):
Labels that will be applied to newly imported DataItems. If an identical DataItem as one being imported already exists in the Dataset, then these labels will be appended to these of the already existing one, and if labels with identical key is imported before, the old label value will be overwritten. If two DataItems are identical in the same import data operation, the labels will be combined and if key collision happens in this case, one of the values will be picked randomly. Two DataItems are considered identical if their content bytes are identical (e.g. image bytes or pdf bytes). These labels will be overridden by Annotation labels specified inside index file refenced by
import_schema_uri
,- project (String):
Required. project to retrieve dataset from.
- location (String):
Optional location to retrieve dataset from.
- labels (JsonObject):
Optional. Labels with user-defined metadata to organize your Tensorboards. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. No more than 64 user labels can be associated with one Tensorboard (System labels are excluded). See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with “aiplatform.googleapis.com/” and are immutable.
- encryption_spec_key_name (Optional[String]):
Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the dataset. Has the form:
projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created. If set, this Dataset and all sub-resources of this Dataset will be secured by this key. Overrides encryption_spec_key_name set in aiplatform.init.
- Returns:
- video_dataset (VideoDataset):
Instantiated representation of the managed video dataset resource.
- google_cloud_pipeline_components.aiplatform.VideoDatasetExportDataOp(project: str, dataset: google.VertexDataset, output_dir: str, location: str = 'us-central1')
video_dataset_export Exports data to output dir to GCS. Args:
- output_dir (String):
Required. The Google Cloud Storage location where the output is to be written to. In the given directory a new directory will be created with name:
export-data-<dataset-display-name>-<timestamp-of-export-call>
where timestamp is in YYYYMMDDHHMMSS format. All export output will be written into that directory. Inside that directory, annotations with the same schema will be grouped into sub directories which are named with the corresponding annotations’ schema title. Inside these sub directories, a schema.yaml will be created to describe the output format. If the uri doesn’t end with ‘/’, a ‘/’ will be automatically appended. The directory is created if it doesn’t exist.- project (String):
Required. project to retrieve dataset from.
- location (String):
Optional location to retrieve dataset from.
- Returns:
- exported_files (Sequence[str]):
All of the files that are exported in this export operation.
- google_cloud_pipeline_components.aiplatform.VideoDatasetImportDataOp(project: str, dataset: google.VertexDataset, location: str = 'us-central1', data_item_labels: dict = '{}', gcs_source: str = None, import_schema_uri: str = None)
video_dataset_import Upload data to existing managed dataset. Args:
- project (String):
Required. project to retrieve dataset from.
- location (String):
Optional location to retrieve dataset from.
- dataset (Dataset):
Required. The dataset to be updated.
- gcs_source (Union[str, Sequence[str]]):
Required. Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. examples:
str: “gs://bucket/file.csv” Sequence[str]: [“gs://bucket/file1.csv”, “gs://bucket/file2.csv”]
- import_schema_uri (String):
Required. Points to a YAML file stored on Google Cloud Storage describing the import format. Validation will be done against the schema. The schema is defined as an OpenAPI 3.0.2 Schema Object.
- data_item_labels (JsonObject):
Labels that will be applied to newly imported DataItems. If an identical DataItem as one being imported already exists in the Dataset, then these labels will be appended to these of the already existing one, and if labels with identical key is imported before, the old label value will be overwritten. If two DataItems are identical in the same import data operation, the labels will be combined and if key collision happens in this case, one of the values will be picked randomly. Two DataItems are considered identical if their content bytes are identical (e.g. image bytes or pdf bytes). These labels will be overridden by Annotation labels specified inside index file refenced by
import_schema_uri
, e.g. jsonl file.
- Returns:
- dataset (Dataset):
Instantiated representation of the managed dataset resource.