Model Evaluation¶

Model evaluation pipelines.

Components:

`ModelEvaluationClassificationOp`(...[, ...])	Computes a `google.ClassificationMetrics` Artifact, containing evaluation metrics given a model's prediction results.
`ModelEvaluationForecastingOp`(gcp_resources, ...)	Computes a `google.ForecastingMetrics` Artifact, containing evaluation metrics given a model's prediction results.
`ModelEvaluationRegressionOp`(gcp_resources, ...)	Computes a `google.RegressionMetrics` Artifact, containing evaluation metrics given a model's prediction results.

Pipelines:

`evaluated_annotation_pipeline`(location, ...)	The evaluation evaluated annotation pipeline.
`evaluation_automl_tabular_feature_attribution_pipeline`(...)	The evaluation AutoML tabular pipeline with feature attribution.
`evaluation_automl_tabular_pipeline`(project, ...)	The evaluation AutoML tabular pipeline with no feature attribution.
`evaluation_automl_unstructure_data_pipeline`(...)	The evaluation pipeline with ground truth and no feature attribution.
`evaluation_feature_attribution_pipeline`(...)	The evaluation custom tabular pipeline with feature attribution.
`vision_model_error_analysis_pipeline`(...[, ...])	The evaluation vision error analysis pipeline.

v1.model_evaluation.ModelEvaluationClassificationOp(gcp_resources: dsl.OutputPath(str), evaluation_metrics: dsl.Output[google.ClassificationMetrics], target_field_name: str, model: dsl.Input[google.VertexModel] = None, location: str = 'us-central1', predictions_format: str = 'jsonl', predictions_gcs_source: dsl.Input[system.Artifact] = None, predictions_bigquery_source: dsl.Input[google.BQTable] = None, ground_truth_format: str = 'jsonl', ground_truth_gcs_source: list[str] = [], ground_truth_bigquery_source: str = '', classification_type: str = 'multiclass', class_labels: list[str] = [], prediction_score_column: str = 'prediction.scores', prediction_label_column: str = 'prediction.classes', slicing_specs: list[Any] = [], positive_classes: list[str] = [], dataflow_service_account: str = '', dataflow_disk_size_gb: int = 50, dataflow_machine_type: str = 'n1-standard-4', dataflow_workers_num: int = 1, dataflow_max_workers_num: int = 5, dataflow_subnetwork: str = '', dataflow_use_public_ips: bool = True, encryption_spec_key_name: str = '', force_runner_mode: str = '', project: str = '{{$.pipeline_google_cloud_project_id}}')¶

Computes a google.ClassificationMetrics Artifact, containing evaluation metrics given a model’s prediction results.

Creates a Dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports multiclass classification evaluation for tabular, image, video, and text data.

Parameters¶

location: str = 'us-central1'¶: Location for running the evaluation.
predictions_format: str = 'jsonl'¶: The file format for the batch

prediction results. jsonl, csv, and bigquery are the allowed formats, from Vertex Batch Prediction. :param predictions_gcs_source: An artifact with its URI pointing toward a GCS directory with prediction or explanation files to be used for this evaluation. For prediction results, the files should be named “prediction.results-” or “predictions_”. For explanation results, the files should be named “explanation.results-“. :param predictions_bigquery_source: BigQuery table with prediction or explanation data to be used for this evaluation. For prediction results, the table column should be named “predicted_*”. :param ground_truth_format: Required for custom tabular and non tabular data. The file format for the ground truth files. jsonl, csv, and bigquery are the allowed formats. :param ground_truth_gcs_source: Required for custom tabular and non tabular data. The GCS URIs representing where the ground truth is located. Used to provide ground truth for each prediction instance when they are not part of the batch prediction jobs prediction instance. :param ground_truth_bigquery_source: Required for custom tabular. The BigQuery table URI representing where the ground truth is located. Used to provide ground truth for each prediction instance when they are not part of the batch prediction jobs prediction instance. :param classification_type: The type of classification problem, either multiclass or multilabel. :param class_labels: The list of class names for the target_field_name, in the same order they appear in the batch predictions jobs predictions output file. For instance, if the values of target_field_name could be either 1 or 0, and the predictions output contains [“1”, “0”] for the prediction_label_column, then the class_labels input will be [“1”, “0”]. If not set, defaults to the classes found in the prediction_label_column in the batch prediction jobs predictions file. :param target_field_name: The full name path of the features target field in the predictions file. Formatted to be able to find nested columns, delimited by .. Alternatively referred to as the ground truth (or ground_truth_column) field. :param model: The Vertex model used for evaluation. Must be located in the same region as the location argument. It is used to set the default configurations for AutoML and custom-trained models. :param prediction_score_column: The column name of the field containing batch prediction scores. Formatted to be able to find nested columns, delimited by .. :param prediction_label_column: The column name of the field containing classes the model is scoring. Formatted to be able to find nested columns, delimited by .. :param slicing_specs: List of google.cloud.aiplatform_v1.types.ModelEvaluationSlice.SlicingSpec. When provided, compute metrics for each defined slice. See sample code in https://cloud.google.com/vertex-ai/docs/pipelines/model-evaluation-component Below is an example of how to format this input.

1: First, create a SlicingSpec.
  `from google.cloud.aiplatform_v1.types.ModelEvaluationSlice.Slice import SliceSpec`

  `from google.cloud.aiplatform_v1.types.ModelEvaluationSlice.Slice.SliceSpec import SliceConfig`

  `slicing_spec = SliceSpec(configs={ 'feature_a': SliceConfig(SliceSpec.Value(string_value='label_a'))})`
2: Create a list to store the slicing specs into.
  `slicing_specs = []`
3: Format each SlicingSpec into a JSON or Dict.
  `slicing_spec_json = json_format.MessageToJson(slicing_spec)`
  or
  `slicing_spec_dict = json_format.MessageToDict(slicing_spec)`
4: Combine each slicing_spec JSON into a list.
  `slicing_specs.append(slicing_spec_json)`
5: Finally, pass slicing_specs as an parameter for this component.
  `ModelEvaluationClassificationOp(slicing_specs=slicing_specs)`
For more details on configuring slices, see
https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.types.ModelEvaluationSlice

Parameters¶

positive_classes: list[str] = []¶: The list of class

names to create binary classification metrics based on one-vs-rest for each value of positive_classes provided. :param dataflow_service_account: Service account to run the Dataflow job. If not set, Dataflow will use the default worker service account. For more details, see https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#default_worker_service_account :param dataflow_disk_size_gb: The disk size (in GB) of the machine executing the evaluation run. :param dataflow_machine_type: The machine type executing the evaluation run. :param dataflow_workers_num: The number of workers executing the evaluation run. :param dataflow_max_workers_num: The max number of workers executing the evaluation run. :param dataflow_subnetwork: Dataflow’s fully qualified subnetwork name, when empty the default subnetwork will be used. More details: https://cloud.google.com/dataflow/docs/guides/specifying-networks#example_network_and_subnetwork_specifications :param dataflow_use_public_ips: Specifies whether Dataflow workers use public IP addresses. :param encryption_spec_key_name: Customer-managed encryption key options. If set, resources created by this pipeline will be encrypted with the provided encryption key. Has the form: projects/my-project/locations/my-location/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. :param force_runner_mode: Flag to choose Beam runner. Valid options are DirectRunner and Dataflow. :param project: Project to run evaluation container. Defaults to the project in which the PipelineJob is run.

Returns¶

``evaluation_metrics: dsl.Output[google.ClassificationMetrics]``
          `google.ClassificationMetrics` representing the classification

evaluation metrics in GCS. gcp_resources: dsl.OutputPath(str) Serialized gcp_resources proto tracking the Dataflow job. For more details, see https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md.

v1.model_evaluation.ModelEvaluationForecastingOp(gcp_resources: dsl.OutputPath(str), evaluation_metrics: dsl.Output[google.ForecastingMetrics], target_field_name: str, model: dsl.Input[google.VertexModel] = None, location: str = 'us-central1', predictions_format: str = 'jsonl', predictions_gcs_source: dsl.Input[system.Artifact] = None, predictions_bigquery_source: dsl.Input[google.BQTable] = None, ground_truth_format: str = 'jsonl', ground_truth_gcs_source: list[str] = [], ground_truth_bigquery_source: str = '', forecasting_type: str = 'point', forecasting_quantiles: list[float] = [], point_evaluation_quantile: float = 0.5, prediction_score_column: str = 'prediction.value', dataflow_service_account: str = '', dataflow_disk_size_gb: int = 50, dataflow_machine_type: str = 'n1-standard-4', dataflow_workers_num: int = 1, dataflow_max_workers_num: int = 5, dataflow_subnetwork: str = '', dataflow_use_public_ips: bool = True, encryption_spec_key_name: str = '', force_runner_mode: str = '', project: str = '{{$.pipeline_google_cloud_project_id}}')¶

Computes a google.ForecastingMetrics Artifact, containing evaluation metrics given a model’s prediction results.

Creates a Dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports point forecasting and quantile forecasting for tabular data.

Parameters¶

location: str = 'us-central1'¶: Location for running the evaluation.
predictions_format: str = 'jsonl'¶: The file format for the batch

prediction results. jsonl, csv, and bigquery are the allowed formats, from Vertex Batch Prediction. :param predictions_gcs_source: An artifact with its URI pointing toward a GCS directory with prediction or explanation files to be used for this evaluation. For prediction results, the files should be named “prediction.results-”. For explanation results, the files should be named “explanation.results-“. :param predictions_bigquery_source: BigQuery table with prediction or explanation data to be used for this evaluation. For prediction results, the table column should be named “predicted_*”. :param ground_truth_format: Required for custom tabular and non tabular data. The file format for the ground truth files. jsonl, csv, and bigquery are the allowed formats. :param ground_truth_gcs_source: Required for custom tabular and non tabular data. The GCS URIs representing where the ground truth is located. Used to provide ground truth for each prediction instance when they are not part of the batch prediction jobs prediction instance. :param ground_truth_bigquery_source: Required for custom tabular. The BigQuery table URI representing where the ground truth is located. Used to provide ground truth for each prediction instance when they are not part of the batch prediction jobs prediction instance. :param forecasting_type: The forecasting type being addressed by this evaluation run. point and quantile are the supported types. :param forecasting_quantiles: Required for a quantile forecasting_type. The list of quantiles in the same order appeared in the quantile prediction score column. :param point_evaluation_quantile: Required for a quantile forecasting_type. A quantile in the list of forecasting_quantiles that will be used for point evaluation metrics. :param target_field_name: The full name path of the features target field in the predictions file. Formatted to be able to find nested columns, delimited by .. Alternatively referred to as the ground truth (or ground_truth_column) field. :param model: The Vertex model used for evaluation. Must be located in the same region as the location argument. It is used to set the default configurations for AutoML and custom-trained models. :param prediction_score_column: The column name of the field containing batch prediction scores. Formatted to be able to find nested columns, delimited by .. :param dataflow_service_account: Service account to run the Dataflow job. If not set, Dataflow will use the default worker service account. For more details, see https://cloud.google.com/dataflow/docs/concepts/secURIty-and-permissions#default_worker_service_account :param dataflow_disk_size_gb: The disk size (in GB) of the machine executing the evaluation run. :param dataflow_machine_type: The machine type executing the evaluation run. :param dataflow_workers_num: The number of workers executing the evaluation run. :param dataflow_max_workers_num: The max number of workers executing the evaluation run. :param dataflow_subnetwork: Dataflow’s fully qualified subnetwork name, when empty the default subnetwork will be used. More details: https://cloud.google.com/dataflow/docs/guides/specifying-networks#example_network_and_subnetwork_specifications :param dataflow_use_public_ips: Specifies whether Dataflow workers use public IP addresses. :param encryption_spec_key_name: Customer-managed encryption key options. If set, resources created by this pipeline will be encrypted with the provided encryption key. Has the form: projects/my-project/locations/my-location/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. :param force_runner_mode: Flag to choose Beam runner. Valid options are DirectRunner and Dataflow. :param project: Project to run evaluation container. Defaults to the project in which the PipelineJob is run.

Returns¶

``evaluation_metrics: dsl.Output[google.ForecastingMetrics]``
          `google.ForecastingMetrics` representing the forecasting

evaluation metrics in GCS. gcp_resources: dsl.OutputPath(str) Serialized gcp_resources proto tracking the Dataflow job. For more details, see https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md.

v1.model_evaluation.ModelEvaluationRegressionOp(gcp_resources: dsl.OutputPath(str), evaluation_metrics: dsl.Output[google.RegressionMetrics], target_field_name: str, model: dsl.Input[google.VertexModel] = None, location: str = 'us-central1', predictions_format: str = 'jsonl', predictions_gcs_source: dsl.Input[system.Artifact] = None, predictions_bigquery_source: dsl.Input[google.BQTable] = None, ground_truth_format: str = 'jsonl', ground_truth_gcs_source: list[str] = [], ground_truth_bigquery_source: str = '', prediction_score_column: str = 'prediction.value', dataflow_service_account: str = '', dataflow_disk_size_gb: int = 50, dataflow_machine_type: str = 'n1-standard-4', dataflow_workers_num: int = 1, dataflow_max_workers_num: int = 5, dataflow_subnetwork: str = '', dataflow_use_public_ips: bool = True, encryption_spec_key_name: str = '', force_runner_mode: str = '', project: str = '{{$.pipeline_google_cloud_project_id}}')¶

Computes a google.RegressionMetrics Artifact, containing evaluation metrics given a model’s prediction results.

Creates a Dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports regression for tabular data.

Parameters¶

location: str = 'us-central1'¶: Location for running the evaluation.
predictions_format: str = 'jsonl'¶: The file format for the batch

prediction results. jsonl, csv, and bigquery are the allowed formats, from Vertex Batch Prediction. :param predictions_gcs_source: An artifact with its URI pointing toward a GCS directory with prediction or explanation files to be used for this evaluation. For prediction results, the files should be named “prediction.results-”. For explanation results, the files should be named “explanation.results-“. :param predictions_bigquery_source: BigQuery table with prediction or explanation data to be used for this evaluation. For prediction results, the table column should be named “predicted_*”. :param ground_truth_format: Required for custom tabular and non tabular data. The file format for the ground truth files. jsonl, csv, and bigquery are the allowed formats. :param ground_truth_gcs_source: Required for custom tabular and non tabular data. The GCS URIs representing where the ground truth is located. Used to provide ground truth for each prediction instance when they are not part of the batch prediction jobs prediction instance. :param ground_truth_bigquery_source: Required for custom tabular. The BigQuery table URI representing where the ground truth is located. Used to provide ground truth for each prediction instance when they are not part of the batch prediction jobs prediction instance. :param target_field_name: The target field’s name. Formatted to be able to find nested columns, delimited by .. Prefixed with ‘instance.’ on the component for Vertex Batch Prediction. :param model: The Vertex model used for evaluation. Must be located in the same region as the location argument. It is used to set the default configurations for AutoML and custom-trained models. :param prediction_score_column: The column name of the field containing batch prediction scores. Formatted to be able to find nested columns, delimited by .. :param dataflow_service_account: Service account to run the Dataflow job. If not set, Dataflow will use the default worker service account. For more details, see https://cloud.google.com/dataflow/docs/concepts/secURIty-and-permissions#default_worker_service_account :param dataflow_disk_size_gb: The disk size (in GB) of the machine executing the evaluation run. :param dataflow_machine_type: The machine type executing the evaluation run. :param dataflow_workers_num: The number of workers executing the evaluation run. :param dataflow_max_workers_num: The max number of workers executing the evaluation run. :param dataflow_subnetwork: Dataflow’s fully qualified subnetwork name, when empty the default subnetwork will be used. More details: https://cloud.google.com/dataflow/docs/guides/specifying-networks#example_network_and_subnetwork_specifications :param dataflow_use_public_ips: Specifies whether Dataflow workers use public IP addresses. :param encryption_spec_key_name: Customer-managed encryption key options. If set, resources created by this pipeline will be encrypted with the provided encryption key. Has the form: projects/my-project/locations/my-location/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. :param force_runner_mode: Flag to choose Beam runner. Valid options are DirectRunner and Dataflow. :param project: Project to run evaluation container. Defaults to the project in which the PipelineJob is run.

Returns¶

``evaluation_metrics: dsl.Output[google.RegressionMetrics]``
          `google.RegressionMetrics` representing the regression

evaluation metrics in GCS. gcp_resources: dsl.OutputPath(str) Serialized gcp_resources proto tracking the Dataflow job. For more details, see https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md.

v1.model_evaluation.evaluated_annotation_pipeline(location: str, model_name: str, batch_predict_gcs_destination_output_uri: str, test_dataset_resource_name: str = '', test_dataset_annotation_set_name: str = '', test_dataset_storage_source_uris: list[str] = [], batch_predict_instances_format: str = 'jsonl', batch_predict_predictions_format: str = 'jsonl', batch_predict_machine_type: str = 'n1-standard-32', batch_predict_starting_replica_count: int = 5, batch_predict_max_replica_count: int = 10, batch_predict_accelerator_type: str = '', batch_predict_accelerator_count: int = 0, dataflow_machine_type: str = 'n1-standard-8', dataflow_max_num_workers: int = 5, dataflow_disk_size_gb: int = 50, dataflow_service_account: str = '', dataflow_subnetwork: str = '', dataflow_use_public_ips: bool = True, encryption_spec_key_name: str = '', force_runner_mode: str = '', project: str = '{{$.pipeline_google_cloud_project_id}}')[source]¶

The evaluation evaluated annotation pipeline.

Parameters¶

location: str¶: The GCP region that runs the pipeline components.
model_name: str¶: The Vertex model resource name to be imported and used for batch

prediction, in the format of projects/{project}/locations/{location}/models/{model} or projects/{project}/locations/{location}/models/{model}@{model_version_id or model_version_alias} :param batch_predict_gcs_destination_output_uri: The Google Cloud Storage location of the directory where the output is to be written to. In the given directory a new directory is created. Its name is prediction-<model-display-name>-<job-create-time>, where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format. Inside of it files predictions_0001.<extension>, predictions_0002.<extension>, …, predictions_N.<extension> are created where <extension> depends on chosen predictions_format, and N may equal 0001 and depends on the total number of successfully predicted instances. If the Model has both instance and prediction schemata defined then each such file contains predictions as per the predictions_format. If prediction for any instance failed (partially or completely), then an additional errors_0001.<extension>, errors_0002.<extension>,…, errors_N.<extension> files are created (N depends on total number of failed predictions). These files contain the failed instances, as per their schema, followed by an additional error field which as value has google.rpc.Status containing only code and message fields. For more details about this output config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#OutputConfig. :param test_dataset_resource_name: A Vertex dataset resource name of the test dataset. If test_dataset_storage_source_uris is also provided, this argument will override the GCS source. :param test_dataset_annotation_set_name: A string of the annotation_set name containing the ground truth of the test datset used for evaluation. :param test_dataset_storage_source_uris: Google Cloud Storage URI(-s) to unmanaged test datasets.``jsonl`` is currently the only allowed format. If test_dataset is also provided, this field will be overridden by the provided Vertex Dataset. :param batch_predict_instances_format: The format in which instances are given, must be one of the Model’s supportedInputStorageFormats. For more details about this input config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#InputConfig. :param batch_predict_predictions_format: The format in which Vertex AI gives the predictions. Must be one of the Model’s supportedOutputStorageFormats. For more details about this output config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#OutputConfig. :param batch_predict_machine_type: The type of machine for running batch prediction on dedicated resources. If the Model supports DEDICATED_RESOURCES this config may be provided (and the job will use these resources). If the Model doesn’t support AUTOMATIC_RESOURCES, this config must be provided. For more details about the BatchDedicatedResources, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#BatchDedicatedResources. For more details about the machine spec, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/MachineSpec :param batch_predict_starting_replica_count: The number of machine replicas used at the start of the batch operation. If not set, Vertex AI decides starting number, not greater than max_replica_count. Only used if machine_type is set. :param batch_predict_max_replica_count: The maximum number of machine replicas the batch operation may be scaled to. Only used if machine_type is set. :param batch_predict_accelerator_type: The type of accelerator(s) that may be attached to the machine as per batch_predict_accelerator_count. Only used if batch_predict_machine_type is set. For more details about the machine spec, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/MachineSpec :param batch_predict_accelerator_count: The number of accelerators to attach to the batch_predict_machine_type. Only used if batch_predict_machine_type is set. :param dataflow_machine_type: The Dataflow machine type for evaluation components. :param dataflow_max_num_workers: The max number of Dataflow workers for evaluation components. :param dataflow_disk_size_gb: The disk size (in GB) of the machine executing the evaluation run. :param dataflow_service_account: Custom service account to run Dataflow jobs. :param dataflow_subnetwork: Dataflow’s fully qualified subnetwork name, when empty the default subnetwork will be used. Example: https://cloud.google.com/dataflow/docs/guides/specifying-networks#example_network_and_subnetwork_specifications :param dataflow_use_public_ips: Specifies whether Dataflow workers use public IP addresses. :param encryption_spec_key_name: Customer-managed encryption key options. If set, resources created by this pipeline will be encrypted with the provided encryption key. Has the form: projects/my-project/locations/my-location/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. :param force_runner_mode: Indicate the runner mode to use forcely. Valid options are Dataflow and DirectRunner. :param project: The GCP project that runs the pipeline components. Defaults to the project in which the PipelineJob is run.

v1.model_evaluation.evaluation_automl_tabular_feature_attribution_pipeline(project: str, location: str, prediction_type: str, model_name: str, target_field_name: str, batch_predict_instances_format: str, batch_predict_gcs_destination_output_uri: str, batch_predict_gcs_source_uris: list[str] = [], batch_predict_bigquery_source_uri: str = '', batch_predict_predictions_format: str = 'jsonl', batch_predict_bigquery_destination_output_uri: str = '', batch_predict_machine_type: str = 'n1-standard-16', batch_predict_starting_replica_count: int = 5, batch_predict_max_replica_count: int = 10, batch_predict_explanation_metadata: dict[str, Any] = {}, batch_predict_explanation_parameters: dict[str, Any] = {}, batch_predict_explanation_data_sample_size: int = 10000, batch_predict_accelerator_type: str = '', batch_predict_accelerator_count: int = 0, slicing_specs: list[Any] = [], dataflow_machine_type: str = 'n1-standard-4', dataflow_max_num_workers: int = 5, dataflow_disk_size_gb: int = 50, dataflow_service_account: str = '', dataflow_subnetwork: str = '', dataflow_use_public_ips: bool = True, encryption_spec_key_name: str = '', force_runner_mode: str = '')[source]¶

The evaluation AutoML tabular pipeline with feature attribution.

This pipeline guarantees support for AutoML Tabular classification and regression models that contain a valid explanation_spec. This pipeline does not include the target_field_data_remover component, which is needed for many tabular custom models.

Parameters¶

project: str¶: The GCP project that runs the pipeline components.
location: str¶: The GCP region that runs the pipeline components.
prediction_type: str¶: The type of prediction the model is to produce.

“classification” or “regression”. :param model_name: The Vertex model resource name to be imported and used for batch prediction. :param target_field_name: The target field’s name. Formatted to be able to find nested columns, delimited by .. Prefixed with ‘instance.’ on the component for Vertex Batch Prediction. :param batch_predict_instances_format: The format in which instances are given, must be one of the Model’s supportedInputStorageFormats. For more details about this input config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#InputConfig. :param batch_predict_gcs_destination_output_uri: The Google Cloud Storage location of the directory where the output is to be written to. In the given directory a new directory is created. Its name is prediction-<model-display-name>-<job-create-time>, where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format. Inside of it files predictions_0001.<extension>, predictions_0002.<extension>, …, predictions_N.<extension> are created where <extension> depends on chosen predictions_format, and N may equal 0001 and depends on the total number of successfully predicted instances. If the Model has both instance and prediction schemata defined then each such file contains predictions as per the predictions_format. If prediction for any instance failed (partially or completely), then an additional errors_0001.<extension>, errors_0002.<extension>,…, errors_N.<extension> files are created (N depends on total number of failed predictions). These files contain the failed instances, as per their schema, followed by an additional error field which as value has google.rpc.Status containing only code and message fields. For more details about this output config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#OutputConfig. :param batch_predict_gcs_source_uris: Google Cloud Storage URI(-s) to your instances to run batch prediction on. May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. For more details about this input config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#InputConfig. :param batch_predict_bigquery_source_uri: Google BigQuery URI to your instances to run batch prediction on. May contain wildcards. For more details about this input config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#InputConfig. :param batch_predict_predictions_format: The format in which Vertex AI gives the predictions. Must be one of the Model’s supportedOutputStorageFormats. For more details about this output config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#OutputConfig. :param batch_predict_bigquery_destination_output_uri: The BigQuery project location where the output is to be written to. In the given project a new dataset is created with name prediction_<model-display-name>_<job-create-time> where is made BigQuery-dataset-name compatible (for example, most special characters become underscores), and timestamp is in YYYY_MM_DDThh_mm_ss_sssZ “based on ISO-8601” format. In the dataset two tables will be created, predictions, and errors. If the Model has both instance and prediction schemata defined then the tables have columns as follows: The predictions table contains instances for which the prediction succeeded, it has columns as per a concatenation of the Model’s instance and prediction schemata. The errors table contains rows for which the prediction has failed, it has instance columns, as per the instance schema, followed by a single “errors” column, which as values has google.rpc.Status represented as a STRUCT, and containing only code and message. For more details about this output config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#OutputConfig. :param batch_predict_machine_type: The type of machine for running batch prediction on dedicated resources. If the Model supports DEDICATED_RESOURCES this config may be provided (and the job will use these resources). If the Model doesn’t support AUTOMATIC_RESOURCES, this config must be provided. For more details about the BatchDedicatedResources, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#BatchDedicatedResources. For more details about the machine spec, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/MachineSpec :param batch_predict_starting_replica_count: The number of machine replicas used at the start of the batch operation. If not set, Vertex AI decides starting number, not greater than max_replica_count. Only used if machine_type is set. :param batch_predict_max_replica_count: The maximum number of machine replicas the batch operation may be scaled to. Only used if machine_type is set. :param batch_predict_explanation_metadata: Explanation metadata configuration for this BatchPredictionJob. Can be specified only if generate_explanation is set to True. This value overrides the value of Model.explanation_metadata. All fields of explanation_metadata are optional in the request. If a field of the explanation_metadata object is not populated, the corresponding field of the Model.explanation_metadata object is inherited. For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/ExplanationSpec#explanationmetadata. :param batch_predict_explanation_parameters: Parameters to configure explaining for Model’s predictions. Can be specified only if generate_explanation is set to True. This value overrides the value of Model.explanation_parameters. All fields of explanation_parameters are optional in the request. If a field of the explanation_parameters object is not populated, the corresponding field of the Model.explanation_parameters object is inherited. For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/ExplanationSpec#ExplanationParameters. :param batch_predict_explanation_data_sample_size: Desired size to downsample the input dataset that will then be used for batch explanation. :param batch_predict_accelerator_type: The type of accelerator(s) that may be attached to the machine as per batch_predict_accelerator_count. Only used if batch_predict_machine_type is set. For more details about the machine spec, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/MachineSpec :param batch_predict_accelerator_count: The number of accelerators to attach to the batch_predict_machine_type. Only used if batch_predict_machine_type is set. :param slicing_specs: List of google.cloud.aiplatform_v1.types.ModelEvaluationSlice.SlicingSpec. When provided, compute metrics for each defined slice. See sample code in https://cloud.google.com/vertex-ai/docs/pipelines/model-evaluation-component For more details on configuring slices, see https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.types.ModelEvaluationSlice. :param dataflow_machine_type: The Dataflow machine type for evaluation components. :param dataflow_max_num_workers: The max number of Dataflow workers for evaluation components. :param dataflow_disk_size_gb: Dataflow worker’s disk size in GB for evaluation components. :param dataflow_service_account: Custom service account to run Dataflow jobs. :param dataflow_subnetwork: Dataflow’s fully qualified subnetwork name, when empty the default subnetwork will be used. Example: https://cloud.google.com/dataflow/docs/guides/specifying-networks#example_network_and_subnetwork_specifications :param dataflow_use_public_ips: Specifies whether Dataflow workers use public IP addresses. :param encryption_spec_key_name: Customer-managed encryption key options. If set, resources created by this pipeline will be encrypted with the provided encryption key. Has the form: projects/my-project/locations/my-location/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. :param force_runner_mode: Indicate the runner mode to use forcely. Valid options are Dataflow and DirectRunner.

v1.model_evaluation.evaluation_automl_tabular_pipeline(project: str, location: str, prediction_type: str, model_name: str, target_field_name: str, batch_predict_instances_format: str, batch_predict_gcs_destination_output_uri: str, batch_predict_gcs_source_uris: list[str] = [], batch_predict_bigquery_source_uri: str = '', batch_predict_predictions_format: str = 'jsonl', batch_predict_bigquery_destination_output_uri: str = '', batch_predict_machine_type: str = 'n1-standard-16', batch_predict_starting_replica_count: int = 5, batch_predict_max_replica_count: int = 10, batch_predict_accelerator_type: str = '', batch_predict_accelerator_count: int = 0, slicing_specs: list[Any] = [], dataflow_machine_type: str = 'n1-standard-4', dataflow_max_num_workers: int = 5, dataflow_disk_size_gb: int = 50, dataflow_service_account: str = '', dataflow_subnetwork: str = '', dataflow_use_public_ips: bool = True, encryption_spec_key_name: str = '', force_runner_mode: str = '')[source]¶

The evaluation AutoML tabular pipeline with no feature attribution.

This pipeline guarantees support for AutoML Tabular classification and regression models. This pipeline does not include the target_field_data_remover component, which is needed for many tabular custom models and AutoML Tabular Forecasting.

Parameters¶

project: str¶: The GCP project that runs the pipeline components.
location: str¶: The GCP region that runs the pipeline components.
prediction_type: str¶: The type of prediction the model is to produce.

“classification” or “regression”. :param model_name: The Vertex model resource name to be imported and used for batch prediction. :param target_field_name: The target field’s name. Formatted to be able to find nested columns, delimited by .. Prefixed with ‘instance.’ on the component for Vertex Batch Prediction. :param batch_predict_instances_format: The format in which instances are given, must be one of the Model’s supportedInputStorageFormats. For more details about this input config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#InputConfig. :param batch_predict_gcs_destination_output_uri: The Google Cloud Storage location of the directory where the output is to be written to. In the given directory a new directory is created. Its name is prediction-<model-display-name>-<job-create-time>, where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format. Inside of it files predictions_0001.<extension>, predictions_0002.<extension>, …, predictions_N.<extension> are created where <extension> depends on chosen predictions_format, and N may equal 0001 and depends on the total number of successfully predicted instances. If the Model has both instance and prediction schemata defined then each such file contains predictions as per the predictions_format. If prediction for any instance failed (partially or completely), then an additional errors_0001.<extension>, errors_0002.<extension>,…, errors_N.<extension> files are created (N depends on total number of failed predictions). These files contain the failed instances, as per their schema, followed by an additional error field which as value has google.rpc.Status containing only code and message fields. For more details about this output config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#OutputConfig. :param batch_predict_gcs_source_uris: Google Cloud Storage URI(-s) to your instances to run batch prediction on. May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. For more details about this input config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#InputConfig. :param batch_predict_bigquery_source_uri: Google BigQuery URI to your instances to run batch prediction on. May contain wildcards. For more details about this input config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#InputConfig. :param batch_predict_predictions_format: The format in which Vertex AI gives the predictions. Must be one of the Model’s supportedOutputStorageFormats. For more details about this output config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#OutputConfig. :param batch_predict_bigquery_destination_output_uri: The BigQuery project location where the output is to be written to. In the given project a new dataset is created with name prediction_<model-display-name>_<job-create-time> where is made BigQuery-dataset-name compatible (for example, most special characters become underscores), and timestamp is in YYYY_MM_DDThh_mm_ss_sssZ “based on ISO-8601” format. In the dataset two tables will be created, predictions, and errors. If the Model has both instance and prediction schemata defined then the tables have columns as follows: The predictions table contains instances for which the prediction succeeded, it has columns as per a concatenation of the Model’s instance and prediction schemata. The errors table contains rows for which the prediction has failed, it has instance columns, as per the instance schema, followed by a single “errors” column, which as values has google.rpc.Status represented as a STRUCT, and containing only code and message. For more details about this output config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#OutputConfig. :param batch_predict_machine_type: The type of machine for running batch prediction on dedicated resources. If the Model supports DEDICATED_RESOURCES this config may be provided (and the job will use these resources). If the Model doesn’t support AUTOMATIC_RESOURCES, this config must be provided. For more details about the BatchDedicatedResources, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#BatchDedicatedResources. For more details about the machine spec, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/MachineSpec :param batch_predict_starting_replica_count: The number of machine replicas used at the start of the batch operation. If not set, Vertex AI decides starting number, not greater than max_replica_count. Only used if machine_type is set. :param batch_predict_max_replica_count: The maximum number of machine replicas the batch operation may be scaled to. Only used if machine_type is set. :param batch_predict_accelerator_type: The type of accelerator(s) that may be attached to the machine as per batch_predict_accelerator_count. Only used if batch_predict_machine_type is set. For more details about the machine spec, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/MachineSpec :param batch_predict_accelerator_count: The number of accelerators to attach to the batch_predict_machine_type. Only used if batch_predict_machine_type is set. :param slicing_specs: List of google.cloud.aiplatform_v1.types.ModelEvaluationSlice.SlicingSpec. When provided, compute metrics for each defined slice. See sample code in https://cloud.google.com/vertex-ai/docs/pipelines/model-evaluation-component For more details on configuring slices, see https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.types.ModelEvaluationSlice. :param dataflow_machine_type: The Dataflow machine type for evaluation components. :param dataflow_max_num_workers: The max number of Dataflow workers for evaluation components. :param dataflow_disk_size_gb: Dataflow worker’s disk size in GB for evaluation components. :param dataflow_service_account: Custom service account to run Dataflow jobs. :param dataflow_subnetwork: Dataflow’s fully qualified subnetwork name, when empty the default subnetwork will be used. Example: https://cloud.google.com/dataflow/docs/guides/specifying-networks#example_network_and_subnetwork_specifications :param dataflow_use_public_ips: Specifies whether Dataflow workers use public IP addresses. :param encryption_spec_key_name: Customer-managed encryption key options. If set, resources created by this pipeline will be encrypted with the provided encryption key. Has the form: projects/my-project/locations/my-location/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. :param force_runner_mode: Indicate the runner mode to use forcely. Valid options are Dataflow and DirectRunner.

v1.model_evaluation.evaluation_automl_unstructure_data_pipeline(project: str, location: str, prediction_type: str, model_name: str, target_field_name: str, batch_predict_instances_format: str, batch_predict_gcs_destination_output_uri: str, batch_predict_gcs_source_uris: list[str] = [], batch_predict_bigquery_source_uri: str = '', batch_predict_predictions_format: str = 'jsonl', batch_predict_bigquery_destination_output_uri: str = '', batch_predict_machine_type: str = 'n1-standard-16', batch_predict_starting_replica_count: int = 5, batch_predict_max_replica_count: int = 10, batch_predict_accelerator_type: str = '', batch_predict_accelerator_count: int = 0, evaluation_prediction_label_column: str = '', evaluation_prediction_score_column: str = '', evaluation_class_labels: list[str] = [], dataflow_machine_type: str = 'n1-standard-4', dataflow_max_num_workers: int = 5, dataflow_disk_size_gb: int = 50, dataflow_service_account: str = '', dataflow_subnetwork: str = '', dataflow_use_public_ips: bool = True, encryption_spec_key_name: str = '', force_runner_mode: str = '')[source]¶

The evaluation pipeline with ground truth and no feature attribution.

This pipeline is used for all unstructured AutoML models, including Text, Video, Image and Custom models.

Parameters¶

project: str¶: The GCP project that runs the pipeline components.
location: str¶: The GCP region that runs the pipeline components.
prediction_type: str¶: The type of prediction the model is to produce.

“classification” or “regression”. :param model_name: The Vertex model resource name to be imported and used for batch prediction. Formatted like projects/{project}/locations/{location}/models/{model} or projects/{project}/locations/{location}/models/{model}@{model_version_id_or_model_version_alias}. :param target_field_name: The target field’s name. Formatted to be able to find nested columns, delimited by .. Prefixed with ‘instance.’ on the component for Vertex Batch Prediction. :param batch_predict_instances_format: The format in which instances are given, must be one of the Model’s supportedInputStorageFormats. For more details about this input config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#InputConfig. :param batch_predict_gcs_destination_output_uri: The Google Cloud Storage location of the directory where the output is to be written to. In the given directory a new directory is created. Its name is prediction-<model-display-name>-<job-create-time>, where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format. Inside of it files predictions_0001.<extension>, predictions_0002.<extension>, …, predictions_N.<extension> are created where <extension> depends on chosen predictions_format, and N may equal 0001 and depends on the total number of successfully predicted instances. If the Model has both instance and prediction schemata defined then each such file contains predictions as per the predictions_format. If prediction for any instance failed (partially or completely), then an additional errors_0001.<extension>, errors_0002.<extension>,…, errors_N.<extension> files are created (N depends on total number of failed predictions). These files contain the failed instances, as per their schema, followed by an additional error field which as value has google.rpc.Status containing only code and message fields. For more details about this output config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#OutputConfig. :param batch_predict_gcs_source_uris: Google Cloud Storage URI(-s) to your instances data to run batch prediction on. The instances data should also contain the ground truth (target) data, used for evaluation. May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. For more details about this input config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#InputConfig. :param batch_predict_bigquery_source_uri: Google BigQuery URI to your instances to run batch prediction on. May contain wildcards. For more details about this input config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#InputConfig. :param batch_predict_predictions_format: The format in which Vertex AI gives the predictions. Must be one of the Model’s supportedOutputStorageFormats. For more details about this output config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#OutputConfig. :param batch_predict_bigquery_destination_output_uri: The BigQuery project location where the output is to be written to. In the given project a new dataset is created with name prediction_<model-display-name>_<job-create-time> where is made BigQuery-dataset-name compatible (for example, most special characters become underscores), and timestamp is in YYYY_MM_DDThh_mm_ss_sssZ “based on ISO-8601” format. In the dataset two tables will be created, predictions, and errors. If the Model has both instance and prediction schemata defined then the tables have columns as follows: The predictions table contains instances for which the prediction succeeded, it has columns as per a concatenation of the Model’s instance and prediction schemata. The errors table contains rows for which the prediction has failed, it has instance columns, as per the instance schema, followed by a single “errors” column, which as values has google.rpc.Status represented as a STRUCT, and containing only code and message. For more details about this output config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#OutputConfig. :param batch_predict_machine_type: The type of machine for running batch prediction on dedicated resources. If the Model supports DEDICATED_RESOURCES this config may be provided (and the job will use these resources). If the Model doesn’t support AUTOMATIC_RESOURCES, this config must be provided. For more details about the BatchDedicatedResources, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#BatchDedicatedResources. For more details about the machine spec, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/MachineSpec :param batch_predict_starting_replica_count: The number of machine replicas used at the start of the batch operation. If not set, Vertex AI decides starting number, not greater than max_replica_count. Only used if machine_type is set. :param batch_predict_max_replica_count: The maximum number of machine replicas the batch operation may be scaled to. Only used if machine_type is set. :param batch_predict_accelerator_type: The type of accelerator(s) that may be attached to the machine as per batch_predict_accelerator_count. Only used if batch_predict_machine_type is set. For more details about the machine spec, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/MachineSpec :param batch_predict_accelerator_count: The number of accelerators to attach to the batch_predict_machine_type. Only used if batch_predict_machine_type is set. :param evaluation_prediction_label_column: The column name of the field containing classes the model is scoring. Formatted to be able to find nested columns, delimited by .. :param evaluation_prediction_score_column: The column name of the field containing batch prediction scores. Formatted to be able to find nested columns, delimited by .. :param evaluation_class_labels: Required for classification prediction type. The list of class names for the target_field_name, in the same order they appear in a file in batch_predict_gcs_source_uris. For instance, if the target_field_name could be either 1 or 0, then the class_labels input will be [“1”, “0”]. :param dataflow_machine_type: The Dataflow machine type for evaluation components. :param dataflow_max_num_workers: The max number of Dataflow workers for evaluation components. :param dataflow_disk_size_gb: Dataflow worker’s disk size in GB for evaluation components. :param dataflow_service_account: Custom service account to run Dataflow jobs. :param dataflow_subnetwork: Dataflow’s fully qualified subnetwork name, when empty the default subnetwork will be used. Example: https://cloud.google.com/dataflow/docs/guides/specifying-networks#example_network_and_subnetwork_specifications :param dataflow_use_public_ips: Specifies whether Dataflow workers use public IP addresses. :param encryption_spec_key_name: Customer-managed encryption key options. If set, resources created by this pipeline will be encrypted with the provided encryption key. Has the form: projects/my-project/locations/my-location/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. :param force_runner_mode: Indicate the runner mode to use forcely. Valid options are Dataflow and DirectRunner.

v1.model_evaluation.evaluation_feature_attribution_pipeline(project: str, location: str, prediction_type: str, model_name: str, target_field_name: str, batch_predict_instances_format: str, batch_predict_gcs_destination_output_uri: str, batch_predict_gcs_source_uris: list[str] = [], batch_predict_bigquery_source_uri: str = '', batch_predict_predictions_format: str = 'jsonl', batch_predict_bigquery_destination_output_uri: str = '', batch_predict_machine_type: str = 'n1-standard-16', batch_predict_starting_replica_count: int = 5, batch_predict_max_replica_count: int = 10, batch_predict_explanation_metadata: dict[str, Any] = {}, batch_predict_explanation_parameters: dict[str, Any] = {}, batch_predict_explanation_data_sample_size: int = 10000, batch_predict_accelerator_type: str = '', batch_predict_accelerator_count: int = 0, evaluation_prediction_label_column: str = '', evaluation_prediction_score_column: str = '', evaluation_class_labels: list[str] = [], dataflow_machine_type: str = 'n1-standard-4', dataflow_max_num_workers: int = 5, dataflow_disk_size_gb: int = 50, dataflow_service_account: str = '', dataflow_subnetwork: str = '', dataflow_use_public_ips: bool = True, encryption_spec_key_name: str = '', force_runner_mode: str = '')[source]¶

The evaluation custom tabular pipeline with feature attribution.

This pipeline gives support for custom models that contain a valid explanation_spec. This pipeline includes the target_field_data_remover component, which is needed for many tabular custom models.

Parameters¶

project: str¶: The GCP project that runs the pipeline components.
location: str¶: The GCP region that runs the pipeline components.
prediction_type: str¶: The type of prediction the model is to produce.

“classification” or “regression”. :param model_name: The Vertex model resource name to be imported and used for batch prediction. :param target_field_name: The target field’s name. Formatted to be able to find nested columns, delimited by .. Prefixed with ‘instance.’ on the component for Vertex Batch Prediction. :param batch_predict_instances_format: The format in which instances are given, must be one of the Model’s supportedInputStorageFormats. For more details about this input config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#InputConfig. :param batch_predict_gcs_destination_output_uri: The Google Cloud Storage location of the directory where the output is to be written to. In the given directory a new directory is created. Its name is prediction-<model-display-name>-<job-create-time>, where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format. Inside of it files predictions_0001.<extension>, predictions_0002.<extension>, …, predictions_N.<extension> are created where <extension> depends on chosen predictions_format, and N may equal 0001 and depends on the total number of successfully predicted instances. If the Model has both instance and prediction schemata defined then each such file contains predictions as per the predictions_format. If prediction for any instance failed (partially or completely), then an additional errors_0001.<extension>, errors_0002.<extension>,…, errors_N.<extension> files are created (N depends on total number of failed predictions). These files contain the failed instances, as per their schema, followed by an additional error field which as value has google.rpc.Status containing only code and message fields. For more details about this output config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#OutputConfig. :param batch_predict_gcs_source_uris: Google Cloud Storage URI(-s) to your instances data to run batch prediction on. The instances data should also contain the ground truth (target) data, used for evaluation. May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. For more details about this input config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#InputConfig. :param batch_predict_bigquery_source_uri: Google BigQuery URI to your instances to run batch prediction on. May contain wildcards. For more details about this input config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#InputConfig. :param batch_predict_predictions_format: The format in which Vertex AI gives the predictions. Must be one of the Model’s supportedOutputStorageFormats. For more details about this output config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#OutputConfig. :param batch_predict_bigquery_destination_output_uri: The BigQuery project location where the output is to be written to. In the given project a new dataset is created with name prediction_<model-display-name>_<job-create-time> where is made BigQuery-dataset-name compatible (for example, most special characters become underscores), and timestamp is in YYYY_MM_DDThh_mm_ss_sssZ “based on ISO-8601” format. In the dataset two tables will be created, predictions, and errors. If the Model has both instance and prediction schemata defined then the tables have columns as follows: The predictions table contains instances for which the prediction succeeded, it has columns as per a concatenation of the Model’s instance and prediction schemata. The errors table contains rows for which the prediction has failed, it has instance columns, as per the instance schema, followed by a single “errors” column, which as values has google.rpc.Status represented as a STRUCT, and containing only code and message. For more details about this output config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#OutputConfig. :param batch_predict_machine_type: The type of machine for running batch prediction on dedicated resources. If the Model supports DEDICATED_RESOURCES this config may be provided (and the job will use these resources). If the Model doesn’t support AUTOMATIC_RESOURCES, this config must be provided. For more details about the BatchDedicatedResources, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#BatchDedicatedResources. For more details about the machine spec, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/MachineSpec :param batch_predict_starting_replica_count: The number of machine replicas used at the start of the batch operation. If not set, Vertex AI decides starting number, not greater than max_replica_count. Only used if machine_type is set. :param batch_predict_max_replica_count: The maximum number of machine replicas the batch operation may be scaled to. Only used if machine_type is set. :param batch_predict_explanation_metadata: Explanation metadata configuration for this BatchPredictionJob. Can be specified only if generate_explanation is set to True. This value overrides the value of Model.explanation_metadata. All fields of explanation_metadata are optional in the request. If a field of the explanation_metadata object is not populated, the corresponding field of the Model.explanation_metadata object is inherited. For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/ExplanationSpec#explanationmetadata. :param batch_predict_explanation_parameters: Parameters to configure explaining for Model’s predictions. Can be specified only if generate_explanation is set to True. This value overrides the value of Model.explanation_parameters. All fields of explanation_parameters are optional in the request. If a field of the explanation_parameters object is not populated, the corresponding field of the Model.explanation_parameters object is inherited. For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/ExplanationSpec#ExplanationParameters. :param batch_predict_explanation_data_sample_size: Desired size to downsample the input dataset that will then be used for batch explanation. :param batch_predict_accelerator_type: The type of accelerator(s) that may be attached to the machine as per batch_predict_accelerator_count. Only used if batch_predict_machine_type is set. For more details about the machine spec, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/MachineSpec :param batch_predict_accelerator_count: The number of accelerators to attach to the batch_predict_machine_type. Only used if batch_predict_machine_type is set. :param evaluation_prediction_label_column: The column name of the field containing classes the model is scoring. Formatted to be able to find nested columns, delimited by .. :param evaluation_prediction_score_column: The column name of the field containing batch prediction scores. Formatted to be able to find nested columns, delimited by .. :param evaluation_class_labels: Required for classification prediction type. The list of class names for the target_field_name, in the same order they appear in a file in batch_predict_gcs_source_uris. For instance, if the target_field_name could be either 1 or 0, then the class_labels input will be [“1”, “0”]. :param dataflow_machine_type: The Dataflow machine type for evaluation components. :param dataflow_max_num_workers: The max number of Dataflow workers for evaluation components. :param dataflow_disk_size_gb: Dataflow worker’s disk size in GB for evaluation components. :param dataflow_service_account: Custom service account to run Dataflow jobs. :param dataflow_subnetwork: Dataflow’s fully qualified subnetwork name, when empty the default subnetwork will be used. Example: https://cloud.google.com/dataflow/docs/guides/specifying-networks#example_network_and_subnetwork_specifications :param dataflow_use_public_ips: Specifies whether Dataflow workers use public IP addresses. :param encryption_spec_key_name: Customer-managed encryption key options. If set, resources created by this pipeline will be encrypted with the provided encryption key. Has the form: projects/my-project/locations/my-location/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. :param force_runner_mode: Indicate the runner mode to use forcely. Valid options are Dataflow and DirectRunner.

v1.model_evaluation.vision_model_error_analysis_pipeline(location: str, model_name: str, batch_predict_gcs_destination_output_uri: str, test_dataset_resource_name: str = '', test_dataset_annotation_set_name: str = '', training_dataset_resource_name: str = '', training_dataset_annotation_set_name: str = '', test_dataset_storage_source_uris: list[str] = [], training_dataset_storage_source_uris: list[str] = [], batch_predict_instances_format: str = 'jsonl', batch_predict_predictions_format: str = 'jsonl', batch_predict_machine_type: str = 'n1-standard-32', batch_predict_starting_replica_count: int = 5, batch_predict_max_replica_count: int = 10, batch_predict_accelerator_type: str = '', batch_predict_accelerator_count: int = 0, dataflow_machine_type: str = 'n1-standard-8', dataflow_max_num_workers: int = 5, dataflow_disk_size_gb: int = 50, dataflow_service_account: str = '', dataflow_subnetwork: str = '', dataflow_use_public_ips: bool = True, encryption_spec_key_name: str = '', force_runner_mode: str = '', project: str = '{{$.pipeline_google_cloud_project_id}}')[source]¶

The evaluation vision error analysis pipeline.

This pipeline can help you to continuously discover dataset example errors with nearest neighbor distances and outlier flags, and provides you with actionable steps to improve the model performance. It uses GCP services including Dataflow and BatchPrediction.

Parameters¶

location: str¶: The GCP region that runs the pipeline components.
model_name: str¶: The Vertex model resource name to be imported and used for batch

prediction, in the format of projects/{project}/locations/{location}/models/{model} or projects/{project}/locations/{location}/models/{model}@{model_version_id or model_version_alias} :param batch_predict_gcs_destination_output_uri: The Google Cloud Storage location of the directory where the output is to be written to. In the given directory a new directory is created. Its name is prediction-<model-display-name>-<job-create-time>, where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format. Inside of it files predictions_0001.<extension>, predictions_0002.<extension>, …, predictions_N.<extension> are created where <extension> depends on chosen predictions_format, and N may equal 0001 and depends on the total number of successfully predicted instances. If the Model has both instance and prediction schemata defined then each such file contains predictions as per the predictions_format. If prediction for any instance failed (partially or completely), then an additional errors_0001.<extension>, errors_0002.<extension>,…, errors_N.<extension> files are created (N depends on total number of failed predictions). These files contain the failed instances, as per their schema, followed by an additional error field which as value has google.rpc.Status containing only code and message fields. For more details about this output config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#OutputConfig. :param test_dataset_resource_name: A Vertex dataset resource name of the test dataset. If test_dataset_storage_source_uris is also provided, this argument will override the GCS source. :param test_dataset_annotation_set_name: A string of the annotation_set resource name containing the ground truth of the test datset used for evaluation. :param training_dataset_resource_name: A Vertex dataset resource name of the training dataset. If training_dataset_storage_source_uris is also provided, this argument will override the GCS source. :param training_dataset_annotation_set_name: A string of the annotation_set resource name containing the ground truth of the test datset used for feature extraction. :param test_dataset_storage_source_uris: Google Cloud Storage URI(-s) to unmanaged test datasets.``jsonl`` is currently the only allowed format. If test_dataset is also provided, this field will be overridden by the provided Vertex Dataset. :param training_dataset_storage_source_uris: Google Cloud Storage URI(-s) to unmanaged test datasets.``jsonl`` is currently the only allowed format. If training_dataset is also provided, this field will be overridden by the provided Vertex Dataset. :param batch_predict_instances_format: The format in which instances are given, must be one of the Model’s supportedInputStorageFormats. For more details about this input config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#InputConfig. :param batch_predict_predictions_format: The format in which Vertex AI gives the predictions. Must be one of the Model’s supportedOutputStorageFormats. For more details about this output config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#OutputConfig. :param batch_predict_machine_type: The type of machine for running batch prediction on dedicated resources. If the Model supports DEDICATED_RESOURCES this config may be provided (and the job will use these resources). If the Model doesn’t support AUTOMATIC_RESOURCES, this config must be provided. For more details about the BatchDedicatedResources, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#BatchDedicatedResources. For more details about the machine spec, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/MachineSpec :param batch_predict_starting_replica_count: The number of machine replicas used at the start of the batch operation. If not set, Vertex AI decides starting number, not greater than max_replica_count. Only used if machine_type is set. :param batch_predict_max_replica_count: The maximum number of machine replicas the batch operation may be scaled to. Only used if machine_type is set. :param batch_predict_accelerator_type: The type of accelerator(s) that may be attached to the machine as per batch_predict_accelerator_count. Only used if batch_predict_machine_type is set. For more details about the machine spec, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/MachineSpec :param batch_predict_accelerator_count: The number of accelerators to attach to the batch_predict_machine_type. Only used if batch_predict_machine_type is set. :param dataflow_machine_type: The Dataflow machine type for evaluation components. :param dataflow_max_num_workers: The max number of Dataflow workers for evaluation components. :param dataflow_disk_size_gb: The disk size (in GB) of the machine executing the evaluation run. :param dataflow_service_account: Custom service account to run Dataflow jobs. :param dataflow_subnetwork: Dataflow’s fully qualified subnetwork name, when empty the default subnetwork will be used. Example: https://cloud.google.com/dataflow/docs/guides/specifying-networks#example_network_and_subnetwork_specifications :param dataflow_use_public_ips: Specifies whether Dataflow workers use public IP addresses. :param encryption_spec_key_name: Customer-managed encryption key options. If set, resources created by this pipeline will be encrypted with the provided encryption key. Has the form: projects/my-project/locations/my-location/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. :param force_runner_mode: Indicate the runner mode to use forcely. Valid options are Dataflow and DirectRunner. :param project: The GCP project that runs the pipeline components. Defaults to the project in which the PipelineJob is run.