google_cloud_pipeline_components.experimental.evaluation module

Google Cloud Pipeline Model Evaluation components.

google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp(project: str, root_dir: str, location: str = 'us-central1', gcs_source_uris: list = '[]', bigquery_source_uri: str = '', instances_format: str = 'jsonl', sample_size: int = 10000, dataflow_service_account: str = '', dataflow_subnetwork: str = '', dataflow_use_public_ips: bool = 'true', encryption_spec_key_name: str = '')

evaluation_data_sampler Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset.

Args:
project (str):

Required. Project to retrieve dataset from.

location (Optional[str]):

Location to retrieve dataset from. If not set, defaulted to us-central1.

root_dir (str):

Required. The GCS directory for keeping staging files. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.

gcs_source_uris (Sequence[str]):

Google Cloud Storage URI(-s) to your instances to run data sampler on. They must match instances_format. May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames.

bigquery_source_uri (Optional[str]):

Google BigQuery Table URI to your instances to run data sampler on.

instances_format (Optional[str]):

The format in which instances are given, must be one of the model’s supported input storage formats. If not set, default to “jsonl”.

sample_size (Optional[int]):

Sample size of the randomly sampled dataset. 10k by default.

dataflow_service_account (Optional[str]):

Optional. Service account to run the dataflow job. If not set, dataflow will use the default woker service account.

For more details, see https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#default_worker_service_account

dataflow_subnetwork (Optional[str]):

Dataflow’s fully qualified subnetwork name, when empty the default subnetwork will be used. More details: https://cloud.google.com/dataflow/docs/guides/specifying-networks#example_network_and_subnetwork_specifications

dataflow_use_public_ips (Optional[bool]):

Specifies whether Dataflow workers use public IP addresses.

encryption_spec_key_name (Optional[str]):

Customer-managed encryption key for the Dataflow job. If this is set, then all resources created by the Dataflow job will be encrypted with the provided encryption key.

Returns:
gcs_output_directory (JsonArray):

JsonArray of the downsampled dataset GCS output.

bigquery_output_table (str):

String of the downsampled dataset BigQuery output.

gcp_resources (str):

Serialized gcp_resources proto tracking the data sampler.

google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp(model_resource_name: str)

get_vertex_model TO BE REMOVED. Gets a Vertex Model Artifact.

google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationClassificationOp(project: str, root_dir: str, target_field_name: str, location: str = 'us-central1', predictions_format: str = 'jsonl', predictions_gcs_source: Artifact = None, predictions_bigquery_source: google.BQTable = None, ground_truth_format: str = 'jsonl', ground_truth_gcs_source: list = '{}', ground_truth_bigquery_source: str = '', classification_type: str = '', class_labels: list = '{}', model: google.VertexModel = None, prediction_score_column: str = '', prediction_label_column: str = '', prediction_id_column: str = '', example_weight_column: str = '', positive_classes: list = '{}', dataflow_service_account: str = '', dataflow_disk_size: int = 50, dataflow_machine_type: str = 'n1-standard-4', dataflow_workers_num: int = 1, dataflow_max_workers_num: int = 5, dataflow_subnetwork: str = '', dataflow_use_public_ips: bool = 'true', encryption_spec_key_name: str = '')

model_evaluation_classification Computes a google.ClassificationMetrics Artifact, containing evaluation metrics given a model’s prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports mutliclass classification evaluation for tabular, image, video, and text data.

Args:
project (str):

Project to run evaluation container.

location (Optional[str]):

Location for running the evaluation. If not set, defaulted to us-central1.

root_dir (str):

The GCS directory for keeping staging files. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.

predictions_format (Optional[str]):

The file format for the batch prediction results. jsonl is currently the only allowed format. If not set, defaulted to jsonl.

predictions_gcs_source (Optional[system.Artifact]):

An artifact with its URI pointing toward a GCS directory with prediction or explanation files to be used for this evaluation. For prediction results, the files should be named “prediction.results-“. For explanation results, the files should be named “explanation.results-“.

predictions_bigquery_source (Optional[google.BQTable]):

BigQuery table with prediction or explanation data to be used for this evaluation. For prediction results, the table column should be named “predicted_*”.

ground_truth_format(Optional[str]):

Required for custom tabular and non tabular data. The file format for the ground truth files. jsonl is currently the only allowed format. If not set, defaulted to jsonl.

ground_truth_gcs_source(Optional[Sequence[str]]):

Required for custom tabular and non tabular data. The GCS uris representing where the ground truth is located. Used to provide ground truth for each prediction instance when they are not part of the batch prediction jobs prediction instance.

ground_truth_bigquery_source(Optional[str]):

Required for custom tabular. The BigQuery table uri representing where the ground truth is located. Used to provide ground truth for each prediction instance when they are not part of the batch prediction jobs prediction instance.

classification_type (Optional[str]):

The type of classification problem, either multiclass or multilabel. If not set, defaulted to multiclass.

class_labels (Optional[Sequence[str]]):

The list of class names for the target_field_name, in the same order they appear in the batch predictions jobs predictions output file. For instance, if the values of target_field_name could be either 1 or 0, and the predictions output contains [“1”, “0”] for the prediction_label_column, then the class_labels input will be [“1”, “0”]. If not set, defaulted to the classes found in the prediction_label_column in the batch prediction jobs predictions file.

target_field_name (str):

The full name path of the features target field in the predictions file. Formatted to be able to find nested columns, delimited by .. Alternatively referred to as the ground truth (or ground_truth_column) field.

model (Optional[google.VertexModel]):

The Model used for predictions job. Must share the same ancestor Location.

prediction_score_column (Optional[str]):

Optional. The column name of the field containing batch prediction scores. Formatted to be able to find nested columns, delimited by .. If not set, defaulted to prediction.scores for classification.

prediction_label_column (Optional[str]):

Optional. The column name of the field containing classes the model is scoring. Formatted to be able to find nested columns, delimited by .. If not set, defaulted to prediction.classes for classification.

prediction_id_column (Optional[str]):

Optional. The column name of the field containing ids for classes the model is scoring. Formatted to be able to find nested columns, delimited by ..

example_weight_column (Optional[str]):

Optional. The column name of the field containing example weights. Formatted to be able to find nested columns, delimited by ..

positive_classes (Optional[Sequence[str]]):

Optional. The list of class names to create binary classification metrics based on one-vs-rest for each value of positive_classes provided.

dataflow_service_account (Optional[str]):

Optional. Service account to run the dataflow job. If not set, dataflow will use the default woker service account.

For more details, see https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#default_worker_service_account

dataflow_disk_size (Optional[int]):

Optional. The disk size (in GB) of the machine executing the evaluation run. If not set, defaulted to 50.

dataflow_machine_type (Optional[str]):

Optional. The machine type executing the evaluation run. If not set, defaulted to n1-standard-4.

dataflow_workers_num (Optional[int]):

Optional. The number of workers executing the evaluation run. If not set, defaulted to 10.

dataflow_max_workers_num (Optional[int]):

Optional. The max number of workers executing the evaluation run. If not set, defaulted to 25.

dataflow_subnetwork (Optional[str]):

Dataflow’s fully qualified subnetwork name, when empty the default subnetwork will be used. More details: https://cloud.google.com/dataflow/docs/guides/specifying-networks#example_network_and_subnetwork_specifications

dataflow_use_public_ips (Optional[bool]):

Specifies whether Dataflow workers use public IP addresses.

encryption_spec_key_name (Optional[str]):

Customer-managed encryption key.

Returns:
evaluation_metrics (google.ClassificationMetrics):

google.ClassificationMetrics artifact representing the classification evaluation metrics in GCS.

google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp(project: str, root_dir: str, location: str = 'us-central1', predictions_format: str = 'jsonl', predictions_gcs_source: Artifact = None, predictions_bigquery_source: google.BQTable = None, dataflow_service_account: str = '', dataflow_disk_size: int = 50, dataflow_machine_type: str = 'n1-standard-4', dataflow_workers_num: int = 1, dataflow_max_workers_num: int = 5, dataflow_subnetwork: str = '', dataflow_use_public_ips: bool = 'true', encryption_spec_key_name: str = '')

feature_attribution Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions.

Args:
project (str):

Project to run evaluation container.

location (Optional[str]):

Location for running the evaluation. If not set, defaulted to us-central1.

root_dir (str):

The GCS directory for keeping staging files. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.

predictions_format (Optional[str]):

The file format for the batch prediction results. jsonl is currently the only allowed format currently. If not set, defaulted to jsonl.

predictions_gcs_source (Optional[system.Artifact]):

An artifact with its URI pointing toward a GCS directory with prediction or explanation files to be used for this evaluation. For prediction results, the files should be named “prediction.results-“. For explanation results, the files should be named “explanation.results-“.

predictions_bigquery_source (Optional[google.BQTable]):

BigQuery table with prediction or explanation data to be used for this evaluation. For prediction results, the table column should be named “predicted_*”.

dataflow_service_account (Optional[str]):

Optional. Service account to run the dataflow job. If not set, dataflow will use the default woker service account.

For more details, see https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#default_worker_service_account

dataflow_disk_size (Optional[int]):

Optional. The disk size (in GB) of the machine executing the evaluation run. If not set, defaulted to 50.

dataflow_machine_type (Optional[str]):

Optional. The machine type executing the evaluation run. If not set, defaulted to n1-standard-4.

dataflow_workers_num (Optional[int]):

Optional. The number of workers executing the evaluation run. If not set, defaulted to 10.

dataflow_max_workers_num (Optional[int]):

Optional. The max number of workers executing the evaluation run. If not set, defaulted to 25.

dataflow_subnetwork (Optional[str]):

Dataflow’s fully qualified subnetwork name, when empty the default subnetwork will be used. More details: https://cloud.google.com/dataflow/docs/guides/specifying-networks#example_network_and_subnetwork_specifications

dataflow_use_public_ips (Optional[bool]):

Specifies whether Dataflow workers use public IP addresses.

encryption_spec_key_name (Optional[str]):

Customer-managed encryption key.

Returns:
evaluation_metrics (system.Metrics):

System metrics artifact representing the evaluation metrics in GCS. WIP to update to a google.VertexMetrics type with additional functionality.

google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationForecastingOp(project: str, root_dir: str, target_field_name: str, location: str = 'us-central1', predictions_format: str = 'jsonl', predictions_gcs_source: Artifact = None, predictions_bigquery_source: google.BQTable = None, ground_truth_format: str = 'jsonl', ground_truth_gcs_source: list = '{}', ground_truth_bigquery_source: str = '', model: google.VertexModel = None, prediction_score_column: str = '', forecasting_type: str = 'point', forecasting_quantiles: list = '[0.5]', example_weight_column: str = '', dataflow_service_account: str = '', dataflow_disk_size: int = 50, dataflow_machine_type: str = 'n1-standard-4', dataflow_workers_num: int = 1, dataflow_max_workers_num: int = 5, dataflow_subnetwork: str = '', dataflow_use_public_ips: bool = 'true', encryption_spec_key_name: str = '')

model_evaluation_forecasting Computes a google.ForecastingMetrics Artifact, containing evaluation metrics given a model’s prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports point forecasting and quantile forecasting for tabular data. Args:

project (str):

Project to run evaluation container.

location (Optional[str]):

Location for running the evaluation. If not set, defaulted to us-central1.

root_dir (str):

The GCS directory for keeping staging files. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.

predictions_format (Optional[str]):

The file format for the batch prediction results. jsonl is currently the only allowed format. If not set, defaulted to jsonl.

predictions_gcs_source (Optional[system.Artifact]):

An artifact with its URI pointing toward a GCS directory with prediction or explanation files to be used for this evaluation. For prediction results, the files should be named “prediction.results-“. For explanation results, the files should be named “explanation.results-“.

predictions_bigquery_source (Optional[google.BQTable]):

BigQuery table with prediction or explanation data to be used for this evaluation. For prediction results, the table column should be named “predicted_*”.

ground_truth_format(Optional[str]):

Required for custom tabular and non tabular data. The file format for the ground truth files. jsonl is currently the only allowed format. If not set, defaulted to jsonl.

ground_truth_gcs_source(Optional[Sequence[str]]):

Required for custom tabular and non tabular data. The GCS uris representing where the ground truth is located. Used to provide ground truth for each prediction instance when they are not part of the batch prediction jobs prediction instance.

ground_truth_bigquery_source(Optional[str]):

Required for custom tabular. The BigQuery table uri representing where the ground truth is located. Used to provide ground truth for each prediction instance when they are not part of the batch prediction jobs prediction instance.

target_field_name (str):

The full name path of the features target field in the predictions file. Formatted to be able to find nested columns, delimited by .. Alternatively referred to as the ground truth (or ground_truth_column) field.

model (Optional[google.VertexModel]):

The Model used for predictions job. Must share the same ancestor Location.

prediction_score_column (Optional[str]):

Optional. The column name of the field containing batch prediction scores. Formatted to be able to find nested columns, delimited by .. If not set, defaulted to prediction.value for a point forecasting_type and prediction.quantile_predictions for a quantile forecasting_type.

forecasting_type (Optional[str]):

Optional. If the problem_type is forecasting, then the forecasting type being addressed by this regression evaluation run. point and quantile are the supported types. If not set, defaulted to point.

forecasting_quantiles (Optional[Sequence[Float]]):

Required for a quantile forecasting_type. The list of quantiles in the same order appeared in the quantile prediction score column. If one of the quantiles is set to 0.5f, point evaluation will be set on that index.

example_weight_column (Optional[str]):
Optional. The column name of the field containing example weights.

Each value of positive_classes provided.

dataflow_service_account (Optional[str]):

Optional. Service account to run the dataflow job. If not set, dataflow will use the default woker service account. For more details, see https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#default_worker_service_account

dataflow_disk_size (Optional[int]):

Optional. The disk size (in GB) of the machine executing the evaluation run. If not set, defaulted to 50.

dataflow_machine_type (Optional[str]):

Optional. The machine type executing the evaluation run. If not set, defaulted to n1-standard-4.

dataflow_workers_num (Optional[int]):

Optional. The number of workers executing the evaluation run. If not set, defaulted to 10.

dataflow_max_workers_num (Optional[int]):

Optional. The max number of workers executing the evaluation run. If not set, defaulted to 25.

dataflow_subnetwork (Optional[str]):

Dataflow’s fully qualified subnetwork name, when empty the default subnetwork will be used. More details: https://cloud.google.com/dataflow/docs/guides/specifying-networks#example_network_and_subnetwork_specifications

dataflow_use_public_ips (Optional[bool]):

Specifies whether Dataflow workers use public IP addresses.

encryption_spec_key_name (Optional[str]):

Customer-managed encryption key.

Returns:
evaluation_metrics (google.ForecastingMetrics):

google.ForecastingMetrics artifact representing the forecasting evaluation metrics in GCS.

google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationOp(project: str, root_dir: str, problem_type: str, batch_prediction_job: google.VertexBatchPredictionJob, ground_truth_column: str, location: str = 'us-central1', predictions_format: str = 'jsonl', ground_truth_format: str = 'jsonl', ground_truth_gcs_source: list = '{}', key_columns: list = '{}', classification_type: str = '', class_names: list = '{}', prediction_score_column: str = '', prediction_label_column: str = '', prediction_id_column: str = '', example_weight_column: str = '', positive_classes: list = '{}', generate_feature_attribution: bool = False, dataflow_service_account: str = '', dataflow_disk_size: int = 50, dataflow_machine_type: str = 'n1-standard-4', dataflow_workers_num: int = 1, dataflow_max_workers_num: int = 5, dataflow_subnetwork: str = '', dataflow_use_public_ips: bool = 'true', encryption_spec_key_name: str = '')

model_evaluation TO BE REMOVED. Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics.

Args:
project (str):

Project to run evaluation container.

location (Optional[str]):

Location for running the evaluation. If not set, defaulted to us-central1.

root_dir (str):

The GCS directory for keeping staging files. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.

problem_type (str):

The problem type being addressed by this evaluation run. classification and regression are the currently supported problem types.

predictions_format (Optional[str]):

The file format for the batch prediction results. jsonl is currently the only allowed format currently. If not set, defaulted to jsonl.

batch_prediction_job (google.VertexBatchPredictionJob):

The VertexBatchPredictionJob with prediction or explanation results for this evaluation run. For prediction results, the files should be in format “prediction.results-“. For explanation results, the files should be in format “explanation.results-“.

ground_truth_format(Optional[str]):

Unstructured data classification. The file format for the ground truth files. jsonl is currently the only allowed format. If not set, defaulted to jsonl.

ground_truth_gcs_source(Optional[Sequence[str]]):

Unstructured data classification. The GCS uris representing where the ground truth is located. Used to provide ground truth for each prediction instance when they are not part of the batch prediction jobs prediction instance.

key_columns(Optional[Sequence[str]]):

Unstructured data classification. The list of fields in the ground truth gcs source to format the joining key. Used to merge prediction instances with ground truth data.

classification_type (Optional[str]):

Required only for a classification problem_type. The type of classification problem. Defined as multiclass or multilabel. If not set, defaulted to multiclass internally.

class_names (Optional[Sequence[str]]):

The list of class names for the ground_truth_column, in the same order they appear in the batch predictions jobs predictions output file. For instance, if the ground_truth_column could be either 1 or 0, and the batch prediction jobs predictions output contains [“1”, “0”] for the prediction_label_column, then the class_names input will be [“1”, “0”]. If not set, defaulted to the classes found in the prediction_label_column in the batch prediction jobs predictions file.

ground_truth_column (str):

The column name of the feature containing ground truth. Formatted to be able to find nested columns, delimeted by ..

prediction_score_column (Optional[str]):

The column name of the field containing batch prediction scores. Formatted to be able to find nested columns, delimeted by .. If not set, defaulted to prediction.scores for a classification problem_type, prediction.value for a regression problem_type.

prediction_label_column (Optional[str]):

Optional. The column name of the field containing classes the model is scoring. Formatted to be able to find nested columns, delimeted by .. If not set, defaulted to prediction.classes for classification.

prediction_id_column (Optional[str]):

Optional. The column name of the field containing ids for classes the model is scoring. Formatted to be able to find nested columns, delimeted by ..

example_weight_column (Optional[str]):

Optional. The column name of the field containing example weights. Formatted to be able to find nested columns, delimeted by ..

positive_classes (Optional[Sequence[str]]):

Optional for a classification problem_type. The list of class names to create binary classification metrics based on one-vs-rest for Each value of positive_classes provided.

generate_feature_attribution (Optional[bool]):

Optional. If set to True, then the explanations generated by the VertexBatchPredictionJob will be used to generate feature attributions. This will only pass if the input VertexBatchPredictionJob generated explanations. If not set, defaulted to False.

dataflow_service_account (Optional[str]):

Optional. Service account to run the dataflow job. If not set, dataflow will use the default woker service account.

For more details, see https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#default_worker_service_account

dataflow_disk_size (Optional[int]):

Optional. The disk size (in GB) of the machine executing the evaluation run. If not set, defaulted to 50.

dataflow_machine_type (Optional[str]):

Optional. The machine type executing the evaluation run. If not set, defaulted to n1-standard-4.

dataflow_workers_num (Optional[int]):

Optional. The number of workers executing the evaluation run. If not set, defaulted to 10.

dataflow_max_workers_num (Optional[int]):

Optional. The max number of workers executing the evaluation run. If not set, defaulted to 25.

dataflow_subnetwork (Optional[str]):

Dataflow’s fully qualified subnetwork name, when empty the default subnetwork will be used. More details: https://cloud.google.com/dataflow/docs/guides/specifying-networks#example_network_and_subnetwork_specifications

dataflow_use_public_ips (Optional[bool]):

Specifies whether Dataflow workers use public IP addresses.

encryption_spec_key_name (Optional[str]):

Customer-managed encryption key.

Returns:
evaluation_metrics (system.Metrics):

System metrics artifact representing the evaluation metrics in GCS. WIP to update to a google.VertexMetrics type with additional functionality.

google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationRegressionOp(project: str, root_dir: str, target_field_name: str, location: str = 'us-central1', predictions_format: str = 'jsonl', predictions_gcs_source: Artifact = None, predictions_bigquery_source: google.BQTable = None, ground_truth_format: str = 'jsonl', ground_truth_gcs_source: list = '{}', ground_truth_bigquery_source: str = '', model: google.VertexModel = None, prediction_score_column: str = '', example_weight_column: str = '', dataflow_service_account: str = '', dataflow_disk_size: int = 50, dataflow_machine_type: str = 'n1-standard-4', dataflow_workers_num: int = 1, dataflow_max_workers_num: int = 5, dataflow_subnetwork: str = '', dataflow_use_public_ips: bool = 'true', encryption_spec_key_name: str = '')

model_evaluation_regression Computes a google.RegressionMetrics Artifact, containing evaluation metrics given a model’s prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports regression for tabular data.

Args:
project (str):

Project to run evaluation container.

location (Optional[str]):

Location for running the evaluation. If not set, defaulted to us-central1.

root_dir (str):

The GCS directory for keeping staging files. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.

predictions_format (Optional[str]):

The file format for the batch prediction results. jsonl is currently the only allowed format. If not set, defaulted to jsonl.

predictions_gcs_source (Optional[system.Artifact]):

An artifact with its URI pointing toward a GCS directory with prediction or explanation files to be used for this evaluation. For prediction results, the files should be named “prediction.results-“. For explanation results, the files should be named “explanation.results-“.

predictions_bigquery_source (Optional[google.BQTable]):

BigQuery table with prediction or explanation data to be used for this evaluation. For prediction results, the table column should be named “predicted_*”.

ground_truth_format(Optional[str]):

Required for custom tabular and non tabular data. The file format for the ground truth files. jsonl is currently the only allowed format. If not set, defaulted to jsonl.

ground_truth_gcs_source(Optional[Sequence[str]]):

Required for custom tabular and non tabular data. The GCS uris representing where the ground truth is located. Used to provide ground truth for each prediction instance when they are not part of the batch prediction jobs prediction instance.

ground_truth_bigquery_source(Optional[str]):

Required for custom tabular. The BigQuery table uri representing where the ground truth is located. Used to provide ground truth for each prediction instance when they are not part of the batch prediction jobs prediction instance.

target_field_name (str):

The full name path of the features target field in the predictions file. Formatted to be able to find nested columns, delimited by .. Alternatively referred to as the ground truth (or ground_truth_column) field.

model (Optional[google.VertexModel]):

The Model used for predictions job. Must share the same ancestor Location.

prediction_score_column (Optional[str]):

Optional. The column name of the field containing batch prediction scores. Formatted to be able to find nested columns, delimited by .. If not set, defaulted to prediction.value for regression.

example_weight_column (Optional[str]):

Optional. The column name of the field containing example weights. Formatted to be able to find nested columns, delimited by ..

dataflow_disk_size (Optional[int]):

Optional. The disk size (in GB) of the machine executing the evaluation run. If not set, defaulted to 50.

dataflow_machine_type (Optional[str]):

Optional. The machine type executing the evaluation run. If not set, defaulted to n1-standard-4.

dataflow_workers_num (Optional[int]):

Optional. The number of workers executing the evaluation run. If not set, defaulted to 10.

dataflow_max_workers_num (Optional[int]):

Optional. The max number of workers executing the evaluation run. If not set, defaulted to 25.

dataflow_subnetwork (Optional[str]):

Dataflow’s fully qualified subnetwork name, when empty the default subnetwork will be used. More details: https://cloud.google.com/dataflow/docs/guides/specifying-networks#example_network_and_subnetwork_specifications

dataflow_use_public_ips (Optional[bool]):

Specifies whether Dataflow workers use public IP addresses.

encryption_spec_key_name (Optional[str]):

Customer-managed encryption key.

Returns:
evaluation_metrics (google.RegressionMetrics):

google.RegressionMetrics artifact representing the regression evaluation metrics in GCS.

google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp(model: google.VertexModel, metrics: Metrics = None, problem_type: str = None, classification_metrics: google.ClassificationMetrics = None, forecasting_metrics: google.ForecastingMetrics = None, regression_metrics: google.RegressionMetrics = None, explanation: Metrics = None, feature_attributions: Metrics = None, display_name: str = '', dataset_path: str = '', dataset_paths: list = '[]', dataset_type: str = '')

model_evaluation_import Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.models.evaluations One of the four metrics inputs must be provided, metrics & problem_type, classification_metrics, regression_metrics, or forecasting_metrics. Args:

model (google.VertexModel):

Vertex model resource that will be the parent resource of the uploaded evaluation.

metrics (system.Metrics):

Path of metrics generated from an evaluation component.

problem_type (Optional[str]):
The problem type of the metrics being imported to the VertexModel.

classification, regression, and forecasting are the currently supported problem types. Must be provided when metrics is provided.

classification_metrics (Optional[google.ClassificationMetrics]):

Path of classification metrics generated from the classification evaluation component.

forecasting_metrics (Optional[google.ForecastingMetrics]):

Path of forecasting metrics generated from the forecasting evaluation component.

regression_metrics (Optional[google.RegressionMetrics]):

Path of regression metrics generated from the regression evaluation component.

explanation (Optional[system.Metrics]):

Path for model explanation metrics generated from an evaluation component.

feature_attributions (Optional[system.Metrics]):

The feature attributions metrics artifact generated from the feature attribution component.

display_name (str):

The display name for the uploaded model evaluation resource.

google_cloud_pipeline_components.experimental.evaluation.TargetFieldDataRemoverOp()

target_field_data_remover Removes the target field from the input dataset for supporting unstructured AutoML models and custom models for Vertex Batch Prediction. Creates a Dataflow job with Apache Beam to remove the target field.

Args:
project (str):

Required. Project to retrieve dataset from.

location (Optional[str]):

Location to retrieve dataset from. If not set, defaulted to us-central1.

root_dir (str):

Required. The GCS directory for keeping staging files. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.

gcs_source_uris ([Sequence[str]):

Google Cloud Storage URI(-s) to your instances to run the target field data remover on. They must match instances_format. May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames.

bigquery_source_uri (Optional[str]):

Google BigQuery Table URI to your instances to run data sampler on.

instances_format (Optional[str]):

The format in which instances are given, must be one of the model’s supported input storage formats. If not set, default to “jsonl”.

target_field_name (str):

The name of the features target field in the predictions file. Formatted to be able to find nested columns, delimited by .. Alternatively referred to as the ground_truth_column field. If not set, defaulted to ground_truth.

dataflow_service_account (Optional[str]):

Optional. Service account to run the dataflow job. If not set, dataflow will use the default woker service account.

For more details, see https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#default_worker_service_account

dataflow_subnetwork (Optional[str]):

Dataflow’s fully qualified subnetwork name, when empty the default subnetwork will be used. More details: https://cloud.google.com/dataflow/docs/guides/specifying-networks#example_network_and_subnetwork_specifications

dataflow_use_public_ips (Optional[bool]):

Specifies whether Dataflow workers use public IP addresses.

encryption_spec_key_name (Optional[str]):

Customer-managed encryption key for the Dataflow job. If this is set, then all resources created by the Dataflow job will be encrypted with the provided encryption key.

Returns:
gcs_output_directory (JsonArray):

Output storage location with the dataset that has the target field removed.

bigquery_output_table (str):

The BigQuery Table with the dataset that has the target field removed.

gcp_resources (str):

Serialized gcp_resources proto tracking the target_field_data_remover.