Model Evaluation¶

Model evaluation preview components.

Components:

`DetectDataBiasOp`(gcp_resources, ...[, ...])	Detects data bias metrics in a dataset.
`DetectModelBiasOp`(gcp_resources, ...[, ...])	Detects bias metrics from a model's predictions.
`ModelEvaluationFeatureAttributionOp`(...[, ...])	Compute feature attribution on a trained model's batch explanation results.

Pipelines:

`FeatureAttributionGraphComponentOp`(location, ...)	A pipeline to compute feature attributions by sampling data for batch explanations.
`autosxs_pipeline`(evaluation_dataset, task, ...)	Evaluates two models side-by-side using an arbiter model.
`evaluation_llm_classification_pipeline`(...)	The LLM Text Classification Evaluation pipeline.
`evaluation_llm_text_generation_pipeline`(...)	LLM Text Generation Evaluation pipeline.

preview.model_evaluation.DetectDataBiasOp(gcp_resources: dsl.OutputPath(str), data_bias_metrics: dsl.Output[system.Artifact], target_field_name: str, bias_configs: list[Any], location: str = 'us-central1', dataset_format: str = 'jsonl', dataset_storage_source_uris: list[str] = [], dataset: dsl.Input[google.VertexDataset] = None, columns: list[str] = [], encryption_spec_key_name: str = '', project: str = '{{$.pipeline_google_cloud_project_id}}')¶

Detects data bias metrics in a dataset.

Creates a Dataflow job with Apache Beam to category each data point in the dataset to the corresponding bucket based on bias configs, then compute data bias metrics for the dataset.

Parameters¶

location: str = 'us-central1'¶: Location for running data bias detection.
target_field_name: str¶: The full name path of the features target field in the predictions file. Formatted to be able to find nested columns, delimited by .. Alternatively referred to as the ground truth (or ground_truth_column) field.
bias_configs: list[Any]¶: A list of google.cloud.aiplatform_v1beta1.types.ModelEvaluation.BiasConfig. When provided, compute data bias metrics for each defined slice. Below is an example of how to format this input.

First, create a BiasConfig. `from google.cloud.aiplatform_v1beta1.types.ModelEvaluation import BiasConfig` `from google.cloud.aiplatform_v1.types.ModelEvaluationSlice.Slice import SliceSpec` `from google.cloud.aiplatform_v1.types.ModelEvaluationSlice.Slice.SliceSpec import SliceConfig` `bias_config = BiasConfig(bias_slices=SliceSpec(configs={ 'feature_a': SliceConfig(SliceSpec.Value(string_value= 'label_a') ) }))`
Create a list to store the bias configs into. `bias_configs = []`
Format each BiasConfig into a JSON or Dict. `bias_config_json = json_format.MessageToJson(bias_config` or `bias_config_dict = json_format.MessageToDict(bias_config).`
Combine each bias_config JSON into a list. `bias_configs.append(bias_config_json)`
Finally, pass bias_configs as an parameter for this component. `DetectDataBiasOp(bias_configs=bias_configs)`

Parameters¶

dataset_format: str = 'jsonl'¶: The file format for the dataset. jsonl and csv are the currently allowed formats.
dataset_storage_source_uris: list[str] = []¶: Google Cloud Storage URI(-s) to unmanaged test datasets.``jsonl`` and csv is currently allowed format. If dataset is also provided, this field will be overriden by the provided Vertex Dataset.
dataset: dsl.Input[google.VertexDataset] = None¶: A google.VertexDataset artifact of the dataset. If dataset_gcs_source is also provided, this Vertex Dataset argument will override the GCS source.
encryption_spec_key_name: str = ''¶: Customer-managed encryption key options for the Dataflow. If this is set, then all resources created by the Dataflow will be encrypted with the provided encryption key. Has the form: projects/my-project/locations/my-location/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created.
project: str = '{{$.pipeline_google_cloud_project_id}}'¶: Project to run data bias detection. Defaults to the project in which the PipelineJob is run.

Returns¶

data_bias_metrics: dsl.Output[system.Artifact]: Artifact tracking the data bias detection output.
gcp_resources: dsl.OutputPath(str): Serialized gcp_resources proto tracking the Dataflow job. For more details, see https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md.

preview.model_evaluation.DetectModelBiasOp(gcp_resources: dsl.OutputPath(str), bias_model_metrics: dsl.Output[system.Artifact], target_field_name: str, bias_configs: list, location: str = 'us-central1', predictions_format: str = 'jsonl', predictions_gcs_source: dsl.Input[system.Artifact] = None, predictions_bigquery_source: dsl.Input[google.BQTable] = None, thresholds: list = [0.5], encryption_spec_key_name: str = '', project: str = '{{$.pipeline_google_cloud_project_id}}')¶

Detects bias metrics from a model’s predictions.

Creates a Dataflow job with Apache Beam to category each data point to the corresponding bucket based on bias configs and predictions, then compute model bias metrics for classification problems.

Parameters¶

location: str = 'us-central1'¶: Location for running data bias detection.
target_field_name: str¶: The full name path of the features target field in the predictions file. Formatted to be able to find nested columns, delimited by .. Alternatively referred to as the ground truth (or ground_truth_column) field.
predictions_format: str = 'jsonl'¶: The file format for the batch prediction results. jsonl is the only currently allow format.
predictions_gcs_source: dsl.Input[system.Artifact] = None¶: An artifact with its URI pointing toward a GCS directory with prediction or explanation files to be used for this evaluation. For prediction results, the files should be named “prediction.results-”. For explanation results, the files should be named “explanation.results-“.
predictions_bigquery_source: dsl.Input[google.BQTable] = None¶: BigQuery table with prediction or explanation data to be used for this evaluation. For prediction results, the table column should be named “predicted_*”.
bias_configs: list¶: A list of google.cloud.aiplatform_v1beta1.types.ModelEvaluation.BiasConfig. When provided, compute model bias metrics for each defined slice. Below is an example of how to format this input.

First, create a BiasConfig. `from google.cloud.aiplatform_v1beta1.types.ModelEvaluation import BiasConfig` `from google.cloud.aiplatform_v1.types.ModelEvaluationSlice.Slice import SliceSpec` `from google.cloud.aiplatform_v1.types.ModelEvaluationSlice.Slice.SliceSpec import SliceConfig` `bias_config = BiasConfig(bias_slices=SliceSpec(configs={ 'feature_a': SliceConfig(SliceSpec.Value(string_value= 'label_a') ) }))`
Create a list to store the bias configs into. `bias_configs = []`
Format each BiasConfig into a JSON or Dict. `bias_config_json = json_format.MessageToJson(bias_config` or `bias_config_dict = json_format.MessageToDict(bias_config)`
Combine each bias_config JSON into a list. `bias_configs.append(bias_config_json)`
Finally, pass bias_configs as an parameter for this component. `DetectModelBiasOp(bias_configs=bias_configs)`

Parameters¶

thresholds: list = [0.5]¶: A list of float values to be used as prediction decision thresholds.
encryption_spec_key_name: str = ''¶: Customer-managed encryption key options for the Dataflow. If this is set, then all resources created by the Dataflow will be encrypted with the provided encryption key. Has the form: projects/my-project/locations/my-location/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created.
project: str = '{{$.pipeline_google_cloud_project_id}}'¶: Project to run data bias detection. Defaults to the project in which the PipelineJob is run.

Returns¶

bias_model_metrics: dsl.Output[system.Artifact]: Artifact tracking the model bias detection output.
gcp_resources: dsl.OutputPath(str): Serialized gcp_resources proto tracking the Dataflow job. For more details, see https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md.

preview.model_evaluation.FeatureAttributionGraphComponentOp(location: str, prediction_type: str, vertex_model: VertexModel, batch_predict_instances_format: str, batch_predict_gcs_destination_output_uri: str, batch_predict_gcs_source_uris: list[str] = [], batch_predict_bigquery_source_uri: str = '', batch_predict_predictions_format: str = 'jsonl', batch_predict_bigquery_destination_output_uri: str = '', batch_predict_machine_type: str = 'n1-standard-16', batch_predict_starting_replica_count: int = 5, batch_predict_max_replica_count: int = 10, batch_predict_explanation_metadata: dict = {}, batch_predict_explanation_parameters: dict = {}, batch_predict_explanation_data_sample_size: int = 10000, batch_predict_accelerator_type: str = '', batch_predict_accelerator_count: int = 0, dataflow_machine_type: str = 'n1-standard-4', dataflow_max_num_workers: int = 5, dataflow_disk_size_gb: int = 50, dataflow_service_account: str = '', dataflow_subnetwork: str = '', dataflow_use_public_ips: bool = True, encryption_spec_key_name: str = '', force_runner_mode: str = '', project: str = '{{$.pipeline_google_cloud_project_id}}') → outputs¶

A pipeline to compute feature attributions by sampling data for batch explanations.

This pipeline guarantees support for AutoML Tabular models that contain a valid explanation_spec.

Parameters¶

location: str¶: The GCP region that runs the pipeline components.
prediction_type: str¶: The type of prediction the model is to produce. “classification”, “regression”, or “forecasting”.
vertex_model: VertexModel¶: The Vertex model artifact used for batch explanation.
batch_predict_instances_format: str¶: The format in which instances are given, must be one of the Model’s supportedInputStorageFormats. For more details about this input config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#InputConfig.
batch_predict_gcs_destination_output_uri: str¶: The Google Cloud Storage location of the directory where the output is to be written to. In the given directory a new directory is created. Its name is prediction-<model-display-name>-<job-create-time>, where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format. Inside of it files predictions_0001.<extension>, predictions_0002.<extension>, …, predictions_N.<extension> are created where <extension> depends on chosen predictions_format, and N may equal 0001 and depends on the total number of successfully predicted instances. If the Model has both instance and prediction schemata defined then each such file contains predictions as per the predictions_format. If prediction for any instance failed (partially or completely), then an additional errors_0001.<extension>, errors_0002.<extension>,…, errors_N.<extension> files are created (N depends on total number of failed predictions). These files contain the failed instances, as per their schema, followed by an additional error field which as value has google.rpc.Status containing only code and message fields. For more details about this output config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#OutputConfig.
batch_predict_gcs_source_uris: list[str] = []¶: Google Cloud Storage URI(-s) to your instances to run batch prediction on. May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. For more details about this input config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#InputConfig.
batch_predict_bigquery_source_uri: str = ''¶: Google BigQuery URI to your instances to run batch prediction on. May contain wildcards. For more details about this input config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#InputConfig.
batch_predict_predictions_format: str = 'jsonl'¶: The format in which Vertex AI gives the predictions. Must be one of the Model’s supportedOutputStorageFormats. For more details about this output config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#OutputConfig.
batch_predict_bigquery_destination_output_uri: str = ''¶: The BigQuery project location where the output is to be written to. In the given project a new dataset is created with name prediction_<model-display-name>_<job-create-time> where is made BigQuery-dataset-name compatible (for example, most special characters become underscores), and timestamp is in YYYY_MM_DDThh_mm_ss_sssZ “based on ISO-8601” format. In the dataset two tables will be created, predictions, and errors. If the Model has both instance and prediction schemata defined then the tables have columns as follows: The predictions table contains instances for which the prediction succeeded, it has columns as per a concatenation of the Model’s instance and prediction schemata. The errors table contains rows for which the prediction has failed, it has instance columns, as per the instance schema, followed by a single “errors” column, which as values has google.rpc.Status represented as a STRUCT, and containing only code and message. For more details about this output config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#OutputConfig.
batch_predict_machine_type: str = 'n1-standard-16'¶: The type of machine for running batch prediction on dedicated resources. If the Model supports DEDICATED_RESOURCES this config may be provided (and the job will use these resources). If the Model doesn’t support AUTOMATIC_RESOURCES, this config must be provided. For more details about the BatchDedicatedResources, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#BatchDedicatedResources. For more details about the machine spec, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/MachineSpec
batch_predict_starting_replica_count: int = 5¶: The number of machine replicas used at the start of the batch operation. If not set, Vertex AI decides starting number, not greater than max_replica_count. Only used if machine_type is set.
batch_predict_max_replica_count: int = 10¶: The maximum number of machine replicas the batch operation may be scaled to. Only used if machine_type is set.
batch_predict_explanation_metadata: dict = {}¶: Explanation metadata configuration for this BatchPredictionJob. Can be specified only if generate_explanation is set to True. This value overrides the value of Model.explanation_metadata. All fields of explanation_metadata are optional in the request. If a field of the explanation_metadata object is not populated, the corresponding field of the Model.explanation_metadata object is inherited. For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/ExplanationSpec#explanationmetadata.
batch_predict_explanation_parameters: dict = {}¶: Parameters to configure explaining for Model’s predictions. Can be specified only if generate_explanation is set to True. This value overrides the value of Model.explanation_parameters. All fields of explanation_parameters are optional in the request. If a field of the explanation_parameters object is not populated, the corresponding field of the Model.explanation_parameters object is inherited. For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/ExplanationSpec#ExplanationParameters.
batch_predict_explanation_data_sample_size: int = 10000¶: Desired size to downsample the input dataset that will then be used for batch explanation.
batch_predict_accelerator_type: str = ''¶: The type of accelerator(s) that may be attached to the machine as per batch_predict_accelerator_count. Only used if batch_predict_machine_type is set. For more details about the machine spec, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/MachineSpec
batch_predict_accelerator_count: int = 0¶: The number of accelerators to attach to the batch_predict_machine_type. Only used if batch_predict_machine_type is set.
dataflow_machine_type: str = 'n1-standard-4'¶: The Dataflow machine type for evaluation components.
dataflow_max_num_workers: int = 5¶: The max number of Dataflow workers for evaluation components.
dataflow_disk_size_gb: int = 50¶: Dataflow worker’s disk size in GB for evaluation components.
dataflow_service_account: str = ''¶: Custom service account to run Dataflow jobs.
dataflow_subnetwork: str = ''¶: Dataflow’s fully qualified subnetwork name, when empty the default subnetwork will be used. Example: https://cloud.google.com/dataflow/docs/guides/specifying-networks#example_network_and_subnetwork_specifications
dataflow_use_public_ips: bool = True¶: Specifies whether Dataflow workers use public IP addresses.
encryption_spec_key_name: str = ''¶: Customer-managed encryption key options. If set, resources created by this pipeline will be encrypted with the provided encryption key. Has the form: projects/my-project/locations/my-location/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created.
force_runner_mode: str = ''¶: Indicate the runner mode to use forcely. Valid options are Dataflow and DirectRunner.
project: str = '{{$.pipeline_google_cloud_project_id}}'¶: The GCP project that runs the pipeline components. Defaults to the project in which the PipelineJob is run.

Returns¶

A system.Metrics artifact with feature attributions.

preview.model_evaluation.ModelEvaluationFeatureAttributionOp(gcp_resources: dsl.OutputPath(str), feature_attributions: dsl.Output[system.Metrics], problem_type: str, location: str = 'us-central1', predictions_format: str = 'jsonl', predictions_gcs_source: dsl.Input[system.Artifact] = None, predictions_bigquery_source: dsl.Input[google.BQTable] = None, dataflow_service_account: str = '', dataflow_disk_size_gb: int = 50, dataflow_machine_type: str = 'n1-standard-4', dataflow_workers_num: int = 1, dataflow_max_workers_num: int = 5, dataflow_subnetwork: str = '', dataflow_use_public_ips: bool = True, encryption_spec_key_name: str = '', force_runner_mode: str = '', project: str = '{{$.pipeline_google_cloud_project_id}}')¶

Compute feature attribution on a trained model’s batch explanation results.

Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. Will compute feature attribution for every target label if possible, typically possible for AutoML Classification models.

Parameters¶

location: str = 'us-central1'¶: Location running feature attribution. If not set, defaulted to us-central1.
problem_type: str¶: Problem type of the pipeline: one of classification, regression and forecasting.
predictions_format: str = 'jsonl'¶: The file format for the batch prediction results. jsonl, csv, and bigquery are the allowed formats, from Vertex Batch Prediction. If not set, defaulted to jsonl.
predictions_gcs_source: dsl.Input[system.Artifact] = None¶: An artifact with its URI pointing toward a GCS directory with prediction or explanation files to be used for this evaluation. For prediction results, the files should be named “prediction.results-” or “predictions_”. For explanation results, the files should be named “explanation.results-“.
predictions_bigquery_source: dsl.Input[google.BQTable] = None¶: BigQuery table with prediction or explanation data to be used for this evaluation. For prediction results, the table column should be named “predicted_*”.
dataflow_service_account: str = ''¶: Service account to run the dataflow job. If not set, dataflow will use the default worker service account. For more details, see https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#default_worker_service_account
dataflow_disk_size_gb: int = 50¶: The disk size (in GB) of the machine executing the evaluation run. If not set, defaulted to 50.
dataflow_machine_type: str = 'n1-standard-4'¶: The machine type executing the evaluation run. If not set, defaulted to n1-standard-4.
dataflow_workers_num: int = 1¶: The number of workers executing the evaluation run. If not set, defaulted to 10.
dataflow_max_workers_num: int = 5¶: The max number of workers executing the evaluation run. If not set, defaulted to 25.
dataflow_subnetwork: str = ''¶: Dataflow’s fully qualified subnetwork name, when empty the default subnetwork will be used. More details: https://cloud.google.com/dataflow/docs/guides/specifying-networks#example_network_and_subnetwork_specifications
dataflow_use_public_ips: bool = True¶: Specifies whether Dataflow workers use public IP addresses.
encryption_spec_key_name: str = ''¶: Customer-managed encryption key for the Dataflow job. If this is set, then all resources created by the Dataflow job will be encrypted with the provided encryption key.
force_runner_mode: str = ''¶: Flag to choose Beam runner. Valid options are DirectRunner and Dataflow.
project: str = '{{$.pipeline_google_cloud_project_id}}'¶: Project to run feature attribution container. Defaults to the project in which the PipelineJob is run.

Returns¶

gcs_output_directory: Unknown: JsonArray of the downsampled dataset GCS output.
bigquery_output_table: Unknown: String of the downsampled dataset BigQuery output.
gcp_resources: dsl.OutputPath(str): Serialized gcp_resources proto tracking the dataflow job. For more details, see https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md.

preview.model_evaluation.autosxs_pipeline(evaluation_dataset: str, task: str, id_columns: list[str], model_a: str = '', model_b: str = '', autorater_prompt_parameters: dict[str, dict[str, str]] = {}, model_a_prompt_parameters: dict[str, dict[str, str]] = {}, model_b_prompt_parameters: dict[str, dict[str, str]] = {}, response_column_a: str = '', response_column_b: str = '', model_a_parameters: dict[str, str] = {}, model_b_parameters: dict[str, str] = {}, human_preference_column: str = '', project: str = '{{$.pipeline_google_cloud_project_id}}', location: str = '{{$.pipeline_google_cloud_location}}', judgments_format: str = 'jsonl', bigquery_destination_prefix: str = '', experimental_args: dict[str, Any] = {})[source]¶

Evaluates two models side-by-side using an arbiter model.

Parameters¶

evaluation_dataset: str¶: A list of GCS paths to a JSONL dataset containing

evaluation examples. :param task: Evaluation task in the form {task}@{version}. task can be one of “summarization”, “question_answer”. Version is an integer with 3 digits or “latest”. Ex: summarization@001 or question_answer@latest. :param id_columns: The columns which distinguish unique evaluation examples. :param model_a: A fully-qualified model resource name. This parameter is optional if Model A responses are specified. :param model_b: A fully-qualified model resource name. This parameter is optional if Model B responses are specified. :param autorater_prompt_parameters: Map of autorater prompt parameters to columns or templates. The expected parameters are: inference_instruction - Details on how to perform a task. inference_context - Content to reference to perform the task. :param model_a_prompt_parameters: Map of Model A prompt template parameters to columns or templates. :param model_b_prompt_parameters: Map of Model B prompt template parameters to columns or templates. :param response_column_a: The column containing responses for model A. Required if any response tables are provided for model A. :param response_column_b: The column containing responses for model B. Required if any response tables are provided for model B. :param model_a_parameters: The parameters that govern the predictions from model A. :param model_b_parameters: The parameters that govern the predictions from model B. :param human_preference_column: The column containing ground truths. Only required when users want to check the autorater alignment against human preference. :param project: Project used to run custom jobs. Default is the same project used to run the pipeline. :param location: Location used to run custom jobs. Default is the same location used to run the pipeline. :param judgments_format: The format to write judgments to. Can be either ‘json’ or ‘bigquery’. :param bigquery_destination_prefix: BigQuery table to write judgments to if the specified format is ‘bigquery’. :param experimental_args: Experimentally released arguments. Subject to change.

preview.model_evaluation.evaluation_llm_classification_pipeline(project: str, location: str, target_field_name: str, batch_predict_gcs_source_uris: list[str], batch_predict_gcs_destination_output_uri: str, model_name: str = 'publishers/google/models/text-bison@001', evaluation_task: str = 'text-classification', evaluation_class_labels: list[str] = [], input_field_name: str = 'input_text', batch_predict_instances_format: str = 'jsonl', batch_predict_predictions_format: str = 'jsonl', batch_predict_model_parameters: dict[str, str] = {}, machine_type: str = 'e2-highmem-16', service_account: str = '', network: str = '', dataflow_machine_type: str = 'n1-standard-4', dataflow_disk_size_gb: int = 50, dataflow_max_num_workers: int = 5, dataflow_service_account: str = '', dataflow_subnetwork: str = '', dataflow_use_public_ips: bool = True, encryption_spec_key_name: str = '', evaluation_display_name: str = 'evaluation-llm-classification-pipeline-{{$.pipeline_job_uuid}}') → outputs[source]¶

The LLM Text Classification Evaluation pipeline.

Parameters¶

project: str¶: The GCP project that runs the pipeline components.
location: str¶: The GCP region that runs the pipeline components.
target_field_name: str¶: The target field’s name. Formatted to be able to find nested columns, delimited by .. Prefixed with ‘instance.’ on the component for Vertex Batch Prediction.
batch_predict_gcs_source_uris: list[str]¶: Google Cloud Storage URI(-s) to your instances data to run batch prediction on. The instances data should also contain the ground truth (target) data, used for evaluation. May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. For more details about this input config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#InputConfig.
batch_predict_gcs_destination_output_uri: str¶: The Google Cloud Storage location of the directory where the output is to be written to.
model_name: str = 'publishers/google/models/text-bison@001'¶: The Model name used to run evaluation. Must be a publisher Model or a managed Model sharing the same ancestor location. Starting this job has no impact on any existing deployments of the Model and their resources.
evaluation_task: str = 'text-classification'¶: The task that the large language model will be evaluated on. The evaluation component computes a set of metrics relevant to that specific task. Currently supported Classification tasks is: text-classification.
evaluation_class_labels: list[str] = []¶: The JSON array of class names for the target_field, in the same order they appear in the batch predictions input file.
input_field_name: str = 'input_text'¶: The field name of the input eval dataset instances that contains the input prompts to the LLM.
batch_predict_instances_format: str = 'jsonl'¶: The format in which instances are given, must be one of the Model’s supportedInputStorageFormats. For more details about this input config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#InputConfig.
batch_predict_predictions_format: str = 'jsonl'¶: The format in which Vertex AI gives the predictions. Must be one of the Model’s supportedOutputStorageFormats. For more details about this output config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#OutputConfig.
batch_predict_model_parameters: dict[str, str] = {}¶: A map of parameters that govern the predictions. Some acceptable parameters include: maxOutputTokens, topK, topP, and temperature.
machine_type: str = 'e2-highmem-16'¶: The machine type of the custom jobs in this pipeline. If not set, defaulted to e2-highmem-16. More details: https://cloud.google.com/compute/docs/machine-resource
service_account: str = ''¶: Sets the default service account for workload run-as account. The service account running the pipeline (https://cloud.google.com/vertex-ai/docs/pipelines/configure-project#service-account) submitting jobs must have act-as permission on this run-as account. If unspecified, the Vertex AI Custom Code Service Agent(https://cloud.google.com/vertex-ai/docs/general/access-control#service-agents) for the CustomJob’s project.
network: str = ''¶: The full name of the Compute Engine network to which the job should be peered. For example, projects/12345/global/networks/myVPC. Format is of the form projects/{project}/global/networks/{network}. Where {project} is a project number, as in 12345, and {network} is a network name, as in myVPC. To specify this field, you must have already configured VPC Network Peering for Vertex AI (https://cloud.google.com/vertex-ai/docs/general/vpc-peering). If left unspecified, the job is not peered with any network.
dataflow_machine_type: str = 'n1-standard-4'¶: The Dataflow machine type for evaluation components.
dataflow_disk_size_gb: int = 50¶: The disk size (in GB) of the machine executing the evaluation run. If not set, defaulted to 50.
dataflow_max_num_workers: int = 5¶: The max number of workers executing the evaluation run. If not set, defaulted to 5.
dataflow_service_account: str = ''¶: Custom service account to run Dataflow jobs.
dataflow_subnetwork: str = ''¶: Dataflow’s fully qualified subnetwork name, when empty the default subnetwork will be used. Example: https://cloud.google.com/dataflow/docs/guides/specifying-networks#example_network_and_subnetwork_specifications
dataflow_use_public_ips: bool = True¶: Specifies whether Dataflow workers use public IP addresses.
encryption_spec_key_name: str = ''¶: Customer-managed encryption key options. If set, resources created by this pipeline will be encrypted with the provided encryption key. Has the form: projects/my-project/locations/my-location/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created.
evaluation_display_name: str = 'evaluation-llm-classification-pipeline-{{$.pipeline_job_uuid}}'¶: The display name of the uploaded evaluation resource to the Vertex AI model.

Returns¶

ClassificationMetrics Artifact for LLM Text Classification.

evaluation_resource_name: If run on an user’s managed VertexModel, the imported evaluation resource name. Empty if run on a publisher model. :rtype: evaluation_metrics

preview.model_evaluation.evaluation_llm_text_generation_pipeline(project: str, location: str, batch_predict_gcs_source_uris: list[str], batch_predict_gcs_destination_output_uri: str, model_name: str = 'publishers/google/models/text-bison@001', evaluation_task: str = 'text-generation', input_field_name: str = 'input_text', target_field_name: str = 'output_text', batch_predict_instances_format: str = 'jsonl', batch_predict_predictions_format: str = 'jsonl', batch_predict_model_parameters: dict[str, str] = {}, enable_row_based_metrics: bool = False, machine_type: str = 'e2-highmem-16', service_account: str = '', network: str = '', encryption_spec_key_name: str = '', evaluation_display_name: str = 'evaluation-llm-text-generation-pipeline-{{$.pipeline_job_uuid}}') → outputs[source]¶

LLM Text Generation Evaluation pipeline.

This pipeline supports evaluating large language models, publisher or managed models, performing the following generative tasks: summarization, question-answering, and text-generation.

Parameters¶

project: str¶: The GCP project that runs the pipeline components.
location: str¶: The GCP region that runs the pipeline components.
batch_predict_gcs_source_uris: list[str]¶: Google Cloud Storage URI(-s) to your eval dataset instances data to run batch prediction on. The instances data should also contain the ground truth (target) data, used for evaluation. May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. For more details about this input config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#InputConfig.
batch_predict_gcs_destination_output_uri: str¶: The Google Cloud Storage location of the directory where the eval pipeline output is to be written to.
model_name: str = 'publishers/google/models/text-bison@001'¶: The Model name used to run evaluation. Must be a publisher Model or a managed Model sharing the same ancestor location. Starting this job has no impact on any existing deployments of the Model and their resources.
evaluation_task: str = 'text-generation'¶: The task that the large language model will be evaluated on. The evaluation component computes a set of metrics relevant to that specific task. Currently supported tasks are: summarization, question-answering, text-generation.
input_field_name: str = 'input_text'¶: The field name of the input eval dataset instances that contains the input prompts to the LLM.
target_field_name: str = 'output_text'¶: The field name of the eval dataset instance that contains an example reference text response. Alternatively referred to as the ground truth (or ground_truth_column) field. If not set, defaulted to output_text.
batch_predict_instances_format: str = 'jsonl'¶: The format in which instances are given, must be one of the Model’s supportedInputStorageFormats. Only “jsonl” is currently supported. For more details about this input config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#InputConfig.
batch_predict_predictions_format: str = 'jsonl'¶: The format in which Vertex AI gives the predictions. Must be one of the Model’s supportedOutputStorageFormats. Only “jsonl” is currently supported. For more details about this output config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs#OutputConfig.
batch_predict_model_parameters: dict[str, str] = {}¶: A map of parameters that govern the predictions. Some acceptable parameters include: maxOutputTokens, topK, topP, and temperature.
machine_type: str = 'e2-highmem-16'¶: The machine type of this custom job. If not set, defaulted to e2-highmem-16. More details: https://cloud.google.com/compute/docs/machine-resource
service_account: str = ''¶: Sets the default service account for workload run-as account. The service account running the pipeline (https://cloud.google.com/vertex-ai/docs/pipelines/configure-project#service-account) submitting jobs must have act-as permission on this run-as account. If unspecified, the Vertex AI Custom Code Service Agent(https://cloud.google.com/vertex-ai/docs/general/access-control#service-agents) for the CustomJob’s project.
network: str = ''¶: The full name of the Compute Engine network to which the job should be peered. For example, projects/12345/global/networks/myVPC. Format is of the form projects/{project}/global/networks/{network}. Where {project} is a project number, as in 12345, and {network} is a network name, as in myVPC. To specify this field, you must have already configured VPC Network Peering for Vertex AI (https://cloud.google.com/vertex-ai/docs/general/vpc-peering). If left unspecified, the job is not peered with any network.
encryption_spec_key_name: str = ''¶: Customer-managed encryption key options. If set, resources created by this pipeline will be encrypted with the provided encryption key. Has the form: projects/my-project/locations/my-location/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created.
evaluation_display_name: str = 'evaluation-llm-text-generation-pipeline-{{$.pipeline_job_uuid}}'¶: The display name of the uploaded evaluation resource to the Vertex AI model.

Returns¶

Metrics Artifact for LLM Text Generation.

evaluation_resource_name: If run on a user’s managed VertexModel, the imported evaluation resource name. Empty if run on a publisher model. :rtype: evaluation_metrics