Endpoint

Manage model serving endpoints via Vertex AI Endpoints.

Components:

EndpointCreateOp(display_name, ...[, ...])

Creates a Google Cloud Vertex Endpoint and waits for it to be ready.

EndpointDeleteOp(endpoint, gcp_resources)

Deletes a Google Cloud Vertex Endpoint.

ModelDeployOp(model, gcp_resources[, ...])

Deploys a Google Cloud Vertex Model to an Endpoint creating a DeployedModel within it.

ModelUndeployOp(model, endpoint, gcp_resources)

Undeploys a Google Cloud Vertex DeployedModel within an Endpoint.

v1.endpoint.EndpointCreateOp(display_name: str, gcp_resources: dsl.OutputPath(str), endpoint: dsl.Output[google.VertexEndpoint], location: str = 'us-central1', description: str = '', labels: dict[str, str] = {}, encryption_spec_key_name: str = '', network: str = '', project: str = '{{$.pipeline_google_cloud_project_id}}')

Creates a Google Cloud Vertex Endpoint and waits for it to be ready. See the Endpoint create method for more information.

Parameters:
location: str = 'us-central1'

Location to create the Endpoint. If not set, default to us-central1.

display_name: str

The user-defined name of the Endpoint. The name can be up to 128 characters long and can be consist of any UTF-8 characters.

description: str = ''

The description of the Endpoint.

labels: dict[str, str] = {}

The labels with user-defined metadata to organize your Endpoints. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.

encryption_spec_key_name: str = ''

Customer-managed encryption key spec for an Endpoint. If set, this Endpoint and all of this Endoint’s sub-resources will be secured by this key. Has the form: projects/my-project/locations/my-location/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. If set, this Endpoint and all sub-resources of this Endpoint will be secured by this key.

network: str = ''

The full name of the Google Compute Engine network to which the Endpoint should be peered. Private services access must already be configured for the network. If left unspecified, the Endpoint is not peered with any network. Format: projects/{project}/global/networks/{network}. Where {project} is a project number, as in '12345', and {network} is network name.

project: str = '{{$.pipeline_google_cloud_project_id}}'

Project to create the Endpoint. Defaults to the project in which the PipelineJob is run.

Returns:

endpoint: dsl.Output[google.VertexEndpoint]

Artifact tracking the created Endpoint.

gcp_resources: dsl.OutputPath(str)

Serialized JSON of gcp_resources [proto](https://github.com/kubeflow/pipelines/tree/master/components/google-cloud/google_cloud_pipeline_components/proto) which tracks the create Endpoint’s long-running operation.

v1.endpoint.EndpointDeleteOp(endpoint: dsl.Input[google.VertexEndpoint], gcp_resources: dsl.OutputPath(str))

Deletes a Google Cloud Vertex Endpoint. See the Endpoint delete method for more information.

Parameters:
endpoint: dsl.Input[google.VertexEndpoint]

The Endpoint to be deleted.

Returns:

gcp_resources: dsl.OutputPath(str)

Serialized JSON of gcp_resources [proto](https://github.com/kubeflow/pipelines/tree/master/components/google-cloud/google_cloud_pipeline_components/proto) which tracks the delete Endpoint’s long-running operation.

v1.endpoint.ModelDeployOp(model: dsl.Input[google.VertexModel], gcp_resources: dsl.OutputPath(str), endpoint: dsl.Input[google.VertexEndpoint] = None, deployed_model_display_name: str = '', traffic_split: dict[str, str] = {}, dedicated_resources_machine_type: str = '', dedicated_resources_min_replica_count: int = 0, dedicated_resources_max_replica_count: int = 0, dedicated_resources_accelerator_type: str = '', dedicated_resources_accelerator_count: int = 0, automatic_resources_min_replica_count: int = 0, automatic_resources_max_replica_count: int = 0, service_account: str = '', disable_container_logging: bool = False, enable_access_logging: bool = False, explanation_metadata: dict[str, str] = {}, explanation_parameters: dict[str, str] = {})

Deploys a Google Cloud Vertex Model to an Endpoint creating a DeployedModel within it. See the deploy Model method for more information.

Parameters:
model: dsl.Input[google.VertexModel]

The model to be deployed.

endpoint: dsl.Input[google.VertexEndpoint] = None

The Endpoint to be deployed to.

deployed_model_display_name: str = ''

The display name of the DeployedModel. If not provided upon creation, the Model’s display_name is used.

traffic_split: dict[str, str] = {}

A map from a DeployedModel’s ID to the percentage of this Endpoint’s traffic that should be forwarded to that DeployedModel. If this field is non-empty, then the Endpoint’s trafficSplit will be overwritten with it. To refer to the ID of the just being deployed Model, a “0” should be used, and the actual ID of the new DeployedModel will be filled in its place by this method. The traffic percentage values must add up to 100. If this field is empty, then the Endpoint’s trafficSplit is not updated.

dedicated_resources_machine_type: str = ''

The specification of a single machine used by the prediction. This field is required if automatic_resources_min_replica_count is not specified. See more information.

dedicated_resources_accelerator_type: str = ''

Hardware accelerator type. Must also set accelerator_count if used. See available options. This field is required if dedicated_resources_machine_type is specified.

dedicated_resources_accelerator_count: int = 0

The number of accelerators to attach to a worker replica.

dedicated_resources_min_replica_count: int = 0

The minimum number of machine replicas this DeployedModel will be always deployed on. This value must be greater than or equal to 1. If traffic against the DeployedModel increases, it may dynamically be deployed onto more replicas, and as traffic decreases, some of these extra replicas may be freed.

dedicated_resources_max_replica_count: int = 0

The maximum number of replicas this deployed model may the larger value of min_replica_count or 1 will be used. If value provided is smaller than min_replica_count, it will automatically be increased to be min_replica_count. The maximum number of replicas this deployed model may be deployed on when the traffic against it increases. If requested value is too large, the deployment will error, but if deployment succeeds then the ability to scale the model to that many replicas is guaranteed (barring service outages). If traffic against the deployed model increases beyond what its replicas at maximum may handle, a portion of the traffic will be dropped. If this value is not provided, will use dedicated_resources_min_replica_count as the default value.

automatic_resources_min_replica_count: int = 0

The minimum number of replicas this DeployedModel will be always deployed on. If traffic against it increases, it may dynamically be deployed onto more replicas up to automatic_resources_max_replica_count, and as traffic decreases, some of these extra replicas may be freed. If the requested value is too large, the deployment will error. This field is required if dedicated_resources_machine_type is not specified.

automatic_resources_max_replica_count: int = 0

The maximum number of replicas this DeployedModel may be deployed on when the traffic against it increases. If the requested value is too large, the deployment will error, but if deployment succeeds then the ability to scale the model to that many replicas is guaranteed (barring service outages). If traffic against the DeployedModel increases beyond what its replicas at maximum may handle, a portion of the traffic will be dropped. If this value is not provided, a no upper bound for scaling under heavy traffic will be assume, though Vertex AI may be unable to scale beyond certain replica number.

service_account: str = ''

The service account that the DeployedModel’s container runs as. Specify the email address of the service account. If this service account is not specified, the container runs as a service account that doesn’t have access to the resource project. Users deploying the Model must have the iam.serviceAccounts.actAs permission on this service account.

disable_container_logging: bool = False

For custom-trained Models and AutoML Tabular Models, the container of the DeployedModel instances will send stderr and stdout streams to Stackdriver Logging by default. Please note that the logs incur cost, which are subject to Cloud Logging pricing. User can disable container logging by setting this flag to true.

enable_access_logging: bool = False

These logs are like standard server access logs, containing information like timestamp and latency for each prediction request. Note that Stackdriver logs may incur a cost, especially if your project receives prediction requests at a high queries per second rate (QPS). Estimate your costs before enabling this option.

explanation_metadata: dict[str, str] = {}

Metadata describing the Model’s input and output for explanation. See more information.

explanation_parameters: dict[str, str] = {}

Parameters that configure explaining information of the Model’s predictions. See more information.

Returns:

gcp_resources: dsl.OutputPath(str)

Serialized JSON of gcp_resources [proto](https://github.com/kubeflow/pipelines/tree/master/components/google-cloud/google_cloud_pipeline_components/proto) which tracks the deploy Model’s long-running operation.

v1.endpoint.ModelUndeployOp(model: dsl.Input[google.VertexModel], endpoint: dsl.Input[google.VertexEndpoint], gcp_resources: dsl.OutputPath(str), traffic_split: dict[str, str] = {})

Undeploys a Google Cloud Vertex DeployedModel within an Endpoint. See the undeploy Model method for more information.

Parameters:
model: dsl.Input[google.VertexModel]

The model that was deployed to the Endpoint.

endpoint: dsl.Input[google.VertexEndpoint]

The Endpoint for the DeployedModel to be undeployed from.

traffic_split: dict[str, str] = {}

If this field is provided, then the Endpoint’s trafficSplit will be overwritten with it. If last DeployedModel is being undeployed from the Endpoint, the [Endpoint.traffic_split] will always end up empty when this call returns. A DeployedModel will be successfully undeployed only if it doesn’t have any traffic assigned to it when this method executes, or if this field unassigns any traffic to it.

Returns:

gcp_resources: dsl.OutputPath(str)

Serialized JSON of gcp_resources [proto](https://github.com/kubeflow/pipelines/tree/master/components/google-cloud/google_cloud_pipeline_components/proto) which tracks the undeploy Model’s long-running operation.