google_cloud_pipeline_components.experimental.custom_job.utils module

Module for supporting Google Vertex AI Custom Training Job Op.

google_cloud_pipeline_components.experimental.custom_job.utils.create_custom_training_job_op_from_component(component_spec: Callable, display_name: Optional[str] = '', replica_count: Optional[int] = 1, machine_type: Optional[str] = 'n1-standard-4', accelerator_type: Optional[str] = '', accelerator_count: Optional[int] = 1, boot_disk_type: Optional[str] = 'pd-ssd', boot_disk_size_gb: Optional[int] = 100, timeout: Optional[str] = '', restart_job_on_worker_restart: Optional[bool] = False, service_account: Optional[str] = '', network: Optional[str] = '', encryption_spec_key_name: Optional[str] = '', tensorboard: Optional[str] = '', base_output_directory: Optional[str] = '', labels: Optional[Dict[str, str]] = None) → Callable

Create a component spec that runs a custom training in Vertex AI.

This utility converts a given component to a CustomTrainingJobOp that runs a custom training in Vertex AI. This simplifies the creation of custom training jobs. All Inputs and Outputs of the supplied component will be copied over to the constructed training job.

Note that this utility constructs a ClusterSpec where the master and all the workers use the same spec, meaning all disk/machine spec related parameters will apply to all replicas. This is suitable for use cases such as training with MultiWorkerMirroredStrategy or Mirrored Strategy.

This component does not support Vertex AI Python training application.

For more details on Vertex AI Training service, please refer to https://cloud.google.com/vertex-ai/docs/training/create-custom-job

Args:

component_spec: The task (ContainerOp) object to run as Vertex AI custom

job.

display_name (Optional[str]): The name of the custom job. If not provided

the component_spec.name will be used instead.

replica_count (Optional[int]): The count of instances in the cluster. One

replica always counts towards the master in worker_pool_spec[0] and the remaining replicas will be allocated in worker_pool_spec[1]. For more details see https://cloud.google.com/vertex-ai/docs/training/distributed-training#configure_a_distributed_training_job.

machine_type (Optional[str]): The type of the machine to run the custom job.

The default value is “n1-standard-4”. For more details about this input config, see https://cloud.google.com/vertex-ai/docs/training/configure-compute#machine-types.

accelerator_type (Optional[str]): The type of accelerator(s) that may be

attached to the machine as per accelerator_count. For more details about this input config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/MachineSpec#acceleratortype.

accelerator_count (Optional[int]): The number of accelerators to attach to

the machine. Defaults to 1 if accelerator_type is set.

boot_disk_type (Optional[str]):

Type of the boot disk (default is “pd-ssd”). Valid values: “pd-ssd”: (Persistent Disk Solid State Drive) or “pd-standard” (Persistent Disk Hard Disk Drive).

boot_disk_size_gb (Optional[int]): Size in GB of the boot disk (default is

100GB).

timeout (Optional[str]): The maximum job running time. The default is 7

days. A duration in seconds with up to nine fractional digits, terminated by ‘s’, for example: “3.5s”.

restart_job_on_worker_restart (Optional[bool]): Restarts the entire

CustomJob if a worker gets restarted. This feature can be used by distributed training jobs that are not resilient to workers leaving and joining a job.

service_account (Optional[str]): Sets the default service account for

workload run-as account. The service account running the pipeline

(https://cloud.google.com/vertex-ai/docs/pipelines/configure-project#service-account): submitting jobs must have act-as permission on this run-as account. If unspecified, the Vertex AI Custom Code Service
Agent(https://cloud.google.com/vertex-ai/docs/general/access-control#service-agents): for the CustomJob’s project.

network (Optional[str]): The full name of the Compute Engine network to

which the job should be peered. For example, projects/12345/global/networks/myVPC. Format is of the form projects/{project}/global/networks/{network}. Where {project} is a project number, as in 12345, and {network} is a network name. Private services access must already be configured for the network. If left unspecified, the job is not peered with any network.

encryption_spec_key_name (Optional[str]): Customer-managed encryption key

options for the CustomJob. If this is set, then all resources created by the CustomJob will be encrypted with the provided encryption key.

tensorboard (Optional[str]): The name of a Vertex AI Tensorboard resource to

which this CustomJob will upload Tensorboard logs.

base_output_directory (Optional[str]): The Cloud Storage location to store

the output of this CustomJob or HyperparameterTuningJob. see below for more details: https://cloud.google.com/vertex-ai/docs/reference/rest/v1/GcsDestination

labels (Optional[Dict[str, str]]): The labels with user-defined metadata to

organize CustomJobs. See https://goo.gl/xmQnxf for more information.

Returns:

A Custom Job component operator corresponding to the input component operator.