google_cloud_pipeline_components.experimental.custom_job.custom_job module

Module for supporting Google Vertex AI Custom Job.

google_cloud_pipeline_components.experimental.custom_job.custom_job.run_as_vertex_ai_custom_job(component_spec: Callable, display_name: Optional[str] = None, replica_count: Optional[int] = None, machine_type: Optional[str] = None, accelerator_type: Optional[str] = None, accelerator_count: Optional[int] = None, boot_disk_type: Optional[str] = None, boot_disk_size_gb: Optional[int] = None, timeout: Optional[str] = None, restart_job_on_worker_restart: Optional[bool] = None, service_account: Optional[str] = None, network: Optional[str] = None, worker_pool_specs: Optional[List[Mapping[str, Any]]] = None) Callable

Run a pipeline task using AI Platform (Unified) custom training job.

For detailed doc of the service, please refer to https://cloud.google.com/ai-platform-unified/docs/training/create-custom-job

Args:

component_spec: The task (ContainerOp) object to run as aiplatform custom job. display_name: Optional. The name of the custom job. If not provided the

component_spec.name will be used instead.

replica_count: Optional. The number of replicas to be split between master

workerPoolSpec and worker workerPoolSpec. (master always has 1 replica).

machine_type: Optional. The type of the machine to run the custom job. The

default value is “n1-standard-4”.

accelerator_type: Optional. The type of accelerator(s) that may be attached

to the machine as per accelerator_count. Optional.

accelerator_count: Optional. The number of accelerators to attach to the

machine.

boot_disk_type: Optional. Type of the boot disk (default is “pd-ssd”). Valid
values: “pd-ssd” (Persistent Disk Solid State Drive) or “pd-standard”

(Persistent Disk Hard Disk Drive).

boot_disk_size_gb: Optional. Size in GB of the boot disk (default is 100GB). timeout: Optional. The maximum job running time. The default is 7 days. A

duration in seconds with up to nine fractional digits, terminated by ‘s’. Example: “3.5s”

restart_job_on_worker_restart: Optional. Restarts the entire CustomJob if a

worker gets restarted. This feature can be used by distributed training jobs that are not resilient to workers leaving and joining a job.

service_account: Optional. Specifies the service account for workload run-as

account.

network: Optional. The full name of the Compute Engine network to which the

job should be peered. For example, projects/12345/global/networks/myVPC.

worker_pool_specs: Optional, worker_pool_specs for distributed training. this

will overwite all other cluster configurations. For details, please see: https://cloud.google.com/ai-platform-unified/docs/training/distributed-training

Returns:

A Custom Job component OP correspoinding to the input component OP.