google_cloud_pipeline_components.experimental.custom_job.utils module
Module for supporting Google Vertex AI Custom Training Job Op.
- google_cloud_pipeline_components.experimental.custom_job.utils.create_custom_training_job_op_from_component(component_spec: Callable, display_name: Optional[str] = '', replica_count: Optional[int] = 1, machine_type: Optional[str] = 'n1-standard-4', accelerator_type: Optional[str] = '', accelerator_count: Optional[int] = 1, boot_disk_type: Optional[str] = 'pd-ssd', boot_disk_size_gb: Optional[int] = 100, timeout: Optional[str] = '', restart_job_on_worker_restart: Optional[bool] = False, service_account: Optional[str] = '', network: Optional[str] = '', encryption_spec_key_name: Optional[str] = '', tensorboard: Optional[str] = '', base_output_directory: Optional[str] = '', labels: Optional[Dict[str, str]] = None) Callable
Create a component spec that runs a custom training in Vertex AI.
This utility converts a given component to a CustomTrainingJobOp that runs a custom training in Vertex AI. This simplifies the creation of custom training jobs. All Inputs and Outputs of the supplied component will be copied over to the constructed training job.
Note that this utility constructs a ClusterSpec where the master and all the workers use the same spec, meaning all disk/machine spec related parameters will apply to all replicas. This is suitable for use cases such as training with MultiWorkerMirroredStrategy or Mirrored Strategy.
This component does not support Vertex AI Python training application.
For more details on Vertex AI Training service, please refer to https://cloud.google.com/vertex-ai/docs/training/create-custom-job
- Args:
- component_spec: The task (ContainerOp) object to run as Vertex AI custom
job.
- display_name (Optional[str]): The name of the custom job. If not provided
the component_spec.name will be used instead.
- replica_count (Optional[int]): The count of instances in the cluster. One
replica always counts towards the master in worker_pool_spec[0] and the remaining replicas will be allocated in worker_pool_spec[1]. For more details see https://cloud.google.com/vertex-ai/docs/training/distributed-training#configure_a_distributed_training_job.
- machine_type (Optional[str]): The type of the machine to run the custom job.
The default value is “n1-standard-4”. For more details about this input config, see https://cloud.google.com/vertex-ai/docs/training/configure-compute#machine-types.
- accelerator_type (Optional[str]): The type of accelerator(s) that may be
attached to the machine as per accelerator_count. For more details about this input config, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/MachineSpec#acceleratortype.
- accelerator_count (Optional[int]): The number of accelerators to attach to
the machine. Defaults to 1 if accelerator_type is set.
- boot_disk_type (Optional[str]):
- Type of the boot disk (default is “pd-ssd”). Valid values: “pd-ssd”
(Persistent Disk Solid State Drive) or “pd-standard” (Persistent Disk Hard Disk Drive).
- boot_disk_size_gb (Optional[int]): Size in GB of the boot disk (default is
100GB).
- timeout (Optional[str]): The maximum job running time. The default is 7
days. A duration in seconds with up to nine fractional digits, terminated by ‘s’, for example: “3.5s”.
- restart_job_on_worker_restart (Optional[bool]): Restarts the entire
CustomJob if a worker gets restarted. This feature can be used by distributed training jobs that are not resilient to workers leaving and joining a job.
- service_account (Optional[str]): Sets the default service account for
- workload run-as account. The service account running the pipeline
- (https://cloud.google.com/vertex-ai/docs/pipelines/configure-project#service-account)
submitting jobs must have act-as permission on this run-as account. If unspecified, the Vertex AI Custom Code Service
- Agent(https://cloud.google.com/vertex-ai/docs/general/access-control#service-agents)
for the CustomJob’s project.
- network (Optional[str]): The full name of the Compute Engine network to
which the job should be peered. For example, projects/12345/global/networks/myVPC. Format is of the form projects/{project}/global/networks/{network}. Where {project} is a project number, as in 12345, and {network} is a network name. Private services access must already be configured for the network. If left unspecified, the job is not peered with any network.
- encryption_spec_key_name (Optional[str]): Customer-managed encryption key
options for the CustomJob. If this is set, then all resources created by the CustomJob will be encrypted with the provided encryption key.
- tensorboard (Optional[str]): The name of a Vertex AI Tensorboard resource to
which this CustomJob will upload Tensorboard logs.
- base_output_directory (Optional[str]): The Cloud Storage location to store
the output of this CustomJob or HyperparameterTuningJob. see below for more details: https://cloud.google.com/vertex-ai/docs/reference/rest/v1/GcsDestination
- labels (Optional[Dict[str, str]]): The labels with user-defined metadata to
organize CustomJobs. See https://goo.gl/xmQnxf for more information.
- Returns:
A Custom Job component operator corresponding to the input component operator.