Dataflow

Create Google Cloud Dataflow jobs from within Vertex AI Pipelines.

Components:

DataflowPythonJobOp(python_module_path, ...)

Launch a self-executing Beam Python file on Google Cloud using the Dataflow Runner.

v1.dataflow.DataflowPythonJobOp(python_module_path: str, temp_location: str, gcp_resources: dsl.OutputPath(str), location: str = 'us-central1', requirements_file_path: str = '', args: list[str] = [], project: str = '{{$.pipeline_google_cloud_project_id}}')

Launch a self-executing Beam Python file on Google Cloud using the Dataflow Runner.

Parameters
location: str = 'us-central1'

Location of the Dataflow job. If not set, defaults to

'us-central1'. :param python_module_path: The GCS path to the Python file to run. :param temp_location: A GCS path for Dataflow to stage temporary job files created during the execution of the pipeline. :param requirements_file_path: The GCS path to the pip requirements file. :param args: The list of args to pass to the Python file. Can include additional parameters for the Dataflow Runner. :param project: Project to create the Dataflow job. Defaults to the project in which the PipelineJob is run.

Returns

``gcp_resources: dsl.OutputPath(str)``
          Serialized gcp_resources proto tracking the Dataflow job. For more details, see

https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md.