Dataflow¶
Create Google Cloud Dataflow jobs from within Vertex AI Pipelines.
Components:
|
Launch a self-executing Beam Python file on Google Cloud using the Dataflow Runner. |
-
v1.dataflow.DataflowPythonJobOp(project: str, python_module_path: str, temp_location: str, gcp_resources: dsl.OutputPath(str), location: str =
'us-central1'
, requirements_file_path: str =''
, args: list[str] =[]
)¶ Launch a self-executing Beam Python file on Google Cloud using the Dataflow Runner.
- Parameters¶
- project: str¶
Project to create the Dataflow job.
- location: str =
'us-central1'
¶ Location of the Dataflow job. If not set, defaults to
'us-central1'
.- python_module_path: str¶
The GCS path to the Python file to run.
- temp_location: str¶
A GCS path for Dataflow to stage temporary job files created during the execution of the pipeline.
- requirements_file_path: str =
''
¶ The GCS path to the pip requirements file.
- args: list[str] =
[]
¶ The list of args to pass to the Python file. Can include additional parameters for the Dataflow Runner.
- Returns¶
gcp_resources: dsl.OutputPath(str)
Serialized gcp_resources proto tracking the Dataflow job. For more details, see https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md.