google_cloud_pipeline_components.v1.dataflow module

Google Cloud Pipeline DataFlow component.

google_cloud_pipeline_components.v1.dataflow.DataflowPythonJobOp(project: str, python_module_path: str, temp_location: str, gcp_resources: <kfp.components.types.type_annotations.OutputPath object at 0x7f9f6da82130>, location: str = 'us-central1', requirements_file_path: str = '', args: ~typing.List[str] = [])

Launch a self-executing beam python file on Google Cloud using the DataflowRunner.

Args:
project (str):

Required. Project to create the Dataflow job in.

location (Optional[str]):

Location for creating the Dataflow job. If not set, default to us-central1.

python_module_path (str):

The GCS path to the python file to run.

temp_location (str):

A GCS path for Dataflow to stage temporary job files created during the execution of the pipeline.

requirements_file_path (Optional[str]):

The GCS path to the pip requirements file.

args(Optional[List[str]]):

The list of args to pass to the python file. Can include additional parameters for the beam runner.

Returns:
gcp_resources (str):

Serialized gcp_resources proto tracking the Dataflow job. For more details, see https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md.