AutoML Forecasting

GA AutoML forecasting components.

Components:

ProphetTrainerOp(project, location, ...[, ...])

Trains and tunes one Prophet model per time series using Dataflow.

v1.automl.forecasting.ProphetTrainerOp(project: str, location: str, root_dir: str, target_column: str, time_column: str, time_series_identifier_column: str, forecast_horizon: int, window_column: str, data_granularity_unit: str, predefined_split_column: str, source_bigquery_uri: str, gcp_resources: dsl.OutputPath(str), unmanaged_container_model: dsl.Output[google.UnmanagedContainerModel], evaluated_examples_directory: dsl.Output[system.Artifact], optimization_objective: str | None = 'rmse', max_num_trials: int | None = 6, encryption_spec_key_name: str | None = '', dataflow_max_num_workers: int | None = 10, dataflow_machine_type: str | None = 'n1-standard-1', dataflow_disk_size_gb: int | None = 40, dataflow_service_account: str | None = '', dataflow_subnetwork: str | None = '', dataflow_use_public_ips: bool | None = True)

Trains and tunes one Prophet model per time series using Dataflow.

Parameters
project: str

The GCP project that runs the pipeline components.

location: str

The GCP region for Vertex AI.

root_dir: str

The Cloud Storage location to store the output.

time_column: str

Name of the column that identifies time order in the time series.

time_series_identifier_column: str

Name of the column that identifies the time series.

target_column: str

Name of the column that the model is to predict values for.

forecast_horizon: int

The number of time periods into the future for which forecasts will be created. Future periods start after the latest timestamp for each time series.

optimization_objective: str | None = 'rmse'

Optimization objective for tuning. Supported metrics come from Prophet’s performance_metrics function. These are mse, rmse, mae, mape, mdape, smape, and coverage.

data_granularity_unit: str

String representing the units of time for the time column.

predefined_split_column: str

The predefined_split column name. A string that represents a list of comma separated CSV filenames.

source_bigquery_uri: str

The BigQuery table path of format bq (str)://bq_project.bq_dataset.bq_table

window_column: str

Name of the column that should be used to filter input rows. The column should contain either booleans or string booleans; if the value of the row is True, generate a sliding window from that row.

max_num_trials: int | None = 6

Maximum number of tuning trials to perform per time series. There are up to 100 possible combinations to explore for each time series. Recommended values to try are 3, 6, and 24.

encryption_spec_key_name: str | None = ''

Customer-managed encryption key.

dataflow_machine_type: str | None = 'n1-standard-1'

The dataflow machine type used for training.

dataflow_max_num_workers: int | None = 10

The max number of Dataflow workers used for training.

dataflow_disk_size_gb: int | None = 40

Dataflow worker’s disk size in GB during training.

dataflow_service_account: str | None = ''

Custom service account to run dataflow jobs.

dataflow_subnetwork: str | None = ''

Dataflow’s fully qualified subnetwork name, when empty the default subnetwork will be used.

dataflow_use_public_ips: bool | None = True

Specifies whether Dataflow workers use public IP addresses.

Returns

gcp_resources: dsl.OutputPath(str)

Serialized gcp_resources proto tracking the custom training job.

nmanaged_container_model: dsl.Output[google.UnmanagedContainerModel]

The UnmanagedContainerModel artifact.