AutoML Forecasting¶
GA AutoML forecasting components.
Components:
|
Trains and tunes one Prophet model per time series using Dataflow. |
-
v1.automl.forecasting.ProphetTrainerOp(project: str, location: str, root_dir: str, target_column: str, time_column: str, time_series_identifier_column: str, forecast_horizon: int, window_column: str, data_granularity_unit: str, predefined_split_column: str, source_bigquery_uri: str, gcp_resources: dsl.OutputPath(str), unmanaged_container_model: dsl.Output[google.UnmanagedContainerModel], evaluated_examples_directory: dsl.Output[system.Artifact], optimization_objective: str | None =
'rmse'
, max_num_trials: int | None =6
, encryption_spec_key_name: str | None =''
, dataflow_max_num_workers: int | None =10
, dataflow_machine_type: str | None ='n1-standard-1'
, dataflow_disk_size_gb: int | None =40
, dataflow_service_account: str | None =''
, dataflow_subnetwork: str | None =''
, dataflow_use_public_ips: bool | None =True
)¶ Trains and tunes one Prophet model per time series using Dataflow.
- Parameters¶
- project: str¶
The GCP project that runs the pipeline components.
- location: str¶
The GCP region for Vertex AI.
- root_dir: str¶
The Cloud Storage location to store the output.
- time_column: str¶
Name of the column that identifies time order in the time series.
- time_series_identifier_column: str¶
Name of the column that identifies the time series.
- target_column: str¶
Name of the column that the model is to predict values for.
- forecast_horizon: int¶
The number of time periods into the future for which forecasts will be created. Future periods start after the latest timestamp for each time series.
- optimization_objective: str | None =
'rmse'
¶ Optimization objective for tuning. Supported metrics come from Prophet’s performance_metrics function. These are mse, rmse, mae, mape, mdape, smape, and coverage.
- data_granularity_unit: str¶
String representing the units of time for the time column.
- predefined_split_column: str¶
The predefined_split column name. A string that represents a list of comma separated CSV filenames.
- source_bigquery_uri: str¶
The BigQuery table path of format bq (str)://bq_project.bq_dataset.bq_table
- window_column: str¶
Name of the column that should be used to filter input rows. The column should contain either booleans or string booleans; if the value of the row is True, generate a sliding window from that row.
- max_num_trials: int | None =
6
¶ Maximum number of tuning trials to perform per time series. There are up to 100 possible combinations to explore for each time series. Recommended values to try are 3, 6, and 24.
- encryption_spec_key_name: str | None =
''
¶ Customer-managed encryption key.
- dataflow_machine_type: str | None =
'n1-standard-1'
¶ The dataflow machine type used for training.
- dataflow_max_num_workers: int | None =
10
¶ The max number of Dataflow workers used for training.
- dataflow_disk_size_gb: int | None =
40
¶ Dataflow worker’s disk size in GB during training.
- dataflow_service_account: str | None =
''
¶ Custom service account to run dataflow jobs.
- dataflow_subnetwork: str | None =
''
¶ Dataflow’s fully qualified subnetwork name, when empty the default subnetwork will be used.
- dataflow_use_public_ips: bool | None =
True
¶ Specifies whether Dataflow workers use public IP addresses.
- Returns¶
gcp_resources: dsl.OutputPath(str)
Serialized gcp_resources proto tracking the custom training job.
nmanaged_container_model: dsl.Output[google.UnmanagedContainerModel]
The UnmanagedContainerModel artifact.