google_cloud_pipeline_components.experimental.automl.forecasting package
Submodules
google_cloud_pipeline_components.experimental.automl.forecasting.utils module
Util functions for Vertex Forecasting pipelines.
- google_cloud_pipeline_components.experimental.automl.forecasting.utils.get_bqml_arima_predict_pipeline_and_parameters(project: str, location: str, model_name: str, data_source: Dict[str, Dict[str, Union[List[str], str]]], bigquery_destination_uri: str = '', generate_explanation: bool = False) Tuple[str, Dict[str, Any]]
Get the BQML ARIMA_PLUS prediction pipeline.
- Args:
project: The GCP project that runs the pipeline components. location: The GCP region that runs the pipeline components. model_name: ARIMA_PLUS BQML model URI. data_source: Serialized JSON with URI of BigQuery table containing input
data. This table should be provided in a JSON object that looks like: {
- “big_query_data_source”: {
“big_query_table_path”: “bq://[PROJECT].[DATASET].[TABLE]”
}
} or {
- “csv_data_source”: {
“csv_filenames”: [ [GCS_PATHS] ],
}
- bigquery_destination_uri: URI of the desired destination dataset. If not
specified, a resource will be created under a new dataset in the project.
- generate_explanation: Generate explanation along with the batch prediction
results. This will cause the batch prediction output to include explanations.
- Returns:
Tuple of pipeline_definiton_path and parameter_values.
- google_cloud_pipeline_components.experimental.automl.forecasting.utils.get_bqml_arima_train_pipeline_and_parameters(project: str, location: str, time_column: str, time_series_identifier_column: str, target_column_name: str, forecast_horizon: int, data_granularity_unit: str, data_source: Dict[str, Dict[str, Union[List[str], str]]], split_spec: Optional[Dict[str, Dict[str, Union[str, float]]]] = None, window_config: Optional[Dict[str, Union[int, str]]] = None, bigquery_destination_uri: str = '', override_destination: bool = False, max_order: int = 5) Tuple[str, Dict[str, Any]]
Get the BQML ARIMA_PLUS training pipeline.
- Args:
project: The GCP project that runs the pipeline components. location: The GCP region that runs the pipeline components. time_column: Name of the column that identifies time order in the time
series.
- time_series_identifier_column: Name of the column that identifies the time
series.
- target_column_name: Name of the column that the model is to predict values
for.
- forecast_horizon: The number of time periods into the future for which
forecasts will be created. Future periods start after the latest timestamp for each time series.
- data_granularity_unit: The data granularity unit. Accepted values are:
minute, hour, day, week, month, year.
- data_source: Serialized JSON with URI of BigQuery table containing training
data. This table should be provided in a JSON object that looks like: {
- “big_query_data_source”: {
“big_query_table_path”: “bq://[PROJECT].[DATASET].[TABLE]”
}
} or {
- “csv_data_source”: {
“csv_filenames”: [ [GCS_PATHS] ],
}
- split_spec: Serialized JSON with name of the column containing the dataset
each row belongs to. Valid values in this column are: TRAIN, VALIDATE, and TEST. This column should be provided in a JSON object that looks like: {“predefined_split”: {“key”: “[SPLIT_COLUMN]”}} or {
- ‘fraction_split’: {
‘training_fraction’: 0.8, ‘validation_fraction’: 0.1, ‘test_fraction’: 0.1,
},
}
- window_config: Serialized JSON that configures how many evaluation windows
will be created from the test set. Valid inputs are: {“column”: “[COLUMN]”} or {“stride_length”: [STRIDE]} or {“max_count”: [COUNT]}
- bigquery_destination_uri: URI of the desired destination dataset. If not
specified, resources will be created under a new dataset in the project. Unlike in Vertex Forecasting, all resources will be given hardcoded names under this dataset, and the model artifact will also be exported here.
- override_destination: Whether to override a
model or table if it already exists. If False and the resource exists, the training job will fail.
- max_order: Integer between 1 and 5 representing the size of the parameter
search space for ARIMA_PLUS. 5 would result in the highest accuracy model, but also the longest training runtime.
- Returns:
Tuple of pipeline_definiton_path and parameter_values.