AutoML Forecasting¶
Experimental AutoML forecasting components.
Components:
|
Ensembles AutoML Forecasting models. |
|
Searches AutoML Forecasting architectures and selects the top trials. |
|
Tunes AutoML Forecasting models and selects top trials. |
Functions:
|
Returns l2l_forecasting pipeline and formatted parameters. |
|
Returns seq2seq forecasting pipeline and formatted parameters. |
|
Returns tft_forecasting pipeline and formatted parameters. |
|
Returns timeseries_dense_encoder_forecasting pipeline and parameters. |
-
preview.automl.forecasting.ForecastingEnsembleOp(project: str, location: str, root_dir: str, transform_output: dsl.Input[system.Artifact], metadata: dsl.Input[system.Artifact], tuning_result_input: dsl.Input[system.Artifact], instance_baseline: dsl.Input[system.Artifact], instance_schema_path: dsl.Input[system.Artifact], prediction_image_uri: str, gcp_resources: dsl.OutputPath(str), model_architecture: dsl.Output[system.Artifact], example_instance: dsl.Output[system.Artifact], unmanaged_container_model: dsl.Output[google.UnmanagedContainerModel], explanation_metadata: dsl.OutputPath(dict), explanation_metadata_artifact: dsl.Output[system.Artifact], explanation_parameters: dsl.OutputPath(dict), encryption_spec_key_name: str | None =
''
)¶ Ensembles AutoML Forecasting models.
- Parameters¶:
- project: str¶
Project to run the job in.
- location: str¶
Region to run the job in.
- root_dir: str¶
The Cloud Storage path to store the output.
- transform_output: dsl.Input[system.Artifact]¶
The transform output artifact.
- metadata: dsl.Input[system.Artifact]¶
The tabular example gen metadata.
- tuning_result_input: dsl.Input[system.Artifact]¶
AutoML Tabular tuning result.
- instance_baseline: dsl.Input[system.Artifact]¶
The instance baseline used to calculate explanations.
- instance_schema_path: dsl.Input[system.Artifact]¶
The path to the instance schema, describing the input data for the tf_model at serving time.
- encryption_spec_key_name: str | None =
''
¶ Customer-managed encryption key.
- prediction_image_uri: str¶
URI of the Docker image to be used as the container for serving predictions. This URI must identify an image in Artifact Registry or Container Registry.
- Returns¶:
gcp_resources: dsl.OutputPath(str)
GCP resources created by this component. For more details, see https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md.
model_architecture: dsl.Output[system.Artifact]
The architecture of the output model.
nmanaged_container_model: dsl.Output[google.UnmanagedContainerModel]
Model information needed to perform batch prediction.
explanation_metadata: dsl.OutputPath(dict)
The explanation metadata used by Vertex online and batch explanations.
explanation_metadata_artifact: dsl.Output[system.Artifact]
The explanation metadata used by Vertex online and batch explanations in the format of a KFP Artifact.
explanation_parameters: dsl.OutputPath(dict)
The explanation parameters used by Vertex online and batch explanations.
example_instance: dsl.Output[system.Artifact]
An example instance which may be used as an input for predictions.
-
preview.automl.forecasting.ForecastingStage1TunerOp(project: str, location: str, root_dir: str, num_selected_trials: int, deadline_hours: float, num_parallel_trials: int, single_run_max_secs: int, metadata: dsl.Input[system.Artifact], transform_output: dsl.Input[system.Artifact], materialized_train_split: dsl.Input[system.Artifact], materialized_eval_split: dsl.Input[system.Artifact], gcp_resources: dsl.OutputPath(str), tuning_result_output: dsl.Output[system.Artifact], study_spec_parameters_override: list | None =
[]
, worker_pool_specs_override_json: list | None =[]
, reduce_search_space_mode: str | None ='regular'
, encryption_spec_key_name: str | None =''
)¶ Searches AutoML Forecasting architectures and selects the top trials.
- Parameters¶:
- project: str¶
Project to run hyperparameter tuning.
- location: str¶
Location for running the hyperparameter tuning.
- root_dir: str¶
The Cloud Storage location to store the output.
- study_spec_parameters_override: list | None =
[]
¶ JSON study spec. E.g., [{“parameter_id”: “activation”,”categorical_value_spec”: {“values”: [“tanh”]}}]
- worker_pool_specs_override_json: list | None =
[]
¶ JSON worker pool specs. E.g., [{“machine_spec”: {“machine_type”: “n1-standard-16”}},{},{},{“machine_spec”: {“machine_type”: “n1-standard-16”}}]
- reduce_search_space_mode: str | None =
'regular'
¶ The reduce search space mode. Possible values: “regular” (default), “minimal”, “full”.
- num_selected_trials: int¶
Number of selected trials. The number of weak learners in the final model is 5 * num_selected_trials.
- deadline_hours: float¶
Number of hours the hyperparameter tuning should run.
- num_parallel_trials: int¶
Number of parallel training trials.
- single_run_max_secs: int¶
Max number of seconds each training trial runs.
- metadata: dsl.Input[system.Artifact]¶
The tabular example gen metadata.
- transform_output: dsl.Input[system.Artifact]¶
The transform output artifact.
- materialized_train_split: dsl.Input[system.Artifact]¶
The materialized train split.
- materialized_eval_split: dsl.Input[system.Artifact]¶
The materialized eval split.
- encryption_spec_key_name: str | None =
''
¶ Customer-managed encryption key.
- Returns¶:
gcp_resources: dsl.OutputPath(str)
GCP resources created by this component. For more details, see https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md.
ning_result_output: dsl.Output[system.Artifact]
The trained model and architectures.
-
preview.automl.forecasting.ForecastingStage2TunerOp(project: str, location: str, root_dir: str, num_selected_trials: int, deadline_hours: float, num_parallel_trials: int, single_run_max_secs: int, metadata: dsl.Input[system.Artifact], transform_output: dsl.Input[system.Artifact], materialized_train_split: dsl.Input[system.Artifact], materialized_eval_split: dsl.Input[system.Artifact], tuning_result_input_path: dsl.Input[system.Artifact], gcp_resources: dsl.OutputPath(str), tuning_result_output: dsl.Output[system.Artifact], worker_pool_specs_override_json: list | None =
[]
, encryption_spec_key_name: str | None =''
)¶ Tunes AutoML Forecasting models and selects top trials.
- Parameters¶:
- project: str¶
Project to run stage 2 tuner.
- location: str¶
Cloud region for running the component: us-central1).
- root_dir: str¶
The Cloud Storage location to store the output.
- worker_pool_specs_override_json: list | None =
[]
¶ JSON worker pool specs. E.g., [{“machine_spec”: {“machine_type”: “n1-standard-16”}},{},{},{“machine_spec”: {“machine_type”: “n1-standard-16”}}]
- num_selected_trials: int¶
Number of selected trials. The number of weak learners in the final model.
- deadline_hours: float¶
Number of hours the cross-validation trainer should run.
- num_parallel_trials: int¶
Number of parallel training trials.
- single_run_max_secs: int¶
Max number of seconds each training trial runs.
- metadata: dsl.Input[system.Artifact]¶
The forecasting example gen metadata.
- transform_output: dsl.Input[system.Artifact]¶
The transform output artifact.
- materialized_train_split: dsl.Input[system.Artifact]¶
The materialized train split.
- materialized_eval_split: dsl.Input[system.Artifact]¶
The materialized eval split.
- encryption_spec_key_name: str | None =
''
¶ Customer-managed encryption key.
- tuning_result_input_path: dsl.Input[system.Artifact]¶
Path to the json of hyperparameter tuning results to use when evaluating models.
- Returns¶:
gcp_resources: dsl.OutputPath(str)
GCP resources created by this component. For more details, see https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md.
ning_result_output: dsl.Output[system.Artifact]
The trained (private) model artifact paths and their hyperparameters.
-
preview.automl.forecasting.get_learn_to_learn_forecasting_pipeline_and_parameters(*, project: str, location: str, root_dir: str, target_column: str, optimization_objective: str, transformations: dict[str, list[str]], train_budget_milli_node_hours: float, time_column: str, time_series_identifier_columns: list[str], time_series_identifier_column: str | None =
None
, time_series_attribute_columns: list[str] | None =None
, available_at_forecast_columns: list[str] | None =None
, unavailable_at_forecast_columns: list[str] | None =None
, forecast_horizon: int | None =None
, context_window: int | None =None
, evaluated_examples_bigquery_path: str | None =None
, window_predefined_column: str | None =None
, window_stride_length: int | None =None
, window_max_count: int | None =None
, holiday_regions: list[str] | None =None
, stage_1_num_parallel_trials: int | None =None
, stage_1_tuning_result_artifact_uri: str | None =None
, stage_2_num_parallel_trials: int | None =None
, num_selected_trials: int | None =None
, data_source_csv_filenames: str | None =None
, data_source_bigquery_table_path: str | None =None
, predefined_split_key: str | None =None
, training_fraction: float | None =None
, validation_fraction: float | None =None
, test_fraction: float | None =None
, weight_column: str | None =None
, dataflow_service_account: str | None =None
, dataflow_subnetwork: str | None =None
, dataflow_use_public_ips: bool =True
, feature_transform_engine_bigquery_staging_full_dataset_id: str =''
, feature_transform_engine_dataflow_machine_type: str ='n1-standard-16'
, feature_transform_engine_dataflow_max_num_workers: int =10
, feature_transform_engine_dataflow_disk_size_gb: int =40
, evaluation_batch_predict_machine_type: str ='n1-standard-16'
, evaluation_batch_predict_starting_replica_count: int =25
, evaluation_batch_predict_max_replica_count: int =25
, evaluation_dataflow_machine_type: str ='n1-standard-16'
, evaluation_dataflow_max_num_workers: int =25
, evaluation_dataflow_starting_num_workers: int =22
, evaluation_dataflow_disk_size_gb: int =50
, study_spec_parameters_override: list[dict[str, Any]] | None =None
, stage_1_tuner_worker_pool_specs_override: dict[str, Any] | None =None
, stage_2_trainer_worker_pool_specs_override: dict[str, Any] | None =None
, enable_probabilistic_inference: bool =False
, quantiles: list[float] | None =None
, encryption_spec_key_name: str | None =None
, model_display_name: str | None =None
, model_description: str | None =None
, run_evaluation: bool =True
, group_columns: list[str] | None =None
, group_total_weight: float =0.0
, temporal_total_weight: float =0.0
, group_temporal_total_weight: float =0.0
) tuple[str, dict[str, Any]] [source]¶ Returns l2l_forecasting pipeline and formatted parameters.
- Parameters¶:
- project: str¶
The GCP project that runs the pipeline components.
- location: str¶
The GCP region that runs the pipeline components.
- root_dir: str¶
The root GCS directory for the pipeline components.
- target_column: str¶
The target column name.
- optimization_objective: str¶
“minimize-rmse”, “minimize-mae”, “minimize-rmsle”, “minimize-rmspe”, “minimize-wape-mae”, “minimize-mape”, or “minimize-quantile-loss”.
- transformations: dict[str, list[str]]¶
Dict mapping auto and/or type-resolutions to feature columns. The supported types are: auto, categorical, numeric, text, and timestamp.
- train_budget_milli_node_hours: float¶
The train budget of creating this model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour.
- time_column: str¶
The column that indicates the time.
- time_series_identifier_columns: list[str]¶
The columns which distinguish different time series.
- time_series_identifier_column: str | None =
None
¶ [Deprecated] The column which distinguishes different time series.
- time_series_attribute_columns: list[str] | None =
None
¶ The columns that are invariant across the same time series.
- available_at_forecast_columns: list[str] | None =
None
¶ The columns that are available at the forecast time.
The columns that are unavailable at the forecast time.
- forecast_horizon: int | None =
None
¶ The length of the horizon.
- context_window: int | None =
None
¶ The length of the context window.
- evaluated_examples_bigquery_path: str | None =
None
¶ The bigquery dataset to write the predicted examples into for evaluation, in the format
bq://project.dataset
.- window_predefined_column: str | None =
None
¶ The column that indicate the start of each window.
- window_stride_length: int | None =
None
¶ The stride length to generate the window.
- window_max_count: int | None =
None
¶ The maximum number of windows that will be generated.
- holiday_regions: list[str] | None =
None
¶ The geographical regions where the holiday effect is applied in modeling.
- stage_1_num_parallel_trials: int | None =
None
¶ Number of parallel trails for stage 1.
- stage_1_tuning_result_artifact_uri: str | None =
None
¶ The stage 1 tuning result artifact GCS URI.
- stage_2_num_parallel_trials: int | None =
None
¶ Number of parallel trails for stage 2.
- num_selected_trials: int | None =
None
¶ Number of selected trails.
- data_source_csv_filenames: str | None =
None
¶ A string that represents a list of comma separated CSV filenames.
- data_source_bigquery_table_path: str | None =
None
¶ The BigQuery table path of format bq://bq_project.bq_dataset.bq_table
- predefined_split_key: str | None =
None
¶ The predefined_split column name.
- training_fraction: float | None =
None
¶ The training fraction.
- validation_fraction: float | None =
None
¶ The validation fraction.
- test_fraction: float | None =
None
¶ The test fraction.
- weight_column: str | None =
None
¶ The weight column name.
- dataflow_service_account: str | None =
None
¶ The full service account name.
- dataflow_subnetwork: str | None =
None
¶ The dataflow subnetwork.
- dataflow_use_public_ips: bool =
True
¶ True
to enable dataflow public IPs.- feature_transform_engine_bigquery_staging_full_dataset_id: str =
''
¶ The full id of the feature transform engine staging dataset.
- feature_transform_engine_dataflow_machine_type: str =
'n1-standard-16'
¶ The dataflow machine type of the feature transform engine.
- feature_transform_engine_dataflow_max_num_workers: int =
10
¶ The max number of dataflow workers of the feature transform engine.
- feature_transform_engine_dataflow_disk_size_gb: int =
40
¶ The disk size of the dataflow workers of the feature transform engine.
- evaluation_batch_predict_machine_type: str =
'n1-standard-16'
¶ Machine type for the batch prediction job in evaluation, such as ‘n1-standard-16’.
- evaluation_batch_predict_starting_replica_count: int =
25
¶ Number of replicas to use in the batch prediction cluster at startup time.
- evaluation_batch_predict_max_replica_count: int =
25
¶ The maximum count of replicas the batch prediction job can scale to.
- evaluation_dataflow_machine_type: str =
'n1-standard-16'
¶ Machine type for the dataflow job in evaluation, such as ‘n1-standard-16’.
- evaluation_dataflow_max_num_workers: int =
25
¶ Maximum number of dataflow workers.
- evaluation_dataflow_starting_num_workers: int =
22
¶ Starting number of dataflow workers.
- evaluation_dataflow_disk_size_gb: int =
50
¶ The disk space in GB for dataflow.
- study_spec_parameters_override: list[dict[str, Any]] | None =
None
¶ The list for overriding study spec.
- stage_1_tuner_worker_pool_specs_override: dict[str, Any] | None =
None
¶ The dictionary for overriding stage 1 tuner worker pool spec.
- stage_2_trainer_worker_pool_specs_override: dict[str, Any] | None =
None
¶ The dictionary for overriding stage 2 trainer worker pool spec.
- enable_probabilistic_inference: bool =
False
¶ If probabilistic inference is enabled, the model will fit a distribution that captures the uncertainty of a prediction. If quantiles are specified, then the quantiles of the distribution are also returned.
- quantiles: list[float] | None =
None
¶ Quantiles to use for probabilistic inference. Up to 5 quantiles are allowed of values between 0 and 1, exclusive. Represents the quantiles to use for that objective. Quantiles must be unique.
- encryption_spec_key_name: str | None =
None
¶ The KMS key name.
- model_display_name: str | None =
None
¶ Optional display name for model.
- model_description: str | None =
None
¶ Optional description.
- run_evaluation: bool =
True
¶ True
to evaluate the ensembled model on the test split.- group_columns: list[str] | None =
None
¶ A list of time series attribute column names that define the time series hierarchy.
- group_total_weight: float =
0.0
¶ The weight of the loss for predictions aggregated over time series in the same group.
- temporal_total_weight: float =
0.0
¶ The weight of the loss for predictions aggregated over the horizon for a single time series.
- group_temporal_total_weight: float =
0.0
¶ The weight of the loss for predictions aggregated over both the horizon and time series in the same hierarchy group.
- Returns¶:
Tuple of pipeline_definition_path and parameter_values.
-
preview.automl.forecasting.get_sequence_to_sequence_forecasting_pipeline_and_parameters(*, project: str, location: str, root_dir: str, target_column: str, optimization_objective: str, transformations: dict[str, list[str]], train_budget_milli_node_hours: float, time_column: str, time_series_identifier_columns: list[str], time_series_identifier_column: str | None =
None
, time_series_attribute_columns: list[str] | None =None
, available_at_forecast_columns: list[str] | None =None
, unavailable_at_forecast_columns: list[str] | None =None
, forecast_horizon: int | None =None
, context_window: int | None =None
, evaluated_examples_bigquery_path: str | None =None
, window_predefined_column: str | None =None
, window_stride_length: int | None =None
, window_max_count: int | None =None
, holiday_regions: list[str] | None =None
, stage_1_num_parallel_trials: int | None =None
, stage_1_tuning_result_artifact_uri: str | None =None
, stage_2_num_parallel_trials: int | None =None
, num_selected_trials: int | None =None
, data_source_csv_filenames: str | None =None
, data_source_bigquery_table_path: str | None =None
, predefined_split_key: str | None =None
, training_fraction: float | None =None
, validation_fraction: float | None =None
, test_fraction: float | None =None
, weight_column: str | None =None
, dataflow_service_account: str | None =None
, dataflow_subnetwork: str | None =None
, dataflow_use_public_ips: bool =True
, feature_transform_engine_bigquery_staging_full_dataset_id: str =''
, feature_transform_engine_dataflow_machine_type: str ='n1-standard-16'
, feature_transform_engine_dataflow_max_num_workers: int =10
, feature_transform_engine_dataflow_disk_size_gb: int =40
, evaluation_batch_predict_machine_type: str ='n1-standard-16'
, evaluation_batch_predict_starting_replica_count: int =25
, evaluation_batch_predict_max_replica_count: int =25
, evaluation_dataflow_machine_type: str ='n1-standard-16'
, evaluation_dataflow_max_num_workers: int =25
, evaluation_dataflow_starting_num_workers: int =22
, evaluation_dataflow_disk_size_gb: int =50
, study_spec_parameters_override: list[dict[str, Any]] | None =None
, stage_1_tuner_worker_pool_specs_override: dict[str, Any] | None =None
, stage_2_trainer_worker_pool_specs_override: dict[str, Any] | None =None
, encryption_spec_key_name: str | None =None
, model_display_name: str | None =None
, model_description: str | None =None
, run_evaluation: bool =True
)[source]¶ Returns seq2seq forecasting pipeline and formatted parameters.
- Parameters¶:
- project: str¶
The GCP project that runs the pipeline components.
- location: str¶
The GCP region that runs the pipeline components.
- root_dir: str¶
The root GCS directory for the pipeline components.
- target_column: str¶
The target column name.
- optimization_objective: str¶
“minimize-rmse”, “minimize-mae”, “minimize-rmsle”, “minimize-rmspe”, “minimize-wape-mae”, “minimize-mape”, or “minimize-quantile-loss”.
- transformations: dict[str, list[str]]¶
Dict mapping auto and/or type-resolutions to feature columns. The supported types are: auto, categorical, numeric, text, and timestamp.
- train_budget_milli_node_hours: float¶
The train budget of creating this model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour.
- time_column: str¶
The column that indicates the time.
- time_series_identifier_columns: list[str]¶
The columns which distinguish different time series.
- time_series_identifier_column: str | None =
None
¶ [Deprecated] The column which distinguishes different time series.
- time_series_attribute_columns: list[str] | None =
None
¶ The columns that are invariant across the same time series.
- available_at_forecast_columns: list[str] | None =
None
¶ The columns that are available at the forecast time.
The columns that are unavailable at the forecast time.
- forecast_horizon: int | None =
None
¶ The length of the horizon.
- context_window: int | None =
None
¶ The length of the context window.
- evaluated_examples_bigquery_path: str | None =
None
¶ The bigquery dataset to write the predicted examples into for evaluation, in the format
bq://project.dataset
.- window_predefined_column: str | None =
None
¶ The column that indicate the start of each window.
- window_stride_length: int | None =
None
¶ The stride length to generate the window.
- window_max_count: int | None =
None
¶ The maximum number of windows that will be generated.
- holiday_regions: list[str] | None =
None
¶ The geographical regions where the holiday effect is applied in modeling.
- stage_1_num_parallel_trials: int | None =
None
¶ Number of parallel trails for stage 1.
- stage_1_tuning_result_artifact_uri: str | None =
None
¶ The stage 1 tuning result artifact GCS URI.
- stage_2_num_parallel_trials: int | None =
None
¶ Number of parallel trails for stage 2.
- num_selected_trials: int | None =
None
¶ Number of selected trails.
- data_source_csv_filenames: str | None =
None
¶ A string that represents a list of comma separated CSV filenames.
- data_source_bigquery_table_path: str | None =
None
¶ The BigQuery table path of format bq://bq_project.bq_dataset.bq_table
- predefined_split_key: str | None =
None
¶ The predefined_split column name.
- training_fraction: float | None =
None
¶ The training fraction.
- validation_fraction: float | None =
None
¶ The validation fraction.
- test_fraction: float | None =
None
¶ The test fraction.
- weight_column: str | None =
None
¶ The weight column name.
- dataflow_service_account: str | None =
None
¶ The full service account name.
- dataflow_subnetwork: str | None =
None
¶ The dataflow subnetwork.
- dataflow_use_public_ips: bool =
True
¶ True
to enable dataflow public IPs.- feature_transform_engine_bigquery_staging_full_dataset_id: str =
''
¶ The full id of the feature transform engine staging dataset.
- feature_transform_engine_dataflow_machine_type: str =
'n1-standard-16'
¶ The dataflow machine type of the feature transform engine.
- feature_transform_engine_dataflow_max_num_workers: int =
10
¶ The max number of dataflow workers of the feature transform engine.
- feature_transform_engine_dataflow_disk_size_gb: int =
40
¶ The disk size of the dataflow workers of the feature transform engine.
- evaluation_batch_predict_machine_type: str =
'n1-standard-16'
¶ Machine type for the batch prediction job in evaluation, such as ‘n1-standard-16’.
- evaluation_batch_predict_starting_replica_count: int =
25
¶ Number of replicas to use in the batch prediction cluster at startup time.
- evaluation_batch_predict_max_replica_count: int =
25
¶ The maximum count of replicas the batch prediction job can scale to.
- evaluation_dataflow_machine_type: str =
'n1-standard-16'
¶ Machine type for the dataflow job in evaluation, such as ‘n1-standard-16’.
- evaluation_dataflow_max_num_workers: int =
25
¶ Maximum number of dataflow workers.
- evaluation_dataflow_starting_num_workers: int =
22
¶ Starting number of dataflow workers.
- evaluation_dataflow_disk_size_gb: int =
50
¶ The disk space in GB for dataflow.
- study_spec_parameters_override: list[dict[str, Any]] | None =
None
¶ The list for overriding study spec.
- stage_1_tuner_worker_pool_specs_override: dict[str, Any] | None =
None
¶ The dictionary for overriding stage 1 tuner worker pool spec.
- stage_2_trainer_worker_pool_specs_override: dict[str, Any] | None =
None
¶ The dictionary for overriding stage 2 trainer worker pool spec.
- encryption_spec_key_name: str | None =
None
¶ The KMS key name.
- model_display_name: str | None =
None
¶ Optional display name for model.
- model_description: str | None =
None
¶ Optional description.
- run_evaluation: bool =
True
¶ True
to evaluate the ensembled model on the test split.
- Returns¶:
Tuple of pipeline_definition_path and parameter_values.
-
preview.automl.forecasting.get_temporal_fusion_transformer_forecasting_pipeline_and_parameters(*, project: str, location: str, root_dir: str, target_column: str, optimization_objective: str, transformations: dict[str, list[str]], train_budget_milli_node_hours: float, time_column: str, time_series_identifier_columns: list[str], time_series_identifier_column: str | None =
None
, time_series_attribute_columns: list[str] | None =None
, available_at_forecast_columns: list[str] | None =None
, unavailable_at_forecast_columns: list[str] | None =None
, forecast_horizon: int | None =None
, context_window: int | None =None
, evaluated_examples_bigquery_path: str | None =None
, window_predefined_column: str | None =None
, window_stride_length: int | None =None
, window_max_count: int | None =None
, holiday_regions: list[str] | None =None
, stage_1_num_parallel_trials: int | None =None
, stage_1_tuning_result_artifact_uri: str | None =None
, stage_2_num_parallel_trials: int | None =None
, data_source_csv_filenames: str | None =None
, data_source_bigquery_table_path: str | None =None
, predefined_split_key: str | None =None
, training_fraction: float | None =None
, validation_fraction: float | None =None
, test_fraction: float | None =None
, weight_column: str | None =None
, dataflow_service_account: str | None =None
, dataflow_subnetwork: str | None =None
, dataflow_use_public_ips: bool =True
, feature_transform_engine_bigquery_staging_full_dataset_id: str =''
, feature_transform_engine_dataflow_machine_type: str ='n1-standard-16'
, feature_transform_engine_dataflow_max_num_workers: int =10
, feature_transform_engine_dataflow_disk_size_gb: int =40
, evaluation_batch_predict_machine_type: str ='n1-standard-16'
, evaluation_batch_predict_starting_replica_count: int =25
, evaluation_batch_predict_max_replica_count: int =25
, evaluation_dataflow_machine_type: str ='n1-standard-16'
, evaluation_dataflow_max_num_workers: int =25
, evaluation_dataflow_starting_num_workers: int =22
, evaluation_dataflow_disk_size_gb: int =50
, study_spec_parameters_override: list[dict[str, Any]] | None =None
, stage_1_tuner_worker_pool_specs_override: dict[str, Any] | None =None
, stage_2_trainer_worker_pool_specs_override: dict[str, Any] | None =None
, encryption_spec_key_name: str | None =None
, model_display_name: str | None =None
, model_description: str | None =None
, run_evaluation: bool =True
)[source]¶ Returns tft_forecasting pipeline and formatted parameters.
- Parameters¶:
- project: str¶
The GCP project that runs the pipeline components.
- location: str¶
The GCP region that runs the pipeline components.
- root_dir: str¶
The root GCS directory for the pipeline components.
- target_column: str¶
The target column name.
- optimization_objective: str¶
“minimize-rmse”, “minimize-mae”, “minimize-rmsle”, “minimize-rmspe”, “minimize-wape-mae”, “minimize-mape”, or “minimize-quantile-loss”.
- transformations: dict[str, list[str]]¶
Dict mapping auto and/or type-resolutions to feature columns. The supported types are: auto, categorical, numeric, text, and timestamp.
- train_budget_milli_node_hours: float¶
The train budget of creating this model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour.
- time_column: str¶
The column that indicates the time.
- time_series_identifier_columns: list[str]¶
The columns which distinguish different time series.
- time_series_identifier_column: str | None =
None
¶ [Deprecated] The column which distinguishes different time series.
- time_series_attribute_columns: list[str] | None =
None
¶ The columns that are invariant across the same time series.
- available_at_forecast_columns: list[str] | None =
None
¶ The columns that are available at the forecast time.
The columns that are unavailable at the forecast time.
- forecast_horizon: int | None =
None
¶ The length of the horizon.
- context_window: int | None =
None
¶ The length of the context window.
- evaluated_examples_bigquery_path: str | None =
None
¶ The bigquery dataset to write the predicted examples into for evaluation, in the format
bq://project.dataset
.- window_predefined_column: str | None =
None
¶ The column that indicate the start of each window.
- window_stride_length: int | None =
None
¶ The stride length to generate the window.
- window_max_count: int | None =
None
¶ The maximum number of windows that will be generated.
- holiday_regions: list[str] | None =
None
¶ The geographical regions where the holiday effect is applied in modeling.
- stage_1_num_parallel_trials: int | None =
None
¶ Number of parallel trails for stage 1.
- stage_1_tuning_result_artifact_uri: str | None =
None
¶ The stage 1 tuning result artifact GCS URI.
- stage_2_num_parallel_trials: int | None =
None
¶ Number of parallel trails for stage 2.
- data_source_csv_filenames: str | None =
None
¶ A string that represents a list of comma separated CSV filenames.
- data_source_bigquery_table_path: str | None =
None
¶ The BigQuery table path of format bq://bq_project.bq_dataset.bq_table
- predefined_split_key: str | None =
None
¶ The predefined_split column name.
- training_fraction: float | None =
None
¶ The training fraction.
- validation_fraction: float | None =
None
¶ The validation fraction.
- test_fraction: float | None =
None
¶ The test fraction.
- weight_column: str | None =
None
¶ The weight column name.
- dataflow_service_account: str | None =
None
¶ The full service account name.
- dataflow_subnetwork: str | None =
None
¶ The dataflow subnetwork.
- dataflow_use_public_ips: bool =
True
¶ True
to enable dataflow public IPs.- feature_transform_engine_bigquery_staging_full_dataset_id: str =
''
¶ The full id of the feature transform engine staging dataset.
- feature_transform_engine_dataflow_machine_type: str =
'n1-standard-16'
¶ The dataflow machine type of the feature transform engine.
- feature_transform_engine_dataflow_max_num_workers: int =
10
¶ The max number of dataflow workers of the feature transform engine.
- feature_transform_engine_dataflow_disk_size_gb: int =
40
¶ The disk size of the dataflow workers of the feature transform engine.
- evaluation_batch_predict_machine_type: str =
'n1-standard-16'
¶ Machine type for the batch prediction job in evaluation, such as ‘n1-standard-16’.
- evaluation_batch_predict_starting_replica_count: int =
25
¶ Number of replicas to use in the batch prediction cluster at startup time.
- evaluation_batch_predict_max_replica_count: int =
25
¶ The maximum count of replicas the batch prediction job can scale to.
- evaluation_dataflow_machine_type: str =
'n1-standard-16'
¶ Machine type for the dataflow job in evaluation, such as ‘n1-standard-16’.
- evaluation_dataflow_max_num_workers: int =
25
¶ Maximum number of dataflow workers.
- evaluation_dataflow_starting_num_workers: int =
22
¶ Starting number of dataflow workers.
- evaluation_dataflow_disk_size_gb: int =
50
¶ The disk space in GB for dataflow.
- study_spec_parameters_override: list[dict[str, Any]] | None =
None
¶ The list for overriding study spec.
- stage_1_tuner_worker_pool_specs_override: dict[str, Any] | None =
None
¶ The dictionary for overriding stage 1 tuner worker pool spec.
- stage_2_trainer_worker_pool_specs_override: dict[str, Any] | None =
None
¶ The dictionary for overriding stage 2 trainer worker pool spec.
- encryption_spec_key_name: str | None =
None
¶ The KMS key name.
- model_display_name: str | None =
None
¶ Optional display name for model.
- model_description: str | None =
None
¶ Optional description.
- run_evaluation: bool =
True
¶ True
to evaluate the ensembled model on the test split.
- Returns¶:
Tuple of pipeline_definition_path and parameter_values.
-
preview.automl.forecasting.get_time_series_dense_encoder_forecasting_pipeline_and_parameters(*, project: str, location: str, root_dir: str, target_column: str, optimization_objective: str, transformations: dict[str, list[str]], train_budget_milli_node_hours: float, time_column: str, time_series_identifier_columns: list[str], time_series_identifier_column: str | None =
None
, time_series_attribute_columns: list[str] | None =None
, available_at_forecast_columns: list[str] | None =None
, unavailable_at_forecast_columns: list[str] | None =None
, forecast_horizon: int | None =None
, context_window: int | None =None
, evaluated_examples_bigquery_path: str | None =None
, window_predefined_column: str | None =None
, window_stride_length: int | None =None
, window_max_count: int | None =None
, holiday_regions: list[str] | None =None
, stage_1_num_parallel_trials: int | None =None
, stage_1_tuning_result_artifact_uri: str | None =None
, stage_2_num_parallel_trials: int | None =None
, num_selected_trials: int | None =None
, data_source_csv_filenames: str | None =None
, data_source_bigquery_table_path: str | None =None
, predefined_split_key: str | None =None
, training_fraction: float | None =None
, validation_fraction: float | None =None
, test_fraction: float | None =None
, weight_column: str | None =None
, dataflow_service_account: str | None =None
, dataflow_subnetwork: str | None =None
, dataflow_use_public_ips: bool =True
, feature_transform_engine_bigquery_staging_full_dataset_id: str =''
, feature_transform_engine_dataflow_machine_type: str ='n1-standard-16'
, feature_transform_engine_dataflow_max_num_workers: int =10
, feature_transform_engine_dataflow_disk_size_gb: int =40
, evaluation_batch_predict_machine_type: str ='n1-standard-16'
, evaluation_batch_predict_starting_replica_count: int =25
, evaluation_batch_predict_max_replica_count: int =25
, evaluation_dataflow_machine_type: str ='n1-standard-16'
, evaluation_dataflow_max_num_workers: int =25
, evaluation_dataflow_starting_num_workers: int =22
, evaluation_dataflow_disk_size_gb: int =50
, study_spec_parameters_override: list[dict[str, Any]] | None =None
, stage_1_tuner_worker_pool_specs_override: dict[str, Any] | None =None
, stage_2_trainer_worker_pool_specs_override: dict[str, Any] | None =None
, enable_probabilistic_inference: bool =False
, quantiles: list[float] | None =None
, encryption_spec_key_name: str | None =None
, model_display_name: str | None =None
, model_description: str | None =None
, run_evaluation: bool =True
, group_columns: list[str] | None =None
, group_total_weight: float =0.0
, temporal_total_weight: float =0.0
, group_temporal_total_weight: float =0.0
) tuple[str, dict[str, Any]] [source]¶ Returns timeseries_dense_encoder_forecasting pipeline and parameters.
- Parameters¶:
- project: str¶
The GCP project that runs the pipeline components.
- location: str¶
The GCP region that runs the pipeline components.
- root_dir: str¶
The root GCS directory for the pipeline components.
- target_column: str¶
The target column name.
- optimization_objective: str¶
“minimize-rmse”, “minimize-mae”, “minimize-rmsle”, “minimize-rmspe”, “minimize-wape-mae”, “minimize-mape”, or “minimize-quantile-loss”.
- transformations: dict[str, list[str]]¶
Dict mapping auto and/or type-resolutions to feature columns. The supported types are: auto, categorical, numeric, text, and timestamp.
- train_budget_milli_node_hours: float¶
The train budget of creating this model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour.
- time_column: str¶
The column that indicates the time.
- time_series_identifier_columns: list[str]¶
The columns which distinguish different time series.
- time_series_identifier_column: str | None =
None
¶ [Deprecated] The column which distinguishes different time series.
- time_series_attribute_columns: list[str] | None =
None
¶ The columns that are invariant across the same time series.
- available_at_forecast_columns: list[str] | None =
None
¶ The columns that are available at the forecast time.
The columns that are unavailable at the forecast time.
- forecast_horizon: int | None =
None
¶ The length of the horizon.
- context_window: int | None =
None
¶ The length of the context window.
- evaluated_examples_bigquery_path: str | None =
None
¶ The bigquery dataset to write the predicted examples into for evaluation, in the format
bq://project.dataset
.- window_predefined_column: str | None =
None
¶ The column that indicate the start of each window.
- window_stride_length: int | None =
None
¶ The stride length to generate the window.
- window_max_count: int | None =
None
¶ The maximum number of windows that will be generated.
- holiday_regions: list[str] | None =
None
¶ The geographical regions where the holiday effect is applied in modeling.
- stage_1_num_parallel_trials: int | None =
None
¶ Number of parallel trails for stage 1.
- stage_1_tuning_result_artifact_uri: str | None =
None
¶ The stage 1 tuning result artifact GCS URI.
- stage_2_num_parallel_trials: int | None =
None
¶ Number of parallel trails for stage 2.
- num_selected_trials: int | None =
None
¶ Number of selected trails.
- data_source_csv_filenames: str | None =
None
¶ A string that represents a list of comma separated CSV filenames.
- data_source_bigquery_table_path: str | None =
None
¶ The BigQuery table path of format bq://bq_project.bq_dataset.bq_table
- predefined_split_key: str | None =
None
¶ The predefined_split column name.
- training_fraction: float | None =
None
¶ The training fraction.
- validation_fraction: float | None =
None
¶ The validation fraction.
- test_fraction: float | None =
None
¶ The test fraction.
- weight_column: str | None =
None
¶ The weight column name.
- dataflow_service_account: str | None =
None
¶ The full service account name.
- dataflow_subnetwork: str | None =
None
¶ The dataflow subnetwork.
- dataflow_use_public_ips: bool =
True
¶ True
to enable dataflow public IPs.- feature_transform_engine_bigquery_staging_full_dataset_id: str =
''
¶ The full id of the feature transform engine staging dataset.
- feature_transform_engine_dataflow_machine_type: str =
'n1-standard-16'
¶ The dataflow machine type of the feature transform engine.
- feature_transform_engine_dataflow_max_num_workers: int =
10
¶ The max number of dataflow workers of the feature transform engine.
- feature_transform_engine_dataflow_disk_size_gb: int =
40
¶ The disk size of the dataflow workers of the feature transform engine.
- evaluation_batch_predict_machine_type: str =
'n1-standard-16'
¶ Machine type for the batch prediction job in evaluation, such as ‘n1-standard-16’.
- evaluation_batch_predict_starting_replica_count: int =
25
¶ Number of replicas to use in the batch prediction cluster at startup time.
- evaluation_batch_predict_max_replica_count: int =
25
¶ The maximum count of replicas the batch prediction job can scale to.
- evaluation_dataflow_machine_type: str =
'n1-standard-16'
¶ Machine type for the dataflow job in evaluation, such as ‘n1-standard-16’.
- evaluation_dataflow_max_num_workers: int =
25
¶ Maximum number of dataflow workers.
- evaluation_dataflow_starting_num_workers: int =
22
¶ Starting number of dataflow workers.
- evaluation_dataflow_disk_size_gb: int =
50
¶ The disk space in GB for dataflow.
- study_spec_parameters_override: list[dict[str, Any]] | None =
None
¶ The list for overriding study spec.
- stage_1_tuner_worker_pool_specs_override: dict[str, Any] | None =
None
¶ The dictionary for overriding stage 1 tuner worker pool spec.
- stage_2_trainer_worker_pool_specs_override: dict[str, Any] | None =
None
¶ The dictionary for overriding stage 2 trainer worker pool spec.
- enable_probabilistic_inference: bool =
False
¶ If probabilistic inference is enabled, the model will fit a distribution that captures the uncertainty of a prediction. If quantiles are specified, then the quantiles of the distribution are also returned.
- quantiles: list[float] | None =
None
¶ Quantiles to use for probabilistic inference. Up to 5 quantiles are allowed of values between 0 and 1, exclusive. Represents the quantiles to use for that objective. Quantiles must be unique.
- encryption_spec_key_name: str | None =
None
¶ The KMS key name.
- model_display_name: str | None =
None
¶ Optional display name for model.
- model_description: str | None =
None
¶ Optional description.
- run_evaluation: bool =
True
¶ True
to evaluate the ensembled model on the test split.- group_columns: list[str] | None =
None
¶ A list of time series attribute column names that define the time series hierarchy.
- group_total_weight: float =
0.0
¶ The weight of the loss for predictions aggregated over time series in the same group.
- temporal_total_weight: float =
0.0
¶ The weight of the loss for predictions aggregated over the horizon for a single time series.
- group_temporal_total_weight: float =
0.0
¶ The weight of the loss for predictions aggregated over both the horizon and time series in the same hierarchy group.
- Returns¶:
Tuple of pipeline_definition_path and parameter_values.