google_cloud_pipeline_components.v1.dataset module

Core modules for AI Platform Pipeline Components.

google_cloud_pipeline_components.v1.dataset.ImageDatasetCreateOp(project: str, display_name: str, location: str = 'us-central1', data_item_labels: dict = '{}', gcs_source: str = None, import_schema_uri: str = None, labels: dict = '{}', encryption_spec_key_name: str = None)

image_dataset_create Creates a new image dataset and optionally imports data into dataset when source and import_schema_uri are passed. Args:

display_name (String):

Required. The user-defined name of the Dataset. The name can be up to 128 characters long and can be consist of any UTF-8 characters.

gcs_source (Union[str, Sequence[str]]):

Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. examples:

str: “gs://bucket/file.csv” Sequence[str]: [“gs://bucket/file1.csv”, “gs://bucket/file2.csv”]

import_schema_uri (String):

Points to a YAML file stored on Google Cloud Storage describing the import format. Validation will be done against the schema. The schema is defined as an OpenAPI 3.0.2 Schema Object <https://tinyurl.com/y538mdwt>.

data_item_labels (JsonObject):

Labels that will be applied to newly imported DataItems. If an identical DataItem as one being imported already exists in the Dataset, then these labels will be appended to these of the already existing one, and if labels with identical key is imported before, the old label value will be overwritten. If two DataItems are identical in the same import data operation, the labels will be combined and if key collision happens in this case, one of the values will be picked randomly. Two DataItems are considered identical if their content bytes are identical (e.g. image bytes or pdf bytes). These labels will be overridden by Annotation labels specified inside index file refenced by import_schema_uri, e.g. jsonl file.

project (String):

Required. project to retrieve dataset from.

location (String):

Optional location to retrieve dataset from.

labels (JsonObject):

Optional. Labels with user-defined metadata to organize your Tensorboards. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. No more than 64 user labels can be associated with one Tensorboard (System labels are excluded). See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with “aiplatform.googleapis.com/” and are immutable.

encryption_spec_key_name (Optional[String]):

Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the dataset. Has the form: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. If set, this Dataset and all sub-resources of this Dataset will be secured by this key. Overrides encryption_spec_key_name set in aiplatform.init.

Returns:
dataset (google.VertexDataset):

Instantiated representation of the managed image dataset resource.

google_cloud_pipeline_components.v1.dataset.ImageDatasetExportDataOp(project: str, dataset: google.VertexDataset, output_dir: str, location: str = 'us-central1')

image_dataset_export Exports data to output dir to GCS. Args:

output_dir (String):

Required. The Google Cloud Storage location where the output is to be written to. In the given directory a new directory will be created with name: export-data-<dataset-display-name>-<timestamp-of-export-call> where timestamp is in YYYYMMDDHHMMSS format. All export output will be written into that directory. Inside that directory, annotations with the same schema will be grouped into sub directories which are named with the corresponding annotations’ schema title. Inside these sub directories, a schema.yaml will be created to describe the output format. If the uri doesn’t end with ‘/’, a ‘/’ will be automatically appended. The directory is created if it doesn’t exist.

project (String):

Required. project to retrieve dataset from.

location (String):

Optional location to retrieve dataset from.

Returns:
exported_files (Sequence[str]):

All of the files that are exported in this export operation.

google_cloud_pipeline_components.v1.dataset.ImageDatasetImportDataOp(project: str, dataset: google.VertexDataset, location: str = 'us-central1', data_item_labels: dict = '{}', gcs_source: str = None, import_schema_uri: str = None)

image_dataset_import Upload data to existing managed dataset. Args:

project (String):

Required. project to retrieve dataset from.

location (String):

Optional location to retrieve dataset from.

dataset (Dataset):

Required. The dataset to be updated.

gcs_source (Union[str, Sequence[str]]):

Required. Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. examples:

str: “gs://bucket/file.csv” Sequence[str]: [“gs://bucket/file1.csv”, “gs://bucket/file2.csv”]

import_schema_uri (String):

Required. Points to a YAML file stored on Google Cloud Storage describing the import format. Validation will be done against the schema. The schema is defined as an OpenAPI 3.0.2 Schema Object.

data_item_labels (JsonObject):

Labels that will be applied to newly imported DataItems. If an identical DataItem as one being imported already exists in the Dataset, then these labels will be appended to these of the already existing one, and if labels with identical key is imported before, the old label value will be overwritten. If two DataItems are identical in the same import data operation, the labels will be combined and if key collision happens in this case, one of the values will be picked randomly. Two DataItems are considered identical if their content bytes are identical (e.g. image bytes or pdf bytes). These labels will be overridden by Annotation labels specified inside index file refenced by import_schema_uri, e.g. jsonl file.

Returns:
dataset (Dataset):

Instantiated representation of the managed dataset resource.

google_cloud_pipeline_components.v1.dataset.TabularDatasetCreateOp(project: str, display_name: str, location: str = 'us-central1', gcs_source: str = None, bq_source: str = None, labels: dict = '{}', encryption_spec_key_name: str = None)

tabular_dataset_create Creates a new tabular dataset. Args:

display_name (String):

Required. The user-defined name of the Dataset. The name can be up to 128 characters long and can be consist of any UTF-8 characters.

gcs_source (Union[str, Sequence[str]]):

Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. examples:

str: “gs://bucket/file.csv” Sequence[str]: [“gs://bucket/file1.csv”, “gs://bucket/file2.csv”]

bq_source (String):

BigQuery URI to the input table. example:

“bq://project.dataset.table_name”

project (String):

Required. project to retrieve dataset from.

location (String):

Optional location to retrieve dataset from.

labels (JsonObject):

Optional. Labels with user-defined metadata to organize your Tensorboards. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. No more than 64 user labels can be associated with one Tensorboard (System labels are excluded). See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with “aiplatform.googleapis.com/” and are immutable.

encryption_spec_key_name (Optional[String]):

Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the dataset. Has the form: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. If set, this Dataset and all sub-resources of this Dataset will be secured by this key. Overrides encryption_spec_key_name set in aiplatform.init.

Returns:
tabular_dataset (TabularDataset):

Instantiated representation of the managed tabular dataset resource.

google_cloud_pipeline_components.v1.dataset.TabularDatasetExportDataOp(project: str, dataset: google.VertexDataset, output_dir: str, location: str = 'us-central1')

tabular_dataset_export Exports data to output dir to GCS. Args:

output_dir (String):

Required. The Google Cloud Storage location where the output is to be written to. In the given directory a new directory will be created with name: export-data-<dataset-display-name>-<timestamp-of-export-call> where timestamp is in YYYYMMDDHHMMSS format. All export output will be written into that directory. Inside that directory, annotations with the same schema will be grouped into sub directories which are named with the corresponding annotations’ schema title. Inside these sub directories, a schema.yaml will be created to describe the output format. If the uri doesn’t end with ‘/’, a ‘/’ will be automatically appended. The directory is created if it doesn’t exist.

project (String):

Required. project to retrieve dataset from.

location (String):

Optional location to retrieve dataset from.

Returns:
exported_files (Sequence[str]):

All of the files that are exported in this export operation.

google_cloud_pipeline_components.v1.dataset.TextDatasetCreateOp(project: str, display_name: str, location: str = 'us-central1', data_item_labels: dict = '{}', gcs_source: str = None, import_schema_uri: str = None, labels: dict = '{}', encryption_spec_key_name: str = None)

text_dataset_create Creates a new text dataset and optionally imports data into dataset when source and import_schema_uri are passed. Args:

display_name (String):

Required. The user-defined name of the Dataset. The name can be up to 128 characters long and can be consist of any UTF-8 characters.

gcs_source (Union[str, Sequence[str]]):

Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. examples:

str: “gs://bucket/file.csv” Sequence[str]: [“gs://bucket/file1.csv”, “gs://bucket/file2.csv”]

import_schema_uri (String):

Points to a YAML file stored on Google Cloud Storage describing the import format. Validation will be done against the schema. The schema is defined as an OpenAPI 3.0.2 Schema Object.

data_item_labels (JsonObject):

Labels that will be applied to newly imported DataItems. If an identical DataItem as one being imported already exists in the Dataset, then these labels will be appended to these of the already existing one, and if labels with identical key is imported before, the old label value will be overwritten. If two DataItems are identical in the same import data operation, the labels will be combined and if key collision happens in this case, one of the values will be picked randomly. Two DataItems are considered identical if their content bytes are identical (e.g. image bytes or pdf bytes). These labels will be overridden by Annotation labels specified inside index file refenced by import_schema_uri, e.g. jsonl file.

project (String):

Required. project to retrieve dataset from.

location (String):

Optional location to retrieve dataset from.

labels (JsonObject):

Optional. Labels with user-defined metadata to organize your Tensorboards. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. No more than 64 user labels can be associated with one Tensorboard (System labels are excluded). See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with “aiplatform.googleapis.com/” and are immutable.

encryption_spec_key_name (Optional[String]):

Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the dataset. Has the form: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. If set, this Dataset and all sub-resources of this Dataset will be secured by this key. Overrides encryption_spec_key_name set in aiplatform.init.

Returns:
text_dataset (TextDataset):

Instantiated representation of the managed text dataset resource.

google_cloud_pipeline_components.v1.dataset.TextDatasetExportDataOp(project: str, dataset: google.VertexDataset, output_dir: str, location: str = 'us-central1')

text_dataset_export Exports data to output dir to GCS. Args:

output_dir (String):

Required. The Google Cloud Storage location where the output is to be written to. In the given directory a new directory will be created with name: export-data-<dataset-display-name>-<timestamp-of-export-call> where timestamp is in YYYYMMDDHHMMSS format. All export output will be written into that directory. Inside that directory, annotations with the same schema will be grouped into sub directories which are named with the corresponding annotations’ schema title. Inside these sub directories, a schema.yaml will be created to describe the output format. If the uri doesn’t end with ‘/’, a ‘/’ will be automatically appended. The directory is created if it doesn’t exist.

project (String):

Required. project to retrieve dataset from.

location (String):

Optional location to retrieve dataset from.

Returns:
exported_files (Sequence[str]):

All of the files that are exported in this export operation.

google_cloud_pipeline_components.v1.dataset.TextDatasetImportDataOp(project: str, dataset: google.VertexDataset, location: str = 'us-central1', data_item_labels: dict = '{}', gcs_source: str = None, import_schema_uri: str = None)

text_dataset_import Upload data to existing managed dataset. Args:

project (String):

Required. project to retrieve dataset from.

location (String):

Optional location to retrieve dataset from.

dataset (Dataset):

Required. The dataset to be updated.

gcs_source (Union[str, Sequence[str]]):

Required. Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. examples:

str: “gs://bucket/file.csv” Sequence[str]: [“gs://bucket/file1.csv”, “gs://bucket/file2.csv”]

import_schema_uri (String):

Required. Points to a YAML file stored on Google Cloud Storage describing the import format. Validation will be done against the schema. The schema is defined as an OpenAPI 3.0.2 Schema Object.

data_item_labels (JsonObject):

Labels that will be applied to newly imported DataItems. If an identical DataItem as one being imported already exists in the Dataset, then these labels will be appended to these of the already existing one, and if labels with identical key is imported before, the old label value will be overwritten. If two DataItems are identical in the same import data operation, the labels will be combined and if key collision happens in this case, one of the values will be picked randomly. Two DataItems are considered identical if their content bytes are identical (e.g. image bytes or pdf bytes). These labels will be overridden by Annotation labels specified inside index file refenced by import_schema_uri, e.g. jsonl file.

Returns:
dataset (Dataset):

Instantiated representation of the managed dataset resource.

google_cloud_pipeline_components.v1.dataset.TimeSeriesDatasetCreateOp(project: str, display_name: str, location: str = 'us-central1', gcs_source: str = None, bq_source: str = None, labels: dict = '{}', encryption_spec_key_name: str = None)

time_series_dataset_create Creates a new time series dataset. Args:

display_name (String):

Required. The user-defined name of the Dataset. The name can be up to 128 characters long and can be consist of any UTF-8 characters.

gcs_source (Union[str, Sequence[str]]):

Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. examples:

str: “gs://bucket/file.csv” Sequence[str]: [“gs://bucket/file1.csv”, “gs://bucket/file2.csv”]

bq_source (String):

BigQuery URI to the input table. example:

“bq://project.dataset.table_name”

project (String):

Required. project to retrieve dataset from.

location (String):

Optional location to retrieve dataset from.

labels (JsonObject):

Optional. Labels with user-defined metadata to organize your Tensorboards. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. No more than 64 user labels can be associated with one Tensorboard (System labels are excluded). See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with “aiplatform.googleapis.com/” and are immutable.

encryption_spec_key_name (Optional[String]):

Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the dataset. Has the form: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. If set, this Dataset and all sub-resources of this Dataset will be secured by this key. Overrides encryption_spec_key_name set in aiplatform.init.

Returns:
time_series_dataset (TimeSeriesDataset):

Instantiated representation of the managed time series dataset resource.

google_cloud_pipeline_components.v1.dataset.TimeSeriesDatasetExportDataOp(project: str, dataset: google.VertexDataset, output_dir: str, location: str = 'us-central1')

time_series_dataset_export Exports data to output dir to GCS. Args:

output_dir (String):

Required. The Google Cloud Storage location where the output is to be written to. In the given directory a new directory will be created with name: export-data-<dataset-display-name>-<timestamp-of-export-call> where timestamp is in YYYYMMDDHHMMSS format. All export output will be written into that directory. Inside that directory, annotations with the same schema will be grouped into sub directories which are named with the corresponding annotations’ schema title. Inside these sub directories, a schema.yaml will be created to describe the output format. If the uri doesn’t end with ‘/’, a ‘/’ will be automatically appended. The directory is created if it doesn’t exist.

project (String):

Required. project to retrieve dataset from.

location (String):

Optional location to retrieve dataset from.

Returns:
exported_files (Sequence[str]):

All of the files that are exported in this export operation.

google_cloud_pipeline_components.v1.dataset.VideoDatasetCreateOp(project: str, display_name: str, location: str = 'us-central1', data_item_labels: dict = '{}', gcs_source: str = None, import_schema_uri: str = None, labels: dict = '{}', encryption_spec_key_name: str = None)

video_dataset_create Creates a new video dataset and optionally imports data into dataset when source and import_schema_uri are passed. Args:

display_name (String):

Required. The user-defined name of the Dataset. The name can be up to 128 characters long and can be consist of any UTF-8 characters.

gcs_source (Union[str, Sequence[str]]):

Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. examples:

str: “gs://bucket/file.csv” Sequence[str]: [“gs://bucket/file1.csv”, “gs://bucket/file2.csv”]

import_schema_uri (String):

Points to a YAML file stored on Google Cloud Storage describing the import format. Validation will be done against the schema. The schema is defined as an OpenAPI 3.0.2 Schema Object.

data_item_labels (JsonObject):

Labels that will be applied to newly imported DataItems. If an identical DataItem as one being imported already exists in the Dataset, then these labels will be appended to these of the already existing one, and if labels with identical key is imported before, the old label value will be overwritten. If two DataItems are identical in the same import data operation, the labels will be combined and if key collision happens in this case, one of the values will be picked randomly. Two DataItems are considered identical if their content bytes are identical (e.g. image bytes or pdf bytes). These labels will be overridden by Annotation labels specified inside index file refenced by import_schema_uri,

project (String):

Required. project to retrieve dataset from.

location (String):

Optional location to retrieve dataset from.

labels (JsonObject):

Optional. Labels with user-defined metadata to organize your Tensorboards. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. No more than 64 user labels can be associated with one Tensorboard (System labels are excluded). See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with “aiplatform.googleapis.com/” and are immutable.

encryption_spec_key_name (Optional[String]):

Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the dataset. Has the form: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. If set, this Dataset and all sub-resources of this Dataset will be secured by this key. Overrides encryption_spec_key_name set in aiplatform.init.

Returns:
video_dataset (VideoDataset):

Instantiated representation of the managed video dataset resource.

google_cloud_pipeline_components.v1.dataset.VideoDatasetExportDataOp(project: str, dataset: google.VertexDataset, output_dir: str, location: str = 'us-central1')

video_dataset_export Exports data to output dir to GCS. Args:

output_dir (String):

Required. The Google Cloud Storage location where the output is to be written to. In the given directory a new directory will be created with name: export-data-<dataset-display-name>-<timestamp-of-export-call> where timestamp is in YYYYMMDDHHMMSS format. All export output will be written into that directory. Inside that directory, annotations with the same schema will be grouped into sub directories which are named with the corresponding annotations’ schema title. Inside these sub directories, a schema.yaml will be created to describe the output format. If the uri doesn’t end with ‘/’, a ‘/’ will be automatically appended. The directory is created if it doesn’t exist.

project (String):

Required. project to retrieve dataset from.

location (String):

Optional location to retrieve dataset from.

Returns:
exported_files (Sequence[str]):

All of the files that are exported in this export operation.

google_cloud_pipeline_components.v1.dataset.VideoDatasetImportDataOp(project: str, dataset: google.VertexDataset, location: str = 'us-central1', data_item_labels: dict = '{}', gcs_source: str = None, import_schema_uri: str = None)

video_dataset_import Upload data to existing managed dataset. Args:

project (String):

Required. project to retrieve dataset from.

location (String):

Optional location to retrieve dataset from.

dataset (Dataset):

Required. The dataset to be updated.

gcs_source (Union[str, Sequence[str]]):

Required. Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. examples:

str: “gs://bucket/file.csv” Sequence[str]: [“gs://bucket/file1.csv”, “gs://bucket/file2.csv”]

import_schema_uri (String):

Required. Points to a YAML file stored on Google Cloud Storage describing the import format. Validation will be done against the schema. The schema is defined as an OpenAPI 3.0.2 Schema Object.

data_item_labels (JsonObject):

Labels that will be applied to newly imported DataItems. If an identical DataItem as one being imported already exists in the Dataset, then these labels will be appended to these of the already existing one, and if labels with identical key is imported before, the old label value will be overwritten. If two DataItems are identical in the same import data operation, the labels will be combined and if key collision happens in this case, one of the values will be picked randomly. Two DataItems are considered identical if their content bytes are identical (e.g. image bytes or pdf bytes). These labels will be overridden by Annotation labels specified inside index file refenced by import_schema_uri, e.g. jsonl file.

Returns:
dataset (Dataset):

Instantiated representation of the managed dataset resource.