LLM¶
Large-language model preview components.
Pipelines:
|
Uses a large-language model to perform bulk inference on a prompt dataset. |
|
Performs reinforcement learning from AI feedback. |
|
Performs reinforcement learning from human feedback. |
-
preview.llm.infer_pipeline(large_model_reference: str, prompt_dataset: str, model_checkpoint: str | None =
None
, prompt_sequence_length: int =512
, target_sequence_length: int =64
, sampling_strategy: str ='greedy'
, instruction: str | None =None
, project: str ='{{$.pipeline_google_cloud_project_id}}'
, accelerator_type: str ='GPU'
, location: str ='{{$.pipeline_google_cloud_location}}'
, encryption_spec_key_name: str =''
) Outputs [source]¶ Uses a large-language model to perform bulk inference on a prompt dataset.
- Parameters¶:
- large_model_reference: str¶
Name of the base model. Supported values are
text-bison@001
,t5-small
,t5-large
,t5-xl
andt5-xxl
.text-bison@001
andt5-small
are supported inus-central1
andeurope-west4
.t5-large
,t5-xl
andt5-xxl
are only supported ineurope-west4
.- model_checkpoint: str | None =
None
¶ Optional Cloud storage path to the model checkpoint. If not provided, the default checkpoint for the
large_model_reference
will be used.- prompt_dataset: str¶
Cloud storage path to an unlabled JSONL dataset that contains prompts. Text datasets must contain an
input_text
field that contains the prompt. Chat datasets must contain at least 1 message in amessages
field. Each message must be valid JSON that containsauthor
andcontent
fields, where validauthor
values areuser
andassistant
andcontent
must be non-empty. Each row may contain multiple messages, but the first and last author must be theuser
. An optionalcontext
field may be provided for each example in a chat dataset. If provided, thecontext
will preprended to the messagecontent
. Theinstruction
serves as the default context. (Useful if most messages use the same system-level context.) Any context provided in the example will override the default value.- prompt_sequence_length: int =
512
¶ Maximum tokenized sequence length for input text. Higher values increase memory overhead. This value should be at most 8192. Default value is 512.
- target_sequence_length: int =
64
¶ Maximum tokenized sequence length for target text. Higher values increase memory overhead. This value should be at most 1024. Default value is 64.
- sampling_strategy: str =
'greedy'
¶ This field specifies the sampling strategy. The valid options are ‘greedy’ and ‘temperature_sampling’.
- instruction: str | None =
None
¶ This field lets the model know what task it needs to perform. Base models have been trained over a large set of varied instructions. You can give a simple and intuitive description of the task and the model will follow it, e.g. “Classify this movie review as positive or negative” or “Translate this sentence to Danish”. Do not specify this if your dataset already prepends the instruction to the inputs field.
- project: str =
'{{$.pipeline_google_cloud_project_id}}'
¶ Project used to run custom jobs. If not specified the project used to run the pipeline will be used.
- accelerator_type: str =
'GPU'
¶ One of ‘TPU’ or ‘GPU’. If ‘TPU’ is specified, tuning components run in europe-west4. Otherwise tuning components run in us-central1 on GPUs. Default is ‘GPU’.
- location: str =
'{{$.pipeline_google_cloud_location}}'
¶ Location used to run non-tuning components, i.e. components that do not require accelerators. If not specified the location used to run the pipeline will be used.
- encryption_spec_key_name: str =
''
¶ Customer-managed encryption key. If this is set, then all resources created by the CustomJob will be encrypted with the provided encryption key. Note that this is not supported for TPU at the moment.
- Returns¶:
Cloud storage path to output predictions.
-
preview.llm.rlaif_pipeline(prompt_dataset: str, preference_prompt_dataset: str, large_model_reference: str, model_display_name: str | None =
None
, prompt_sequence_length: int =512
, target_sequence_length: int =64
, large_model_a_reference: str ='text-bison@001'
, large_model_b_reference: str ='t5-small'
, reward_model_learning_rate_multiplier: float =1.0
, reinforcement_learning_rate_multiplier: float =1.0
, reward_model_train_steps: int =1000
, reinforcement_learning_train_steps: int =1000
, kl_coeff: float =0.1
, sampling_strategy: str ='temperature_sampling'
, instruction: str | None =None
, eval_dataset: str | None =None
, project: str ='{{$.pipeline_google_cloud_project_id}}'
, accelerator_type: str ='GPU'
, location: str ='{{$.pipeline_google_cloud_location}}'
, tensorboard_resource_id: str | None =None
) PipelineOutput [source]¶ Performs reinforcement learning from AI feedback.
At the moment, it only supports summarization task type.
- Parameters¶:
- prompt_dataset: str¶
Cloud storage path to an unlabled JSONL dataset that contains prompts. Text datasets must contain an
input_text
field that contains the prompt. Chat datasets must contain at least 1 message in amessages
field. Each message must be valid JSON that containsauthor
andcontent
fields, where validauthor
values areuser
andassistant
andcontent
must be non-empty. Each row may contain multiple messages, but the first and last author must be theuser
. An optionalcontext
field may be provided for each example in a chat dataset. If provided, thecontext
will preprended to the messagecontent
. Theinstruction
serves as the default context. (Useful if most messages use the same system-level context.) Any context provided in the example will override the default value.- preference_prompt_dataset: str¶
The prompt dataset used for two models’ inferences to build the side by side comparison AI feedback. large_model_reference: Name of the base model. Supported values are
text-bison@001
,t5-small
,t5-large
,t5-xl
andt5-xxl
.text-bison@001
andt5-small
are supported inus-central1
andeurope-west4
.t5-large
,t5-xl
andt5-xxl
are only supported ineurope-west4
.- model_display_name: str | None =
None
¶ Name of the fine-tuned model shown in the Model Registry. If not provided, a default name will be created.
- prompt_sequence_length: int =
512
¶ Maximum tokenized sequence length for input text. Higher values increase memory overhead. This value should be at most 8192. Default value is 512.
- target_sequence_length: int =
64
¶ Maximum tokenized sequence length for target text. Higher values increase memory overhead. This value should be at most 1024. Default value is 64.
- large_model_a_reference: str =
'text-bison@001'
¶ Name of a predefined model A for side by side comparison to build the AI feedback dataset. By default, it uses
text-bison@001
. The valid values aret5-small
,t5-large
,t5-xl
,t5-xxl
,text-bison@001
,llama-2-7b
,llama-2-13b
.- large_model_b_reference: str =
't5-small'
¶ Name of a predefined model B for side by side comparison to build the AI feedback dataset. By default, it uses
t5-small
. The valid values aret5-small
,t5-large
,t5-xl
,t5-xxl
,text-bison@001
,llama-2-7b
,llama-2-13b
.- reward_model_learning_rate_multiplier: float =
1.0
¶ Constant used to adjust the base learning rate used when training a reward model. Multiply by a number > 1 to increase the magnitude of updates applied at each training step or multiply by a number < 1 to decrease the magnitude of updates. Default value is 1.0.
- reinforcement_learning_rate_multiplier: float =
1.0
¶ Constant used to adjust the base learning rate used during reinforcement learning. Multiply by a number > 1 to increase the magnitude of updates applied at each training step or multiply by a number < 1 to decrease the magnitude of updates. Default value is 1.0.
- reward_model_train_steps: int =
1000
¶ Number of steps to use when training a reward model. Default value is 1000.
- reinforcement_learning_train_steps: int =
1000
¶ Number of reinforcement learning steps to perform when tuning a base model. Default value is 1000.
- kl_coeff: float =
0.1
¶ Coefficient for KL penalty. This regularizes the policy model and penalizes if it diverges from its initial distribution. If set to 0, the reference language model is not loaded into memory. Default value is 0.1.
- sampling_strategy: str =
'temperature_sampling'
¶ The strategy used to candidates for AI feedback. Default is temperature_sampling. Valid values are greedy, temperature_sampling
- instruction: str | None =
None
¶ This field lets the model know what task it needs to perform. Base models have been trained over a large set of varied instructions. You can give a simple and intuitive description of the task and the model will follow it, e.g., “Classify this movie review as positive or negative” or “Translate this sentence to Danish”. Do not specify this if your dataset already prepends the instruction to the inputs field.
- eval_dataset: str | None =
None
¶ Optional Cloud storage path to an evaluation dataset. If provided, inference will be performed on this dataset after training. The dataset format is jsonl. Each example in the dataset must contain a field
input_text
that contains the prompt.- project: str =
'{{$.pipeline_google_cloud_project_id}}'
¶ Project used to run custom jobs. If not specified the project used to run the pipeline will be used.
- accelerator_type: str =
'GPU'
¶ One of ‘TPU’ or ‘GPU’. If ‘TPU’ is specified, tuning components run in europe-west4. Otherwise tuning components run in us-central1 on GPUs. Default is ‘GPU’.
- location: str =
'{{$.pipeline_google_cloud_location}}'
¶ Location used to run custom jobs. If not specified the location used to run the pipeline will be used.
- tensorboard_resource_id: str | None =
None
¶ Optional tensorboard resource id in format
projects/{project_number}/locations/{location}/tensorboards/{tensorboard_id}
. If provided, tensorboard metrics will be uploaded to this location.
- Returns¶:
Path to the model uploaded to the Model Registry. This will be an empty string if the model was not deployed.
endpoint_resource_name: Path the Online Prediction Endpoint. This will be an empty string if the model was not deployed. :rtype: model_resource_name
-
preview.llm.rlhf_pipeline(prompt_dataset: str, preference_dataset: str, large_model_reference: str, model_display_name: str | None =
None
, prompt_sequence_length: int =512
, target_sequence_length: int =64
, reward_model_learning_rate_multiplier: float =1.0
, reinforcement_learning_rate_multiplier: float =1.0
, reward_model_train_steps: int =1000
, reinforcement_learning_train_steps: int =1000
, kl_coeff: float =0.1
, instruction: str | None =None
, deploy_model: bool =True
, eval_dataset: str | None =None
, project: str ='{{$.pipeline_google_cloud_project_id}}'
, accelerator_type: str ='GPU'
, location: str ='{{$.pipeline_google_cloud_location}}'
, encryption_spec_key_name: str =''
, tensorboard_resource_id: str | None =None
) Outputs [source]¶ Performs reinforcement learning from human feedback.
- Parameters¶:
- prompt_dataset: str¶
Cloud storage path to an unlabled JSONL dataset that contains prompts. Text datasets must contain an
input_text
field that contains the prompt. Chat datasets must contain at least 1 message in amessages
field. Each message must be valid JSON that containsauthor
andcontent
fields, where validauthor
values areuser
andassistant
andcontent
must be non-empty. Each row may contain multiple messages, but the first and last author must be theuser
. An optionalcontext
field may be provided for each example in a chat dataset. If provided, thecontext
will preprended to the messagecontent
. Theinstruction
serves as the default context. (Useful if most messages use the same system-level context.) Any context provided in the example will override the default value.- preference_dataset: str¶
Cloud storage path to a human preference JSONL dataset used to train a reward model. Each example in a preference dataset must contain
candidate_0
andcandidate_1
fields that contain candidate responses,choice
that specifies the preferred candidate and eitherinput_text
(if tuning a text model) ormessages
(if tuning a chat model). Chat datasets must contain at least 1 message in amessages
field. Each message must be valid JSON that containsauthor
andcontent
fields, where validauthor
values areuser
andassistant
andcontent
must be non-empty. Each row may contain multiple messages, but the first and last author must be theuser
. An optionalcontext
field may be provided for each example in a chat dataset. If provided, thecontext
will preprended to the messagecontent
. Theinstruction
serves as the default context. (Useful if most messages use the same system-level context.) Any context provided in the example will override the default value.- large_model_reference: str¶
Name of the base model. Supported values are
text-bison@001
,t5-small
,t5-large
,t5-xl
andt5-xxl
.text-bison@001
andt5-small
are supported inus-central1
andeurope-west4
.t5-large
,t5-xl
andt5-xxl
are only supported ineurope-west4
.- model_display_name: str | None =
None
¶ Name of the fine-tuned model shown in the Model Registry. If not provided, a default name will be created.
- prompt_sequence_length: int =
512
¶ Maximum tokenized sequence length for input text. Higher values increase memory overhead. This value should be at most 8192. Default value is 512.
- target_sequence_length: int =
64
¶ Maximum tokenized sequence length for target text. Higher values increase memory overhead. This value should be at most 1024. Default value is 64.
- reward_model_learning_rate_multiplier: float =
1.0
¶ Constant used to adjust the base learning rate used when training a reward model. Multiply by a number > 1 to increase the magnitude of updates applied at each training step or multiply by a number < 1 to decrease the magnitude of updates. Default value is 1.0.
- reinforcement_learning_rate_multiplier: float =
1.0
¶ Constant used to adjust the base learning rate used during reinforcement learning. Multiply by a number > 1 to increase the magnitude of updates applied at each training step or multiply by a number < 1 to decrease the magnitude of updates. Default value is 1.0.
- reward_model_train_steps: int =
1000
¶ Number of steps to use when training a reward model. Default value is 1000.
- reinforcement_learning_train_steps: int =
1000
¶ Number of reinforcement learning steps to perform when tuning a base model. Default value is 1000.
- kl_coeff: float =
0.1
¶ Coefficient for KL penalty. This regularizes the policy model and penalizes if it diverges from its initial distribution. If set to 0, the reference language model is not loaded into memory. Default value is 0.1.
- instruction: str | None =
None
¶ This field lets the model know what task it needs to perform. Base models have been trained over a large set of varied instructions. You can give a simple and intuitive description of the task and the model will follow it, e.g. “Classify this movie review as positive or negative” or “Translate this sentence to Danish”. Do not specify this if your dataset already prepends the instruction to the inputs field.
- deploy_model: bool =
True
¶ Whether to deploy the model to an endpoint in
us-central1
. Default is True.- eval_dataset: str | None =
None
¶ Optional Cloud storage path to an evaluation dataset. The dataset format is jsonl. The evaluation dataset can be used to compute train-time metrics (when training a reward model) or perform bulk inference for third-party models. To compute train-time metrics this dataset must contain the same fields as the peference dataset. For bulk inference with third-party models only
input_text
is needed. Note, train-time metrics are only computed for the first 5000 samples in the dataset for efficient evaluation during training.- project: str =
'{{$.pipeline_google_cloud_project_id}}'
¶ Project used to run custom jobs. If not specified the project used to run the pipeline will be used.
- accelerator_type: str =
'GPU'
¶ One of ‘TPU’ or ‘GPU’. If ‘TPU’ is specified, tuning components run in europe-west4. Otherwise tuning components run in us-central1 on GPUs. Default is ‘GPU’.
- location: str =
'{{$.pipeline_google_cloud_location}}'
¶ Location used to run non-tuning components, i.e. components that do not require accelerators. If not specified the location used to run the pipeline will be used.
- encryption_spec_key_name: str =
''
¶ Customer-managed encryption key. If this is set, then all resources created by the CustomJob will be encrypted with the provided encryption key. Note that this is not supported for TPU at the moment.
- tensorboard_resource_id: str | None =
None
¶ Optional tensorboard resource id in format
projects/{project_number}/locations/{location}/tensorboards/{tensorboard_id}
. If provided, tensorboard metrics will be uploaded to this location.
- Returns¶:
Path to the model uploaded to the Model Registry. This will be an empty string if the model was not deployed.
endpoint_resource_name: Path the Online Prediction Endpoint. This will be an empty string if the model was not deployed. :rtype: model_resource_name