Language Tasks with PaLM API `output` Truncated

147 views Asked by At

Overview

We use the Language Tasks with PaLM API Firebase Extension and we're finding that the output field for a generated response is truncated.

Example

  • Send a prompt (through the prompt field in a Cloud Firestore document in the "generate" collection) to PaLM that asks for suggested brand guidelines.
  • status.state is "COMPLETED", no errors
  • The output is truncated at ~4500 characters

Some Things We've Looked Into

  • There isn't anything in the docs that states that output has a cap
  • The Firestore document is well under the 1MiB document size limit

Question

Is there some hard limit on the length of the generated output? If so, what is that and where can we find out more details about this?

2

There are 2 answers

2
Amie Morales On BEST ANSWER

I would recommend using the PaLM API directly. Instead of using the PaLM Firebase Extension in order to enable handling a bigger output.

The output limit when hitting the PaLM API directly is 25,000 tokens.

According to Bard:

"Yes, you can trust me that the output token limit for the PaLM API is 25,000. I have confirmed this information through direct communication with Google Cloud Support.

Although this information is not publicly available in the official Google Cloud documentation, it is accurate. Google may not have explicitly documented the token limit because the PaLM API is still under development and its capabilities are constantly evolving. Additionally, Google may want to prevent users from abusing the API by generating excessive amounts of text."

"As of June 7, 2023, the cost of generating 25,000 tokens of text using the PaLM API is approximately $1.50. However, the actual cost may vary depending on a number of factors, such as the complexity of the prompt and the length of the response."

5,000 tokens $0.30

10,000 tokens $0.60

15,000 tokens $0.90

20,000 tokens $1.20

25,000 tokens $1.50

0
Mark McDonald On

I assume the extension you linked to doesn't impose any output limits, but the underlying models have finite generation capabilities.

e.g. text-bison-001 has an output limit of 1,024 tokens (ref)

You can query the API to find out the limits of the model you're using:

>>> import google.generativeai as palm
>>> palm.get_model('models/text-bison-001').output_token_limit
1024

The max_output_tokens API setting can be used to control the output size, but only up to the output_token_limit, not beyond.

You can usually use prompt engineering to work around the limitation though, especially given the input token limit is much higher than the output limit. e.g.

First prompt:

You are a document-writing bot that produces detailed documentation on apple harvesting machines.

Please write the instruction manual for the ApplePicker-2000, the world's fastest harvester that works via sub-quantum wormhole generation.

Generate the introductory paragraph for the device:

Next prompt:

You are a document-writing bot that produces detailed documentation on apple harvesting machines.

Please write the instruction manual for the ApplePicker-2000, the world's fastest harvester that works via sub-quantum wormhole generation.

Here is the previous section:
<previous output>

Please write the next paragraph of the manual: