Temperature
In terms of application, you might want to use a lower temperature value for tasks like fact-based QA to encourage more factual and concise responses. For poem generation or other creative tasks, it might be beneficial to increase the temperature value.
Top P (Nucleus Sampling)
- Instead of selecting the next word based on the highest probability alone, nucleus sampling considers a subset of the vocabulary.
- This subset is chosen so that the cumulative probability of the words in the subset is greater than or equal to a threshold p and the subset is the smallest.
- Once this subset (or “nucleus”) is formed, the next word is sampled from this subset, introducing a level of randomness.
Vocab set ~ 50k, then the probability of giving the next word for a randomly initialized model is 1/50k. When p
is set low then the threshold is met early and with more factual and deterministic responses.
The general recommendation is to alter temperature or Top P but not both
Frequency Penalty
Applies a penalty on the next token proportional to how many times that token already appeared in the response and prompt.
Reduces the repetition of words in the model’s response by giving tokens that appear more a higher penalty.
Presence Penalty
Applies a penalty on repeated tokens but, unlike the frequency penalty, the penalty is the same for all repeated tokens. A token that appears twice and a token that appears 10 times are penalized the same. This setting prevents the model from repeating phrases too often in its response.
Creative Text
- High Penalty
Stay Focused
- Low Penalty
Similar to
temperature
andtop_p
, the general recommendation is to alter the frequency or presence penalty but not both.
Other Settings
- Max Length
- Stop Sequences
TOC
- Intro