technovangelist / notes / num_predict

num_predict

Num_predict is the maximum number of tokens to predict when generating text. It defaults to 128. Setting to -2 means fill the context and -1 means there is no limit. That doesn’t mean it will just keep going. If the model says the right answer is going to take 10 tokens, setting this won’t make it longer. This just sets the max. If the answer is going to be longer, then the answer just gets cut off.

#ollama/parameters