It may exceed the output length limitation of the LLM if the full optimized code is requested.
The message I saw:
<incomplete triton code>
Expected to find output fields in the LM response: [optimized_code, next_thought]
Actual output fields parsed from the LM response: [optimized_code]
How about asking LLM to only return a patch instead of the full code?
A straightforward idea is to modify the prompt to the LLM to tell it to return a patch instead of the full optimized code. But the change may also affect the parser and other operations after LLM returns. Can anyone help with it? Thanks.
It may exceed the output length limitation of the LLM if the full optimized code is requested.
The message I saw:
How about asking LLM to only return a patch instead of the full code?
A straightforward idea is to modify the prompt to the LLM to tell it to return a patch instead of the full optimized code. But the change may also affect the parser and other operations after LLM returns. Can anyone help with it? Thanks.