Researchers from Meta FAIR, the University of California, Berkeley, and New York University have introduced Thought Preference Optimization (TPO), a new method aimed at improving the response quality of instruction-fine tuned LLMs.
The scaling issue is more about having a product companies can sell with fast turnaround times on requests at a reasonable price. There’s a brick wall on that.
The work in this write up is about accuracy over speed. Maybe accuracy is the wrong word, but it’s meant to allow for tuning the NN to give more predictable and repeatable results at the expense of speed.
Two different problems.
The scaling issue is more about having a product companies can sell with fast turnaround times on requests at a reasonable price. There’s a brick wall on that.
The work in this write up is about accuracy over speed. Maybe accuracy is the wrong word, but it’s meant to allow for tuning the NN to give more predictable and repeatable results at the expense of speed.