Abacus.ai:

We recently released Smaug-72B-v0.1 which has taken first place on the Open LLM Leaderboard by HuggingFace. It is the first open-source model to have an average score more than 80.

  • Toes♀
    link
    fedilink
    English
    arrow-up
    8
    arrow-down
    1
    ·
    5 months ago

    4GB is practically nothing in this space. Ideally you want at least 10GB of dedicated vram if you can’t get even more. Keep in mind you’re also probably trying to share that vram with your operating system. So it’s more like ~3GB before you even started.

    Kolboldcpp is capable of using both your GPU and CPU together, you might wanna consider that. (Using a feature called layers) There’s a trade-off that occurs between the memory available and the quality of its output and the speed of the calculation.

    The model mentioned in this post can be run on the CPU with enough system ram or swap.

    If you wanna keep it all on the GPU check out 4bit models. Also there’s been a lot of work into trying to do this with the raspberry Pi. I suspect that their work could help you out here as well.