• mstrk@lemmy.world
    link
    fedilink
    arrow-up
    2
    ·
    11 days ago

    I do use the 1.5b of whatever latest ollama with open web ui as frontend for my personal use. Although I can upload files and search the web it’s too slow on my machine.

    • If you’ve got a decent Nvidia GPU and are hoping on linux, look into the Kobold-cpp Vulkan backend, in my experience it works far better than the CUDA backend and is astronomically faster than the CPU-Only backend.

        • When/If you do, a RTX3070-lhr (about $300 new) is just about the BARE MINIMUM for gpu inferencing. Its what I use, it gets the job done, but I often find context limits too small to be usable with larger models.

          If you wanna go team red, Vulkan should still work for inferencing and you have access to options with significantly more VRAM, allowing you to more effectively use larger models. I’m not sure about speed though, I haven’t personally used AMDs GPUs since around 2015.