A note that this setup runs a 671B model in Q4 quantization at 3-4 TPS, running a Q8 would need something beefier. To run a 671B model in the original Q8 at 6-8 TPS you’d need a dual socket EPYC server motherboard with 768GB of RAM.

  • Sims@lemmy.ml
    link
    fedilink
    arrow-up
    2
    ·
    20 hours ago

    512GB 2400 ECC RAM $400

    Interesting. I just bled a Ryzen ai with 64gb 8000 Ram because I thought/learned that memory bandwidth was important, but here - with this size model, he recommends a better cpu before more ram, before ram speed. However, the price of fast ram is ridiculous, so maybe the tradeoff is okay.