If you need or want to run an LLM on limited hardware, you may want to look into so-called bitnets with ternary connections. These should be efficient enough to run an OK LLM on a CPU with 16 GB of ram if not less. Unfortunately they’re barely out of the experimental stage, so you’ll probably have to compile BitNet.cpp yourself or wait a few months until full support lands in Ollama.
I haven’t run a bitnet myself yet, so I can’t personally vouch for their effectiveness or usefulness.
If you need or want to run an LLM on limited hardware, you may want to look into so-called bitnets with ternary connections. These should be efficient enough to run an OK LLM on a CPU with 16 GB of ram if not less. Unfortunately they’re barely out of the experimental stage, so you’ll probably have to compile BitNet.cpp yourself or wait a few months until full support lands in Ollama.
I haven’t run a bitnet myself yet, so I can’t personally vouch for their effectiveness or usefulness.