I’ve had pretty good luck running llamafile on my laptop. The speeds aren’t super fast, and I can only use the models that are Mistral 7B and smaller, but the results are good enough for casual use and general R and Python code.
Edit: my laptop doesn’t have a dedicated GPU, and I don’t think llamafile has support for Intel GPUs yet. CPU inference is still pretty quick.
Docker swarm is dead, docker swarm mode is still very alive