cross-posted from: https://lemmy.world/post/708817

Visit TheBloke’s HuggingFace page to see all of the new models in their SuperHOT glory.

SuperHOT models are LLMs who’s LoRAs have been adapted to support a context length of 8,000 tokens!

For reference, this is x4 times the default amount of many LLMs (i.e. 2048 tokens). Even some of the newer ones can only reach a context length of 4096 tokens, half the amount of these SuperHOT models!

Here are a few that were released if you couldn’t view his HuggingFace:

New GPTQ Models from TheBloke

  • airoboros (13B)
  • CAMEL (13B)
  • Chronos (13B)
  • Guanaco (13B & 33B)
  • Manticore (13B)
  • Minotaur (13B)
  • Nous Hermes (13B)
  • Pygmalion (13B)
  • Samantha (13B & 33B)
  • Snoozy (13B)
  • Tulu (13B & 33B)
  • Vicuna (13B & 33B)
  • WizardLM (13B)

We owe a tremendous thank you to TheBloke, who has enabled many of us in the community to interact with versions of Manticore, Nous Hermes, WizardLM and others running the remarkable 8k context length from SuperHOT.

Many of these are 13B models, which should be compatible with consumer grade GPUs. Try using Exllama or Oobabooga for testing out these new formats.

Shoutout to Kaikendev for the creation of SuperHOT. You can learn more about their work here. or in Meta’s new research paper covering this method.

If you enjoyed reading this, please consider subscribing to /c/FOSAI where I do my best to keep you in the know with the latest and greatest advancements regarding free open-source artificial intelligence.