Local Llm
Fleetingmodels that I tested
hf.co/bartowski/Llama-3.2-3B-Instruct-GGUF:Q4_K_L
simonw/llm
- External reference: https://github.com/simonw/llm
ollama
use Ollama with any GGUF Model on Hugging Face Hub
-
External reference: https://huggingface.co/docs/hub/en/ollama
The snippet would be in format:
ollama run hf.co/{username}/{repository}
Please note that you can use both hf.co and huggingface.co as the domain name.
Here are some models you can try:
ollama run hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF
ollama run hf.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated-GGUF
ollama run hf.co/arcee-ai/SuperNova-Medius-GGUF
ollama run hf.co/bartowski/Humanish-LLama3-8B-Instruct-GGUF
Custom Quantization
By default, the Q4_K_M quantization scheme is used, when it’s present inside the model repo. If not, we default to picking one reasonable quant type present inside the repo.
To select a different scheme, simply:
From Files and versions tab on a model page, open GGUF viewer on a particular GGUF file. Choose ollama from Use this model dropdown.