What's the best Local AI Model to run?

all models are ai models but i really want to know which models is the best to use.
obviously those has to be good quality.

deepseek grok qwen or gemini chatgpt phi cohere or gemma ,granite

I would advice on polishing your post if possible please, I genuinely thought it was a new user with a low effort post but no you joined since 2023, and I don’t wanna be rude but that’s quite embarrassing.

Anywho that is my advice, However I understood your question and I’m gonna get into it.
DeepSeek has been regarded as one of the best models to run locally since if the numbers are true, It’s on par with OpenAI’s GPT-o1, If not then it does more or less the same things as it, which is the thought process and then output, but then keep in mind that no model is perfect and does heavily depend on how many parameters you’re planning to run to my understanding.

I don’t have the answer, but I’m just gonna say this: round up some cash to buy a few RTX 4090s. You’re gonna need them if you want to run “the best local AI model”.

1 Like

You can look at some benchmarks. I would advise aider.chat for coding performance, matharena.ai for math, and Live Bench for overall performance.

Except Aider, the two other benchmarks have minimal or no contamination, as they regularly update the challenges.

or a 5090, but yeah then I guess I wasn’t wrong on the parameter part.

1 Like

If there was “a best” model across the board for all situations then we wouldn’t have such a plethora of options.

It’s not really possible to make a good recommendation without knowing your hardware limitations, and your goals/how you plan to use AI. You should get in the habit of including relevant details and context in your posts/questions.

  • If you want “the best” I suppose you could run Deepseek R1 locally, but the upfront hardware cost would be like 10K+

  • If you want a good place to start without excessive hardware requirements start with a 7B t o14B model (assuming you have a GPU with 8-16GB VRAM). Llama 3.3 8B, or Mistral Nemo are two popular choices,

  • if you have a high end system with a lot of high bandwidth memory (e.g. an M1/2/3/4 w/ a lot of memory or 1-3 3090’s you could try Mistral Small 3.1, or Qwen 32B or Llama 3.3 70B.

  • If you want to explore on your own (my recommendation, because there is no “best” I’d suggest trying lots of different models on huggingface and/or locally.

It’s not really a question anyone can answer for you, there are too many personal variables that you haven’t provided, and no model that is best at all things across all areas of knowledge and use cases.

2 Likes

I was running qwq 32b locally on my laptop it is slow but it works. And i really don’t think i can run deepseek without gpu.

Im hardly a beginner, much less an expert but to my understanding, generally the larger the number of parameters attached to the model, the “smarter” the AI you have.

The best LLM/AI that you can run is limited by the size of the VRAM of GPU that you have:

The largest consumer GPU is the 5090 with 32 GB of VRAM.

There are upcoming (or released even?) Ryzen Strix Halo line for laptops and HTPC with fixed shared fast RAM like they have on those fancy Apple M_ chips. The largest RAM it has 128GB which should fit the larger models.

What is this weird gatekeeping?

You can run <=13b models just fine on CPU only with 16GB of RAM and still get decent responses in 20-40 seconds.

It’d require really hefty hardware to run Deepseek. (iirc Deepseek is like 670B, and requires many hundreds of GB of VRAM or LPDDR5x/DDR5 at slower speeds).

But if you can run 32B models at a speed that you are okay with, you’ve got lots of options as to the models you can run. Mistral Small 3.1 is a popular ~24B model right now. What sorts of things do you use AI for? The model you find “best” will probably depend on what you want to use it for (e.g. what is best for creative writing, and what is best for coding will often differ)

Well its not really gatekeeping, OP was looking for “the best” that could be run locally. Yes you can run the smaller ones in CPU alone but the token generation speed isnt as nice as in GPU.

Had OP asked what was good enough, I would have answered the same.


The unsloth guys over on huggingface are good at reducing memory requirements.

1 Like

that is not the case always https://artificialanalysis.ai/
qwq 32 B is a 32 b model you can run on anything but it will be slower
but deepseek r1 is a 671b model which is huge but it is just slightly better as per bench mark. it depends on the design and the data and many other parameters.

True.

BTW AMD also have great GPUs, buy one with ROCM support and you will be fine.

Intel also now makes some GPUs, which could be a bargain depending on whether you can buy it at MSRP.