Skip to main content
Back to blog
GuideFebruary 25, 20264 min read
Share:

How to Choose the Right Model for Your Device

A practical guide to selecting the best LLM model size based on your phone's specs, with benchmarks and recommendations for every budget.

Understanding Model Sizes

Language models come in different sizes, measured in parameters. More parameters generally mean better quality, but they also need more RAM and processing power.

Quick Reference

RAMRecommended ModelQuality
4 GBQwen 3.5 0.6BBasic conversations
6 GBQwen 3.5 1.5BGood quality
8 GBQwen 3.5 4BGreat quality
12+ GBQwen 3.5 8BExcellent quality

All Supported Model Families

MyLLM supports 6 model families, each with unique strengths:

  • Qwen 3.5 (0.6B to 8B) — Our top recommendation. Best balance of quality, speed, and multilingual support. Excels at code, math, and reasoning.
  • Llama 3.2 (1B, 3B) — Meta's latest. Great for general conversation and creative writing. Strong English performance.
  • Gemma 2 (2B) — Google's compact model. Excellent for factual Q&A and summarization. Surprisingly capable for its size.
  • Phi-3.5 Mini (3.8B) — Microsoft's small-but-mighty model. Outstanding at code generation and logical reasoning.
  • SmolLM2 (1.7B) — HuggingFace's efficient model. Fast and reliable for everyday tasks. Great on budget devices.
  • DeepSeek R1 (1.5B) — Specializes in reasoning and chain-of-thought. Good for complex problem solving.

What About Speed?

Speed depends on your processor. Modern chips like Snapdragon 8 Gen 2/3 and Google Tensor G3/G4 have excellent single-threaded performance, which is what matters for LLM inference.

Expect roughly:

  • 0.6B model: 30-50 tokens/second
  • 1.5B model: 15-25 tokens/second
  • 4B model: 8-15 tokens/second
  • 8B model: 4-8 tokens/second

Quantization Explained

Models come in different quantization levels that affect quality and size:

  • Q4_K_M — Best balance of quality and size. Our default recommendation.
  • Q5_K_M — Slightly better quality, ~25% larger files.
  • Q8_0 — Near-original quality, ~2x the file size. For power users with plenty of storage.

Our Recommendation

For most users, we recommend Qwen 3.5 4B (Q4_K_M quantization). It offers the best balance of quality, speed, and resource usage. It fits comfortably in 4-5 GB of RAM and generates text fast enough for comfortable conversation.

If your phone has only 4-6 GB of RAM, start with the 1.5B model and work your way up.

Tips for Best Performance

  • Close other apps — Free up as much RAM as possible
  • Keep your phone cool — Thermal throttling reduces speed significantly
  • Use the right quantization — Q4_K_M is the sweet spot for most models
  • Enable GPU acceleration — If your device supports it, this can significantly boost speed
  • Try different models — Each model family has different strengths. Experiment to find your favorite.

MyLLM AI Team

Building the future of private, on-device AI. We believe AI should run on your phone, respect your privacy, and be free for everyone.

Stay in the loop

Get updates on new features

We'll send you occasional updates about new models, features, and releases. No spam, ever.