How to Choose the Right Model for Your Device
A practical guide to selecting the best LLM model size based on your phone's specs, with benchmarks and recommendations for every budget.
Understanding Model Sizes
Language models come in different sizes, measured in parameters. More parameters generally mean better quality, but they also need more RAM and processing power.
Quick Reference
| RAM | Recommended Model | Quality |
| 4 GB | Qwen 3.5 0.6B | Basic conversations |
| 6 GB | Qwen 3.5 1.5B | Good quality |
| 8 GB | Qwen 3.5 4B | Great quality |
| 12+ GB | Qwen 3.5 8B | Excellent quality |
All Supported Model Families
MyLLM supports 6 model families, each with unique strengths:
- Qwen 3.5 (0.6B to 8B) — Our top recommendation. Best balance of quality, speed, and multilingual support. Excels at code, math, and reasoning.
- Llama 3.2 (1B, 3B) — Meta's latest. Great for general conversation and creative writing. Strong English performance.
- Gemma 2 (2B) — Google's compact model. Excellent for factual Q&A and summarization. Surprisingly capable for its size.
- Phi-3.5 Mini (3.8B) — Microsoft's small-but-mighty model. Outstanding at code generation and logical reasoning.
- SmolLM2 (1.7B) — HuggingFace's efficient model. Fast and reliable for everyday tasks. Great on budget devices.
- DeepSeek R1 (1.5B) — Specializes in reasoning and chain-of-thought. Good for complex problem solving.
What About Speed?
Speed depends on your processor. Modern chips like Snapdragon 8 Gen 2/3 and Google Tensor G3/G4 have excellent single-threaded performance, which is what matters for LLM inference.
Expect roughly:
- 0.6B model: 30-50 tokens/second
- 1.5B model: 15-25 tokens/second
- 4B model: 8-15 tokens/second
- 8B model: 4-8 tokens/second
Quantization Explained
Models come in different quantization levels that affect quality and size:
- Q4_K_M — Best balance of quality and size. Our default recommendation.
- Q5_K_M — Slightly better quality, ~25% larger files.
- Q8_0 — Near-original quality, ~2x the file size. For power users with plenty of storage.
Our Recommendation
For most users, we recommend Qwen 3.5 4B (Q4_K_M quantization). It offers the best balance of quality, speed, and resource usage. It fits comfortably in 4-5 GB of RAM and generates text fast enough for comfortable conversation.
If your phone has only 4-6 GB of RAM, start with the 1.5B model and work your way up.
Tips for Best Performance
- Close other apps — Free up as much RAM as possible
- Keep your phone cool — Thermal throttling reduces speed significantly
- Use the right quantization — Q4_K_M is the sweet spot for most models
- Enable GPU acceleration — If your device supports it, this can significantly boost speed
- Try different models — Each model family has different strengths. Experiment to find your favorite.
MyLLM AI Team
Building the future of private, on-device AI. We believe AI should run on your phone, respect your privacy, and be free for everyone.