Understanding Model Sizes

Language models come in different sizes, measured in parameters. More parameters generally mean better quality, but they also need more RAM and processing power.

Quick Reference

RAM	Recommended Model	Quality
4 GB	Qwen 3.5 0.6B	Basic conversations
6 GB	Qwen 3.5 1.5B	Good quality
8 GB	Qwen 3.5 4B	Great quality
12+ GB	Qwen 3.5 8B	Excellent quality

All Supported Model Families

MyLLM supports 6 model families, each with unique strengths:

Qwen 3.5 (0.6B to 8B) — Our top recommendation. Best balance of quality, speed, and multilingual support. Excels at code, math, and reasoning.
Llama 3.2 (1B, 3B) — Meta's latest. Great for general conversation and creative writing. Strong English performance.
Gemma 2 (2B) — Google's compact model. Excellent for factual Q&A and summarization. Surprisingly capable for its size.
Phi-3.5 Mini (3.8B) — Microsoft's small-but-mighty model. Outstanding at code generation and logical reasoning.
SmolLM2 (1.7B) — HuggingFace's efficient model. Fast and reliable for everyday tasks. Great on budget devices.
DeepSeek R1 (1.5B) — Specializes in reasoning and chain-of-thought. Good for complex problem solving.

What About Speed?

Speed depends on your processor. Modern chips like Snapdragon 8 Gen 2/3 and Google Tensor G3/G4 have excellent single-threaded performance, which is what matters for LLM inference.

Expect roughly:

0.6B model: 30-50 tokens/second
1.5B model: 15-25 tokens/second
4B model: 8-15 tokens/second
8B model: 4-8 tokens/second

Quantization Explained

Models come in different quantization levels that affect quality and size:

Q4_K_M — Best balance of quality and size. Our default recommendation.
Q5_K_M — Slightly better quality, ~25% larger files.
Q8_0 — Near-original quality, ~2x the file size. For power users with plenty of storage.

Our Recommendation

For most users, we recommend Qwen 3.5 4B (Q4_K_M quantization). It offers the best balance of quality, speed, and resource usage. It fits comfortably in 4-5 GB of RAM and generates text fast enough for comfortable conversation.

If your phone has only 4-6 GB of RAM, start with the 1.5B model and work your way up.

Tips for Best Performance

Close other apps — Free up as much RAM as possible
Keep your phone cool — Thermal throttling reduces speed significantly
Use the right quantization — Q4_K_M is the sweet spot for most models
Enable GPU acceleration — If your device supports it, this can significantly boost speed
Try different models — Each model family has different strengths. Experiment to find your favorite.

How to Choose the Right Model for Your Device