Run DeepSeek-R1 Locally: A Quick Guide

Run DeepSeek-R1 Locally: A Quick Guide

Everybody is talking about Deepseek. It has gained significant attention for its advanced capabilities as a large language model (LLM). In this blog, I’ll show you how to run DeepSeek-R1 on your local machine. Before diving into the setup, it's important to understand a couple of key concepts: Ollama, large language models (LLMs)

Understanding LLM

1. What is an LLM?

At its core, an LLM(Large Language Model) is a statistical tool trained to predict sequences of text by learning patterns from vast amounts of data. Think of it as a "next-word predictor" scaled to superhuman levels.


2. Foundational Components

a. Data

  • LLMs are trained on massive text datasets (books, websites, code, etc.).

  • Example: GPT-3 trained on ~500 billion words.

  • Principle: Exposure to diverse language patterns allows the model to generalize.

b. Architecture (Neural Networks)

  • Built using transformers (a type of neural network architecture).

  • Key parts:

    • Attention mechanisms: Focus on relevant parts of input text (e.g., "cat" links to "purred").

    • Layers: Stacked processing steps that refine predictions (like a factory assembly line).

  • Parameters: Numerical values (weights) adjusted during training.

    • Example: GPT-3 has 175 billion parameters.

c. Training Process

  1. Pre-training:

    • Learn to predict the next word in a sequence.

    • Example: Given "The sky is...", predict "blue".

    • Uses self-supervised learning (no human labels; learns from text itself).

  2. Fine-tuning (optional):

    • Adapt the model for specific tasks (e.g., medical Q&A, coding).

3. How It Works

Step 1: Tokenization

  • Break input text into tokens (words/subwords).

    • Example: "ChatGPT" → ["Chat", "G", "PT"].

Step 2: Context Embedding

  • Convert tokens into vectors (numerical representations) that encode meaning and context.

Step 3: Prediction

  • Use attention and neural layers to compute probabilities for the next token.

  • Example:

    • Input: "The capital of France is..."

    • Model assigns high probability to "Paris".

Step 4: Generation

  • Sample from the probability distribution to generate text (not just regurgitating memorized data).

4. Key Features

  • Scale: Effectiveness grows with model size and training data.

  • Zero-shot learning: Perform tasks without explicit training (e.g., translate English to French "on the fly").

  • Emergent abilities: Unplanned skills like arithmetic or logic at larger scales.


Understanding ollama

1. What is Ollama?

Ollama is an open-source tool designed to run large language models (LLMs) locally on your machine (like GPT, Llama, Mistral, etc.). It simplifies the process of downloading, managing, and interacting with LLMs without relying on cloud services.


2. Core Purpose

First-principle goal: Enable anyone to use advanced AI models locally with minimal setup, bypassing dependencies on external servers or APIs. This addresses:

  • Privacy: Keep data on your device.

  • Cost: Avoid paying for cloud-based API calls.

  • Latency: Reduce delays caused by network communication.

  • Customization: Use models tailored to specific needs.


3. How Does Ollama Work?

a. Model Files

  • LLMs are large neural networks pre-trained on vast datasets.

  • These models are stored as binary files (e.g., llama2-7b.Q4_K_M.gguf).

  • Ollama provides a library of pre-converted, optimized models (via ollama pull <model>).

b. Hardware Acceleration

  • Ollama leverages your local hardware (CPU/GPU) to run models efficiently.

  • Uses frameworks like CUDA (for NVIDIA GPUs) or Metal (for Apple Silicon) to speed up computations.

c. API Layer

  • Ollama exposes a REST API or CLI to interact with models, mimicking cloud services like OpenAI’s API.

  • Example:

curl http://localhost:11434/api/generate -d '{
      "model": "llama2",
      "prompt": "Why is the sky blue?"
    }'

4. Key Features

  • Local Execution: No data leaves your machine.

  • Model Library: Pre-optimized models (e.g., Llama 2, Mistral, CodeLlama, DeepSeek).

  • Cross-Platform: Runs on macOS, Linux, Windows (WSL2).

  • Extensibility: Integrate with apps via its API (e.g., chatbots, coding assistants).

  • Offline Use: Works without internet.

  • Custom Models: Fine-tune models for niche tasks (e.g., legal document analysis).


Understanding DeepSeek

1. What is DeepSeek?

  • DeepSeek is a Chinese-made artificial intelligence (AI) model

  • Key areas include coding, mathematics, reasoning, and domain-specific applications (e.g., finance, science).

  • Known for models like DeepSeek-Coder (code-focused) and DeepSeek-Math (math-focused), optimized for technical tasks.


2. What is Model Distillation?

Distillation trains a smaller model ("student") to mimic a larger, more capable model ("teacher").

  • Goal: Retain most of the teacher’s performance with fewer parameters.

  • How: Transfer knowledge (output probabilities, intermediate features) from the teacher to the student.

  • Result: Smaller, faster models that still perform well on specific tasks.


3. DeepSeek-R1 Distilled Models

The DeepSeek-R1 series uses distillation to create compact models optimized for niche tasks.

  • Example: DeepSeek-R1-Distill-Qwen-1.5B

    • Base Model: distilled from a larger model (e.g., Qwen2.5-Math-1.5B).

    • Focus: Coding tasks, particularly algorithmic problem-solving (e.g., Codeforces challenges).

The different distilled models released by DeepSeek-R1 are:

  • DeepSeek-R1-Distill-Qwen series: 1.5B, 7B, 14B, 32B.

  • DeepSeek-R1-Distill-Llama series: 8B, 70B.

From the table, DeepSeek-R1-Distill-Qwen-1.5B outperforms GPT-4o and Claude-3.5 in specific tasks like:

  • AIME 2024 (Math Competition)

  • MATH-500 (Math Reasoning)

  • Codeforces (Coding Competition)

While DeepSeek-R1-Distill-Qwen-1.5B excels in math and reasoning, it does not outperform GPT-4o and Claude 3.5 in all benchmarks. It underperforms in broader tasks (GPQA, LiveCode Bench) because it’s likely more optimized for math rather than general reasoning or coding.

Now that we have a basic understanding about ollama,LLM, & DeepSeek we can consider installing ollama, DeepSeek-R1 on our local. Below is my system config :-

Step 1 : Installing ollama

The installation of ollama on Linux is very simple. Just run the below command. You can download for Mac and Windows from their official website ollama

$ curl -fsSL https://ollama.com/install.sh | sh
$ ollama --version # verify the installation
ollama version is 0.5.7

Step 2 : Run the DeepSeek Model

The below will automatically pull the model from the registry and then run the model. I have choosen to run the lite mode i.e. deepseek-r1 with 1.5 because of my average system configurations.

1.5B models simply don’t have enough parameters to store all the information in the world. They’re primarily useful as writing tools. Having them use a web search or Wikipedia tool for fact lookup is really what you need for something this small. Hene, you might notice getting incorrect or inaccurate answers.

ollama run deepseek-r1:1.5b # install deepseek-r1 lite model

Step 3 : Installing Chatbox

ChatBox is a desktop client designed for interacting with multiple large language models (LLMs). It offers a single platform where you can access and manage multiple LLMs, including DeepSeek. This simplifies the process of switching between different models and functionalities without needing to open separate applications. ChatBox supports sending files (like PDFs, DOCs, etc.) to the model API, which can be particularly useful for tasks that require document analysis or processing.

$ wget https://download.chatboxai.app/releases/Chatbox-1.9.5-x86_64.AppImage
$ chmod +x Chatbox-1.9.5-x86_64.AppImage
$ ./Chatbox-1.9.5-x86_64.AppImage --no-sandbox # running ChatBox

Step 4 : Configuring Model in ChatBox

After the installation, you will be prompted with text window something like below :-

  • Select Model Provider : Ollama API

  • Select Model : deepseek-r1:1.5b

Now you are ready. Shoot your first query to the model and have fun. Also you can run if you don’t have internet( the best part )

References

https://medium.com/data-science-in-your-pocket/deepseek-r1-distill-qwen-1-5b-the-best-small-sized-llm-14eee304d94b