11. Large AI Model Courses

11.1 AI Model Deployment: Ollama

Large Language Models (LLMs) are advanced text generation systems powered by artificial intelligence. Their key feature is the ability to learn and understand human language through vast training datasets, enabling them to generate natural and fluent text.

Ollama is an open-source tool designed to simplify the deployment and operation of large language models. It allows users to run high-quality language models within a local network environment without relying on cloud services. The tool features a simple, user-friendly command-line interface, making it easy for users to deploy and manage various open-source LLMs.

Model Specifications Compatible Boards
Raspberry Pi 5 (8GB)
Raspberry Pi 5 (4GB)

11.1.1 Ollama Installation

Follow this tutorial to install Ollama, which will run in CPU-only mode by default. To run with GPU support, you need to install from source or directly flash the image we provide.

Before proceeding with the deployment, ensure the board is connected to the internet.

  1. Install curl by entering the following command:

    sudo apt install curl
    
  2. Install Ollama by running:

    curl -fsSL https://ollama.com/install.sh \| sh
    

Note

Note: Installation time may vary depending on your network environment. The entire process may take a while, so please be patient! If installation fails, try again or reboot and attempt once more.

11.1.2 Using Ollama

After installation, type ollama to see the following prompt:

Command Function
ollama serve Start Ollama
ollama create Create a model from a model file
ollama show Display model information
ollama run Run a model
Ollama pull Pull a model from the registry
ollama push Push a model to the registry
ollama list List models
ollama ps List running models
ollama cp Copy a model
Ollama rm Remove a model
Ollama help Get help information for any command

11.1.3 Uninstalling Ollama

To uninstall the Ollama tool, follow the steps below:

  • Remove the Service

sudo rm /etc/systemd/system/ollama.service
  • Remove the Files

    sudo rm $(which ollama)
    
  • Remove Models and Service User and Group

    sudo rm -r /usr/share/ollama
    sudo userdel ollama
    sudo groupdel ollama
    

Note

Note: If you encounter an error when running sudo userdel ollama because the user is currently in use, follow these steps:

  1. The process with ID 12614 is in use in this example, please check the actual process ID on your system. You need to find and stop the process:

    sudo kill 12614
    
  2. Next, check the group members:

    getent group ollama
    
  3. Then remove ollama from the group:

    sudo gpasswd -d username ollama
    

Replace username with the actual username, for example, hiwonder.

  1. After removing the user from the group, you can run the command again to successfully delete the user.

    sudo userdel ollama
    

11.1.4 References

Official Website:

GitHub:https://github.com/ollama/ollama

11.2 Installing the Large Model Chat Platform

11.2.1 Installing Models with Ollama

Before using a model, you need to install it. Start by visiting the Ollama website from the browser on your Raspberry Pi:

Ollama website: https://ollama.com/

  1. In the top-right corner, click on Models to browse available models.

  1. You can also use the search bar to find a specific model. For example, let’s take the Qwen model.

  1. On the model’s page, you’ll find details about all available versions.

  1. Once you’ve chosen the right model, you can use the following command to pull it. For example, to pull qwen2.5:0.5b:

    ollama pull qwen2.50.5b
    

To download other models, you may repeat the same steps.

11.2.2 Open WebUI

Open WebUI is an open-source project designed to provide a simple and user-friendly interface for managing and monitoring open-source software and services.

Supported Boards:

Board Model Supported
Raspberry Pi 5 4GB
Raspberry Pi 5 8G

When using Open WebUI, you may encounter issues like unresponsive dialogues or timeouts. In such cases, try restarting Open WebUI or use Ollama to run the models instead.

This tutorial demonstrates how to install Open WebUI using Docker.

  • Docker Installation

If you’re using the image we provide, Docker is already installed.

  1. Update the local package list:

    sudo apt update
    
  1. Upgrade the installed packages:

    sudo apt upgrade
    
  • Installing Open WebUI

For systems with Docker already installed, you can directly enter the following command in the terminal:

sudo docker pull ghcr.io/open-webui/open-webui:main

Note

Note: The installation process may take some time, so please be patient!

  • Running Open WebUI

  1. To start Open WebUI, run the following command using Docker:

    sudo docker run --network=host -v open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://127.0.0.1:11434 --name open-webui --restart always ghcr.io/open-webui/open-webui:main
    
  1. Once successfully started, open your system’s browser and visit the following URL:

http://localhost:8080/

  1. For first-time use, you need to register an account. This account will be the administrator account, and you can fill in the required information as needed. Taking hiwonder as an example:

Username: hiwonder

Email: hiwonder@qq.com

Password: hiwonder

IMG_256
  1. Once logged in successfully, the interface will appear as shown below.

1. Demo

Using Open WebUI for dialogue may be slower than running directly with the Ollama tool, and you might even encounter service connection timeouts. This is related to the size of the board’s memory and cannot be avoided!

2. Switching Models

If you have downloaded multiple models, you can click on Select a model to choose a specific model for conversation. Models pulled using Ollama will automatically be added to the model options in Open WebUI.

11.2.3 Closing Open WebUI

  • To check running Docker containers:

    docker ps
    
  • To stop a running Docker container:

    docker stop [CONTAINER ID] # For example: docker stop 5f42ee9cf784
    

Be cautious when following the next steps, as they involve removing containers.

  • To view all stopped containers:

    docker ps -a
    
  • To remove a stopped container:

    docker rm [CONTAINER ID] # For example: docker rm 5f42ee9cf784
    
  • To remove all stopped containers:

    docker container prune
    

11.2.4 FAQ

  • Service Connection Timeout

Error Message: Open WebUI: Server Connection Error

Solution: Close Open WebUI and restart it. After that, try asking your question again, or alternatively, use the Ollama tool to run the model and ask your question.

11.3 Meta AI: Llama 3 Model

11.3.1 Llama3 Introduction

Meta’s Llama 3 is a series of advanced open-source large language models (LLMs) developed by Meta AI. Llama 3 has demonstrated state-of-the-art performance across various industry benchmarks and introduces new features, including enhanced inference capabilities.

On the architecture side, Llama 3 uses the standard decoder-only Transformer architecture and employs a tokenizer with a 128K token vocabulary. Llama 3 was pre-trained on Meta’s custom-built 24K GPU clusters, using over 15 terabytes of publicly available data, 5% of which is non-English content, covering more than 30 languages. The training dataset is seven times larger than that of the previous Llama 2, with four times as much code.

  • Model Specifications

Llama 3 comes in multiple versions, allowing you to choose based on the board’s configuration.

Model Specifications Compatible Boards
Llama3.2 Raspberry Pi 5 (4GB) Raspberry Pi 5 (8GB)
1B
3B
  • Performance

11.3.2 Running Llama 3

  1. To run Llama 3.2:1B, enter the following command. If the model has not been pulled yet, it will be downloaded first.

    ollama run llama3.2:1b
    
  1. After entering a question, press Enter to send it. Response time depends on the hardware configuration, so please be patient!

    Example questions: Write a 100-word copy on how technology changes life.
    
    A rectangular nursery has an area of 120m². The length is 2m more than the width. What are the length and width of the nursery?
    
    What is the capital of the United States?
    What is its area?
    What are some recommended tourist destinations?
    
  2. To end the conversation, simply enter the following command:

    /bye
    

11.3.3 References

  • Ollama

Official Website: https://ollama.com/

GitHub:https://github.com/ollama/ollama

  • Llama3

GitHub:https://github.com/meta-llama/llama3

Corresponding Models for Ollama: https://ollama.com/library/llama3.2:3b

11.4 Alibaba Cloud: Qwen 2 Model

11.4.1 Introduction to Qwen 2 Model

The Qwen 2 model is an open-source large language model developed by Alibaba Cloud’s Tongyi Qianwen team. It includes multiple pre-trained and instruction-tuned models of varying sizes, such as Qwen 2-0.5B, Qwen 2-1.5B, Qwen 2-7B, Qwen 2-57B-A14B, and Qwen 2-72B. This series of models has shown excellent performance across several benchmark tests, particularly excelling in areas such as language comprehension, text generation, multilingual capabilities, programming, mathematics, and reasoning. It competes effectively with proprietary models in these fields.

  • Model Specifications

Qwen 2 comes in multiple versions, allowing you to choose based on the board’s configuration.

Model Specifications Compatible Boards
Qwen2 Raspberry Pi 5 (4GB) Raspberry Pi 5 (8GB)
1.5B
7B ×
72B × ×
  • Performance

11.4.2 Running Qwen 2

  1. To run Qwen 2:1.5B, enter the following command. If the model has not been pulled yet, it will be downloaded first.

    ollama run qwen2:1.5b
    
  1. After entering a question, press Enter to send it. Response time depends on the hardware configuration, so please be patient!

    Example questions: Write a 100-word copy on how technology changes life.
    
    A rectangular nursery has an area of 120m². The length is 2m more than the width. What are the length and width of the nursery?
    
    What is the capital of the United States?
    What is its area?
    What are some recommended tourist destinations?
    
  2. To end the conversation, enter the following command:

    /bye
    

11.4.3 References

  • Ollama

Official Website: https://ollama.com/

GitHub:https://github.com/ollama/ollama

  • Qwen2

GitHub:https://github.com/QwenLM/Qwen2

Corresponding Models for Ollama: https://ollama.com/library/qwen2

11.5 Microsoft: Phi-3 Model

11.5.1 Introduction to Phi-3 Model

The Phi-3 model is a series of compact language models (SLMs) developed by Microsoft Research, designed to offer language understanding and reasoning capabilities comparable to larger models, while maintaining a smaller parameter size. The Phi-3 series includes three versions with different sizes: phi-3-mini, phi-3-small, and phi-3-medium. Each version is tailored for specific use cases and requirements.

  • Model Specifications

Phi-3 comes in multiple versions, allowing you to choose based on the board’s configuration.

Model Specifications Compatible Boards
Phi-3.5 Raspberry Pi 5 (4GB) Raspberry Pi 5 (8GB)
3.8B
14B × ×
  • Performance

11.5.2 Running Phi-3

  1. To run Phi-3:3.8B, enter the following command: If the model has not been pulled yet, it will be downloaded first. If installation fails, try again or reboot and attempt once more.

    ollama run phi3:3.8b
    
  1. After entering a question, press Enter to send it. Response time depends on the hardware configuration, so please be patient!

    Example questions: Write a 100-word copy on how technology changes life.
    
    A rectangular nursery has an area of 120m². The length is 2m more than the width. What are the length and width of the nursery?
    
    What is the capital of the United States?
    What is its area?
    What are some recommended tourist destinations?
    
  2. To end the conversation, enter the following command:

    /bye
    

11.5.3 References

  • Ollama

Official Website: https://ollama.com/

GitHub:https://github.com/ollama/ollama

Corresponding Models for Ollama: https://ollama.com/library/phi3

11.6 Google: Gemma Model

11.6.1 Introduction to Gemma Model

Gemma is an open-source AI large model developed collaboratively by Google DeepMind and other teams, with the goal of advancing responsible AI development.

The Gemma model incorporates the same research and technologies as the Gemini model, including Rotational Position Encoding (RoPE), the SentencePiece tokenizer, Logit Clipping, and the GeGLU activation function. Gemma 2 features a deeper network architecture and alternates between local sliding windows and global attention mechanisms to improve model performance and efficiency.

  • Model Specifications

Gemma comes in multiple versions, allowing you to choose based on the board’s configuration.

Model Specifications Compatible Boards
Gemma Raspberry Pi 5 (4GB) Raspberry Pi 5 (8GB)
7B
7B ×

11.6.2 Running Gemma

  1. To run Gemma:2B, enter the following command: If the model has not been pulled yet, it will be downloaded first. If installation fails, try again or reboot and attempt once more.

    ollama run gemma:2b
    
  2. After entering a question, press Enter to send it. Response time depends on the hardware configuration, so please be patient!

    Example questions: Write a 100-word copy on how technology changes life.
    
    What is the capital of the United States?
    
    What is its area?
    What are some recommended tourist destinations?
    A rectangular nursery has an area of 120m². The length is 2m more than the width. What are the length and width of the nursery?
    
  3. To end the conversation, simply enter the following command:

    /bye
    

11.6.3 References

  • Ollama

Official Website: https://ollama.com/

GitHub:https://github.com/ollama/ollama

  • Gemma

GitHub:https://github.com/google-deepmind/gemma

Corresponding Models for Ollama: https://ollama.com/library/gemma

11.7 DeepSeek Coder Model

11.7.1 Introduction to DeepSeek Coder Model

The DeepSeek Coder model is based on DeepSeek V2.5, which significantly outperforms the previous versions in both general capabilities and coding proficiency. DeepSeek Coder V2 and DeepSeek V2 Chat have been merged and upgraded to DeepSeek V2.5. The new model has been optimized across various tasks, including writing tasks and instruction-following, aligning more closely with human preferences.

  • Model Specifications

DeepSeek Coder comes in multiple versions, allowing you to choose based on the board’s configuration.

Model Specifications Compatible Boards
DeepSeek Coder Raspberry Pi 5 (4GB) Raspberry Pi 5 (8GB)
1.3B
6.7B ×
33B × ×
  • 1.2 Performance

11.7.2 Running DeepSeek Coder

  1. To run DeepSeek Coder:1.3B, enter the following command. If the model has not been pulled yet, it will be downloaded first.

    ollama run deepseek-coder:1.3b
    
  1. After entering a question, press Enter to send it. Response time depends on the hardware configuration, so please be patient!

    Example questions: Write a 100-word copy on how technology changes life.
    
    Find the smallest even number in the list \[12, 45, 7, 23, 56, 89, 34\] using Python.
    
  2. To end the conversation, enter the following command:

    /bye
    

11.7.3 References

  • Ollama

Official Website: https://ollama.com/

GitHub:https://github.com/ollama/ollama

  • DeepSeek Coder

Corresponding Models for Ollama: https://ollama.com/library/deepseek-coder

GitHub:https://github.com/deepseek-ai/DeepSeek-Coder

11.8 Orca Mini Model

11.8.1 Introduction to Orca Mini Model

The Orca Mini model is an open-source LLM (Large Language Model) that can run locally. The key feature of this model is its ability to operate locally, allowing you to leverage advanced language model technology without relying on cloud services. Developed by the ORCA project, Orca Mini aims to provide an efficient and easy-to-use solution for running large language models locally.

  • Model Specifications

Orca Mini comes in multiple versions, allowing you to choose based on the board’s configuration.

Model Specifications Compatible Boards
Orca Mini Raspberry Pi 5 (4GB) Raspberry Pi 5 (8GB)
3B
7B ×
13B × ×
70B × ×

11.8.2 Running Orca Mini

  1. To run orca-mini:3b, enter the following command. If the model has not been pulled yet, it will be downloaded first.

    ollama run orca-mini:3b
    
  1. After entering a question, press Enter to send it. Response time depends on the hardware configuration, so please be patient!

    Example questions: Write a 100-word copy on how technology changes life.
    
    What is the capital of the United States?
    What is its area?
    What are some recommended tourist destinations?
    
    A rectangular nursery has an area of 120m². The length is 2m more than the width. What are the length and width of the nursery?
    
  2. To end the conversation, enter the following command:

    /bye
    

11.8.3 References

  • Ollama

Official Website: https://ollama.com/

GitHub:https://github.com/ollama/ollama

  • Orca Mini

Corresponding Models for Ollama:https://ollama.com/library/orca-mini

11.9 StarCoder2 Model

11.9.1 StarCoder2 Model

The StarCoder2 model is a series of open-source large language models designed for code-related tasks. It offers models in three different sizes, including 3 billion, 7 billion, and 15 billion parameters. These models are trained on The Stack v2 dataset, which includes over 600 programming languages, and have demonstrated excellent performance across various evaluations.

  • Model Specifications

‌StarCoder 2 comes in multiple versions, allowing you to choose based on the board’s configuration.

Model Specifications Compatible Boards
StarCoder2 Raspberry Pi 5 (4GB) Raspberry Pi 5 (8GB)
3B
7B ×
15B × ×
  • Performance

11.9.2 Running‌StarCoder2

‌StarCoder 2 is primarily designed for code generation, editing, and reasoning tasks.

  1. Run the ‌StarCoder 2:3b model using the following command. If the model has not been pulled yet, it will be downloaded first.

    ollama run starcoder2:3b
    
  1. After entering a question, press Enter to send it. Response time depends on the hardware configuration, so please be patient!

    Find the smallest even number in the list [12, 45, 7, 23, 56, 89, 34] using Python.
    
  2. To end the conversation, enter the following command:

    /bye
    

11.9.3 References

  • Ollama

Official Website: https://ollama.com/

GitHub:https://github.com/ollama/ollama

  • StarCoder2

GitHub:https://ollama.com/library/starcoder2

Corresponding Models for Ollama: https://ollama.com/library/starcoder2

11.10 LLaVA-Phi3 Model

11.10.1 Introduction to LLaVA-Phi3 Model

LLaVA-Phi3 is a fine-tuned version of the LLaVA model based on Phi 3 Mini 4k. LLaVA (Large-scale Language and Vision Assistant) is a multimodal model designed to achieve general-purpose vision and language understanding by combining a visual encoder with a large-scale language model.

  • Model Specifications

Model Specifications Compatible Boards
LLaVA-Phi3 Jetson Nano Jetson Orin Nano Jetson Orin NX Raspberry Pi 5
4G 8G 8G 16G 4G 8G
3.8B

11.10.2 Running LLaVA-Phi3

  1. To run LLaVA-Phi3 :3.8b, enter the following command. If the model has not been pulled yet, it will be downloaded first.

  1. After entering a question, press Enter to send it. Response time depends on the hardware configuration, so please be patient!

You can drag an image directly into the terminal. The image below is an example, you may import your own.

  1. To end the conversation, simply enter the following command:

/bye

11.10.3 References

  • Ollama

Official Website: https://ollama.com/

GitHub:https://github.com/ollama/ollama

  • LLaVA-Phi3

GitHub:https://github.com/InternLM/xtuner/tree/main

Corresponding Models for Ollama: https://ollama.com/library/llava-phi3

11.11 Moondream Model

11.11.1 Moondream Model Overview

‌Moondream is a compact yet powerful vision-language model designed to deliver strong performance across a wide range of environments. It is initialized with SigLIP and Phi-1.5 weights and contains 1.86 billion parameters, enabling efficient operation and impressive adaptability.

  • Model Specifications

Model Specifications Compatible Boards
moondream Raspberry Pi 5 (4GB) Raspberry Pi 5 (8GB)
1.8B
  • Performance

11.11.2 Running moondream

The Moondream model is primarily designed for image-based question answering and image description tasks.

  1. Run the moondream:1.8b model using the following command. If the model has not been pulled yet, it will be downloaded first.

    ollama run moondream:1.8b
    
  1. Press Enter after entering the image path and file name to send it. Response time depends on the hardware configuration, so please be patient!

    You can drag an image directly into the terminal.

    /home/hiwonder/Desktop/3.png
    
  1. To end the conversation, enter the following command:

    /bye
    

11.11.3 References

  • Ollama

Official Website: https://ollama.com/

GitHub: https://github.com/ollama/ollama

  • moondream

Corresponding Models for Ollama: https://ollama.com/library/moondream