Mi50 32G 运行qwen3,deepseek-r1 32b模型的速度表现

以Ollama做运行器测试，先安装

curl -fsSL https://ollama.com/install.sh | sh

1	curl -fsSL https://ollama.com/install.sh \| sh

在这个过程中，脚本会自动下载ROCm bundle，无需额外操作

aliang@ubuntu:~$ curl -fsSL https://ollama.com/install.sh | sh
>>> Installing ollama to /usr/local
[sudo] password for aliang: 
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service.
>>> Downloading Linux ROCm amd64 bundle
######################################################################## 100.0%
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
>>> AMD GPU ready.

aliang@ubuntu:~$ curl -fsSL https://ollama.com/install.sh | sh

>>> Installing ollama to /usr/local

[sudo] password for aliang:

>>> Downloading Linux amd64 bundle

######################################################################## 100.0%

>>> Creating ollama user...

>>> Adding ollama user to render group...

>>> Adding ollama user to video group...

>>> Adding current user to ollama group...

>>> Creating ollama systemd service...

>>> Enabling and starting ollama service...

Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service.

>>> Downloading Linux ROCm amd64 bundle

######################################################################## 100.0%

>>> The Ollama API is now available at 127.0.0.1:11434.

>>> Install complete. Run "ollama" from the command line.

>>> AMD GPU ready.

打印以上，即表示安装成功。

直接运行32b的版本, 加上参数等会儿推理完查看速度统计

ollama run qwen3:32b --verbose

1	ollama run qwen3:32b --verbose

模型下载速度很快✈️

pulling 3291abe70f16:   8% ▕██ ▏ 1.5 GB/ 20 GB   48 MB/s   6m27s

1	pulling 3291abe70f16: 8% ▕██ ▏ 1.5 GB/ 20 GB 48 MB/s 6m27s

模型下载到结尾的时候，速度会掉到几百k，这时候可以直接ctrl+c关掉再次运行，速度又会提起来。这是一个Q4_K_M 的量化版本，模型只有20G。

问一个经典的推理问题，实时监控

Every 1.0s: rocm-smi                                                                       ubuntu: Fri May  9 02:13:32 2025



============================================ ROCm System Management Interface ============================================
====================================================== Concise Info ======================================================
Device  Node  IDs              Temp    Power     Partitions          SCLK     MCLK     Fan     Perf  PwrCap  VRAM%  GPU%
              (DID,     GUID)  (Edge)  (Socket)  (Mem, Compute, ID)                                                       
==========================================================================================================================
0       1     0x66a1,   58238  60.0°C  225.0W    N/A, N/A, 0         1606Mhz  1000Mhz  26.27%  auto  225.0W  68%    100%
==========================================================================================================================
================================================== End of ROCm SMI Log ===================================================

Every 1.0s: rocm-smi ubuntu: Fri May 9 02:13:32 2025

============================================ ROCm System Management Interface ============================================

====================================================== Concise Info ======================================================

Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU%

(DID, GUID) (Edge) (Socket) (Mem, Compute, ID)

==========================================================================================================================

0 1 0x66a1, 58238 60.0°C 225.0W N/A, N/A, 0 1606Mhz 1000Mhz 26.27% auto 225.0W 68% 100%

==========================================================================================================================

================================================== End of ROCm SMI Log ===================================================

📢注意：由于MI50只有被动散热，需要加装主动散热器，否则超过80度会降频。

qwen3:32b吐字速度🔥：

total duration:       46.602811531s
load duration:        53.024589ms
prompt eval count:    13 token(s)
prompt eval duration: 183.630185ms
prompt eval rate:     70.79 tokens/s
eval count:           691 token(s)
eval duration:        46.364125503s
eval rate:            14.90 tokens/s

total duration: 46.602811531s

load duration: 53.024589ms

prompt eval count: 13 token(s)

prompt eval duration: 183.630185ms

prompt eval rate: 70.79 tokens/s

eval count: 691 token(s)

eval duration: 46.364125503s

eval rate: 14.90 tokens/s

deepseek-r1:32b吐字速度🔥:


total duration:       13.382107816s
load duration:        28.993585ms
prompt eval count:    8 token(s)
prompt eval duration: 13.85964ms
prompt eval rate:     577.22 tokens/s
eval count:           228 token(s)
eval duration:        13.33844511s
eval rate:            17.09 tokens/s

total duration: 13.382107816s

load duration: 28.993585ms

prompt eval count: 8 token(s)

prompt eval duration: 13.85964ms

prompt eval rate: 577.22 tokens/s

eval count: 228 token(s)

eval duration: 13.33844511s

eval rate: 17.09 tokens/s

qwen3:8b 的速度

total duration:       9.995454969s
load duration:        34.46557ms
prompt eval count:    13 token(s)
prompt eval duration: 13.933845ms
prompt eval rate:     932.98 tokens/s
eval count:           536 token(s)
eval duration:        9.945825851s
eval rate:            53.89 tokens/s

total duration: 9.995454969s

load duration: 34.46557ms

prompt eval count: 13 token(s)

prompt eval duration: 13.933845ms

prompt eval rate: 932.98 tokens/s

eval count: 536 token(s)

eval duration: 9.945825851s

eval rate: 53.89 tokens/s

附上Apple M2 Max qwen:32B的速度

total duration:       54.651088333s
load duration:        32.29725ms
prompt eval count:    13 token(s)
prompt eval duration: 375.358583ms
prompt eval rate:     34.63 tokens/s
eval count:           685 token(s)
eval duration:        54.242827667s
eval rate:            12.63 tokens/s

total duration: 54.651088333s

load duration: 32.29725ms

prompt eval count: 13 token(s)

prompt eval duration: 375.358583ms

prompt eval rate: 34.63 tokens/s

eval count: 685 token(s)

eval duration: 54.242827667s

eval rate: 12.63 tokens/s

以上模型均为32b参数，均为热启动，受吐字量、温度和功耗影响，仅做参考。

原创文章，转载请注明： 转载自贝壳博客

本文链接地址: Mi50 32G 运行qwen3,deepseek-r1 32b模型的速度表现

发表回复 取消回复

发表回复取消回复