以Ollama做运行器测试,先安装
1 |
curl -fsSL https://ollama.com/install.sh | sh |
在这个过程中,脚本会自动下载ROCm bundle,无需额外操作
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
aliang@ubuntu:~$ curl -fsSL https://ollama.com/install.sh | sh >>> Installing ollama to /usr/local [sudo] password for aliang: >>> Downloading Linux amd64 bundle ######################################################################## 100.0% >>> Creating ollama user... >>> Adding ollama user to render group... >>> Adding ollama user to video group... >>> Adding current user to ollama group... >>> Creating ollama systemd service... >>> Enabling and starting ollama service... Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service. >>> Downloading Linux ROCm amd64 bundle ######################################################################## 100.0% >>> The Ollama API is now available at 127.0.0.1:11434. >>> Install complete. Run "ollama" from the command line. >>> AMD GPU ready. |
打印以上,即表示安装成功。
直接运行32b的版本, 加上参数等会儿推理完查看速度统计
1 |
ollama run qwen3:32b --verbose |
模型下载速度很快✈️
1 |
pulling 3291abe70f16: 8% ▕██ ▏ 1.5 GB/ 20 GB 48 MB/s 6m27s |
模型下载到结尾的时候,速度会掉到几百k,这时候可以直接ctrl+c关掉再次运行,速度又会提起来。这是一个Q4_K_M 的量化版本,模型只有20G。
问一个经典的推理问题,实时监控
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Every 1.0s: rocm-smi ubuntu: Fri May 9 02:13:32 2025 ============================================ ROCm System Management Interface ============================================ ====================================================== Concise Info ====================================================== Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU% (DID, GUID) (Edge) (Socket) (Mem, Compute, ID) ========================================================================================================================== 0 1 0x66a1, 58238 60.0°C 225.0W N/A, N/A, 0 1606Mhz 1000Mhz 26.27% auto 225.0W 68% 100% ========================================================================================================================== ================================================== End of ROCm SMI Log =================================================== |
📢注意:由于MI50只有被动散热,需要加装主动散热器,否则超过80度会降频。
qwen3:32b吐字速度🔥:
1 2 3 4 5 6 7 8 |
total duration: 46.602811531s load duration: 53.024589ms prompt eval count: 13 token(s) prompt eval duration: 183.630185ms prompt eval rate: 70.79 tokens/s eval count: 691 token(s) eval duration: 46.364125503s eval rate: 14.90 tokens/s |
deepseek-r1:32b吐字速度🔥:
1 2 3 4 5 6 7 8 9 |
total duration: 13.382107816s load duration: 28.993585ms prompt eval count: 8 token(s) prompt eval duration: 13.85964ms prompt eval rate: 577.22 tokens/s eval count: 228 token(s) eval duration: 13.33844511s eval rate: 17.09 tokens/s |
qwen3:8b 的速度
1 2 3 4 5 6 7 8 |
total duration: 9.995454969s load duration: 34.46557ms prompt eval count: 13 token(s) prompt eval duration: 13.933845ms prompt eval rate: 932.98 tokens/s eval count: 536 token(s) eval duration: 9.945825851s eval rate: 53.89 tokens/s |
附上Apple M2 Max qwen:32B的速度
1 2 3 4 5 6 7 8 |
total duration: 54.651088333s load duration: 32.29725ms prompt eval count: 13 token(s) prompt eval duration: 375.358583ms prompt eval rate: 34.63 tokens/s eval count: 685 token(s) eval duration: 54.242827667s eval rate: 12.63 tokens/s |
以上模型均为32b参数,均为热启动,受吐字量、温度和功耗影响,仅做参考。
原创文章,转载请注明: 转载自贝壳博客