@
KaiWuBOSS 4B 果然也不行...
PS C:\Windows\system32> irm
https://raw.githubusercontent.com/val1813/kaiwu/main/install.ps1 | iex
Kaiwu Installer
===============
Detected: windows/amd64
Fetching latest release...
Latest version: v0.1.6
Downloading
https://github.com/val1813/kaiwu/releases/download/v0.1.6/kaiwu-windows-amd64.zip...
Kaiwu installed successfully!
Kaiwu v0.1.6
Get started:
kaiwu run Qwen3-30B-A3B
Note: restart your terminal for PATH changes to take effect.
PS C:\Windows\system32> kaiwu run Qwen3-4B
本地大模型部署器 vv0.1.6 llama.cpp b8864
by
llmbbs.ai 本地 AI 技术社区
[1/6] Probing hardware...
GPU: NVIDIA GeForce GTX 1660 Ti (SM75, 6144 MB VRAM, 288 GB/s)
RAM: 31 GB DDR4
OS: windows amd64
[2/6] Selecting configuration...
Model: Qwen3-4B (dense, 4B)
Quant: q5-k-m (2.8 GB)
Mode: full_gpu
Accel: Flash Attention
[3/6] Checking files...
Using bundled iso3 binary: llama-server-cuda.exe
Binary: llama-server-cuda.exe [cached]
Model: Qwen3-4B-Q5_K_M.gguf [cached]
[4/6] Preflight check...
llama-server 不支持 iso3 ,回退到 q8_0/q4_0
VRAM sufficient
[5/6] Warmup benchmark...
Probe 1: ctx=8K ... OOM
Probe 2: ctx=4K ... OOM
Warmup failed: all ctx probes failed (tried down to 4K)
Using default parameters
[6/6] Starting server...
Waiting for llama-server to be ready (port 11434)...
显存不足,降低上下文至 4K 重试...
Waiting for llama-server to be ready (port 11434)...
Error: failed to start llama-server: 连续 2 次启动失败,即使最小上下文(4K)也无法运行
NVIDIA GeForce GTX 1660 Ti: 6144 MB VRAM
模型 Qwen3-4B: ~2867 MB
KV cache (4K, q4_0): ~112 MB
预估总需: ~4003 MB
建议:
1. 运行 kaiwu run qwen3-4b --reset 重新探测参数
2. 模型较小但仍 OOM ,可能是参数配置问题,请升级到最新版本
Usage:
kaiwu run <model> [flags]
Flags:
--bench Run benchmark after starting
--ctx-size int 手动指定上下文大小( 0=自动)
--fast Skip warmup, use cached profile
-h, --help help for run
--llama-server string 使用自定义 llama-server 二进制(完整路径)
--reset 清除缓存,重新 warmup 探测最优参数
PS C:\Windows\system32> kaiwu run qwen3-4b --reset
本地大模型部署器 vv0.1.6 llama.cpp b8864
by
llmbbs.ai 本地 AI 技术社区
[1/6] Probing hardware...
GPU: NVIDIA GeForce GTX 1660 Ti (SM75, 6144 MB VRAM, 288 GB/s)
RAM: 31 GB DDR4
OS: windows amd64
[2/6] Selecting configuration...
Model: Qwen3-4B (dense, 4B)
Quant: q5-k-m (2.8 GB)
Mode: full_gpu
Accel: Flash Attention
[3/6] Checking files...
Using bundled iso3 binary: llama-server-cuda.exe
Binary: llama-server-cuda.exe [cached]
Model: Qwen3-4B-Q5_K_M.gguf [cached]
[4/6] Preflight check...
llama-server 不支持 iso3 ,回退到 q8_0/q4_0
VRAM sufficient
[5/6] Warmup benchmark...
已清除缓存,重新探测
Probe 1: ctx=8K ... OOM
Probe 2: ctx=4K ... OOM
Warmup failed: all ctx probes failed (tried down to 4K)
Using default parameters
[6/6] Starting server...
Waiting for llama-server to be ready (port 11434)...
显存不足,降低上下文至 4K 重试...
Waiting for llama-server to be ready (port 11434)...
Error: failed to start llama-server: 连续 2 次启动失败,即使最小上下文(4K)也无法运行
NVIDIA GeForce GTX 1660 Ti: 6144 MB VRAM
模型 Qwen3-4B: ~2867 MB
KV cache (4K, q4_0): ~112 MB
预估总需: ~4003 MB
建议:
1. 运行 kaiwu run qwen3-4b --reset 重新探测参数
2. 模型较小但仍 OOM ,可能是参数配置问题,请升级到最新版本
Usage:
kaiwu run <model> [flags]
Flags:
--bench Run benchmark after starting
--ctx-size int 手动指定上下文大小( 0=自动)
--fast Skip warmup, use cached profile
-h, --help help for run
--llama-server string 使用自定义 llama-server 二进制(完整路径)
--reset 清除缓存,重新 warmup 探测最优参数