在 Apple M1 上运行 LLaMA - V2EX
V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
charslee013
V2EX    MacBook

在 Apple M1 上运行 LLaMA

  •  
  •   charslee013 2023-03-13 11:27:28 +08:00 2887 次点击
    这是一个创建于 948 天前的主题,其中的信息可能已经有所发展或是发生改变。

    在 Apple M1 上运行 LLaMA


    TL;DR

    #!/usr/bin/env bash # clone repo and install dependences git clone https://github.com/ggerganov/llama.cpp cd llama.cpp make python -m pip install torch numpy sentencepice # download 7B model mkdir -p models/7B/ wget -P models/7B/ https://huggingface.co/nyanko7/LLaMA-7B/resolve/main/consolidated.00.pth wget -P models/7B/ https://huggingface.co/nyanko7/LLaMA-7B/raw/main/params.json wget -P models/7B/ https://huggingface.co/nyanko7/LLaMA-7B/raw/main/checklist.chk wget -P models/ https://huggingface.co/nyanko7/LLaMA-7B/resolve/main/tokenizer.model # converts the model to "ggml FP16 format" python convert-pth-to-ggml.py models/7B/ 1 # quantizes the model to 4-bits ./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2 # enjoy ./main -m ./models/7B/ggml-model-q4_0.bin \ -t 8 \ -n 128 \ -p 'I Have a Dream' 

    安装依赖


    模型选择

    目前已知的模型有:

    • 7B: 1 个模型文件,占用空间 13GB,转换后总占用空间 30GB
    • 13B: 2 个模型文件,占用空间 25GB,转换后总占用空间 60GB
    • 30B: 4 个模型文件,占用空间 61GB,转换后总占用空间 120GB
    • 65B: 8 个模型文件,占用空间 122GB,转换后总占用空间 240GB

    每个模型的内存占用空间大小约为 4GB,根据自己机器内存大小选择合适的模型

    下载地址

    Meta 并没有公开模型的 hash 值,所以请自行判断是否要运行 目前已知的泄漏地址有以下几个:

    有人在官方库上故意不小心提交了模型的磁力链接

    magnet:?xt=urn:btih:ZXXDAUWYLRUXXBHUYEMS6Q5CE5WA3LVA&dn=LLaMA 

    new bing 找到的库,里面用的好像是作者自己的 API 接口

    curl -o- https://raw.githubusercontent.com/shawwn/llama-dl/56f50b96072f42fb2520b1ad5a1d6ef30351f23c/llama.sh | bash 

    或者通过磁力链接

    magnet:?xt=urn:btih:b8287ebfa04f879b048d4d4404108cf3e8014352&dn=LLaMA&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce 

    目前找到的只有 7B 和 65B 的模型

    https://huggingface.co/nyanko7/LLaMA-7B/tree/main

    https://huggingface.co/datasets/nyanko7/LLaMA-65B/tree/main

    软 /硬件依赖

    笔者机器硬件是 Apple M1 8-core 16GB RAM

    系统版本是 12.5.1

    clang 版本如下

     c++ -v Apple clang version 14.0.0 (clang-1400.0.29.102) Target: arm64-apple-darwin21.6.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin 

    Python

    Python 目前是基于 3.10 版本

    如果没有对应的 python 版本,可以通过 pipenv 或者 conda 创建一个虚拟环境出来

    pipenv shell --python 3.10 

    或者

    conda create -n llama python=3.10 conda activate llama 

    安装依赖

    pip install torch numpy sentencepiece 

    运行模型

    拉取项目

    git clone https://github.com/ggerganov/llama.cpp cd llama.cpp 

    编译出 mainquantize

    make 

    确保模型已经下载到对应的文件夹内

    下面以 7B 模型举例子

    ls ./models 7B tokenizer.model 

    将模型转换为 ggml FP16 格式

    python convert-pth-to-ggml.py models/7B/ 1 

    这一步会生成一个 13GB 的 models/7B/ggml-model-f16.bin 文件

    下一步将模型量化为 4-bit

    ./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2 

    如果你的模型数量有多个,需要分批次来处理

    比如 13B 的两个模型文件

    ./quantize ./models/13B/ggml-model-f16.bin ./models/13B/ggml-model-q4_0.bin 2 ./quantize ./models/13B/ggml-model-f16.bin.1 ./models/13B/ggml-model-q4_0.bin.1 2 

    享受 AI 的时刻

    笔者用的是 13B 模型,-t 是线程数量,-n 是 token 数量 , -p 是你输入的内容

     ./main -m models/13B/ggml-model-q4_0.bin -t 8 -n 409600 -p 'I Have a Dream' main: seed = 1678677633 llama_model_load: loading model from 'models/13B/ggml-model-q4_0.bin' - please wait ... llama_model_load: n_vocab = 32000 llama_model_load: n_ctx = 512 llama_model_load: n_embd = 5120 llama_model_load: n_mult = 256 llama_model_load: n_head = 40 llama_model_load: n_layer = 40 llama_model_load: n_rot = 128 llama_model_load: f16 = 2 llama_model_load: n_ff = 13824 llama_model_load: n_parts = 2 llama_model_load: ggml ctx size = 8559.49 MB llama_model_load: memory_size = 800.00 MB, n_mem = 20480 llama_model_load: loading model part 1/2 from 'models/13B/ggml-model-q4_0.bin' llama_model_load: ............................................. done llama_model_load: model size = 3880.49 MB / num tensors = 363 llama_model_load: loading model part 2/2 from 'models/13B/ggml-model-q4_0.bin.1' llama_model_load: ............................................. done llama_model_load: model size = 3880.49 MB / num tensors = 363 main: prompt: 'I Have a Dream' main: number of tokens in prompt = 5 1 -> '' 29902 -> 'I' 6975 -> ' Have' 263 -> ' a' 16814 -> ' Dream' sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000 I Have a Dream: A Handbook for Teachers and Students on Martin Luther King, Jr. Culture is always changing and being influenced by the people around us who we can observe. Ways of thinking about culture are more important than which one you believe in because it could be dangerous if your way off believing in something that isn’t true but also that means there will be changes over time so everyone should learn these things when they start school Added: Sun, April 29th 2018 [end of text] main: mem per token = 22439492 bytes main: load time = 4974.55 ms main: sample time = 300.81 ms main: predict time = 90728.84 ms / 824.81 ms per token main: total time = 98585.49 ms 

    参考

    Running LLaMA 7B and 13B on a 64GB M2 MacBook Pro with llama.cpp

    ggerganov/llama.cpp

    1 条回复    2023-03-21 14:09:30 +08:00
    NealLason
        1
    NealLason  
       2023-03-21 14:09:30 +08:00
    7B 模型的中文支持简直像智障。。
    关于     帮助文档     自助推广系统     博客     API     FAQ     Solana     5454 人在线   最高记录 6679       Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 20ms UTC 06:48 PVG 14:48 LAX 23:48 JFK 02:48
    Do have faith in what you're doing.
    ubao msn snddm index pchome yahoo rakuten mypaper meadowduck bidyahoo youbao zxmzxm asda bnvcg cvbfg dfscv mmhjk xxddc yybgb zznbn ccubao uaitu acv GXCV ET GDG YH FG BCVB FJFH CBRE CBC GDG ET54 WRWR RWER WREW WRWER RWER SDG EW SF DSFSF fbbs ubao fhd dfg ewr dg df ewwr ewwr et ruyut utut dfg fgd gdfgt etg dfgt dfgd ert4 gd fgg wr 235 wer3 we vsdf sdf gdf ert xcv sdf rwer hfd dfg cvb rwf afb dfh jgh bmn lgh rty gfds cxv xcv xcs vdas fdf fgd cv sdf tert sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf shasha9178 shasha9178 shasha9178 shasha9178 shasha9178 liflif2 liflif2 liflif2 liflif2 liflif2 liblib3 liblib3 liblib3 liblib3 liblib3 zhazha444 zhazha444 zhazha444 zhazha444 zhazha444 dende5 dende denden denden2 denden21 fenfen9 fenf619 fen619 fenfe9 fe619 sdf sdf sdf sdf sdf zhazh90 zhazh0 zhaa50 zha90 zh590 zho zhoz zhozh zhozho zhozho2 lislis lls95 lili95 lils5 liss9 sdf0ty987 sdft876 sdft9876 sdf09876 sd0t9876 sdf0ty98 sdf0976 sdf0ty986 sdf0ty96 sdf0t76 sdf0876 df0ty98 sf0t876 sd0ty76 sdy76 sdf76 sdf0t76 sdf0ty9 sdf0ty98 sdf0ty987 sdf0ty98 sdf6676 sdf876 sd876 sd876 sdf6 sdf6 sdf9876 sdf0t sdf06 sdf0ty9776 sdf0ty9776 sdf0ty76 sdf8876 sdf0t sd6 sdf06 s688876 sd688 sdf86