求助, python3.11 调用 chatGLM2_6b 模型 - V2EX
V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
WMutong
V2EX    OpenAI

求助, python3.11 调用 chatGLM2_6b 模型

  •  
  •   WMutong 2024-07-19 15:42:16 +08:00 1316 次点击
    这是一个创建于 514 天前的主题,其中的信息可能已经有所发展或是发生改变。

    求助,python3.11 调用 chatGLM2_6b 模型,单纯跑个 ChatGLM2-6B 的本地模型,但是就这只要已发消息就卡死了,难道是本地配置不够吗,求教求教,感谢

    设备:macbook pro M2 32G

    main.py

    import tkinter as tk from tkinter import scrolledtext from gpt.OpenAI.openAI import get_openai_response from gpt.ChatGLM.chatglm_client import get_chatGLM2_6b_response def send_message(event=None): user_input = input_text.get() chat_history.insert(tk.END, f"You: {user_input}\n") input_text.set("") respOnse= get_chatGLM2_6b_response(user_input) chat_history.insert(tk.END, f"Bot: {response}\n") # 创建主窗口 root = tk.Tk() root.title("Chat with OpenAI") # 创建聊天记录文本框 chat_history = scrolledtext.ScrolledText(root, wrap=tk.WORD) chat_history.pack(padx=10, pady=10, fill=tk.BOTH, expand=True) # 创建输入框和发送按钮 input_text = tk.StringVar() entry_box = tk.Entry(root, textvariable=input_text, width=50) entry_box.pack(padx=10, pady=5, side=tk.LEFT, expand=True) entry_box.bind('<Return>', send_message) send_button = tk.Button(root, text="Send", command=send_message) send_button.pack(padx=10, pady=5, side=tk.RIGHT) # 运行主循环 root.mainloop()

    chatglm_client.py

    import torch from transformers import AutoTokenizer, AutoModel # 确定设备是否支持 mps device = "mps" if torch.backends.mps.is_available() else "cpu" # 提前加载模型 print("Device:", device) print("Loading tokenizer...") tokenizer = AutoTokenizer.from_pretrained("./modles/chatglm-6b", trust_remote_code=True) print("Loading model...") model = AutoModel.from_pretrained("./modles/chatglm-6b", trust_remote_code=True).half().to(device) print("Model loaded.") def get_chatGLM2_6b_response(prompt): print(prompt) inputs = tokenizer(prompt, return_tensors="pt").to(device) attention_mask = inputs.attention_mask.to(device) if 'attention_mask' in inputs else None print("Generating response...") outputs = model.generate(inputs.input_ids, attention_mask=attention_mask, max_length=50) print("Decoding response...") respOnse= tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) return response 

    输出

    启动后输出到’Model loaded.‘这一行
    输入后输出到’Generating response...‘这一行就不动了。。。。
    python main.py Device: mps Loading tokenizer... Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. Loading model... Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision. Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. Loading checkpoint shards: 0%| | 0/8 [00:00<?, ?it/s]/Users/mutong/Documents/project/AI_Try/AI_first_try/conda/lib/python3.11/site-packages/transformers/modeling_utils.py:415: FutureWarning: You are using `torch.load` with `weights_Only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_Only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. return torch.load(checkpoint_file, map_location="cpu") Loading checkpoint shards: 100%|| 8/8 [00:10<00:00, 1.26s/it] Model loaded. Hello Generating response... The dtype of attention mask (torch.int64) is not bool 

    所以想请教这种情况是不是系统内存不够之类的原因

    5 条回复    2024-07-23 16:01:37 +08:00
    t41372
        1
    t41372  
       2024-07-19 16:32:17 +08:00 via Android
    不跑个量化版本的吗
    我不知道是不是配置不够,不过一般你跑非量化版本的 llm 对配置要求都挺高的。一般用 mac 的会用 llama.cpp 或 ollama 或别的什么跑量化过的 llm ,也有针 Mac 的优化。
    还有现在都什么年代了你还在跑 chatGLM2... glm4 都出了...
    sunmacarenas
        2
    sunmacarenas  
       2024-07-19 18:30:46 +08:00 via Android
    直接 ollama 跑量化的吧,省的折腾
    WMutong
        3
    WMutong  
    OP
       2024-07-23 14:35:53 +08:00
    @t41372 我怕 chatGLM-6b 都跑不起来,那 glm4 岂不是更跑不动了。。。。
    WMutong
        4
    WMutong  
    OP
       2024-07-23 14:36:28 +08:00
    @sunmacarenas 之前没接触语言模型这些,所以想从基础的入手尝试熟悉下
    t41372
        5
    t41372  
       2024-07-23 16:01:37 +08:00
    @WMutong ...跑不跑的动要看那个 "6b", "9b",也就是参数量。大模型迭代速度很快的,新的小模型打爆一年半年前的大模型是很常见的。还有你的 m2 32gb 能跑的模型应该挺多的。我自己的 m1pro 16gb 基本上 12B 左右都能跑的很流畅,glm4 不在话下。去下个 ollama 吧,一行命令直接下载运行最新的大模型。
    关于     帮助文档     自助推广系统     博客     API     FAQ     Solana     4317 人在线   最高记录 6679       Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 37ms UTC 10:08 PVG 18:08 LAX 02:08 JFK 05:08
    Do have faith in what you're doing.
    ubao msn snddm index pchome yahoo rakuten mypaper meadowduck bidyahoo youbao zxmzxm asda bnvcg cvbfg dfscv mmhjk xxddc yybgb zznbn ccubao uaitu acv GXCV ET GDG YH FG BCVB FJFH CBRE CBC GDG ET54 WRWR RWER WREW WRWER RWER SDG EW SF DSFSF fbbs ubao fhd dfg ewr dg df ewwr ewwr et ruyut utut dfg fgd gdfgt etg dfgt dfgd ert4 gd fgg wr 235 wer3 we vsdf sdf gdf ert xcv sdf rwer hfd dfg cvb rwf afb dfh jgh bmn lgh rty gfds cxv xcv xcs vdas fdf fgd cv sdf tert sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf shasha9178 shasha9178 shasha9178 shasha9178 shasha9178 liflif2 liflif2 liflif2 liflif2 liflif2 liblib3 liblib3 liblib3 liblib3 liblib3 zhazha444 zhazha444 zhazha444 zhazha444 zhazha444 dende5 dende denden denden2 denden21 fenfen9 fenf619 fen619 fenfe9 fe619 sdf sdf sdf sdf sdf zhazh90 zhazh0 zhaa50 zha90 zh590 zho zhoz zhozh zhozho zhozho2 lislis lls95 lili95 lils5 liss9 sdf0ty987 sdft876 sdft9876 sdf09876 sd0t9876 sdf0ty98 sdf0976 sdf0ty986 sdf0ty96 sdf0t76 sdf0876 df0ty98 sf0t876 sd0ty76 sdy76 sdf76 sdf0t76 sdf0ty9 sdf0ty98 sdf0ty987 sdf0ty98 sdf6676 sdf876 sd876 sd876 sdf6 sdf6 sdf9876 sdf0t sdf06 sdf0ty9776 sdf0ty9776 sdf0ty76 sdf8876 sdf0t sd6 sdf06 s688876 sd688 sdf86