被 Python 编码搞蒙逼 - V2EX
V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
请不要在回答技术问题时复制粘贴 AI 生成的内容
DongDongXie
V2EX    程序员

被 Python 编码搞蒙逼

  •  
  •   DongDongXie 2017-12-07 17:08:04 +08:00 4003 次点击
    这是一个创建于 2882 天前的主题,其中的信息可能已经有所发展或是发生改变。

    类似于这种的: UnicodeDecodeError: 'ascii' codec can't decode byte 0xcb in position 1: ordinal not in range(128)

    gbk,utf-8,ascii 搞不转

    21 条回复    2017-12-08 10:21:31 +08:00
    cls1991
        1
    cls1991  
       2017-12-07 17:13:34 +08:00
    贴一下代码
    leavic
        2
    leavic  
       2017-12-07 17:17:01 +08:00
    换 python3
    p2pCoder
        3
    p2pCoder  
       2017-12-07 17:22:16 +08:00
    2019 年都要到了,直接 python3
    regicide
        4
    regicide  
       2017-12-07 17:22:47 +08:00
    import sys
    reload(sys)
    sys.setdefaultencoding('utf8')
    这个试过没 如果这个不行基本上可以换车上 python3 了
    marcong95
        5
    marcong95  
       2017-12-07 17:27:03 +08:00 via Android   1
    @p2pCoder 2018 都没到,亲你是穿越了?
    p2pCoder
        6
    p2pCoder  
       2017-12-07 17:27:40 +08:00
    @marcong95 。。。。中午没睡,今天下午有点飘
    livexia
        7
    livexia  
       2017-12-07 17:29:44 +08:00 via Android
    爬虫吧,得先识别原编码方式
    Shura
        8
    Shura  
       2017-12-07 17:30:09 +08:00
    7102 年了,换 Python3 吧
    lhx2008
        9
    lhx2008  
       2017-12-07 17:31:55 +08:00 via Android
    文件头标明文件编码
    用 decode encode 文本变量前面加个 u
    johnsonqrr
        10
    johnsonqrr  
       2017-12-07 18:02:26 +08:00
    PY3,请
    DongDongXie
        11
    DongDongXie  
    OP
       2017-12-07 18:08:21 +08:00
    @cls1991 装的是 anaconda2.7,环境变量也配置了,就想用个 pip list,结果就给我报错,D:\Anaconda2\Lib\ntpath.py 87 行 报错,result_path = result_path + p_path 就这里。然后加了个“ reload(sys)
    sys.setdefaultencoding('gbk')”就正常了


    # Join two (or more) paths.
    def join(path, *paths):
    reload(sys)
    sys.setdefaultencoding('gbk')
    """Join two or more pathname components, inserting "\\" as needed."""
    result_drive, result_path = splitdrive(path)
    for p in paths:
    p_drive, p_path = splitdrive(p)
    if p_path and p_path[0] in '\\/':
    # Second path is absolute
    if p_drive or not result_drive:
    result_drive = p_drive
    result_path = p_path
    continue
    elif p_drive and p_drive != result_drive:
    if p_drive.lower() != result_drive.lower():
    # Different drives => ignore the first path entirely
    result_drive = p_drive
    result_path = p_path
    continue
    # Same drive in different case
    result_drive = p_drive
    # Second path is relative to the first
    if result_path and result_path[-1] not in '\\/':
    result_path = result_path + '\\'
    result_path = result_path + p_path
    ## add separator between UNC and non-absolute path
    if (result_path and result_path[0] not in '\\/' and
    result_drive and result_drive[-1:] != ':'):
    DongDongXie
        12
    DongDongXie  
    OP
       2017-12-07 18:08:52 +08:00
    感觉新手很容易如不同编码方式的坑
    ltux
        13
    ltux  
       2017-12-07 18:38:57 +08:00
    蒙就去学习
    wolong
        14
    wolong  
       2017-12-07 19:57:08 +08:00
    我在 windows 下命令行里运行 py,也出现过这种情况。
    换成直接双击文件运行就好了。
    maidou931019
        15
    maidou931019  
       2017-12-07 20:06:00 +08:00   2
    在 python2 中 str 存的是 bytes 数据,unicode 存的是 unicdoe 编码后的二进制数据,
    在 python3 中 str 存的是 unicode 数据,bytes 存的是 bytes 数据

    在 python2 中混淆了 bytes 和 unicode 数据,u'hello' + 'hi' 不会报错,结果为一个 unicode 数据
    而在 python3 中严格区分了 unicode 和 bytes 数据,字节和字符类型,再混用直接报错,'hello' + b'hi' 不能相加 会报错
    justou
        16
    justou  
       2017-12-07 20:30:28 +08:00
    纠结编码问题不要局限于 py2py3 了, 要系统的了解下字符串在计算机中的表示方式以及编码原理, 清楚了原理再结合具体语言到具体的环境去实践并加深理解, 不然即使熟悉了 python 处理编码的方式, 换了个环境又搞蒙了. 不搞清楚原理怎么治都只是治标不治本.
    给出一些原理性的参考资料:
    Computer Systems A Programmer ’ s Perspective: Chapter2, Representing and Manipulating
    Information
    http://unicodebook.readthedocs.io/
    Mjz127
        17
    Mjz127  
       2017-12-07 20:39:54 +08:00
    请选择用 Python3 :doge
    summerwar
        18
    summerwar  
       2017-12-07 20:49:25 +08:00
    当时看了下 py2 然后毅然选择了 py3
    conn4575
        19
    conn4575  
       2017-12-07 22:59:06 +08:00 via Android
    然而即使是 py3,很多库为了兼容 py2,返回的默认字符类型还是 bytes …我只能说这真是一个天冷
    wellsc
        20
    wellsc  
       2017-12-07 23:01:56 +08:00 via iPhone
    Python 3.6 也 locale 遇到过好几次字符集错误了
    ethusdt
        21
    ethusdt  
       2017-12-08 10:21:31 +08:00
    关于     帮助文档     自助推广系统     博客     API     FAQ     Solana     6018 人在线   最高记录 6679       Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 33ms UTC 02:22 PVG 10:22 LAX 19:22 JFK 22:22
    Do have faith in what you're doing.
    ubao msn snddm index pchome yahoo rakuten mypaper meadowduck bidyahoo youbao zxmzxm asda bnvcg cvbfg dfscv mmhjk xxddc yybgb zznbn ccubao uaitu acv GXCV ET GDG YH FG BCVB FJFH CBRE CBC GDG ET54 WRWR RWER WREW WRWER RWER SDG EW SF DSFSF fbbs ubao fhd dfg ewr dg df ewwr ewwr et ruyut utut dfg fgd gdfgt etg dfgt dfgd ert4 gd fgg wr 235 wer3 we vsdf sdf gdf ert xcv sdf rwer hfd dfg cvb rwf afb dfh jgh bmn lgh rty gfds cxv xcv xcs vdas fdf fgd cv sdf tert sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf shasha9178 shasha9178 shasha9178 shasha9178 shasha9178 liflif2 liflif2 liflif2 liflif2 liflif2 liblib3 liblib3 liblib3 liblib3 liblib3 zhazha444 zhazha444 zhazha444 zhazha444 zhazha444 dende5 dende denden denden2 denden21 fenfen9 fenf619 fen619 fenfe9 fe619 sdf sdf sdf sdf sdf zhazh90 zhazh0 zhaa50 zha90 zh590 zho zhoz zhozh zhozho zhozho2 lislis lls95 lili95 lils5 liss9 sdf0ty987 sdft876 sdft9876 sdf09876 sd0t9876 sdf0ty98 sdf0976 sdf0ty986 sdf0ty96 sdf0t76 sdf0876 df0ty98 sf0t876 sd0ty76 sdy76 sdf76 sdf0t76 sdf0ty9 sdf0ty98 sdf0ty987 sdf0ty98 sdf6676 sdf876 sd876 sd876 sdf6 sdf6 sdf9876 sdf0t sdf06 sdf0ty9776 sdf0ty9776 sdf0ty76 sdf8876 sdf0t sd6 sdf06 s688876 sd688 sdf86