被 Python 编码搞蒙逼

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

请不要在回答技术问题时复制粘贴 AI 生成的内容

这是一个创建于 2882 天前的主题，其中的信息可能已经有所发展或是发生改变。

类似于这种的： UnicodeDecodeError: 'ascii' codec can't decode byte 0xcb in position 1: ordinal not in range(128)

gbk,utf-8,ascii 搞不转

蒙逼

xcb

ASCII

decode

21 条回复 2017-12-08 10:21:31 +08:00

cls1991

2017-12-07 17:13:34 +08:00

贴一下代码

leavic

2017-12-07 17:17:01 +08:00

换 python3

p2pCoder

2017-12-07 17:22:16 +08:00

2019 年都要到了，直接 python3

regicide

2017-12-07 17:22:47 +08:00

import sys
reload(sys)
sys.setdefaultencoding('utf8')
这个试过没如果这个不行基本上可以换车上 python3 了

marcong95

2017-12-07 17:27:03 +08:00 via Android

@p2pCoder 2018 都没到，亲你是穿越了？

p2pCoder

2017-12-07 17:27:40 +08:00

@marcong95 。。。。中午没睡，今天下午有点飘

livexia

2017-12-07 17:29:44 +08:00 via Android

爬虫吧，得先识别原编码方式

Shura

2017-12-07 17:30:09 +08:00

7102 年了，换 Python3 吧

lhx2008

2017-12-07 17:31:55 +08:00 via Android

文件头标明文件编码
用 decode encode 文本变量前面加个 u

johnsonqrr

2017-12-07 18:02:26 +08:00

PY3，请

DongDongXie

2017-12-07 18:08:21 +08:00

@cls1991 装的是 anaconda2.7，环境变量也配置了，就想用个 pip list，结果就给我报错，D:\Anaconda2\Lib\ntpath.py 87 行报错，result_path = result_path + p_path 就这里。然后加了个“ reload(sys)
sys.setdefaultencoding('gbk')”就正常了

# Join two (or more) paths.
def join(path, *paths):
reload(sys)
sys.setdefaultencoding('gbk')
"""Join two or more pathname components, inserting "\\" as needed."""
result_drive, result_path = splitdrive(path)
for p in paths:
p_drive, p_path = splitdrive(p)
if p_path and p_path[0] in '\\/':
# Second path is absolute
if p_drive or not result_drive:
result_drive = p_drive
result_path = p_path
continue
elif p_drive and p_drive != result_drive:
if p_drive.lower() != result_drive.lower():
# Different drives => ignore the first path entirely
result_drive = p_drive
result_path = p_path
continue
# Same drive in different case
result_drive = p_drive
# Second path is relative to the first
if result_path and result_path[-1] not in '\\/':
result_path = result_path + '\\'
result_path = result_path + p_path
## add separator between UNC and non-absolute path
if (result_path and result_path[0] not in '\\/' and
result_drive and result_drive[-1:] != ':'):
、

DongDongXie

2017-12-07 18:08:52 +08:00

感觉新手很容易如不同编码方式的坑

ltux

2017-12-07 18:38:57 +08:00

蒙就去学习

wolong

2017-12-07 19:57:08 +08:00

我在 windows 下命令行里运行 py，也出现过这种情况。
换成直接双击文件运行就好了。

maidou931019

2017-12-07 20:06:00 +08:00

在 python2 中 str 存的是 bytes 数据，unicode 存的是 unicdoe 编码后的二进制数据，
在 python3 中 str 存的是 unicode 数据，bytes 存的是 bytes 数据

在 python2 中混淆了 bytes 和 unicode 数据，u'hello' + 'hi' 不会报错，结果为一个 unicode 数据
而在 python3 中严格区分了 unicode 和 bytes 数据，字节和字符类型，再混用直接报错，'hello' + b'hi' 不能相加会报错

justou

2017-12-07 20:30:28 +08:00

纠结编码问题不要局限于 py2py3 了, 要系统的了解下字符串在计算机中的表示方式以及编码原理, 清楚了原理再结合具体语言到具体的环境去实践并加深理解, 不然即使熟悉了 python 处理编码的方式, 换了个环境又搞蒙了. 不搞清楚原理怎么治都只是治标不治本.
给出一些原理性的参考资料:
Computer Systems A Programmer ’ s Perspective: Chapter2, Representing and Manipulating
Information
http://unicodebook.readthedocs.io/