python 字符编码问题：u'\xb2\xe2\xca\xd4'，类型为unicode，实际编码为utf-8，如何读取？

推荐学习书目

Learn Python the Hard Way

Python Sites

PyPI - Python Package Index

http://diveintopython.org/toc/index.html

Pocoo

值得关注的项目

PyPy

Celery

Jinja2

Read the Docs

gevent

pyenv

virtualenv

Sentry

Shovel

Pyflakes

pytest

Python 编程

pep8 Checker

Styles

PEP 8

Google Python Style Guid

Code Style from The Hitchhiker's Guide

This topic created in 4538 days ago, the information mentioned may be changed or developed.

Supplement 1 Dec 7, 2013

题目上的串错了，应该是这样的 u'\xe5\xbe\xae\xe4\xbf\xae'

编码

xe2

xd4

13 replies 2016-08-11 14:46:24 +08:00

ushuz

Dec 7, 2013 via iPhone

转成str
str()

yingluck

Dec 7, 2013

@ushuz 转成了但是a[3:-1]这样取出来还是读取不了啊

Hackathon

Dec 7, 2013

Python 2.7.5 (default, May 15 2013, 22:43:36) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> a = u'\xb2\xe2\xca\xd4'
>>> b = a.encode('raw_unicode_escape')
>>> print b
测试
>>> c = a.encode('latin1')
>>> print c
测试
>>>

yingluck

Dec 7, 2013

@Hackathon 行啊哈哈哈哈可以了！多谢我要把你这种方法好好学一下！

&nbs; 5

sandtears

Dec 7, 2013

@Hackathon 多谢，我也一直很想知道这个orz

lnehe

Dec 7, 2013

python的字符编码问题一直搞不懂。。。

F0ur

Dec 7, 2013

python的字符编码一直个要研究的问题- -

9hills

Dec 7, 2013

这个不是字符编码问题<_<

VYSE

Dec 7, 2013

标题里的就是'\xb2\xe2\xca\xd4'就是编码的，加u在encode转换其实蛮诡异的，不过latin1还能encode说明Python根据OS环境做了些取舍，放在英文默认编码系统里肯定转不出来。

附言里是utf-8的，
print '\xe5\xbe\xae\xe4\xbf\xae'.decode('utf-8')即可。

\x出现在u''里表示的就不是byte而是等效于\u00XX，
比如u'\xe5\xbe\xae\xe4\xbf\xae'其实等于u'\u00e5\u00be\u00ae\u00e4\u00bf\u00ae'，这样表示的是unicode char table里的第XX位而不是字节，意义就全变了。

反正bytes出现在unicode str里实在诡异。

shenGun

Dec 11, 2013

http://docs.python.org/2/howto/unicode.html
Latin-1, also known as ISO-8859-1, is a similar encoding. Unicode code points 0-255 are identical to the Latin-1 values, so converting to this encoding simply requires converting code points to byte values; if a code point larger than 255 is encountered, the string can’t be encoded into Latin-1.

在Documentation中提示unicode的0-255编码和Latin-1的0-255是一样的。说以u'\xb2\xe2\xca\xd4'.decode('Latin-1')转好之后就是'\xb2\xe2\xca\xd4'其实好像还是的编码

borneo

Dec 15, 2013

hey man.

by the way, keep it compatible with Python 2+3. http://lucumr.pocoo.org/2011/1/22/forwards-compatible-python/

yingluck

Aug 22, 2014

http://hgoldfish.com/blogs/article/56/
直到发现了这个资料

lzjun

Aug 11, 2016

http://foofish.net/blog/16/understanding-python-charset