请问有人知道这里的原始中文编码是什么? 如何解码?
json 返回的字符串
…è“èè’è”±¨…‘°#TODS2023§ ” #TodsTheItalianPortrait ‘Walter Chiapponi
实际内容
流畅轮廓、精致细节和自由律动,糅合呈现#TODS2023 秋冬 男士系列。 #TodsTheItalianPortrait 创意总监:Walter Chiapponi
询问了几个 AI, 基本都建议这样解码:
const icOnv= require('iconv-lite'); const garbledText = "…è“èè’è”±¨…‘°#TODS2023§ ”\n#TodsTheItalianPortrait\n\n‘Walter Chiapponi"; const buf = Buffer.from(garbledText, 'binary'); // const decodedText = iconv.decode(buf, 'windows-1252'); // const decodedText = iconv.decode(buf, 'latin1'); // const decodedText = iconv.decode(buf, 'gbk'); const decodedText = iconv.decode(buf, 'utf-8'); console.log(decodedText);
但是实际输出是这样的, 只有小部分内容被解码:
流"&轮精! `R!9`R&}#TODS20239 士系 #TodsTheItalianPortrait ::aWalter Chiapponi
{ "BaseResponse": { "Ret": 0, "ErrMsg": "" }, "AddMsgCount": 1, "AddMsgList": [ { "MsgId": "100930987469004064", "FromUserName": "@c4dcd4010dc50e5ee03f32ae786701de", "ToUserName": "filehelper", "MsgType": 1, "Content": "…è“èè’è”±¨…‘°#TODS2023§ ”<br/>#TodsTheItalianPortrait<br/><br/>‘Walter Chiapponi", "Status": 3, "ImgStatus": 1, "CreateTime": 1740397130, "VoiceLength": 0, "PlayLength": 0, "FileName": "", "FileSize": "", "MediaId": "", "Url": "", "AppMsgType": 0, "StatusNotifyCode": 0, "StatusNotifyUserName": "", "RecommendInfo": { "UserName": "", "NickName": "", "QQNum": 0, "Province": "", "City": "", "Content": "", "Signature": "", "Alias": "", "Scene": 0, "VerifyFlag": 0, "AttrStatus": 0, "Sex": 0, "Ticket": "", "OpCode": 0 }, "ForwardFlag": 0, "AppInfo": { "AppID": "", "Type": 0 }, "HasProductId": 0, "Ticket": "", "ImgHeight": 0, "ImgWidth": 0, "SubMsgType": 0, "NewMsgId": 100930987469004064, "OriContent": "", "EncryFileName": "" } ], "ModContactCount": 0, "ModContactList": [], "DelContactCount": 0, "DelContactList": [], "ModChatRoomMemberCount": 0, "ModChatRoomMemberList": [], "Profile": { "BitFlag": 0, "UserName": { "Buff": "" }, "NickName": { "Buff": "" }, "BindUin": 0, "BindEmail": { "Buff": "" }, "BindMobile": { "Buff": "" }, "Status": 0, "Sex": 0, "PersonalCard": 0, "Alias": "", "HeadImgUpdateFlag": 0, "HeadImgUrl": "", "Signature": "" }, "ContinueFlag": 0, "SyncKey": { "Count": 14, "List": [ { "Key": 1, "Val": 940546031 }, { "Key": 2, "Val": 897439235 }, { "Key": 3, "Val": 940546023 }, { "Key": 11, "Val": 940546048 }, { "Key": 19, "Val": 44482 }, { "Key": 23, "Val": 1740396794 }, { "Key": 24, "Val": 1740397130 }, { "Key": 25, "Val": 897439235 }, { "Key": 27, "Val": 308443 }, { "Key": 201, "Val": 1740397130 }, { "Key": 203, "Val": 1740396590 }, { "Key": 206, "Val": 101 }, { "Key": 1000, "Val": 1740395520 }, { "Key": 1001, "Val": 1740395522 } ] }, "SKey": "", "SyncCheckKey": { "Count": 14, "List": [ { "Key": 1, "Val": 940546031 }, { "Key": 2, "Val": 897439235 }, { "Key": 3, "Val": 940546023 }, { "Key": 11, "Val": 940546048 }, { "Key": 19, "Val": 44482 }, { "Key": 23, "Val": 1740396794 }, { "Key": 24, "Val": 1740397130 }, { "Key": 25, "Val": 897439235 }, { "Key": 27, "Val": 308443 }, { "Key": 201, "Val": 1740397130 }, { "Key": 203, "Val": 1740396590 }, { "Key": 206, "Val": 101 }, { "Key": 1000, "Val": 1740395520 }, { "Key": 1001, "Val": 1740395522 } ] } }
windows-1252测试结果
> iconv.decode(iconv.encode('…è“èè’è”±¨…‘°#TODS2023§ ”#TodsTheItalianPortrait‘Walter Chiapponi', 'windows-1252'), 'utf-8') '畅轮廓精致细节和自由律动,糅呈现#TODS2023秋冬 男士系列。#TodsTheItalianPortrait创总监:Walter Chiapponi' > iconv.decode(iconv.encode('流畅轮廓、精致细节和自由律动,糅合呈现#TODS2023秋冬 男士系列。#TodsTheItalianPortrait 创意总监:Walter Chiapponi', 'utf-8'), 'windows-1252') '…è“èè’è”±¨…‘°#TODS2023§ ”#TodsTheItalianPortrait ‘Walter Chiapponi' > iconv.decode(iconv.encode(iconv.decode(iconv.encode('流畅轮廓、精致细节和自由律动,糅合呈现#TODS2023秋冬 男士系列。#TodsTheItalianPortrait 创意总监:Walter Chiapponi', 'utf-8'), 'windows-1252'), 'windows-1252'), 'utf-8') '畅轮廓精致细节和自由律动,糅呈现#TODS2023秋冬 男士系列。#TodsTheItalianPortrait 创总监:Walter Chiapponi'
下面这段:
“‰‰”±¤è…è§è°‰
.
用windows-1252编码,然后utf-8解码后是:
iconv.decode(iconv.encode('“‰‰”±¤è…è§è°‰', 'windows-1252'), 'utf-8')
. 当?微信版本?支?展示该内容,请?级至最新版本。
.
通过网络搜索,发现正确的文字是:
当前微信版本不支持展示该内容,请升级至最新版本。
.
部分文字无法解码
仔细分析“当”和“前”发现:
当 => [0xe5, 0xbd, 0x93] => “
前 => [0xe5, 0x89, 0x8d] => ‰ (这里8d没有对应的编码,导致编码出现问题).
windows-1252 的 81、8D、8F、90 和 9D 都未有使用( https://zh.wikipedia.org/wiki/Windows-1252 ).
查看原始网络数据包,发现字符串包含了部分不可见字符,比如: \x8D \x81.
iconv.encode('', 'windows-1252')
之后要替换掉对应位置的值为81、8D、8F、90 或 9D。 1 ntedshen 228 天前 现(utf8)=e78eb0=°(latin1) |
![]() | 2 chenliang0571 OP @ntedshen 似乎不对? > iconv.encode('现', 'utf-8') <Buffer e7 8e b0> > iconv.encode('°', 'latin1') <Buffer e7 3f b0> |
3 ntedshen 228 天前 @chenliang0571 https://cs.stanford.edu/people/miles/iso8859.html 3f 是问号 其实不用管这个,你现在只需要知道编码是错的,接口无论如何也不可能给你一个拉丁字符集让你自己处理中文。。。 看看 contenttype 是不是没 utf8 |
![]() | 4 chenliang0571 OP @ntedshen request:content-type:application/json;charset= response:content-type:text/plain --- 我知道原因了,windows-1252 的 81 、8D 、8F 、90 和 9D 都未有使用( https://zh.wikipedia.org/wiki/Windows-1252 ) 所以下面的中文编码为 windows-1252 ,然后重新解码 utf-8 部分中文会出错。 iconv.decode(iconv.encode(iconv.decode(iconv.encode('流畅轮廓、精致细节和自由律动,糅合呈现#TODS2023 秋冬 男士系列。#TodsTheItalianPortrait 创意总监:Walter Chiapponi', 'utf-8'), 'windows-1252'), 'windows-1252'), 'utf-8') 畅轮廓精致细节和自由律动,糅呈现#TODS2023 秋冬 男士系列。#TodsTheItalianPortrait 创总监:Walter Chiapponi |