最近在使用 Pandas 处理 json 数据时遇到了 ValueError: Protocol not known
的问题
后面使用 json 库就解决了,不明白为什么
json 的数据就是 data 里包个 contestUpcomingContests ,里面再包一个数组,内有两个元素
import json import pandas as pd data = '{"data":{"contestUpcomingContests":[{"containsPremium":false,"title":"\u7b2c 99 \u573a\u53cc\u5468\u8d5b","cardImg":"https://assets.leetcode.cn/aliyun-lc-upload/contest-config/biweekly-contest-99/contest_detail/pc_card.png","titleSlug":"biweekly-contest-99","startTime":1677940200,"duration":5400,"originStartTime":1677940200},{"containsPremium":false,"title":"\u7b2c 334 \u573a\u5468\u8d5b","cardImg":"https://assets.leetcode.cn/aliyun-lc-upload/contest-config/weekly-contest-334/contest_detail/pc_card.png","titleSlug":"weekly-contest-334","startTime":1677378600,"duration":5400,"originStartTime":1677378600}]}}' df1 = json.loads(data) print(df1) df2 = pd.read_json(data) print(df2)
{'data': {'contestUpcomingContests': [{'containsPremium': False, 'title': '第 99 场双周赛', 'cardImg': 'https://assets.leetcode.cn/aliyun-lc-upload/contest-config/biweekly-contest-99/contest_detail/pc_card.png', 'titleSlug': 'biweekly-contest-99', 'startTime': 1677940200, 'duration': 5400, 'originStartTime': 1677940200}, {'containsPremium': False, 'title': '第 334 场周赛', 'cardImg': 'https://assets.leetcode.cn/aliyun-lc-upload/contest-config/weekly-contest-334/contest_detail/pc_card.png', 'titleSlug': 'weekly-contest-334', 'startTime': 1677378600, 'duration': 5400, 'originStartTime': 1677378600}]}} Traceback (most recent call last): File "/Users/world/Developer/AlgorithmSharkSpider/test.py", line 8, in <module> df2 = pd.read_json(data) File "/opt/homebrew/lib/python3.9/site-packages/pandas/util/_decorators.py", line 199, in wrapper return func(*args, **kwargs) File "/opt/homebrew/lib/python3.9/site-packages/pandas/util/_decorators.py", line 299, in wrapper return func(*args, **kwargs) File "/opt/homebrew/lib/python3.9/site-packages/pandas/io/json/_json.py", line 540, in read_json json_reader = JsonReader( File "/opt/homebrew/lib/python3.9/site-packages/pandas/io/json/_json.py", line 622, in __init__ data = self._get_data_from_filepath(filepath_or_buffer) File "/opt/homebrew/lib/python3.9/site-packages/pandas/io/json/_json.py", line 659, in _get_data_from_filepath self.handles = get_handle( File "/opt/homebrew/lib/python3.9/site-packages/pandas/io/common.py", line 558, in get_handle ioargs = _get_filepath_or_buffer( File "/opt/homebrew/lib/python3.9/site-packages/pandas/io/common.py", line 333, in _get_filepath_or_buffer file_obj = fsspec.open( File "/opt/homebrew/lib/python3.9/site-packages/fsspec/core.py", line 419, in open return open_files( File "/opt/homebrew/lib/python3.9/site-packages/fsspec/core.py", line 272, in open_files fs, fs_token, paths = get_fs_token_paths( File "/opt/homebrew/lib/python3.9/site-packages/fsspec/core.py", line 574, in get_fs_token_paths chain = _un_chain(urlpath0, storage_options or {}) File "/opt/homebrew/lib/python3.9/site-packages/fsspec/core.py", line 315, in _un_chain cls = get_filesystem_class(protocol) File "/opt/homebrew/lib/python3.9/site-packages/fsspec/registry.py", line 208, in get_filesystem_class raise ValueError("Protocol not known: %s" % protocol) ValueError: Protocol not known: {"data":{"contestUpcomingContests":[{"containsPremium":false,"title":"第 99 场双周赛","cardImg":"https
![]() | 1 bomb77 2023-02-23 14:41:24 +08:00 python3.8 pandas 1.5.3 测试没有报错 升级下版本或者看看是不是编码啥的问题? |
![]() | 2 dcopen 2023-02-23 15:00:19 +08:00 ![]() 这个问题发生在使用 Pandas 的 read_json() 函数时,该函数使用了 fsspec 库进行文件处理和读取。而在 fsspec 0.9.0 版本之后,它引入了一个新的 URL 解析机制,导致了该错误。 在处理 JSON 数据时,您可以使用 json 库将其转换为 Python 字典,然后再使用 Pandas 的 json_normalize() 函数将其展平为 Pandas 数据帧。下面是一个示例代码: ``` import json import pandas as pd data = '{"data":{"contestUpcomingContests":[{"containsPremium":false,"title":"第 99 场双周赛","cardImg":"https://assets.leetcode.cn/aliyun-lc-upload/contest-config/biweekly-contest-99/contest_detail/pc_card.png","titleSlug":"biweekly-contest-99","startTime":1677940200,"duration":5400,"originStartTime":1677940200},{"containsPremium":false,"title":"第 334 场周赛","cardImg":"https://assets.leetcode.cn/aliyun-lc-upload/contest-config/weekly-contest-334/contest_detail/pc_card.png","titleSlug":"weekly-contest-334","startTime":1677378600,"duration":5400,"originStartTime":1677378600}]}}' data_dict = json.loads(data) df = pd.json_normalize(data_dict, record_path=['data', 'contestUpcomingContests']) print(df) ``` 输出: ``` containsPremium title cardImg titleSlug startTime duration originStartTime 0 False 第 99 场双周赛 https://assets.leetcode.cn/aliyun-lc-upload/co... biweekly-contest-99 1677940200 5400 1677940200 1 False 第 334 场周赛 https://assets.leetcode.cn/aliyun-lc-upload/co... weekly-contest-334 1677378600 5400 1677378600 ``` |