我正在向 api 发出请求:
def get_data(text, url='api.com'):
r = requests.get(url,
params={'key': '<My KEY>',
'hj': text
'oi': 'm'})
json_data = json.dumps(r.json())
data = yaml.load(json_data)
return data
然后我按如下方式应用该函数,因为我的数据位于 pandas 数据框中:
data
0 The quick fox jumps over the lazy
1 The quick fox over the lazy dog
2 The quick brown fox jumps over the lazy dog
....
n The brown fox jumps over the dog
然后:
df['col'] = df[['data']].apply(get_data, axis=1)
我通过请求发送和接收的数据大小非常大,因此如何按 block 发出上述请求?例如 4 x 4?:
for chunk in r.iter_content(chunk_size=5):
json_data = json.dumps(r.json())
data = yaml.load(json_data)
return data
但是它不起作用,有人可以帮助我按 block 发出请求或分成 block 并连接所有内容吗?
更新
我还尝试按 block 分割数据帧,但它只是没有完成:
在:
df.groupby(np.arange(len(df))//10)
for k,g in df.groupby(np.arange(len(df))//10):
[g.data.apply(get_data) for _, g in df.groupby(np.arange(len(df))//10)]
输出:
----> 7 df = pd.concat(g.data.apply(get_data) for _, g in df2.groupby(np.arange(len(df2))//4))
8 df
/usr/local/lib/python3.5/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
2290 else:
2291 values = self.asobject
-> 2292 mapped = lib.map_infer(values, f, convert=convert_dtype)
2293
2294 if len(mapped) and isinstance(mapped[0], Series):
pandas/src/inference.pyx in pandas.lib.map_infer (pandas/lib.c:63307)()
<ipython-input-28-329dbdbb7cdb> in get_data(data)
62
63 r = requests.get('http://api.example.com/api', params=payload, stream = True)
---> 64 json_data = json.dumps(r.json())
65 data = yaml.load(json_data)
66
/usr/local/lib/python3.5/site-packages/requests/models.py in json(self, **kwargs)
848 # used.
849 pass
--> 850 return complexjson.loads(self.text, **kwargs)
851
852 @property
/usr/local/lib/python3.5/site-packages/simplejson/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, use_decimal, **kw)
514 parse_constant is None and object_pairs_hook is None
515 and not use_decimal and not kw):
--> 516 return _default_decoder.decode(s)
517 if cls is None:
518 cls = JSONDecoder
/usr/local/lib/python3.5/site-packages/simplejson/decoder.py in decode(self, s, _w, _PY3)
368 if _PY3 and isinstance(s, binary_type):
369 s = s.decode(self.encoding)
--> 370 obj, end = self.raw_decode(s)
371 end = _w(s, end).end()
372 if end != len(s):
/usr/local/lib/python3.5/site-packages/simplejson/decoder.py in raw_decode(self, s, idx, _w, _PY3)
398 elif ord0 == 0xef and s[idx:idx + 3] == '\xef\xbb\xbf':
399 idx += 3
--> 400 return self.scan_once(s, idx=_w(s, idx).end())
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
但是我不明白在分成 block 后将所有内容链接在一起。
最佳答案
您可以进行列表理解来存储接收到的文件,这里g
将与原始数据帧相同,但尺寸较小:
[g.data.apply(get_data) for _, g in df.groupby(np.arange(len(df))//10)]
<小时/>
或者也许您真正想要的是,如果您想获得 data
系列中每个文本的响应:
df.data.apply(get_data)
请注意,df[["data"]]
返回一个数据帧,因此 df[["data"]].apply(get_data, axis = 1)
会将整列传递给 get_data
函数。
关于python - 批量发出请求时出现编码问题?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41604246/