我有一个包含 1000 万行和很多列的 SQL 表,查询时表大小约为 44 GB。
但是我试图从此表中仅获取 3 列并将其保存到 csv/加载到数据帧中,python 会永远运行。 即
pd.read_sql("select a,b,c from table") is taking more than 1 hour and not returning data
How to achieve this? Can I load this entire data in dataframe at once is that a viable option. After this I should be able to perform some data manipulations on these rows. 2. OR should I download this to csv and read this data part by part to in-memory.
If its 2. How to code for 2?
Code tried for 2 so far is :
def iter_row(cursor, size=10):
while True:
rows = cursor.fetchmany(size)
if not rows:
break
for row in rows:
yield row
def query_with_fetchmany():
cursor.execute("SELECT * FROM books")
for row in iter_row(cursor, 10):
print(row)
cursor.close()
最佳答案
您可以分块读取数据:
for c in pd.read_sql("select a,b,c from table", con=connection, chunksize=10**5):
c.to_csv(r'/path/to/file.csv', index=False, mode='a')
关于python - 如何从 SQL 表中下载大数据并通过一次获取 1000 条左右的记录连续保存到 csv 中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44203644/