从 R 更改为 Python 我在使用 pandas 从多个 DataFrame 列表中编写多个 csv 时遇到一些困难:
import pandas
from dplython import (DplyFrame, X, diamonds, select, sift, sample_n,
sample_frac, head, arrange, mutate, group_by, summarize,
DelayFunction)
diamonds = [diamonds, diamonds, diamonds]
path = "/user/me/"
def extractDiomands(path, diamonds):
for each in diamonds:
df = DplyFrame(each) >> select(X.carat, X.cut, X.price) >> head(5)
df = pd.DataFrame(df) # not sure if that is required
df.to_csv(os.path.join('.csv', each))
extractDiomands(path,diamonds)
但是这会产生错误。感谢任何建议!
最佳答案
欢迎使用Python!首先,我将加载几个库并下载一个示例数据集。
import os
import pandas as pd
example_data = pd.read_csv("http://www.ats.ucla.edu/stat/data/binary.csv")
print(example_data.head(5))
示例数据的前几行:
admit gre gpa rank
0 0 380 3.61 3
1 1 660 3.67 3
2 1 800 4.00 1
3 1 640 3.19 4
4 0 520 2.93 4
现在这就是我认为你想要完成的事情:
# spawn a few datasets to loop through
df_1, df_2, df_3 = example_data.head(20), example_data.tail(20), example_data.head(10)
list_of_datasets = [df_1, df_2, df_3]
output_path = 'scratch'
# in Python you can loop through collections of items directly, its pretty cool.
# with enumerate(), you get the index and the item from the sequence, each step through
for index, dataset in enumerate(list_of_datasets):
# Filter to keep just a couple columns
keep_columns = ['gre', 'admit']
dataset = dataset[keep_columns]
# Export to CSV
filepath = os.path.join(output_path, 'dataset_'+str(index)+'.csv')
dataset.to_csv(filepath)
最后,我的文件夹 'scratch'
包含三个新的 csv,分别为 dataset_0.csv
、dataset_1.csv
和 dataset_2.csv
关于python - 如何循环多个 DataFrame 并生成多个 csv?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43754309/