得到一个 CSV,我使用以下代码选择 500 行的随机样本:
import csv
import random
with open('Original.csv' , "rb") as source:
lines = [line for line in source]
random_choice = random.sample(lines, 500);
我想做的是更新名为 [winner] 的列(如果它们存在于示例中),然后将其保存回 csv 文件,但我不知道如何实现此目的...
名为 [ID] 的列中有一个唯一标识符。
我该如何去做呢?
最佳答案
从如下所示的 CSV 开始:
ID something winner
1 a
2 b
3 c
4 a
5 d
6 a
7 b
8 e
9 f
10 g
您可以使用以下方法。读入整个文件,通过随机选择的索引选择行,然后写回文件。
import csv
import random
# Read in the data
with open('example.csv', 'r') as infile:
reader = csv.reader(infile)
header = next(reader) # We want the headers, but not as part of the sample
data = []
for row in reader:
data.append(row)
# Find the column called winner
winner_column_index = header.index('winner')
# Pick some random indices which will be used to generate the sample
all_indices = list(range(len(data)))
sampled_indices = random.sample(all_indices, 5)
# Add the winner column to those rows selected
for index in sampled_indices:
data[index][winner_column_index] = 'Winner'
# Write the data back
with open('example_out.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
writer.writerow(header) # Make sure we get the headers back in
writer.writerows(data) # Write the rest of the data
这将给出以下输出:
ID something winner
1 a
2 b Winner
3 c
4 a Winner
5 d
6 a Winner
7 b
8 e
9 f Winner
10 g Winner
编辑:事实证明,如果您想使用 Excel 打开,将 CSV 的第一列称为 ID
并不是一个好主意。然后它错误地认为该文件是 SYLK 格式。
关于python - 如果 CSV 行存在于随机样本中,则更新该行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50173681/