python - 如何使用 Pandas 读取不包含标题的 CSV 文件，仅捕获第一列中的数据并执行删除？

我有一个 CSV 文件，其中包含有关人员的信息以及占用 100 多列的各种数据。没有标题，我的主要意图只是获取人们的名字。不是与之相关的其他数据。我怎样才能做到这一点？

这是我的 CSV 文件 --- 'data.csv':

John   12 34 23 48 14 44 94 24  ...    #extends till 100
Becky  23 40 93 47 84 43 64 31  ...    #extends till 100
Lio    63 90 53 77 14 12 69 20  ...    #extends till 100

接下来，假设我的代码中有一个列表，其中填充了很多名称:

names = ['Timothy', 'Joshua', 'Rio', 'Catherine', 'Poorva', 'Gome', 'Lachlan', 'John', 'Lio']

我在 Python 中打开了 CSV 文件，并使用列表理解来读取第一列中的所有姓名，并将它们存储在分配了变量“people_list”的列表中。

现在，对于 people_list 中的所有元素，如果在“姓名”列表中未看到该元素，我想在 CSV 文件中删除该元素。在此示例中，我想删除 Becky，因为她没有出现在姓名列表中。这是我到目前为止所尝试的...

演示 -- data.py:

names = ['Timothy', 'Joshua', 'Rio', 'Catherine', 'Poorva', 'Gome', 'Lachlan', 'John', 'Lio']
csv_filename = data.csv

with open(csv_filename, 'r') as readfile:
reader = csv.reader(readfile, delimiter=',') 
people_list = [row[0] for row in reader]

for person in people_list:
    if person not in names:
        id = people_list.index(person) #grab the index of the person in people_list who's not found in the names list.

        #using pandas
        df = pd.read_csv(csv_filename) #read data.csv file
        df.drop(df.index[id], in_place = True) #delete the row id for the person who does not exist in names list.
        df.to_csv(csv_filename, index = False, sep=',')  #close the csv file with no index
    else:
        print("This person is found in the names list")

我的 CSV 文件中的所有记录都被删除(包括 Becky)，而不是删除 Becky。有人可以解释一下如何做到这一点吗？

最佳答案

将参数header=None添加到默认列0,1,2的read_csv...:

df = pd.read_csv(csv_filename,  header=None)

names = ['Timothy', 'Joshua', 'Rio', 'Catherine', 'Poorva', 'Gome', 'Lachlan', 'John', 'Lio']

然后通过df[0]选择第一列并通过Series.isin测试成员资格并按boolean indexing过滤:

df = df[df[0].isin(names)]
print (df)

上次写入文件:

df.to_csv(csv_filename1, index = False, header=None)

关于python - 如何使用 Pandas 读取不包含标题的 CSV 文件，仅捕获第一列中的数据并执行删除？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58850367/

python - 如何使用 Pandas 读取不包含标题的 CSV 文件，仅捕获第一列中的数据并执行删除？

上一篇：python - HTML 抓取具有重复 div 类名的网站

下一篇：python - 在多索引数据集中按名称引用 pandas 索引