python - 使用 Python/numpy 过滤 CSV 数据

标签 python csv numpy lambda pandas

我正在处理 CSV 文件。

            id     gender       disease       read      write    science 
  1.        11       male      cancer, diabetes 34         46         39  
  2.        20       male      diabetes         60         52         61  
  3.        12       male      diabetes         37         44         39  
  4.        16       male      cancer           47         31         36  
  5.         7       male      diabetes         57         54         47  
  6.        21       male      diabetes         44         44         50  
  7.        15       male      diabetes         39         39         26  
  8.        22       male      diabetes         42         39         56  
  9.         9       male      cancer           48         49         44  
 10.        18       male      diabetes         50         33         44  
 11.         5       male      diabetes         47         40          .  
 12.        14       male      diabetes         47         41         42  
 13.         3       male      diabetes         63         65         63  
 14.        24       male         fever         52         62         47  
 15.         8     female      diabetes         39         44         44  
 16.         1     female      cancer           34         44         39  
 17.         4     female      diabetes         44         50         39  
 18.         2     female      diabetes         39         41         42  
 19.        19     female      cancer           28         46         44  
 20.        17     female      diabetes         47         57         44  
 21.         6     female      diabetes         47         41         40  
 22.        10     female      diabetes         47         54         53  
 23.        13     female      diabetes         47         46         47  
 24.        23     female      diabetes         65         65         58  
 25.        25     female    Breast cancer         47         44         42  

我想获取人们患有癌症的所有行。有些人患有糖尿病和癌症,所以我也必须对其进行过滤。 结果应该是:

1.         11       male      cancer, diabetes 34         46         39  
4.         16       male      cancer           47         31         36
9.         9       male      cancer           48         49         44  
19.        19     female      cancer           28         46         44 
25.        25     female    Breast cancer         47         44         42


import pandas as pd                     
import numpy as np

ppl_ve_cancer = pd.read_csv(join(dirname(__file__), 'data.csv'))
delta= pd.DataFrame.from_records(ppl_ve_cancer )
disease= delta['disease']

现在,如何过滤“疾病列表”,过滤后如何获取其行中的数据(id,性别,读,写,科学)

最佳答案

这是一种更以 pandas 为中心的方法:首先将所有数据作为数据框读取,创建一个 has cancer 列,然后对其进行过滤=

import StringIO
import pandas

datastring = StringIO.StringIO("""\
id,gender,disease,read,write,science
11,male,"cancer,diabetes",34,46,39
20,male,diabetes,60,52,61
12,male,diabetes,37,44,39
16,male,cancer,47,31,36
7,male,diabetes,57,54,47
21,male,diabetes,44,44,50
15,male,diabetes,39,39,26
22,male,diabetes,42,39,56
9,male,cancer,48,49,44
18,male,diabetes,50,33,44
5,male,diabetes,47,40,-999
14,male,diabetes,47,41,42
3,male,diabetes,63,65,63
24,male,fever,52,62,47
8,female,diabetes,39,44,44
1,female,cancer,34,44,39
4,female,diabetes,44,50,39
2,female,diabetes,39,41,42
19,female,cancer,28,46,44
17,female,diabetes,47,57,44
6,female,diabetes,47,41,40
10,female,diabetes,47,54,53
13,female,diabetes,47,46,47
23,female,diabetes,65,65,58
25,female,"Breast cancer",47,44,42
""")

df = pandas.read_csv(datastring, na_values=-999)

# create the `has cancer` column
df['has cancer'] = df.disease.apply(lambda row: 'cancer' in row)

# print the filtered data
print(df[df['has cancer']].to_string())


    id  gender          disease  read  write  science has cancer
0   11    male  cancer,diabetes    34     46       39       True
3   16    male           cancer    47     31       36       True
8    9    male           cancer    48     49       44       True
15   1  female           cancer    34     44       39       True
18  19  female           cancer    28     46       44       True
24  25  female    Breast cancer    47     44       42       True

关于python - 使用 Python/numpy 过滤 CSV 数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21011571/

相关文章:

r - 如何从数据框列表中写入多个 CSV 文件?

python - 迭代从 csv 传递过来的列表。柱子

Python 硬币变化如此接近

python - 在远程IPython集群上运行作业时出现"execution_count"错误

python - 有条件的 Django 模型创建

java - CSV Java文件读取和保存(在不同的ArrayList中)

python - Pandas datetime 查找给定日期之前最近的日期。如果不存在,则获取最近的日期

python - None 的 numpy 索引切片

python - 二维热方程的numpy快速方法

python - `PIL.Image.Show()`临时存储图片在哪里,之后会被删除吗?