我有一个数据框,我正在尝试在 Jupyter 中处理。该数据框最初填充了 NaN
,其中发现了空白,但后来我决定用“Null”字符串替换它们(因为我在忽略 NaN
时遇到了问题)。
以下代码是原始文件mydata.txt
的示例
##IGNORE THIS LINE
group2,"BLA","BLE","BLI","BLO","BLU","TAT","TET","TOT","TUT"
group0,"BLA","BLE","BLI","BLO","BLU"
group3,"BLA","BLE","BLI"
这个想法是构建数组,其中所有不是 NaN
(或后来的“Null”)的元素都可以提供给其他地方进行过滤。
import rpy2.ipython
import rpy2.robjects as robjects
import pandas as pd
import numpy
import re #python for regex
%load_ext rpy2.ipython
%R
path='C:/MyPath/'
allgroups=pd.read_csv(path+'mydata.txt',sep=",",skiprows=1,header=None,index_col=0)
allgroups=allgroups.fillna("Null")
def groupdat(groupname):
#Cleans group
precleaned=numpy.array(allgroups.loc[[groupname]])
# matching = [s for s in precleaned if s != "Null" ] #I tried this
matching=filter(lambda elem: elem != "Null",precleaned) #I also tried this.
print(matching)
return
groupdat('group0')
上面注释的两个 matching
都会产生错误:ValueError: 具有多个元素的数组的真值不明确。使用 a.any() 或 a.all()
。
precleaned
的输出为
[['BLA' 'BLE' 'BLI' 'BLO' 'BLU' 'Null' 'Null' 'Null' 'Null']]
打印 allgroups.loc[[groupname]]
给出
1 2 3 4 5 6 7 8 9
0
group0 BLA BLE BLI BLO BLU Null Null Null Null
[1 rows x 9 columns]
我感谢所有反馈。
最佳答案
创建数组时一维过多
numpy.array(allgroups.loc[["group0"]])
因此 listcomp/filter
迭代唯一的元素,该元素是一个数组,因此您收到的消息
像这样创建它:
numpy.array(allgroups.loc[["group0"][0]])
然后[s for s in precleaned if s != "Null"]
产生:
['BLA', 'BLE', 'BLI', 'BLO', 'BLU']
正如预期的那样。
关于python - 在Python中检索数组中匹配条件的所有元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41428632/