python - 如何将查询列表传递给 pandas 数据框,并输出结果列表?

标签 python python-3.x pandas dataframe

选择列值为 column_name 的行时等于标量,some_value , 我们使用 ==:

df.loc[df['column_name'] == some_value]

或使用 .query()

df.query('column_name == some_value')

具体例子:

import pandas as pd
import numpy as np
df = pd.DataFrame({'Col1': 'what are men to rocks and mountains'.split(),
                   'Col2': 'the curves of your lips rewrite history.'.split(),
                   'Col3': np.arange(7),
                   'Col4': np.arange(7) * 8})

print(df)

         Col1      Col2  Col3  Col4
0       what       the     0     0
1        are    curves     1     8
2        men        of     2    16
3         to      your     3    24
4      rocks      lips     4    32
5        and   rewrite     5    40
6  mountains  history      6    48

查询可以是

rocks_row = df.loc[df['Col1'] == "rocks"]

哪些输出

print(rocks_row)
    Col1  Col2  Col3  Col4
4  rocks  lips     4    32

我想通过一个值列表来查询一个数据框,它输出一个“正确查询”列表。

要执行的查询将在列表中,例如

list_match = ['men', 'curves', 'history']

这将输出满足此条件的所有行,即

matches = pd.concat([df1, df2, df3]) 

在哪里

df1 = df.loc[df['Col1'] == "men"]

df2 = df.loc[df['Col1'] == "curves"]

df3 = df.loc[df['Col1'] == "history"]

我的想法是创建一个接收

的函数
output = []
def find_queries(dataframe, column, value, output):
    for scalar in value: 
        query = dataframe.loc[dataframe[column] == scalar]]
        output.append(query)    # append all query results to a list
    return pd.concat(output)    # return concatenated list of dataframes

但是,这看起来异常缓慢,并且实际上并没有利用 pandas 数据结构。通过 Pandas 数据框传递查询列表的“标准”方法是什么?

编辑:这如何转化为 pandas 中“更复杂”的查询?例如where使用 HDF5 文档?

df.to_hdf('test.h5','df',mode='w',format='table',data_columns=['A','B'])

pd.read_hdf('test.h5','df')

pd.read_hdf('test.h5','df',where='A=["foo","bar"] & B=1')

最佳答案

如果我正确理解了您的问题,您可以使用 bool 索引作为 @uhjish has already shown in his answer 来完成或使用 query()方法:

In [30]: search_list = ['rocks','mountains']

In [31]: df
Out[31]:
        Col1      Col2  Col3  Col4
0       what       the     0     0
1        are    curves     1     8
2        men        of     2    16
3         to      your     3    24
4      rocks      lips     4    32
5        and   rewrite     5    40
6  mountains  history.     6    48

.query() 方法:

In [32]: df.query('Col1 in @search_list and Col4 > 40')
Out[32]:
        Col1      Col2  Col3  Col4
6  mountains  history.     6    48

In [33]: df.query('Col1 in @search_list')
Out[33]:
        Col1      Col2  Col3  Col4
4      rocks      lips     4    32
6  mountains  history.     6    48

使用 bool 索引:

In [34]: df.ix[df.Col1.isin(search_list) & (df.Col4 > 40)]
Out[34]:
        Col1      Col2  Col3  Col4
6  mountains  history.     6    48

In [35]: df.ix[df.Col1.isin(search_list)]
Out[35]:
        Col1      Col2  Col3  Col4
4      rocks      lips     4    32
6  mountains  history.     6    48

更新:使用函数:

def find_queries(df, qry, debug=0, **parms):
    if debug:
        print('[DEBUG]: Query:\t' + qry.format(**parms))
    return df.query(qry.format(**parms))

In [31]: find_queries(df, 'Col1 in {Col1} and Col4 > {Col4}', Col1='@search_list', Col4=40)
    ...:
Out[31]:
        Col1      Col2  Col3  Col4
6  mountains  history.     6    48

In [32]: find_queries(df, 'Col1 in {Col1} and Col4 > {Col4}', Col1='@search_list', Col4=10)
Out[32]:
        Col1      Col2  Col3  Col4
4      rocks      lips     4    32
6  mountains  history.     6    48

包括调试信息(打印查询):

In [40]: find_queries(df, 'Col1 in {Col1} and Col4 > {Col4}', Col1='@search_list', Col4=10, debug=1)
[DEBUG]: Query: Col1 in @search_list and Col4 > 10
Out[40]:
        Col1      Col2  Col3  Col4
4      rocks      lips     4    32
6  mountains  history.     6    48

关于python - 如何将查询列表传递给 pandas 数据框,并输出结果列表?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39988589/

相关文章:

python - pandas 和 numpy 的问题在哪里条件/多个值?

python - pandas 数据框,并对 n 个最常见的值使用 idmax()

python - 为什么 numpy 的 where 操作比 apply 函数快?

python - 将嵌套字典转换为附加数据框

python - 错误 400 : invalid_request The out-of-band (OOB) flow has been blocked in order to keep users secure

python - 我是否需要创建一个类实例以使用 unittest 进行测试?

python - 使用请求从网页获取端口时遇到问题

sockets - python Socket发送ascii命令并接收响应

python - 将 Graphlab SFrame 日期列拆分为三列(年月日)

python - 如何从 block 文件 (revxxxxx.dat) 中提取所有比特币地址