我从 CSV 文件加载了一些机器学习数据。前 2 列是观测值，其余列是特征。

目前，我执行以下操作:

data = pandas.read_csv('mydata.csv')

这给出了类似的东西:

data = pandas.DataFrame(np.random.rand(10,5), columns = list('abcde'))

我想将此数据帧分成两个数据帧:一个包含 a 和 b 列，一个包含 c 列， d 和 e.

不可能写出类似的东西

observations = data[:'c']
features = data['c':]

我不确定最好的方法是什么。我需要 pd.Panel 吗？

顺便说一句，我发现数据帧索引非常不一致:data['a'] 是允许的，但 data[0] 是不允许的。另一方面， data['a':] 是不允许的，但 data[0:] 是允许的。这有实际原因吗？如果列是由 Int 索引的，这真的很令人困惑，因为 data[0] != data[0:1]

最佳答案

2017 答案 - pandas 0.20:.ix 已弃用。使用 .loc

见 deprecation in the docs

.loc 使用基于标签的索引来选择行和列。标签是索引或列的值。 .loc 切片包括最后一个元素。

Let's assume we have a DataFrame with the following columns:
foo, bar, quz, ant, cat, sat, dat.

# selects all rows and all columns beginning at 'foo' up to and including 'sat'
df.loc[:, 'foo':'sat']
# foo bar quz ant cat sat

.loc 接受与 Python 列表对行和列所做的相同切片表示法。切片表示法是 start:stop:step

# slice from 'foo' to 'cat' by every 2nd column
df.loc[:, 'foo':'cat':2]
# foo quz cat

# slice from the beginning to 'bar'
df.loc[:, :'bar']
# foo bar

# slice from 'quz' to the end by 3
df.loc[:, 'quz'::3]
# quz sat

# attempt from 'sat' to 'bar'
df.loc[:, 'sat':'bar']
# no columns returned

# slice from 'sat' to 'bar'
df.loc[:, 'sat':'bar':-1]
sat cat ant quz bar

# slice notation is syntatic sugar for the slice function
# slice from 'quz' to the end by 2 with slice function
df.loc[:, slice('quz',None, 2)]
# quz cat dat

# select specific columns with a list
# select columns foo, bar and dat
df.loc[:, ['foo','bar','dat']]
# foo bar dat

您可以按行和列进行切片。例如，如果您有 5 行带有标签 v、w、x、y、z

# slice from 'w' to 'y' and 'foo' to 'ant' by 3
df.loc['w':'y', 'foo':'ant':3]
#    foo ant
# w
# x
# y

关于python - 如何在 Pandas 中获取数据框的列片，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/10665889/

python - 如何在 Pandas 中获取数据框的列片

2017 答案 - pandas 0.20:.ix 已弃用。使用 .loc

上一篇：python - False == 0 和 True == 1 是实现细节还是由语言保证？

下一篇：python - 如何在python中获取当前时间并分解为年、月、日、小时、分钟？