python - "the label [5] is not in the [index]"复制行时

标签 python pandas

我正在尝试重新格式化 CSV,将每个月列变成每个记录的单独行(本质上是旋转它),即:

enter image description here

进入:

enter image description here

要做到这一点,我认为最好的方法是:

  • 循环遍历每一行,循环遍历每个月列( Jan-17Feb-17 等...),然后复制该行。
  • 然后将月份和总计插入 DateTotals列。
  • 然后删除重复的记录并从索引停止的位置开始(即,每个日期进行 5 次记录循环后,开始的索引将为 5)。
  • 然后,当所有行都重复时,删除月份列( Jan-17Feb-17 等...)

它对第一个数据行(即 brand1 )执行此操作,但在第一个外部循环完成后,它会中断:

the label [5] is not in the [index]

df['date'] = ''
df['totals'] = 0
months = ['Jan-17', 'Feb-17', 'Mar-17', 'Apr-17', 'May-17']

dropRowIndex = 0
nextDuplicateRowStartIndex = 0
totalRows = df.shape[0]

for i in range(0, totalRows):
    print('--------------')
    print(df)
    for col in df:
        if col in months:
            # Insert a row above 0th index with 0th row's values
            # Duplicate the row at this index for each month
            # Then move on to the next "row", which would be the latest index count
            df.loc[nextDuplicateRowStartIndex-1] = df.loc[nextDuplicateRowStartIndex].values
            df.loc[nextDuplicateRowStartIndex-1, 'date'] = col
            df.loc[nextDuplicateRowStartIndex-1, 'totals'] = df.loc[nextDuplicateRowStartIndex-1][col]

            df.index = df.index + 1
            df = df.sort_index()
            dropRowIndex += 1

    # Drop duplicated row by index
    df.drop(dropRowIndex, inplace=True)
    nextDuplicateRowStartIndex = dropRowIndex

# Remove months columns
for col in df:
    if col in months:
        df = df.drop(col, 1)

终端输出:

-------------- INITIAL DATA FRAME:
    brand  Jan-17  Feb-17  Mar-17  Apr-17  May-17 date  totals
0  brand1     222     333     444     555     666            0
1  brand2    7777    8888    9999    1010    1111            0
2  brand3   12121   13131   14141   15151   16161            0
-------------- DATA FRAME AFTER FIRST OUTER LOOP (ROW) ITERATION:
    brand  Jan-17  Feb-17  Mar-17  Apr-17  May-17    date  totals
0  brand1     222     333     444     555     666  May-17     666
1  brand1     222     333     444     555     666  Apr-17     555
2  brand1     222     333     444     555     666  Mar-17     444
3  brand1     222     333     444     555     666  Feb-17     333
4  brand1     222     333     444     555     666  Jan-17     222
6  brand2    7777    8888    9999    1010    1111               0
7  brand3   12121   13131   14141   15151   16161               0
Traceback (most recent call last):
  File "/Users/danielturcotte/Sites/project/env/lib/python3.6/site-packages/pandas/core/indexing.py", line 1506, in _has_valid_type
    error()
  File "/Users/danielturcotte/Sites/project/env/lib/python3.6/site-packages/pandas/core/indexing.py", line 1501, in error
    axis=self.obj._get_axis_name(axis)))
KeyError: 'the label [5] is not in the [index]'

错误

KeyError: 'the label [5] is not in the [index]'

<小时/>

我的一个想法是因为我正在使用 .loc[index] ,其中索引是一个整数,可能是 .loc doesn't work with integers ,但是.iloc[]做。如果我这样做

df.iloc[nextDuplicateRowStartIndex-1] = df.iloc[nextDuplicateRowStartIndex].values

我收到错误:

ValueError: labels [10] not contained in axis

终端输出产生 NaN s:

    brand  Jan-17  Feb-17  Mar-17  Apr-17  May-17    date  totals
0     NaN     NaN     NaN     NaN     NaN     NaN  May-17     NaN
1     NaN     NaN     NaN     NaN     NaN     NaN  Apr-17     NaN
2     NaN     NaN     NaN     NaN     NaN     NaN  Mar-17     NaN
3     NaN     NaN     NaN     NaN     NaN     NaN  Feb-17     NaN
4     NaN     NaN     NaN     NaN     NaN     NaN  Jan-17     NaN
6  brand2  7777.0  8888.0  9999.0  1010.0  1111.0             0.0
7     NaN     NaN     NaN     NaN     NaN     NaN  Apr-17     NaN

虽然我不相信这就是问题所在,因为 print(df.iloc[0])print(df.loc[0])产生相同的结果(即使我使用整数访问 loc[0])。

<小时/>

正在做melt :

enter image description here

最佳答案

您可以使用melt为了这。它允许您选择多个 ID 列和值列。在您的情况下,值列是除“品牌”之外的所有内容,因此我们可以忽略该参数。因此,您可以在一行中完成所有操作:

import pandas as pd

df = pd.DataFrame({
    'brand': ['brand1', 'brand2', 'brand3'],
    'Jan-17': [22, 232, 324],
    'Feb-17': [333, 424, 999]
    # ...
})

rearranged = pd.melt(df, id_vars=['brand'], var_name='Date',
                     value_name='Total')

print(rearranged)

打印:

    brand    Date  Total
0  brand1  Feb-17    333
1  brand2  Feb-17    424
2  brand3  Feb-17    999
3  brand1  Jan-17     22
4  brand2  Jan-17    232
5  brand3  Jan-17    324

关于python - "the label [5] is not in the [index]"复制行时,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47354835/

相关文章:

python - 在不存在预先指定的条件的情况下插入 Pandas 数据框

python - 使用Python提取csv(数据框)中的特定字符串数据

python - 对 db.Model 的查询会检索 db.Model 的所有属性,无论它们是否必要。有其他选择吗?

python - 如何制作python包?

python - 将分类值转换为 Pandas 中的列

python - Pandas:根据条件去除数据帧组末尾的行

python - 将系列字典从数据帧列转换为同一数据帧中的单独列

python - 将变量值从 main.py 传递到 .kv 文件

python - [a-zA-Z] Python 正则表达式模式可以匹配和替换非 ASCII Unicode 字符吗?

python - Matplotlib Colormaps——为每个图形/线条/主题选择不同的颜色