python - 在n级列中逐行分类excel数据

标签 python python-3.x excel pandas dataset

我在使用 excel 文件对某些列和行中的数据进行分类时遇到问题,我需要将合并单元格排列到下一列作为 1 行,并且下一列像这张图片一样放在它们旁边:
输入:
input excel example
乳制品输出:
output excel example
摘要:
首先我们采取了Dairy行,然后我们转到 Dairy 前面的第二列并获取 Dairy 前面的数据,然后我们转到第二列和 Milk to Mr. 1 前面我们得到 Butter to Mrs. 1Butter to Mrs. 2等等 ...
之后,我们要将其导出为 Excel 文件,如输出图片中所示。
我编写了一个代码,它获取第一列数据并找到它前面的所有数据,但我需要更改它以便逐行获取数据,如输出图片中所示:

import pandas
import openpyxl
import xlwt
from xlwt import Workbook

df = pandas.read_excel('excel.xlsx')

result_first_level = []

for i, item in enumerate(df[df.columns[0]].values, 2):
    if pandas.isna(item):
        result_first_level[-1]['index'] = i
    else:
        result_first_level.append(dict(name=item, index=i, levels_name=[]))

for level in df.columns[1:]:
    move_index = 0
    for i, obj in enumerate(result_first_level):
        if i == 0:
            for item in df[level].values[0:obj['index'] - 1]:
                if pandas.isna(item):
                    move_index += 1
                    continue
                else:
                    obj['levels_name'].append(item)
                move_index += 1
        else:
            for item in df[level].values[move_index:obj['index'] - 1]:
                if pandas.isna(item):
                    move_index += 1
                    continue
                else:
                    obj['levels_name'].append(item)
                move_index += 1

# Workbook is created
wb = Workbook()

# add_sheet is used to create sheet.
sheet1 = wb.add_sheet('Sheet 1')
style = xlwt.easyxf('font: bold 1')

move_index = 0
for item in result_first_level:
    for member in item['levels_name']:
        sheet1.write(move_index, 0, item['name'], style)
        sheet1.write(move_index, 1, member)
        move_index += 1

wb.save('test.xls')

下载Input File excel来自 here
感谢您的帮助!

最佳答案

首先,填写您的数据以使用最后一个有效值填充空白单元格,然后使用 pd.CategoricalDtype 创建有序集合对 product 进行排序柱子。最后,您只需成对地遍历列并重命名列以允许连接。最后一步是按 product 对行进行排序值(value)。

import pandas as pd

# Prepare your dataframe
df = pd.read_excel('input.xlsx').dropna(how='all')
df.update(df.iloc[:, :-1].ffill())
df = df.drop_duplicates()

# Get keys to sort data in the final output
cats = pd.CategoricalDtype(df.T.melt()['value'].dropna().unique(), ordered=True)

# Group pairwise values
data = []
for cols in zip(df.columns, df.columns[1:]):
    col_mapping = dict(zip(cols, ['product', 'subproduct']))
    data.append(df[list(cols)].rename(columns=col_mapping))

# Merge all data
out = pd.concat(data).drop_duplicates().dropna() \
        .astype(cats).sort_values('product').reset_index(drop=True)
输出:
>>> cats
CategoricalDtype(categories=['Dairy', 'Milk to Mr.1', 'Butter to Mrs.1',
                  'Butter to Mrs.2', 'Cheese to Miss 2 ', 'Cheese to Mr.2',
                  'Milk to Miss.1', 'Milk to Mr.5', 'yoghurt to Mr.3',
                  'Milk to Mr.6', 'Fruits', 'Apples to Mr.6',
                  'Limes to Miss 5', 'Oranges to Mr.7', 'Plumbs to Miss 5',
                  'apple for mr 2', 'Foods & Drinks', 'Chips to Mr1',
                  'Jam to Mr 2.', 'Coca to Mr 5', 'Cookies to Mr1.',
                  'Coca to Mr 7', 'Coca to Mr 6', 'Juice to Miss 1',
                  'Jam to Mr 3.', 'Ice cream to Miss 3.', 'Honey to Mr 5',
                  'Cake to Mrs. 2', 'Honey to Miss 2',
                  'Chewing gum to Miss 7.'], ordered=True)

>>> out
             product              subproduct
0              Dairy            Milk to Mr.1
1              Dairy          Cheese to Mr.2
2       Milk to Mr.1         Butter to Mrs.1
3       Milk to Mr.1         Butter to Mrs.2
4    Butter to Mrs.2       Cheese to Miss 2 
5     Cheese to Mr.2          Milk to Miss.1
6     Cheese to Mr.2         yoghurt to Mr.3
7     Milk to Miss.1            Milk to Mr.5
8    yoghurt to Mr.3            Milk to Mr.6
9             Fruits          Apples to Mr.6
10            Fruits         Oranges to Mr.7
11    Apples to Mr.6         Limes to Miss 5
12   Oranges to Mr.7        Plumbs to Miss 5
13  Plumbs to Miss 5          apple for mr 2
14    Foods & Drinks            Chips to Mr1
15    Foods & Drinks         Juice to Miss 1
16    Foods & Drinks          Cake to Mrs. 2
17      Chips to Mr1            Jam to Mr 2.
18      Chips to Mr1         Cookies to Mr1.
19      Jam to Mr 2.            Coca to Mr 5
20   Cookies to Mr1.            Coca to Mr 6
21   Cookies to Mr1.            Coca to Mr 7
22   Juice to Miss 1           Honey to Mr 5
23   Juice to Miss 1            Jam to Mr 3.
24      Jam to Mr 3.    Ice cream to Miss 3.
25    Cake to Mrs. 2  Chewing gum to Miss 7.
26    Cake to Mrs. 2         Honey to Miss 2

关于python - 在n级列中逐行分类excel数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69958563/

相关文章:

excel - 列没有不同值时的运行时错误 13

vba - 为什么 SumProduct 不能在 Excel VBA 中使用

python - Cmd 和 Git bash 在运行 Python 代码时有不同的结果

python - val_loss 减半,但 val_acc 保持不变

python - Numpy,计算二维数组中的唯一邻居

python - 与 Python 类属性一起使用的文档字符串?

python如何在模块中注册动态类

python - request.get(url) 以纪元格式返回日期 有没有办法在使用 request.get 方法时将纪元日期转换为日期时间格式?

python-3.x - spy :Error ocurred while starting the kernel

excel - VBA宏在新列中连接2列