python - 如何删除一个工作簿的多个Excel工作表中的重复列?

标签 python excel python-3.x

我在一个 Excel 工作簿中有多个工作表,每个工作表中都有重复的列。我需要删除重复项并仅保留原始列。

我知道如何删除工作表中的重复项。

df_sheet_map['> Acute Hospital Bed SLM']
result2=df_sheet_map['> Acute Hospital Bed SLM'].T.drop_duplicates().T

dfList = []
path = 'J:/TestDup' 
newpath = 'J:/TestDup/Test2'

for fn in os.listdir(path):
    file = os.path.join(path, fn)
    if os.path.isfile(file): 
        # Import the excel file and call it xlsx_file 
        xlsx_file = pd.ExcelFile(file) 
        # View the excel files sheet names 
        xlsx_file.sheet_names 
        # Load the xlsx files Data sheet as a dataframe 
        df = xlsx_file.parse('Sheet1',header= None) 
        df_NoHeader = df[2:] 
        data = df_NoHeader 
        # Save individual dataframe
        data.to_excel(os.path.join(newpath, fn))

        dfList.append(data) 

appended_data = pd.concat(dfList)
appended_data.to_excel(os.path.join(newpath, 'master_data.xlsx'))

上面的代码是有效的。但是,我需要遍历所有工作表。此外,它显示要删除前两行,我需要更改为删除重复项。

最佳答案

#Transpose all sheets in a workbook.  then delete duplicates. then Transpose back to original file and save all sheets
#Transpose all sheets in the workbook file

    import pyexcel
    import pyexcel_xlsx as pe
    from pyexcel_xlsx import get_data

    book = pyexcel.get_book(file_name="H:/SLM_Final/SLM Indicator template Main to clean.xlsx")
    for sheet in book:
        sheet.transpose()
        pass
    book.save_as("H:/SLM_Final/SLM Indicator template Main to clean.xlsx")

#run excel VB from python

    import win32com.client as win32
    import time
    xl = win32.Dispatch('Excel.Application')
    xl.Visible = 0
    ss = xl.Workbooks.Open('H:/SLM_Final/DeleteDup.xlsm')
    xl.Run("deleteDuplicate") 
    time.sleep(30)
    xl.Quit() 
    time.sleep(30)

#VB syntax to add on excel workbook
'''Sub deleteDuplicate()

    Dim ws As Worksheet
    Dim wkbk1 As Workbook
    Dim w As Long
    Dim lRow As Long
    Dim iCntr As Long
    Set wkbk1 = Workbooks.Open("H:/SLM_Final/SLM Indicator template Main to clean.xlsx")
    'Set wkbk1 = ThisWorkbook

    wkbk1.Activate

    With wkbk1

        For w = 1 To .Worksheets.Count

            With Worksheets(w)

                .UsedRange.RemoveDuplicates Columns:=Array(3, 4), Header:=xlYes

            End With

        Next w

    End With
wkbk1.Save
wkbk1.Close
End Sub''''

#
#Transpose files back to the original shape

    import pyexcel
    import pyexcel_xlsx as pe
    from pyexcel_xlsx import get_data

    book = pyexcel.get_book(file_name="H:/SLM_Final/SLM Indicator template Main to clean.xlsx")
    for sheet in book:
        sheet.transpose()
        #sheet.delete_duplicates(keep=False, inplace=False)
        pass
    book.save_as("H:/SLM_Final/SLM Indicator template Main to clean.xlsx")

我希望这会有所帮助。

关于python - 如何删除一个工作簿的多个Excel工作表中的重复列?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56158355/

相关文章:

python - Django-1.5.1 仍然支持 mod_python-3.3.1 吗?

python - 如何在不丢失有效数字的情况下转换 str 中的 float ?

python - 翻译特殊字符 ½

python - 循环并写入 CSV 文件?

python - 高效地将行添加到 pandas DataFrame

vba - Excel VBA - 从一张纸复制到另一张纸> 仅复制一行

excel - 根据另一个值和范围查找值

excel - 在Excel公式中添加两个条件

python-3.x - 仅对DatetimeIndex,TimedeltaIndex或PeriodIndex有效,但获得了 'Int64Index'的实例

python - 用其描述或名称替换表情符号