我在一个 Excel 工作簿中有多个工作表,每个工作表中都有重复的列。我需要删除重复项并仅保留原始列。
我知道如何删除工作表中的重复项。
df_sheet_map['> Acute Hospital Bed SLM']
result2=df_sheet_map['> Acute Hospital Bed SLM'].T.drop_duplicates().T
dfList = []
path = 'J:/TestDup'
newpath = 'J:/TestDup/Test2'
for fn in os.listdir(path):
file = os.path.join(path, fn)
if os.path.isfile(file):
# Import the excel file and call it xlsx_file
xlsx_file = pd.ExcelFile(file)
# View the excel files sheet names
xlsx_file.sheet_names
# Load the xlsx files Data sheet as a dataframe
df = xlsx_file.parse('Sheet1',header= None)
df_NoHeader = df[2:]
data = df_NoHeader
# Save individual dataframe
data.to_excel(os.path.join(newpath, fn))
dfList.append(data)
appended_data = pd.concat(dfList)
appended_data.to_excel(os.path.join(newpath, 'master_data.xlsx'))
上面的代码是有效的。但是,我需要遍历所有工作表。此外,它显示要删除前两行,我需要更改为删除重复项。
最佳答案
#Transpose all sheets in a workbook. then delete duplicates. then Transpose back to original file and save all sheets
#Transpose all sheets in the workbook file
import pyexcel
import pyexcel_xlsx as pe
from pyexcel_xlsx import get_data
book = pyexcel.get_book(file_name="H:/SLM_Final/SLM Indicator template Main to clean.xlsx")
for sheet in book:
sheet.transpose()
pass
book.save_as("H:/SLM_Final/SLM Indicator template Main to clean.xlsx")
#run excel VB from python
import win32com.client as win32
import time
xl = win32.Dispatch('Excel.Application')
xl.Visible = 0
ss = xl.Workbooks.Open('H:/SLM_Final/DeleteDup.xlsm')
xl.Run("deleteDuplicate")
time.sleep(30)
xl.Quit()
time.sleep(30)
#VB syntax to add on excel workbook
'''Sub deleteDuplicate()
Dim ws As Worksheet
Dim wkbk1 As Workbook
Dim w As Long
Dim lRow As Long
Dim iCntr As Long
Set wkbk1 = Workbooks.Open("H:/SLM_Final/SLM Indicator template Main to clean.xlsx")
'Set wkbk1 = ThisWorkbook
wkbk1.Activate
With wkbk1
For w = 1 To .Worksheets.Count
With Worksheets(w)
.UsedRange.RemoveDuplicates Columns:=Array(3, 4), Header:=xlYes
End With
Next w
End With
wkbk1.Save
wkbk1.Close
End Sub''''
#
#Transpose files back to the original shape
import pyexcel
import pyexcel_xlsx as pe
from pyexcel_xlsx import get_data
book = pyexcel.get_book(file_name="H:/SLM_Final/SLM Indicator template Main to clean.xlsx")
for sheet in book:
sheet.transpose()
#sheet.delete_duplicates(keep=False, inplace=False)
pass
book.save_as("H:/SLM_Final/SLM Indicator template Main to clean.xlsx")
我希望这会有所帮助。
关于python - 如何删除一个工作簿的多个Excel工作表中的重复列?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56158355/