我最近刚刚开始使用 Python 编码,还有很多东西需要学习。我的代码的目标是从单元格中提取字符串,检查其字符长度并用特定缩写替换单词。然后,我将新字符串写入另一个 Excel 工作表中,并在所有数据减少后保存。我终于弄清楚如何让它发挥作用,但这确实需要很长时间。我正在处理 10,000 多个字符串单元,并且我的循环迭代可能远未优化。如果您有任何有帮助的信息那就太好了。
import xlwt
import xlrd
book = xlrd.open_workbook() # opens excel file for data input
reduc = xlwt.Workbook() # creates the workbook that the reduced data will be saved in
# Calls the sheets I will be working with
Data = book.sheet_by_index(3)
Table = book.sheet_by_index(5)
sheet1 = reduc.add_sheet("sheet 1")
# the initial loop pulls the string from excel
for x in xrange(30): # I use a limited range for debugging
From = str(Data.col(15)[x].value)
To = str(Data.col(16)[x].value)
print x # I just print this to let me know that i'm not stuck
if len(From) <= 30 and len(To) <= 30:
sheet1.write(x, 3, From)
sheet1.write(x, 4, To)
else:
while len(From) > 30 or len(To) > 30:
for y in xrange(Table.nrows):
word = str(Table.col(0)[y].value)
abbrv = str(Table.col(1)[y].value)
if len(From) > 30:
From = From.replace(word, abbrv)
if len (To) > 30:
To = To.replace(word, abbrv)
sheet1.write(x, 3, From)
sheet1.write(x, 4, To)
break
reduc.save("newdoc.xls")
print " DONE!
下面是我更新的代码。这几乎是即时的,这正是我所期望的。我预加载了我想要的所有列,然后通过相同的循环系统运行它。然后我将数据存储而不是写入到新的 Excel 文件中。减少所有数据后,我将每个单元格保存在单独的 for 循环中。谢谢你们的建议。
import xlwt
import xlrd
# Workbook must be located in the Python27 folder in the C:/directory
book = xlrd.open_workbook() # opens exel file for data input
# Calls the sheets I will be working with
Data = book.sheet_by_index(0)
Table = book.sheet_by_index(1)
# Import column data from excel
From = Data.col_values(15)
To = Data.col_values(16)
word = Table.col_values(0)
abbrv = Table.col_values(1)
# Empty variables to be filled with reduced string
From_r = []
To_r = []
# Notes to be added
for x in xrange(Data.nrows):
if len(From[x]) <= 28 and len(To[x]) <= 28:
From_r.append(From[x])
To_r.append(To[x])
else:
while len(From[x]) > 28 or len(To[x]) > 28:
for y in xrange(Table.nrows):
if len(From[x]) > 28:
From[x] = From[x].replace(word[y], abbrv[y])
if len (To[x]) > 28:
To[x] = To[x].replace(word[y], abbrv[y])
From_r.append(From[x])
To_r.append(To[x])
break
# Create new excel file to write reduced strings into
reduc = xlwt.Workbook()
sheet1 = reduc.add_sheet("sheet 1")
# Itterate through list to write each object into excel
for i in xrange(Data.nrows):
sheet1.write(i, 3, From_r[i])
sheet1.write(i, 4, To_r[i])
# Save reduced string in new excel file
reduc.save("lucky.xls")
print " DONE! "
最佳答案
速度缓慢可能是由于替换代码效率低下造成的。 您应该尝试加载所有单词和相应的缩写,除非列表太大,您会耗尽内存。 然后,为了加快速度,您可以一次性替换所有单词。
执行此操作并将其移出循环
words = [str(cell.value) for cell in Table.col(0)] #list comprehension
abbr = [str(cell.value) for cell in Table.col(1)]
replacements = zip(words, abbr)
此功能来自here使用正则表达式模块替换给定列表中的所有匹配项。
import re
def multiple_replacer(*key_values):
replace_dict = dict(key_values)
replacement_function = lambda match: replace_dict[match.group(0)]
pattern = re.compile("|".join([re.escape(k) for k, v in key_values]))
return lambda string: pattern.sub(replacement_function, string)
要使用它,请执行以下操作:
replaceFunc = multiple_replacer(*replacements) #constructs the function. Do this outside the loop, after the replacements have been gathered.
myString = replaceFunc(myString)
关于python - 使用 xlrd/xlwt 和循环迭代优化 Excel 数据收集/缩减,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20461241/