python - 使用 xlrd/xlwt 和循环迭代优化 Excel 数据收集/缩减

我最近刚刚开始使用 Python 编码，还有很多东西需要学习。我的代码的目标是从单元格中提取字符串，检查其字符长度并用特定缩写替换单词。然后，我将新字符串写入另一个 Excel 工作表中，并在所有数据减少后保存。我终于弄清楚如何让它发挥作用，但这确实需要很长时间。我正在处理 10,000 多个字符串单元，并且我的循环迭代可能远未优化。如果您有任何有帮助的信息那就太好了。

import xlwt
import xlrd

book = xlrd.open_workbook() # opens excel file for data input
reduc = xlwt.Workbook()     # creates the workbook that the reduced data will be saved in

# Calls the sheets I will be working with
Data = book.sheet_by_index(3)
Table = book.sheet_by_index(5)
sheet1 = reduc.add_sheet("sheet 1")

# the initial loop pulls the string from excel

for x in xrange(30): # I use a limited range for debugging
    From = str(Data.col(15)[x].value)
    To = str(Data.col(16)[x].value)
    print x # I just print this to let me know that i'm not stuck

    if len(From) <= 30 and len(To) <= 30:
        sheet1.write(x, 3, From)
        sheet1.write(x, 4, To)
    else:
        while len(From) > 30 or len(To) > 30:
            for y in xrange(Table.nrows):
                word = str(Table.col(0)[y].value)
                abbrv = str(Table.col(1)[y].value)
                if len(From) > 30:
                    From = From.replace(word, abbrv)
                if len (To) > 30:
                    To = To.replace(word, abbrv)
            sheet1.write(x, 3, From)
            sheet1.write(x, 4, To)
            break

reduc.save("newdoc.xls")
print " DONE!

下面是我更新的代码。这几乎是即时的，这正是我所期望的。我预加载了我想要的所有列，然后通过相同的循环系统运行它。然后我将数据存储而不是写入到新的 Excel 文件中。减少所有数据后，我将每个单元格保存在单独的 for 循环中。谢谢你们的建议。

import xlwt
import xlrd

# Workbook must be located in the Python27 folder in the C:/directory
book = xlrd.open_workbook() # opens exel file for data input

# Calls the sheets I will be working with
Data = book.sheet_by_index(0)
Table = book.sheet_by_index(1)

# Import column data from excel
From = Data.col_values(15)
To = Data.col_values(16)
word = Table.col_values(0)
abbrv = Table.col_values(1)

# Empty variables to be filled with reduced string
From_r = []
To_r = []

# Notes to be added 
for x in xrange(Data.nrows):
    if len(From[x]) <= 28 and len(To[x]) <= 28:
        From_r.append(From[x])
        To_r.append(To[x])
    else:
        while len(From[x]) > 28 or len(To[x]) > 28:
            for y in xrange(Table.nrows):
                if len(From[x]) > 28:
                    From[x] = From[x].replace(word[y], abbrv[y])
                if len (To[x]) > 28:
                    To[x] = To[x].replace(word[y], abbrv[y])
            From_r.append(From[x])
            To_r.append(To[x])
            break

# Create new excel file to write reduced strings into
reduc = xlwt.Workbook()
sheet1 = reduc.add_sheet("sheet 1")

# Itterate through list to write each object into excel
for i in xrange(Data.nrows):
    sheet1.write(i, 3, From_r[i])
    sheet1.write(i, 4, To_r[i])

# Save reduced string in new excel file
reduc.save("lucky.xls")
print " DONE! "

最佳答案

速度缓慢可能是由于替换代码效率低下造成的。您应该尝试加载所有单词和相应的缩写，除非列表太大，您会耗尽内存。然后，为了加快速度，您可以一次性替换所有单词。

执行此操作并将其移出循环

words = [str(cell.value) for cell in Table.col(0)] #list comprehension
abbr = [str(cell.value) for cell in Table.col(1)]
replacements = zip(words, abbr)

此功能来自here使用正则表达式模块替换给定列表中的所有匹配项。

import re
def multiple_replacer(*key_values):
    replace_dict = dict(key_values)
    replacement_function = lambda match: replace_dict[match.group(0)]
    pattern = re.compile("|".join([re.escape(k) for k, v in key_values]))
    return lambda string: pattern.sub(replacement_function, string)

要使用它，请执行以下操作:

replaceFunc = multiple_replacer(*replacements) #constructs the function. Do this outside the loop, after the replacements have been gathered.
myString = replaceFunc(myString)

关于python - 使用 xlrd/xlwt 和循环迭代优化 Excel 数据收集/缩减，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/20461241/

python - 使用 xlrd/xlwt 和循环迭代优化 Excel 数据收集/缩减

上一篇：python - 为 scipy 0.13 build 2 导入 scipy.stats 时遇到问题

下一篇：python - Python 中的 Unicode 和(希腊语)变音符号