python - 提取括号中的单词并将提取的单词存储到集合中

标签 python python-3.x

这是我当前的代码:

folder_path1 = os.chdir("C:/Users/xx/Documents/xxx/Test python dict")

words= set()
extracted = set()
for file in os.listdir(folder_path1):
   if file.endswith(".xlsx"):
      wb = load_workbook(file, data_only=True)
      ws = wb.active
      words.add(str(ws['A1'].value))

      wordsextract = re.match(r"(.*)\((.*)\)", str(words))

      extracted.add(str(wordsextract))
      print(extracted)

我不确定如何只提取括号内的单词。因此,我认为可以重新匹配括号以提取括号内的单词。但它不起作用。这里有人有这方面的知识吗?提前致谢

最佳答案

将整列读入一个集合,从每个单元格值中提取单词:

Excel 源:

excel file content image

程序:

from openpyxl import load_workbook
import re
import os

folder_path1 = os.chdir("C:/temp/")

words= set()
extracted = set()
for file in os.listdir(folder_path1):
    if file.endswith("m1.xlsx"):
        wb = load_workbook(file, data_only=True)
        ws = wb.active
        # this is A1 though to A5 - yours is only one cell though, you can change the 
        # min/max to include more columns or rows

        # a set makes no sense here - you read only one cell anyhow, so anything in
        # it is your single possible value string
        # wb.iter_cols(min_col, max_col, min_row, max_row, values_only)
        content = set( *ws.iter_cols(1,1,1,5,True)) - {None} # remove empty cells

        # non-greedy capturing of things in parenthesis
        words = re.findall(r"\((.+?)\)", ' '.join(content), re.DOTALL)
        print(words)

输出:

['problem', 'span \nlines', 'some'] # out of order due to set usage

对拆分做同样的事情:

# same content as above
for cellvalue in content:
    if set("()").intersection(cellvalue) == {"(",")"}:
        print(cellvalue.split("(")[-1].split(")")[0])

HTH

文档:

关于python - 提取括号中的单词并将提取的单词存储到集合中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57379182/

相关文章:

python - 是否有一个 Python 持久数据存储具有与 dict 相同的功能(或者如何哄骗 "Shelve"来获得它)?

python - 为自定义 QWidget 设置背景颜色

python-3.x - 如何重用 df.groupby() 的结果。意思是在另一个数据框中填充NaN?

python - 使用 Python 客户端在 Kubernetes 中批准 CSR

Python:将 (4,1,2) 数组与 (4,1) 相乘得到 (4,1,2) 数组

python - WebElement.find_element_by_class_name 是否仅在直接子级中搜索 [WebDriver/Python]

python - 比例变异性代码

python - 右对齐网格行

python - Python 3.x 中的 "Local variable ' name_variable ' value is not used"错误

python - 使用 tkinter Text 小部件显示结果