python - 需要洞察力 : Using python I am using a regular expression to capture sample restaurant sales data to categorize and convert it to JSON from a . pdf

我使用的正则表达式是^\s*(\d+)\s*(([A-Za-z]+\s*)+)?(\d+)\s+(.+ ?)\s+(\d+.\d+)\s+(\d+.\d+)\s+(\d+.\d+)\s+(\d+.\d+)\s+(\d+.\d+)\s+(\d+.\d+)\s+(\d+.\d+)\s+(\d+.\d+)\s+(\d+.\d+)$

当解析并分类以下示例数据字符串时“1 NA BEVERAGE 1100 ICED TEA 14.00 3.00 42.00 3.50 0.00 42.00 0.00 0.52 47.09”

输出不正确:当您在将分类数据转换为 JSON 之前查看分类数据时，您会看到 'item_category': 'NA BEVERAGE ', 'item_number': 'BEVERAGE ' 它应该是 'item_category': 'NA BEVERAGE '、'item_number':'1100' 等等。

我期望:

{'item_rank': '1', 'item_category': 'NA BEVERAGE', 'item_number': '饮料', 'item_name': '1100', 'number_sold': '冰茶', ' Price_sold': '14.00', '金额': '3.00', '税': '42.00', '成本': '3.50', '利润': '0.00', 'food_cost': '42.00', 'precent_sales' : '0.00', 'cat_sales': '0.52'}

我多次尝试修复正则表达式，但没有成功。感谢对错误的解释。

以下是您可以在自己的计算机上复制并运行的 python 脚本的逻辑:

import re
import json

page_text_str = "   1 NA BEVERAGE 1100 ICED TEA 14.00 3.00 42.00 3.50 0.00 42.00 0.00 0.52 47.09"

sale_line_re = re.compile('^\s*(\d+)\s*(([A-Za-z]+\s*)+)?(\d+)\s+(.+?)\s+(\d+.\d+)\s+(\d+.\d+)\s+(\d+.\d+)\s+(\d+.\d+)\s+(\d+.\d+)\s+(\d+.\d+)\s+(\d+.\d+)\s+(\d+.\d+)\s+(\d+.\d+)$')
grouped_data = []

for line in page_text_str.split('\n'):
    print(line)   
    match = sale_line_re.match(line)
    if match:
        groups = match.groups()
        item = {
            "item_rank": groups[0],
            "item_category": groups[1],
            "item_number": groups[2],
            "item_name": groups[3],
            "number_sold": groups[4],
            "price_sold": groups[5],
            "amount": groups[6],
            "tax": groups[7],
            "cost": groups[8],
            "profit": groups[9],
            "food_cost": groups[10],
            "precent_sales": groups[11],
            "cat_sales": groups[12]
        }
        grouped_data.append(item)


for sale in grouped_data:
    print(sale)

最佳答案

与其构建正则表达式来描述所有数字等，不如使用 re.split 更容易。通过数字之间的空格进行功能，同时忽略单词之间的空格。该函数返回一个列表，然后您可以对其进行迭代以构建 JSON。

(?<=\d)\s|\s(?=\d)

(?<=\d) , lookbehind:数字后面的所有内容
(?=\d) , lookahead:数字之前的所有内容
\s|\s - 匹配数字之前或之后的任何空格。

regex101.com

关于python - 需要洞察力 : Using python I am using a regular expression to capture sample restaurant sales data to categorize and convert it to JSON from a . pdf，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/74905856/

python - 需要洞察力 : Using python I am using a regular expression to capture sample restaurant sales data to categorize and convert it to JSON from a . pdf

上一篇：python - 如何根据另一列填充 nan 值

下一篇：redis - Node Redis 无法在 zadd 中工作，出现 TypeError : Cannot read properties of undefined (reading 'toString' )