python - 如何在Python中对多个术语使用正向和负向前瞻？

我有一个如下所示的数据框

df = pd.DataFrame({'person_id': [11,11,11,11,11,11,11,11,11,11],
                   'text':['inJECTable 1234 Eprex DOSE 4000 units on NONd',
                           'department 6789 DOSE 8000 units on DIALYSIS days  -  IV Interm',
                           'inJECTable 4321 Eprex DOSE - 3 times/wk on NONdialysis day',
                           'insulin MixTARD  30/70 - inJECTable 46 units',
                           'insulin ISOPHANE -- InsulaTARD  Vial -  inJECTable 56 units  SC SubCutaneous',
                           '1-alfacalcidol DOSE 1 mcg  - 3 times a week  -  IV Intermittent',
                           'jevity liquid - FEEDS PO  Jevity  -  237 mL  -  1 times per day',
                           '1-alfacalcidol DOSE 1 mcg  - 3 times per week  -  IV Intermittent',
                           '1-supported DOSE 1 mcg  - 1 time/day  -  IV Intermittent',
                           '1-testpackage DOSE 1 mcg  - 1 time a day  -  IV Intermittent']})

我想删除遵循以下模式的单词/字符串:46 单位、每周 3 次、每周 3 次 ，1次/天等

我正在阅读有关积极和消极的 future 展望。

所以，正在尝试像下面这样的事情

[^([0-9\s]*(?=units))]  #to remove terms like `46 units` from the string
[^[0-9\s]*(?=times)(times a day)] # don't know how to make this work for all time variants

时间变量例如:每天 3 次、每周 3 次、每天 3 次、每月 3 次、3 次/月 等

基本上，我希望我的输出如下所示(删除诸如 xx 单位、每天 xx 时间、每周 xx 次、xx 时间/天、xx 时间/周、xx 时间/周、xx 次每周等术语等)

最佳答案

您可以考虑这样的模式

\s*\d+\s*(?:units?|times?(?:\s+(?:a|per)\s+|\s*/\s*)(?:d(?:ay)?|w(?:ee)?k|month|y(?:ea)?r?))

请参阅regex demo

注意:\d+ 匹配一个或多个数字。如果您需要匹配任何数字，请考虑对您期望的格式的数字使用其他模式，请参阅 regular expression for finding decimal/float numbers? ，例如。

图案详细信息

\s* - 零个或多个空白字符
\d+ - 一位或多位数字
\s* - 零个或多个空格
(?:单位?|次?(?:\s+(?:a|per)\s+|\s*/\s*)(?:d(?:ay)?|w( ?:ee)?k|month|y(?:ea)?r?)) - 非捕获组匹配:
- 单位？ - 单位或单位
- | - 或
- 次？ - 时间 或 次
- (?:\s+(?:a|per)\s+|\s*/\s*) - a 或 per 用 1 个以上空格括起来，或 / 用 0 个以上空格括起来
- (?:d(?:ay)?|w(?:ee)?k|月|y(?:ea)?r?) - d 或天，或周或周，或月，或y/是/年

如果您只需要匹配整个单词，请使用单词边界，\b:

\s*\b\d+\s*(?:units?|times?(?:\s+(?:a|per)\s+|\s*/\s*)(?:d(?:ay)?|w(?:ee)?k|month|y(?:ea)?r?))\b

在 Pandas 中，使用

df['text'] = df['text'].str.replace(r'\s*\b\d+\s*(?:units?|times?(?:\s+(?:a|per)\s+|\s*/\s*)(?:d(?:ay)?|w(?:ee)?k|month|y(?:ea)?r?))\b', '')

关于python - 如何在Python中对多个术语使用正向和负向前瞻？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/64157077/

python - 如何在Python中对多个术语使用正向和负向前瞻？

上一篇：javascript - HTML 自定义属性未显示？

下一篇：javascript - 如何在一行/命令中声明具有属性的 JavaScript 函数？