这个问题在这里已经有了答案:
Script to remove Python comments/docstrings
(6 个回答)
去年关闭。
我想删除 python 文件中的所有注释。
像这样的文件:
--------------- comment.py ---------------
# this is comment line.
age = 18 # comment in line
msg1 = "I'm #1." # comment. there's a # in code.
msg2 = 'you are #2. ' + 'He is #3' # strange sign ' # ' in comment.
print('Waiting your answer')
我写了很多正则表达式来提取所有评论,有些是这样的:
(?(?<=['"])(?<=['"])\s*#.*$|\s*#.*$)
get: #1." # comment. there's a # in code.
(?<=('|")[^\1]*\1)\s*#.*$|\s*#.*$
wrong. it's not 0-width in lookaround (?<=..)
但它不起作用。什么是正确的正则表达式?
请问你能帮帮我吗?
最佳答案
您可以尝试使用 tokenize
而不是 regex
,正如@OlvinRoght 所说,在这种情况下,使用正则表达式解析代码可能是个坏主意。如您所见 here ,你可以尝试这样的事情来检测评论:
import tokenize
fileObj = open('yourpath\comment.py', 'r')
for toktype, tok, start, end, line in tokenize.generate_tokens(fileObj.readline):
# we can also use token.tok_name[toktype] instead of 'COMMENT'
# from the token module
if toktype == tokenize.COMMENT:
print('COMMENT' + " " + tok)
输出:
COMMENT # -*- coding: utf-8 -*-
COMMENT # this is comment line.
COMMENT # comment in line
COMMENT # comment. there's a # in code.
COMMENT # strange sign ' # ' in comment.
然后,为了得到预期的结果,即没有注释的 python 文件,你可以试试这个:
nocomments=[]
for toktype, tok, start, end, line in tokenize.generate_tokens(fileObj.readline):
if toktype != tokenize.COMMENT:
nocomments.append(tok)
print(' '.join(nocomments))
输出:
age = 18
msg1 = "I'm #1."
msg2 = 'you are #2. ' + 'He is #3'
print ( 'Waiting your answer' )
关于python - 用于删除python注释的正则表达式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62316306/