python - 用于删除python注释的正则表达式

这个问题在这里已经有了答案:

Script to remove Python comments/docstrings

(6 个回答)

去年关闭。

我想删除 python 文件中的所有注释。
像这样的文件:
--------------- comment.py ---------------

# this is comment line.
age = 18  # comment in line
msg1 = "I'm #1."  # comment. there's a # in code.
msg2 = 'you are #2. ' + 'He is #3'  # strange sign ' # ' in comment. 
print('Waiting your answer')

我写了很多正则表达式来提取所有评论，有些是这样的:

(?(?<=['"])(?<=['"])\s*#.*$|\s*#.*$)
get:  #1."  # comment. there's a # in code.

(?<=('|")[^\1]*\1)\s*#.*$|\s*#.*$
wrong. it's not 0-width in lookaround (?<=..)

但它不起作用。什么是正确的正则表达式？
请问你能帮帮我吗？

最佳答案

您可以尝试使用 tokenize而不是 regex ，正如@OlvinRoght 所说，在这种情况下，使用正则表达式解析代码可能是个坏主意。如您所见 here ，你可以尝试这样的事情来检测评论:

import tokenize
fileObj = open('yourpath\comment.py', 'r')
for toktype, tok, start, end, line in tokenize.generate_tokens(fileObj.readline):
    # we can also use token.tok_name[toktype] instead of 'COMMENT'
    # from the token module 
    if toktype == tokenize.COMMENT:
        print('COMMENT' + " " + tok)

输出:

COMMENT # -*- coding: utf-8 -*-
COMMENT # this is comment line.
COMMENT # comment in line
COMMENT # comment. there's a # in code.
COMMENT # strange sign ' # ' in comment.

然后，为了得到预期的结果，即没有注释的 python 文件，你可以试试这个:

nocomments=[]
for toktype, tok, start, end, line in tokenize.generate_tokens(fileObj.readline):
    if toktype != tokenize.COMMENT:
        nocomments.append(tok)

print(' '.join(nocomments))

输出:

 age = 18 
 msg1 = "I'm #1." 
 msg2 = 'you are #2. ' + 'He is #3' 
 print ( 'Waiting your answer' )

关于python - 用于删除python注释的正则表达式，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/62316306/

python - 用于删除python注释的正则表达式

上一篇：python - 再现全连接顺序层

下一篇：gcc - 当 Address Sanitizer 说该平台不支持 detect_leaks 时，我应该解决什么问题？