我正在尝试处理一个文件,我需要删除文件中的无关信息;值得注意的是,我正在尝试删除括号 []
包括括号内和括号之间的文字 []
[]
块,说这些块之间的所有内容包括它们本身,但打印其外的所有内容。
下面是我的带有数据示例的文本文件:
$ cat smb
Hi this is my config file.
Please dont delete it
[homes]
browseable = No
comment = Your Home
create mode = 0640
csc policy = disable
directory mask = 0750
public = No
writeable = Yes
[proj]
browseable = Yes
comment = Project directories
csc policy = disable
path = /proj
public = No
writeable = Yes
[]
This last second line.
End of the line.
期望输出:
Hi this is my config file.
Please dont delete it
This last second line.
End of the line.
根据我的理解和重新搜索,我尝试了什么:
$ cat test.py
with open("smb", "r") as file:
for line in file:
start = line.find( '[' )
end = line.find( ']' )
if start != -1 and end != -1:
result = line[start+1:end]
print(result)
输出:
$ ./test.py
homes
proj
最佳答案
用一个正则表达式
import re
with open("smb", "r") as f:
txt = f.read()
txt = re.sub(r'(\n\[)(.*?)(\[]\n)', '', txt, flags=re.DOTALL)
print(txt)
正则解释:
(\n\[)
找到一个序列,其中有一个换行符后跟一个 [(\[]\n)
找到一个序列,其中有 [] 后跟一个换行符(.*?)
删除 (\n\[)
中间的所有内容和 (\[]\n)
re.DOTALL
用于防止不必要的回溯!!! Pandas 更新!!!
可以用pandas进行相同逻辑的相同解决方案
import re
import pandas as pd
# read each line in the file (one raw -> one line)
txt = pd.read_csv('smb', sep = '\n', header=None)
# join all the line in the file separating them with '\n'
txt = '\n'.join(txt[0].to_list())
# apply the regex to clean the text (the same as above)
txt = re.sub(r'(\n\[)(.*?)(\[]\n)', '\n', txt, flags=re.DOTALL)
print(txt)
关于Python删除方括号和它们之间的无关信息,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61638496/