给定 Actor 列表,他们的角色名称放在括号中,用分号(;)或逗号(,)分隔:
Shelley Winters [Ruby]; Millicent Martin [Siddie]; Julia Foster [Gilda];
Jane Asher [Annie]; Shirley Ann Field [Carla]; Vivien Merchant [Lily];
Eleanor Bron [Woman Doctor], Denholm Elliott [Mr. Smith; abortionist];
Alfie Bass [Harry]
我如何将其解析为 [( Actor ,角色),...] 形式的两种类型列表
--> [('Shelley Winters', 'Ruby'), ('Millicent Martin', 'Siddie'),
('Denholm Elliott', 'Mr. Smith; abortionist')]
我原来有:
actors = [item.strip().rstrip(']') for item in re.split('\[|,|;',data['actors'])]
data['actors'] = [(actors[i], actors[i + 1]) for i in range(0, len(actors), 2)]
但这不太有效,因为它还会将括号内的项目分开。
最佳答案
你可以选择类似的东西:
>>> re.findall(r'(\w[\w\s\.]+?)\s*\[([\w\s;\.,]+)\][,;\s$]*', s)
[('Shelley Winters', 'Ruby'),
('Millicent Martin', 'Siddie'),
('Julia Foster', 'Gilda'),
('Jane Asher', 'Annie'),
('Shirley Ann Field', 'Carla'),
('Vivien Merchant', 'Lily'),
('Eleanor Bron', 'Woman Doctor'),
('Denholm Elliott', 'Mr. Smith; abortionist'),
('Alfie Bass', 'Harry')]
还可以使用 .*?
来简化一些事情:
re.findall(r'(\w.*?)\s*\[(.*?)\][,;\s$]*', s)
关于python - 正则表达式帮助将列表拆分为二元组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14904099/