我正在使用 imdbpie 模块来查询 Imdb 并获取电影标题。
from imdbpie import Imdb
imdb = Imdb()
film = str(imdb.search_for_title('some_title'))
tit = re.sub(r'[^\w]|year|title|imdb_id|tt[0-9]{7}', ' ', film)
print( tit )
我去掉了不需要的模式并得到了输出:
2015 The Merchant Gaekju 2015 2015 Murderers Mobsters Madmen Vol 2 Assassination in the 20th Century 1993 The Manzai 2015 Pre Masters 2015 The 2015 World Series 2015 2015 Foster Farms Bowl 2015 2015 Nephilim Monsters Giants Conference 2015 2015 The Disaster Diaries 2015 L Agenda Des Cataclysmes 2015 The Lobster 2015 Brooklyn Lobster 2005 The Oscar Nominated Short Films 2015 Animation 2015 The Oscar Nominated Short Films 2015 Live Action 2015 La langosta azul 1954 The Fresh Lobster 1948 The Lobster 2013 The Oscar Nominated Short Films 2015 Documentary 2015 A Visit to a Crab and Lobster Factory 1913 BBC Election Debate 2015 The Reaction 2015 Easter Bowl 2011 Beneath the Surface 2011 The Lonesome Lobster 2010
字符串是一行,包含随机变量“年份”和“电影标题”。 我想像这样格式化这个输出:
2015 商人盖克州 2015
2015 杀人狂魔第二卷 20 世纪暗杀
2015 2015 世界职业棒球大赛
2015 年福斯特农场碗
2015 年拿非利怪物巨人 session
2015 年灾难日记
2015 年 L 灾难议程
...
...
...
我稍微更改了代码,并在输出字符串中添加了一个新行字符,基本上得到了我需要的内容,但也许还有其他更优雅的方法来做到这一点。
tit = re.sub(r'[^\w'+rlist+']|year|title|imdb_id|tt[0-9]{7}', ' ', film)
ntit = re.sub(r'}', '\n', tits)
f = open('titles.txt', 'wt')
print( ntit, file=f )
f.close()
$猫标题.txt
国家彩票之星 2015 2015
商人 Gaekju 2015 2015
杀人匪徒狂人第二卷 20 世纪的暗杀 1993
万岁 2015 大师赛 2015
2015 世界职业棒球大赛 2015
福斯特农场碗 2015
最佳答案
虽然很乱,但还是有规律的。是年份,后面是 21 个空格,后面是标题,后面是 9 个空格,然后又从头开始。证明:
>>> import re
>>> map(len, re.findall(r'\s{4,}', s))
[21, 9, 21, 9, 21, 9, 21, 9, 21, 9, 21, 9, 21, 9, 21, 9, 21, 9, 21, 9, 21, 9, 21, 9, 21, 9, 21, 9, 21, 9, 21, 9, 21, 9, 21, 9, 21, 9]
但是,依赖这些确切的数字并不明智。假设有大间隙和中间隙交替出现,然后像这样捕获它们:
>>> from pprint import *
>>> pprint(re.findall(r'(.+?)\s{15,}(.+?)\s{5,}', s))
[('2015', 'The Merchant Gaekju 2015'),
('2015',
'Murderers Mobsters Madmen Vol 2 Assassination in the 20th Century'),
('1993', 'The Manzai 2015 Pre Masters'),
('2015', 'The 2015 World Series'),
('2015', '2015 Foster Farms Bowl'),
('2015', '2015 Nephilim Monsters Giants Conference'),
('2015', '2015 The Disaster Diaries 2015 L Agenda Des Cataclysmes'),
('2015', 'The Lobster'),
('2015', 'Brooklyn Lobster'),
('2005', 'The Oscar Nominated Short Films 2015 Animation'),
('2015', 'The Oscar Nominated Short Films 2015 Live Action'),
('2015', 'La langosta azul'),
('1954', 'The Fresh Lobster'),
('1948', 'The Lobster'),
('2013', 'The Oscar Nominated Short Films 2015 Documentary'),
('2015', 'A Visit to a Crab and Lobster Factory'),
('1913', 'BBC Election Debate 2015 The Reaction'),
('2015', 'Easter Bowl 2011 Beneath the Surface'),
('2011', 'The Lonesome Lobster')]
pprint
仅用于此处的输出格式。 Here是正则表达式的解释。
关于Python - 如何格式化长输出字符串?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36608149/