python - 从列表中的 BeautifulSoup 输出中分割字符串

我的代码有以下输出

代码:text = soup.get_text()

输出:

Article Title

    Some text: Text blurb.

More blurb.

Even more blurb. 

Some more blurb. 





Second Article Title

Some text: Text blurb.

More blurb.

Even more blurb. 

Some more blurb.

接下来，当我执行 test = text.splitlines() 时，输出更改为

u'Article Title', u'', u'Some text',u'Text blurb',u'More blurb',u'Even more blurb',u'Some more blurb',, u'', u'', u'', u'', u'',u'Second Article Title', u'', u'Some text:',u'Text blurb',u'More blurb',u'Even more blurb',u'Some more blurb',, u'', u'', u'', u'', u'',

我想使用 u'', u'', u'', u'', u'' 分割字符串，以便我可以单独解析出这些行。我本来想使用这些标签，但它们的结构使其难以使用。

如何进行分割？我尝试过:

result = [list(g) for k,g in groupby(test,lambda x:x=="u''") if not k]
print result

和

for item in test:
    arr = re.split("u'', u'', u'', u'', u''",item, flags=re.UNICODE)
    print arr

但他们没有给我想要的输出。

最佳答案

如果您查看文本，您会希望通过重复的换行符 \n 进行分割来自

text
>> 'Article Title\n\n    Some text: Text blurb.\n\nMore blurb.\n\nEven more blurb. \n\nSome more blurb. \n\n\n\n\n\nSecond Article Title\n\nSome text: Text blurb.\n\nMore blurb.\n\nEven more blurb. \n\nSome more blurb. '

然后您可以使用为 text.split('\n\n\n\n\n') 定义一个参数，如果不添加参数，Python 会简单地用空格分隔。第一次拆分后，您可以按 \n\n 拆分其他元素。 .

[i.split('\n\n') for i in text.split('\n\n\n\n\n')]

>>[['Article Title',
  '    Some text: Text blurb.',
  'More blurb.',
  'Even more blurb. ',
  'Some more blurb. '],
 ['\nSecond Article Title',
  'Some text: Text blurb.',
  'More blurb.',
  'Even more blurb. ',
  'Some more blurb. ']]

关于python - 从列表中的 BeautifulSoup 输出中分割字符串，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53586149/

python - 从列表中的 BeautifulSoup 输出中分割字符串

上一篇：python - 如何使用 Facebook 的 TargetingSearch API 进行复杂查询？

下一篇：python - 在键上切片 numpy 字典数组