python - 调用命令并保留格式和数据类型

标签 python python-3.x beautifulsoup

我从这里有一个 Linux 命令 :

lodestoner topics


/usr/lib/python2.7/site-packages/beautifulsoup4-4.4.1-py2.7.egg/bs4/ UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

To get rid of this warning, change this:

 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "lxml")

        "body": "<div class=\"area_inner_cont\">\n<a href=\"//\"><img alt=\"\" height=\"149\" src=\"\" width=\"570\"/></a>\n\t\t\t\t\t\tThe date has been set for the twenty-seventh installment of the Letter from the Producer LIVE! Streaming live from <b class=\"text-strong\">Kagoshima</b>, Japan, Producer &amp; Director Yoshi-P will answer questions from players across the globe. Don't miss this chance to get the latest information to come out of Eorzea!<br/>\n<br/>\r\nRead on for <a href=\"//\" rel=\"2f33323151437bd543ae8637766e0bd3c6d741ac\">details</a>.\n\t\t\t\t\t</div>",
        "lang": "en",
        "title": "Letter from the Producer LIVE Part XXVII",
        "timestamp": 1452844800,
        "link": "//",
        "id": "2f33323151437bd543ae8637766e0bd3c6d741ac"


    "title": "The \u201dLetter from the Producer LIVE Part XXVI\u201d Digest Released!",

我希望用 Python 编写脚本并保留格式。我已经这样做了:

proc = subprocess.Popen(['lodestoner', 'topics'], stdout=subprocess.PIPE)
(xml, err) = proc.communicate()
exit_code = proc.wait()


b'[\n    {\n        "body": "<div class=\\"area_inner_cont\\">\\n<a href=\\"//\\"><img alt=\\"\\" height=\\"149\\" src=\\"\\" width=\\"570\\"/></a>\\n\\t\\t\\t\\t\\t\\tThe date has been set for the twenty-seventh installment of the Letter from the Producer LIVE! Streaming live from <b class=\\"text-strong\\">Kagoshima</b>, Japan, Producer &amp; Director Yoshi-P will answer questions from players across the globe. Don\'t miss this chance to get the latest information to come out of Eorzea!<br/>\\n<br/>\\r\\nRead on for <a href=\\"//\\" rel=\\"2f33323151437bd543ae8637766e0bd3c6d741ac\\">details</a>.\\n\\t\\t\\t\\t\\t</div>", \n        "lang": "en", \n        "title": "Letter from the Producer LIVE Part XXVII", \n        "timestamp": 1452844800, \n        "link": "//", \n        "id": "2f33323151437bd543ae8637766e0bd3c6d741ac"\n    }]'

我错过了什么吗?我怎样才能把它带入Python并正确处理它(BeautifulSoup/XML)?例如,如果我想打印 title


这是一个 json 字符串(字节字符串)。使用 json.loads 对其进行解码将字节串解码为字符串后:

(xml, err) = ...

objects = json.loads(xml.decode())
print([o['title'] for o in objects])

如果您只想打印 xml,请解码 xml (字节字符串对象)并打印:


关于python - 调用命令并保留格式和数据类型,我们在Stack Overflow上找到一个类似的问题:


python - Pandas read_html 生成带有元组列名称的空 df

python - 滚动区域 - 设置小部件/布局?

python - 如何将包含 7 位毫秒数的日期字符串转换为 Python 中的日期

python - 如何在管道内使用 SMOTENC(错误 : Some of the categorical indices are out of range)?

python - 如何处理 Pandas 中的多值线终止符

python-3.x - 返回执行函数的输出 'on_click'

python - BeautifulSoup4 无法从表中抓取数据

python - 如何通过rel内容获取链接标签的文本?

Python BeautifulSoup 错误

python - 如何在 python 函数中的 'self' 旁边添加参数?