python - 如何从非结构化文本创建 Python 字典?

标签 python python-2.7 dictionary

我有一组存在于文本文件中的损坏链接检查器结果:

Getting links from: https://www.foo.com/
├───OK─── http://www.this.com/
├───OK─── http://www.is.com/
├─BROKEN─ http://www.broken.com/
├───OK─── http://www.set.com/
├───OK─── http://www.one.com/
5 links found. 0 excluded. 1 broken.

Getting links from: https://www.bar.com/
├───OK─── http://www.this.com/
├───OK─── http://www.is.com/
├─BROKEN─ http://www.broken.com/
3 links found. 0 excluded. 1 broken.

Getting links from: https://www.boo.com/
├───OK─── http://www.this.com/
├───OK─── http://www.is.com/
2 links found. 0 excluded. 0 broken.

我正在尝试编写一个脚本来读取文件并创建一个字典列表,其中每个根链接作为键,其子链接作为值(包括摘要行)。

我试图实现的输出如下所示:

{"Getting links from: https://www.foo.com/": ["├───OK─── http://www.this.com/", "├───OK─── http://www.is.com/", "├─BROKEN─ http://www.broken.com/", "├───OK─── http://www.set.com/", "├───OK─── http://www.one.com/", "5 links found. 0 excluded. 1 broken."], 
"Getting links from: https://www.bar.com/": ["├───OK─── http://www.this.com/", "├───OK─── http://www.is.com/", "├─BROKEN─ http://www.broken.com/", "3 links found. 0 excluded. 1 broken."],
"Getting links from: https://www.boo.com/": ["├───OK─── http://www.this.com/", "├───OK─── http://www.is.com/", "2 links found. 0 excluded. 0 broken."] }

这是我目前所拥有的:

result_list = []

with open('link_checker_result.txt', 'r') as f:
    temp_list = f.readlines()
    for line in temp_list:
        result_list.append(line)

这给了我输出:

['Getting links from: https://www.foo.com/', '├───OK─── http://www.this.com/', '├───OK─── http://www.is.com/', '├─BROKEN─ http://www.broken.com/', '├───OK─── http://www.set.com/', '├───OK─── http://www.one.com/', '5 links found. 0 excluded. 1 broken.', 'Getting links from: https://www.bar.com/', '├───OK─── http://www.this.com/', '├───OK─── http://www.is.com/', '...'  ]

我认识到这些集合中的每一个都有一些共同的特征,例如,它们之间有一个空行,或者它们以“Getting...”开头。这是我应该在写字典之前尝试拆分的东西吗?

我是 Python 的新手,所以我承认我什至不确定我的方向是否正确。真的很感谢一些专家对此的看法!提前致谢!

最佳答案

这实际上可以很短,在 4 行代码内:

finalDict = {}
with open('link_checker_result.txt', 'r') as f:
    lines = list(map(lambda line: line.split('\n'),f.read().split('\n\n')))
    finalDict = dict((elem[0],elem[1:]) for elem in lines)
print(finalDict)

输出:

{'Getting links from: https://www.foo.com/': ['+---OK--- http://www.this.com/', '+---OK--- http://www.is.com/', '+-BROKEN- http://www.broken.com/', '+---OK--- http://www.set.com/', '+---OK--- http://www.one.com/'], 'Getting links from: https://www.bar.com/': ['+---OK--- http://www.this.com/', '+---OK--- http://www.is.com/', '+-BROKEN- http://www.broken.com/'], 'Getting links from: https://www.boo.com/': ['+---OK--- http://www.this.com/', '+---OK--- http://www.is.com/']}

上面的代码所做的是,读取输入文件并使用两个连续的换行符 \n 将其拆分,以获得每个 url 的链接。

最后,它创建每个列表的第一个元素和其余元素的元组,并将它们转换为 finalDict 字典中的键值对。

下面是一种更容易理解的方法:

finalDict = {}
with open('link_checker_result.txt', 'r') as f:
    # Getting data and splitting in order to get each url and its links as a unique list element.
    data = f.read().split('\n\n')
    # Splitting each of the above created elements and discarding the last one which is redundant.
    links = [line.split('\n') for line in data]
    # Transforming these elements into key-value pairs and inserting them in the dictionary.
    finalDict = dict((elem[0],elem[1:]) for elem in links)
print(finalDict)

关于python - 如何从非结构化文本创建 Python 字典?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53964214/

相关文章:

c++ - 如何将 2 个 char* 数组直接映射到 std::map<std::string,std::string>

java - 在 map 中,按一个字段排序并按另一个字段删除元素?

python - 你能从 Python 中的 main() 函数返回一个值吗?

Python:如何使用 lxml objectify 的 iterchildren 获取不同命名空间中 sibling 的详细信息

python - BeautifulSoup4 : select elements where attributes are not equal to x

python - 读取存储在文本文件中的列表

Python Pandas 数据透视表

python - 如何将字典的字典展开到 pandas DataFrame 中以获得更大的字典?

python - 是否可以在 Python 3 中将类用作字典键?

python - 如何在没有索引的情况下获取 pandas DataFrame 单行的内容