python - 图书的 CSV 解析

我在解析一个 csv 文件的项目时遇到问题，该文件将包含教科书的章节和小节，看起来像这样:

Chapter, Section, Lesson  #this line shows how the book will be organized
Ch1Name, Secion1Name, Lesson1Name
Ch1Name, Secion2Name, Lesson1Name
Ch1Name, Secion2Name, Lesson2Name

我正在为每个部分创建 Django 模型对象，每个部分都有一个 parent 属性，这是它所在的父部分。我无法想出一种方法来以这种方式处理 csv 文件 parent 的分配是正确的。任何关于如何开始的想法都会很棒。

最佳答案

首先，希望您已经在使用 csv 模块，而不是尝试手动解析它。

其次，您的问题并不完全清楚，但听起来您正在尝试在阅读数据时从数据构建一个简单的树结构。

那么，是这样的吗？

with open('book.csv') as book:
    chapters = collections.defaultdict(collections.defaultdict(list))
    book.readline() # to skip the headers
    for chapter_name, section_name, lesson_name in csv.reader(book):
        chapters[chapter_name][section_name].append(lesson_name)

当然，这是假设您需要一个“关联树”——dict 的 dict。更普通的线性树，如 list 的 list，或“父指针”形式的隐式树，甚至更简单。

例如，假设您有这样定义的类:

class Chapter(object):
    def __init__(self, name):
        self.name = name

class Section(object):
    def __init__(self, chapter, name):
        self.chapter = chapter
        self.name = name

class Lesson(object):
    def __init__(self, section, name):
        self.section = section
        self.name = name

并且您希望每个对象都有一个 dict，将名称映射到对象。所以:

with open('book.csv') as book:
    chapters, sections, lessons = {}, {}, {}
    book.readline() # to skip the headers
    for chapter_name, section_name, lesson_name in csv.reader(book):
        chapter = chapters.setdefault(chapter_name, Chapter(chapter_name))
        section = sections.setdefault(section_name, Section(chapter, section_name))
        lesson = lessons.setdefault(lesson_name, Lesson(section, lesson_name))

现在，您可以随机选择一节课，并打印其章节:

lesson = random.choice(lessons.values())
print('Chapter {}, Section {}: Lesson {}'.format(lesson.section.chapter.name,
                                                 lesson.section.name, lesson.name))

要记住的最后一件事:在这个例子中，父引用不会导致任何循环引用，因为父没有引用他们的 child 。但是，如果您需要它怎么办？

class Chapter(object):
    def __init__(self, name):
        self.name = name
        self.sections = {}

class Section(object):
    def __init__(self, chapter, name):
        self.chapter = chapter
        self.name = name
        self.lessons = {}

# ...

chapter = chapters.setdefault(chapter_name, Chapter(chapter_name))
section = sections.setdefault(section_name, Section(chapter, section_name))
chapters[section_name] = section

到目前为止，还不错……但是当您处理完所有这些对象后会发生什么？它们有循环引用，这可能会导致垃圾回收出现问题。不是无法克服的问题，但这确实意味着在大多数实现中对象不会被快速收集。例如，在 CPython 中，一旦最后一个引用超出范围，通常会立即收集东西——但如果你有循环引用，那永远不会发生，所以在下一次循环检测器通过之前不会收集任何东西。解决方案是使用 weakref对于父指针(或指向子项的 weakref 的集合)。

关于python - 图书的 CSV 解析，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/15013380/

python - 图书的 CSV 解析

上一篇：algorithm - 教师对 Josephus 排列的输出无法重现

下一篇：algorithm - 元素混合算法