python - 如何从嵌套 for 循环构建 pandas 数据框

标签 python pandas dataframe for-loop

我正在使用谷歌云视频智能 API,并尝试将结果放入 pandas 数据帧中。 API 的输出类是repeatedcompositecontainer。因此,我的想法是在 API 函数中使用的 for 循环内构建一个数据帧。

这是 API 函数处理结果的方式:

    segment_labels = result.annotation_results[0].segment_label_annotations
    for i, segment_label in enumerate(segment_labels):
        print('Video label description: {}'.format(
            segment_label.entity.description))
            
        for category_entity in segment_label.category_entities:
            print('\tLabel category description: {}'.format(
                category_entity.description))

        for i, segment in enumerate(segment_label.segments):
            start_time = (segment.segment.start_time_offset.seconds +
                          segment.segment.start_time_offset.nanos / 1e9)
            end_time = (segment.segment.end_time_offset.seconds +
                        segment.segment.end_time_offset.nanos / 1e9)
            positions = '{}s to {}s'.format(start_time, end_time)
            confidence = segment.confidence
            print('\tSegment {}: {}'.format(i, positions))
            print('\tConfidence: {}'.format(confidence))
        print('\n')

this Stack Overflow article的帮助下我创建了一个空列表并附加了结果,稍后将其转换为 pandas 数据框,如下所示:

    df = []
    
    # Process video/segment level label annotations
    segment_labels = result.annotation_results[0].segment_label_annotations
    for i, segment_label in enumerate(segment_labels):
        print('Video label description: {}'.format(
            segment_label.entity.description))
            
        for category_entity in segment_label.category_entities:
            print('\tLabel category description: {}'.format(
                category_entity.description))
            df.append({'Description': category_entity.description})

        for i, segment in enumerate(segment_label.segments):
            start_time = (segment.segment.start_time_offset.seconds +
                          segment.segment.start_time_offset.nanos / 1e9)
            end_time = (segment.segment.end_time_offset.seconds +
                        segment.segment.end_time_offset.nanos / 1e9)
            positions = '{}s to {}s'.format(start_time, end_time)
            confidence = segment.confidence
            df.append({'Confidence': segment.confidence, 'Start': start_time, 'End': end_time})
            print('\tSegment {}: {}'.format(i, positions))
            print('\tConfidence: {}'.format(confidence))
        print('\n')

当我只尝试最后一个 for 循环时,它给了我一个很好的结构化数据框架,如下所示

>>> frame = pd.DataFrame(df)
>>> frame
Confidence         End  Start
  0.704168  599.682416    0.0
  0.737053  599.682416    0.0
  0.832496  599.682416    0.0
  0.427637  599.682416    0.0
  0.518693  599.682416    0.0

但是,当我将相同的逻辑添加到 for 循环时,它会给出如下扭曲的数据帧

>>> frame = pd.DataFrame(df)
>>> frame
Confidence    Description         End  Start
       NaN     technology         NaN    NaN
  0.741133            NaN  599.682416    0.0
       NaN       keyboard         NaN    NaN
  0.328138            NaN  599.682416    0.0
       NaN         person         NaN    NaN
  0.436333            NaN  599.682416    0.0
       NaN         person         NaN    NaN

我希望是否有办法修复它并获得如下数据框:

>>> frame = pd.DataFrame(df)
>>> frame
Confidence  Description    End        Start
  0.741133  technology   599.682416    0.0
  0.328138  keyboard     599.682416    0.0
  0.436333  person       599.682416    0.0

接下来我可以尝试什么?

最佳答案

更改您的代码,如下所示:

    df = []

    # Process video/segment level label annotations
    segment_labels = result.annotation_results[0].segment_label_annotations
    for i, segment_label in enumerate(segment_labels):
        print('Video label description: {}'.format(
            segment_label.entity.description))
        label_row = {} # Create a dictionary for the label
        for category_entity in segment_label.category_entities:
            print('\tLabel category description: {}'.format(
                category_entity.description))
            # Add the description
            label_row['Description'] = category_entity.description

        for i, segment in enumerate(segment_label.segments):
            start_time = (segment.segment.start_time_offset.seconds +
                          segment.segment.start_time_offset.nanos / 1e9)
            end_time = (segment.segment.end_time_offset.seconds +
                        segment.segment.end_time_offset.nanos / 1e9)
            positions = '{}s to {}s'.format(start_time, end_time)
            confidence = segment.confidence
            row_segment_info = {'Confidence': segment.confidence, 'Start': start_time, 'End': end_time})
            # Add the segment info for this row
            label_row.update(row_segment_info)
            df.append(label_row) # Now add the row
            print('\tSegment {}: {}'.format(i, positions))
            print('\tConfidence: {}'.format(confidence))
        print('\n')

总之:您在每个子循环中添加行列表。您只想将该行添加一次。

关于python - 如何从嵌套 for 循环构建 pandas 数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57104050/

相关文章:

python - 如何刷新缓存在 Google App Engine 中的用户对象?

Python - 如何在 Mac OS X 上使用 GUI 应用程序制作守护程序?

python - 在 Matplotlib 图中从 Pandas Dataframe 注释点

python - 根据需要创建尽可能多的列,并使用它们将 .apply() 的输出放置在 Pandas 数据框中

r - 使用变量向数据框添加列

python - Pandas:从字典列表创建一个数据框,其中的值都是 numpy 数组?

python - 在保留索引的同时删除包含 NaN 的行

python - 将数据帧值合并到新数据帧

python - Pandas 根据其他列上的复合条件添加一列

python - 将颜色条添加到集群热图