python - 如果下一行在特定列中有 NaN,则连接 Pandas 行

标签 python pandas dataframe

我有从 pdf 文件解析的 csv 文件,但由于 pdf 文件中的表格有多行,所以解析不正确。 将其导入 pd DataFrame 如下所示。

        Record                            Operational Address       BIC
2   2007-03-03  Omladinskih Brigada 90V 11070 BEOGRAD SERBIA,  AAAARSBG
3          NaN                                    REPUBLIC OF       NaN
4   1994-03-07           SAFAT ALI AL SALEM STREET, MUBARAKIA  AAACKWKW
5          NaN           OPPOSITE PUBLIC LIBRARY 13022 KUWAIT       NaN
6          NaN                                         KUWAIT       NaN
7   2006-06-03           CHEZ NSM 3, AVENUE HOCHE 75008 PARIS  AAADFRP1
8          NaN                                         FRANCE       NaN
9   2006-06-03           10 RUE DU COLISEE 75008 PARIS FRANCE  AAAGFRP1
10         NaN                                            NaN       NaN
11  2014-07-05           152, 6TH OF SEPTEMBER BLVD. BUSINESS  AAAJBG21
12         NaN             CENTER LEGIS 4000 PLOVDIV BULGARIA       NaN
13  1989-03-29       DHABAB STREET HEAD OFFICE BUILDING 11431  AAALSARI
14         NaN                            RIYADH SAUDI ARABIA       NaN

如果 Record 列中的下一个值是 NaN,我想将下一行连接到当前行。

我是说我想得到

        Record                                                                       Operational Address        BIC
2   2007-03-03  Omladinskih Brigada 90V 11070 BEOGRAD SERBIA, REPUBLIC OF                                  AAAARSBG
4   1994-03-07           SAFAT ALI AL SALEM STREET, MUBARAKIA OPPOSITE PUBLIC LIBRARY 13022 KUWAIT KUWAIT  AAACKWKW
7   2006-06-03           CHEZ NSM 3, AVENUE HOCHE 75008 PARIS FRANCE                                       AAADFRP1
9   2006-06-03           10 RUE DU COLISEE 75008 PARIS FRANCE                                              AAAGFRP1
11  2014-07-05           152, 6TH OF SEPTEMBER BLVD. BUSINESS CENTER LEGIS 4000 PLOVDIV BULGARIA           AAAJBG21
13  1989-03-29       DHABAB STREET HEAD OFFICE BUILDING 11431 RIYADH SAUDI ARABIA                          AAALSARI

这里是数据框

import numpy a np
data = {'Record': {2: '2007-03-03',
                   3: np.NaN,
                   4: '1994-03-07',
                   5: np.NaN,
                   6: np.NaN,
                   7: '2006-06-03',
                   8: np.NaN,
                   9: '2006-06-03',
                   10: np.NaN,
                   11: '2014-07-05',
                   12: np.NaN,
                   13: '1989-03-29',
                   14: np.NaN},
        'Operational Address': {2: 'Omladinskih Brigada 90V 11070 BEOGRAD SERBIA,',
                                3: 'REPUBLIC OF',
                                4: 'SAFAT ALI AL SALEM STREET, MUBARAKIA',
                                5: 'OPPOSITE PUBLIC LIBRARY 13022 KUWAIT',
                                6: 'KUWAIT',
                                7: 'CHEZ NSM 3, AVENUE HOCHE 75008 PARIS',
                                8: 'FRANCE',
                                9: '10 RUE DU COLISEE 75008 PARIS FRANCE',
                                10: np.NaN,
                                11: '152, 6TH OF SEPTEMBER BLVD. BUSINESS',
                                12: 'CENTER LEGIS 4000 PLOVDIV BULGARIA',
                                13: 'DHABAB STREET HEAD OFFICE BUILDING 11431',
                                14: 'RIYADH SAUDI ARABIA'},
        'BIC': {2: 'AAAARSBG',
                3: np.NaN,
                4: 'AAACKWKW',
                5: np.NaN,
                6: np.NaN,
                7: 'AAADFRP1',
                8: np.NaN,
                9: 'AAAGFRP1',
                10: np.NaN,
                11: 'AAAJBG21',
                12: np.NaN,
                13: 'AAALSARI',
                14: np.NaN}}

df = pd.DataFrame(data=data)

最佳答案

使用cumsum进行分组,并为每一列指定一个字典进行聚合。

agg_d = {'Record': 'first', 
         'Operational Address': lambda x: ' '.join(x.dropna()),
         'BIC': 'first'}

df.groupby(df.Record.notnull().cumsum().rename(None)).agg(agg_d)

       Record                                                               Operational Address       BIC
1  2007-03-03  Omladinskih Brigada 90V 11070 BEOGRAD SERBIA, REPUBLIC OF                         AAAARSBG
2  1994-03-07  SAFAT ALI AL SALEM STREET, MUBARAKIA OPPOSITE PUBLIC LIBRARY 13022 KUWAIT KUWAIT  AAACKWKW
3  2006-06-03  CHEZ NSM 3, AVENUE HOCHE 75008 PARIS FRANCE                                       AAADFRP1
4  2006-06-03  10 RUE DU COLISEE 75008 PARIS FRANCE                                              AAAGFRP1
5  2014-07-05  152, 6TH OF SEPTEMBER BLVD. BUSINESS CENTER LEGIS 4000 PLOVDIV BULGARIA           AAAJBG21
6  1989-03-29  DHABAB STREET HEAD OFFICE BUILDING 11431 RIYADH SAUDI ARABIA                      AAALSARI

关于python - 如果下一行在特定列中有 NaN,则连接 Pandas 行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55052731/

相关文章:

Python:通过数据框进行转换或融化和分组

r - 如何使用长格式的 R 数据帧的子集进行操作?

python - 为什么这个 Pandas 系列函数不返回任何值?

python - 如何在 Selenium Webdriver 2 Python 中获取当前 URL?

python - Pandas groupby + ifelse + 将新列添加回原始 df

Python Pandas 循环字典键(元组)并绘制变量相互关系

python - Pandas 错误 : "IndexError: iloc cannot enlarge its target object"

python - 如何在 Mac 上将 openCV 安装到 Enthought python 发行版中

python - pipenv需要python 3.7,但已安装的版本是3.8,无法安装

python - 在 Python 中编写这个 for 循环的最优雅的方法是什么?