这里有 pandas 的新程序员。我正在努力将 pandas 数据框的行分隔成新的数据框。我有一个看起来像这样的数据框:
In [1]: print (df)
first_name email organization
0 Brad <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="4527372421052228242c296b262a28" rel="noreferrer noopener nofollow">[email protected]</a> org1
1 Jared <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="0c666d7e69684c6b616d6560226f6361" rel="noreferrer noopener nofollow">[email protected]</a> org2
2 Daniel <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="6f0b0e01060a032f08020e0603410c0002" rel="noreferrer noopener nofollow">[email protected]</a> org3
3 Michael <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="80ede9e3e8e1e5ecc0e7ede1e9ecaee3efed" rel="noreferrer noopener nofollow">[email protected]</a> org1
4 Jaime <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="caa0aba3a7af8aada7aba3a6e4a9a5a7" rel="noreferrer noopener nofollow">[email protected]</a> org2
我想要做的是遍历数据框中的每一行,确定该行属于哪个组织(例如 Brad 属于 org1,Daniel 属于 org3),然后我想将该行写入新的数据框中。在本例中,我想要 3 个新数据框,每个数据框如下所示:
组织1:
In [3]: print (org1)
first_name email organization
0 Brad <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="127060737652757f737b7e3c717d7f" rel="noreferrer noopener nofollow">[email protected]</a> org1
1 Michael <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="4a272329222b2f260a2d272b232664292527" rel="noreferrer noopener nofollow">[email protected]</a> org1
组织2:
In [4]: print (org2)
first_name email organization
0 Jared <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="acc6cddec9c8eccbc1cdc5c082cfc3c1" rel="noreferrer noopener nofollow">[email protected]</a> org2
1 Jaime <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="80eae1e9ede5c0e7ede1e9ecaee3efed" rel="noreferrer noopener nofollow">[email protected]</a> org2
组织3:
In [3]: print (org3)
first_name email organization
0 Daniel <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="c0a4a1aea9a5ac80a7ada1a9aceea3afad" rel="noreferrer noopener nofollow">[email protected]</a> org3
如何创建一个循环来遍历原始数据帧中的每一行,通过特定列中的值识别要写入的数据帧,然后将其实际写入数据帧?
循环第一次遇到该值时,我希望它创建一个全新的数据帧。之后,任何具有相同值的后续行都应该添加到其中,而不是完全创建一个新行。任何帮助(以及理解循环如何在数据帧中工作的智慧)将不胜感激。
最佳答案
选项 1
分组依据
orgs = []
for _, g in df.groupby('organisation', as_index=False):
orgs.append(g)
或者,
orgs = [g for _, g in df.groupby('organization', as_index=False)]
现在,orgs
是一个数据帧列表,每个组织一个。
或者,如果您希望结果为 dict
,请使用 -
orgs = {i : g for i, g in df.groupby('organization', as_index=False)}
现在,要访问 org1
的数据帧,请调用 orgs['org1']
。
选项 2
另一种选择是使用 pd.Series.unique
-
orgs = []
for o in df.organization.unique():
orgs.append(df.query('organization == @o'))
或者,
orgs = [df.query('organization == @o') for o in df.organization.unique()]
或者,
orgs = { o : df.query('organization == @o') for o in df.organization.unique()}
关于python - 将行写入新的数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48196188/