如何优雅地将下面的递归 SQL 查询移植到 Pandas python 代码中? 不知何故,如果不编写自己的递归函数,我就看不到一种直接的方法?
Python 示例代码:
import datetime
import numpy as np
import pandas as pd
import pandas.io.data
from pandas import Series, DataFrame
data = {
'ID': [1,2,3,4,5,6,7,8],
'Name': ['Keith','Josh','Robin','Raja','Tridip','Arijit','Amit','Dev'],
'MgrID': [0,1,1,2,0,5,5,6]
}
df = pd.DataFrame.from_dict(data)
df.set_index('ID', inplace=True, drop=False, append=False)
df.ix[df.query('MgrID >0')['MgrID']]
试图得到这个:
nLevel ID Name
================================
1 6 Arijit
2 8 Dev
1 1 Keith
2 2 Josh
2 3 Robin
3 4 Raja
1 5 Tridip
2 7 Amit
递归 SQL 查询:
;WITH Employee (ID, Name, MgrID) AS
(
SELECT 1, 'Keith', NULL UNION ALL
SELECT 2, 'Josh', 1 UNION ALL
SELECT 3, 'Robin', 1 UNION ALL
SELECT 4, 'Raja', 2 UNION ALL
SELECT 5, 'Tridip', NULL UNION ALL
SELECT 6, 'Arijit', NULL UNION ALL
SELECT 7, 'Amit', 5 UNION ALL
SELECT 8, 'Dev', 6
)
,Hierarchy AS
(
-- Anchor
SELECT ID
,Name
,MgrID
,nLevel = 1
,Family = ROW_NUMBER() OVER (ORDER BY Name)
FROM Employee
WHERE MgrID IS NULL
UNION ALL
-- Recursive query
SELECT E.ID
,E.Name
,E.MgrID
,H.nLevel+1
,Family
FROM Employee E
JOIN Hierarchy H ON E.MgrID = H.ID
)
SELECT nLevel ,ID,space(nLevel+(CASE WHEN nLevel > 1 THEN nLevel ELSE 0 END))+Name Name FROM Hierarchy ORDER BY Family, nLevel
最佳答案
首先,您需要更正Python代码MgrID
列表中的拼写错误:
[0,1,1,2,0,0,5,6]
第二,如果这项工作是在 SQL 中递归完成的,为什么你期望 Python/Pandas 可以不使用递归方法来完成它?这并不难:
def nlevel(id, mgr_dict=df.MgrID, _cache={0:0}):
if id in _cache:
return _cache[id]
return 1+nlevel(mgr_dict[id],mgr_dict)
df['nLevel'] = df.ID.map(nlevel)
print df[['nLevel','ID','Name']]
然后输出(nLevel
)就是你所需要的(除了顺序,我从你的SQL中无法理解):
nLevel ID Name
ID
1 1 1 Keith
2 2 2 Josh
3 2 3 Robin
4 3 4 Raja
5 1 5 Tridip
6 1 6 Arijit
7 2 7 Amit
8 2 8 Dev
[8 rows x 3 columns]
关于python - Pandas 中的递归 SQL CTE 查询?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23770595/