python - 将程序中的 Python Pandas Dataframe 输出合并到一个 Dataframe 中

标签 python json pandas

经过几周的改进,我得到了以下代码,感谢 SO 上的优秀人员,它们根据我的需要生成数据帧,但我不确定如何将程序中的数据帧连接到最终数据帧对象变量中。我只是将 concat 语句分配给一个变量,然后我最终只得到最后一个数据帧。

{
"zipcode":"08989",
"current" {"canwc":null,"cig":4900,"class":"observation","clds":"OVC","day_ind":"D","dewpt":19,"expireTimeGMT":1385486700,"feels_like":34,"gust":null,"hi":37,"humidex":null,"icon_code":26,"icon_extd":2600,"max_temp":37,"wxMan":"wx1111"},
"triggers":[53,31,9,21,48,7,40,178,55,179,176,26,103,175,33,51,20,57,112,30,50,113]
}
{
"zipcode":"08990",
"current":{"canwc":null,"cig":4900,"class":"observation","clds":"OVC","day_ind":"D","dewpt":19,"expireTimeGMT":1385486700,"feels_like":34,"gust":null,"hi":37,"humidex":null,"icon_code":26,"icon_extd":2600,"max_temp":37, "wxMan":"wx1111"},
"triggers":[53,31,9,21,48,7,40,178,55,179,176,26,103,175,33,51,20,57,112,30,50,113]
}

def lines_per_n(f, n):
    for line in f:
        yield ''.join(chain([line], itertools.islice(f, n - 1)))

def series_chunk(chunk):
    try:
        jfile = json.loads(chunk)
        zipcode = jfile['zipcode']
        datetime = jfile['current']['proc_time']
        triggers = jfile['triggers']
        return pd.Series([jfile['zipcode'], jfile['current']['proc_time'],\
                            jfile['triggers']])
    except ValueError, e:
        pass
    else:
        pass

for fin in glob.glob('*.txt'):
    with open(fin) as f:
        print pd.concat([series_chunk(chunk) for chunk in lines_per_n(f, 5)], axis=1).T

上述程序的输出,我需要将其连接为一个数据帧:

       0               1                                                  2
0  08988  20131126102946                                                 []
1  08989  20131126102946  [53, 31, 9, 21, 48, 7, 40, 178, 55, 179, 176, ...
       0               1                                                  2
0  08988  20131126102946                                                 []
1  08989  20131126102946  [53, 31, 9, 21, 48, 7, 40, 178, 55, 179, 176, ...

最终还是屈服了。这是完成我需要的最终代码:

dfs = []
for fin in glob.glob('*.txt'):
    with open(fin) as f:
        df = pd.concat([series_chunk(chunk) for\
            chunk in lines_per_n(f, 7)], axis=1).T
        dfs.append(df)

df = pd.concat(dfs, ignore_index=True)

最佳答案

很高兴您解决了这个问题。 IMO 是一种稍微更干净的方法来执行此操作,作为列表理解

def dataframe_from_file(fin):
    with open(fin) as f:
        return pd.concat([series_chunk(chunk) for chunk in lines_per_n(f, 7)],
                            axis=1).T

df = pd.concat([dataframe_from_file(fin) for fin in glob.glob('*.txt')],
                  ignore_index=True)

注意:最后的 concat 使用 axis=1 可能意味着您可以提前避免 T-ing。

关于python - 将程序中的 Python Pandas Dataframe 输出合并到一个 Dataframe 中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20683491/

相关文章:

java - com.google.gson.JsonSyntaxException : Expected a string but was BEGIN_OBJECT at line 1 column 18101 path $. 数据[0].groups[0]

python - 通过添加列创建 pandas 数据框

python - pandas 类型转换的 h2o 框架

python - 和 pd.read_ 如何使用 parse_dates 然后将索引设置为解析日期的结果?

python - 查找列表中最流行的词

python - 一对多关系 NoForeignKeysError

python - 如何将儒略日期转换为日历日期?

python - 在 Django Rest Framework GET 调用中自定义 JSON 输出

android - 如何在 Android Studio 中解析带有 url 的 Json

json - 如何解码多种类型的嵌套数组?