python - 将数据加载到pandas中

标签 python pandas dataframe

我正在尝试从 pypi 中提取 pip 包的许可证信息,然后加载到 pandas 数据框中。我之前做了一个例子,将列表推导式加载到 PD 中。但我无法弄清楚这个......

到目前为止,我已经写完了。

from requests import get

import pandas as pd

import pip

url = 'https://pypi.python.org/pypi'

# packages_list = ['numpy','twisted']

installed_packages = pip.get_installed_distributions()
installed_packages_list = sorted(["%s==%s" % (i.key, i.version)
     for i in installed_packages])

packages = []
licenses = []
summarys = []

for index, package in enumerate(installed_packages_list):
    package = package.split("==")[0]
    full_url = url+'/'+ package +'/json'
    #print 'url is ' + full_url
    page = get(url+'/'+package+'/json').json()


    #print 'Package: ' + package + ', license is:' + page['info']['license'] + '. ' + page['info']['summary']
    packages.append(package)
    licenses.append(page['info']['license'])
    summarys.append(page['info']['summary'])


print packages


pd_packages = pd.DataFrame(
    {
    "packages":[packages],
    "licenses":[licenses],
    "summarys":[summarys]
    })

print pd_packages

最佳答案

试试这个:

def get_pkg_info(pkg, url_pat='https://pypi.python.org/pypi/{}/json'):
    r = requests.get(url_pat.format(pkg))
    if r.status_code != requests.codes.ok:
         return [pkg, None, None]
    d = r.json()
    if d and 'info' in d:
        return [pkg, d['info'].get('license'), d['info'].get('summary')]
    else:
         return [pkg, None, None]

data = [get_pkg_info(x.split('==')[0]) for x in installed_packages_list]

df = pd.DataFrame(data, columns=['package','license','summary'])

演示:

In [166]: pd.options.display.max_rows = 15

In [167]: df = pd.DataFrame(data, columns=['package','license','summary'])

In [168]: df
Out[168]:
                package       license                                            summary
0             alabaster          None        A configurable sidebar-enabled Sphinx theme
1       anaconda-client       UNKNOWN         Anaconda Cloud command line client library
2    anaconda-navigator   Proprietary
3      anaconda-project          None                                               None
4            asn1crypto           MIT  Fast ASN.1 parser and serializer with definiti...
5               astroid          LGPL  A abstract syntax tree for Python with inferen...
6               astropy           BSD         Community-developed python astronomy tools
..                  ...           ...                                                ...
216              xarray        Apache          N-D labeled arrays and datasets in Python
217                xlrd           BSD  Library for developers to extract data from Mi...
218          xlsxwriter           BSD     A Python module for creating Excel XLSX files.
219             xlwings  BSD 3-clause  Make Excel fly: Interact with Excel from Pytho...
220                xlwt           BSD  Library to create spreadsheet files compatible...
221           xmltodict           MIT  Makes working with XML feel like you are worki...
222               yapsy           BSD                          Yet another plugin system

[223 rows x 3 columns]

关于python - 将数据加载到pandas中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46256419/

相关文章:

python - beautifulsoup 提取句子,如果它包含关键字

python - 使用正则表达式可以吗

python - Pandas :旋转数据框并保留其他非数字列

python - Pandas:条件滚动计数 v.2

python - 通过 Dataframe 的字符串拆分循环

python - 使用 python 获取数据框中每列的唯一字符串值列表

r - R中组中出现的计数因素

python - 使用 Python/Shapely 聚合地理点的最佳方式

python - 将 dash_html_components 传递给 Jinja 模板

python - 检查列是否在列表中,如果没有则删除并将值添加到新列