python - Jupyter 笔记本 Ipython : Groupby based on the values alphabetically

标签 python pandas jupyter-notebook pandas-groupby

我是第一次使用 jupyter 笔记本。我尝试对 csv 的一列进行分组并获取值的计数。我用这段代码得到了下面的结果。

import pandas
pandas.read_csv('a.csv', sep=',')
df.groupby('name').name.count()
name
>Aa</TOPONYM>                 4
>Aachen</TOPONYM>             5
>Aartselaar</TOPONYM>         1
>Abadan</TOPONYM>             1
>Abaya</TOPONYM>              1
>Abba</TOPONYM>              12
>Abbey                        2
>Abbeydale</TOPONYM>          1
>Abbot</TOPONYM>              2
>Abbots                       3
>Abbotsford</TOPONYM>        22
>Abbotsinch</TOPONYM>         5
>Abbotts                      1
>Abel</TOPONYM>               1
>Aberchirder</TOPONYM>        2
>Aberdare</TOPONYM>           3
>Aberdeen                     1
>Aberdeen</TOPONYM>         163
>Aberdeenshire</TOPONYM>    286
>Aberdour</TOPONYM>           9
>Aberfan</TOPONYM>            1
>Aberfeldy</TOPONYM>         16
>Abergavenny</TOPONYM>        4
>Aberlady                     1
>Aberlady</TOPONYM>           3
>Abernethy</TOPONYM>          1
>Abertay                      1
>Abertillery</TOPONYM>        6
>Abha</TOPONYM>               2
>Abidjan</TOPONYM>           10
                           ... 
>Zakho</TOPONYM>             20
>Zakopane</TOPONYM>           1
>Zambezi                      2
>Zambezi</TOPONYM>            8
>Zambia</TOPONYM>            19
>Zamboanga</TOPONYM>          4
>Zandak</TOPONYM>             3
>Zanzibar</TOPONYM>          11
>Zaragosa</TOPONYM>           1
>Zaragoza</TOPONYM>           4
>Zeebrugge</TOPONYM>         28
>Zeeland</TOPONYM>            2
>Zemun</TOPONYM>              1
>Zenica</TOPONYM>            12
>Zermatt</TOPONYM>            5
>Zetland</TOPONYM>            1
>Zhizhong</TOPONYM>           1
>Zhongshan</TOPONYM>          2
>Zhuhai</TOPONYM>             1
>Zimbabwe</TOPONYM>         377
>Znamenskoye</TOPONYM>        1
>Zoetermeer</TOPONYM>         1
>Zola</TOPONYM>               1
>Zomba</TOPONYM>              3
>Zulu</TOPONYM>               1
>Zululand</TOPONYM>           2
>Zuni</TOPONYM>               2
>Zurich</TOPONYM>            86
>Zvornik</TOPONYM>            3
>Zwolle</TOPONYM>             1
Name: name, Length: 8585, dtype: int64

是否可以按字母顺序获取计数,首先我应该使用字母 a 运行命令,它应该给出所有带有 a 的值,然后是下一个 b 等等。或者是否可以获取跳过起始 100 个值的值。

我的真实数据如下所示:

<TOPONYM    geonameid="2657540" lat="51.24827"  lon="-0.76389"  >Aldershot</TOPONYM>    
<TOPONYM    geonameid="3037854" lat="49.9"  lon="2.3"   >Amiens</TOPONYM>   
<TOPONYM    geonameid="6216857" lat="-43.59832" lon="171.55011" >Alaska</TOPONYM>   
<TOPONYM    geonameid="3037854" lat="49.9"  lon="2.3"   >Amiens</TOPONYM>   
<TOPONYM    geonameid="2759794" lat="52.37403"  lon="4.88969"   >Amsterdam</TOPONYM>    
<TOPONYM    geonameid="7216668" lat="28.0106"   lon="-82.1184"  >Alabama</TOPONYM>  
<TOPONYM    geonameid="5884078" lat="48.98339"  lon="-73.34907" >Ally</TOPONYM> 
<TOPONYM    geonameid="2507480" lat="36.7525"   lon="3.04197"   >Algiers</TOPONYM>  
<TOPONYM    geonameid="2759794" lat="52.37403"  lon="4.88969"   >Amsterdam</TOPONYM>    
<TOPONYM    geonameid="2759794" lat="52.37403"  lon="4.88969"   >Amsterdam</TOPONYM>    

最佳答案

您可以使用str[1]选择第一个字母,然后使用value_counts :

df = pandas.read_csv('a.csv')

a = df['name'].str[0].value_counts().rename_axis('alph').reset_index(name='count')

另一个解决方案 groupby按第二个字母:

a = df['name'].groupby(df['name'].str[0]).count().reset_index(name='count')
<小时/>
a = df['name'].groupby(df['name'].str[0]).size().reset_index(name='count')

关于python - Jupyter 笔记本 Ipython : Groupby based on the values alphabetically,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47201227/

相关文章:

jupyter-notebook - 在 Google Colaboratory 中使用小部件

Python Pygame 无法正确显示图像

python - 背景上的简单 HTTPServer python

python - 如何腌制一个空文件?

python - Pandas 错误 : "pandas.hashtable.PyObjectHashTable.get_item"

python - 在 Atom 中安装 Jupyter Notebook 错误 : Installing “jupyter-notebook@0.0.10” failed

python - conda创建环境py3.7,只安装jupyter,但发现冲突

python - Mac OS X 10.10 MacPorts Python 选择

python - 如何让pandas像MS Excel一样使用单元格算术?

python - datetime.timestamp 在 pandas apply 和 dataframe 选择中返回不同的值