我是第一次使用 jupyter 笔记本。我尝试对 csv 的一列进行分组并获取值的计数。我用这段代码得到了下面的结果。
import pandas
pandas.read_csv('a.csv', sep=',')
df.groupby('name').name.count()
name
>Aa</TOPONYM> 4
>Aachen</TOPONYM> 5
>Aartselaar</TOPONYM> 1
>Abadan</TOPONYM> 1
>Abaya</TOPONYM> 1
>Abba</TOPONYM> 12
>Abbey 2
>Abbeydale</TOPONYM> 1
>Abbot</TOPONYM> 2
>Abbots 3
>Abbotsford</TOPONYM> 22
>Abbotsinch</TOPONYM> 5
>Abbotts 1
>Abel</TOPONYM> 1
>Aberchirder</TOPONYM> 2
>Aberdare</TOPONYM> 3
>Aberdeen 1
>Aberdeen</TOPONYM> 163
>Aberdeenshire</TOPONYM> 286
>Aberdour</TOPONYM> 9
>Aberfan</TOPONYM> 1
>Aberfeldy</TOPONYM> 16
>Abergavenny</TOPONYM> 4
>Aberlady 1
>Aberlady</TOPONYM> 3
>Abernethy</TOPONYM> 1
>Abertay 1
>Abertillery</TOPONYM> 6
>Abha</TOPONYM> 2
>Abidjan</TOPONYM> 10
...
>Zakho</TOPONYM> 20
>Zakopane</TOPONYM> 1
>Zambezi 2
>Zambezi</TOPONYM> 8
>Zambia</TOPONYM> 19
>Zamboanga</TOPONYM> 4
>Zandak</TOPONYM> 3
>Zanzibar</TOPONYM> 11
>Zaragosa</TOPONYM> 1
>Zaragoza</TOPONYM> 4
>Zeebrugge</TOPONYM> 28
>Zeeland</TOPONYM> 2
>Zemun</TOPONYM> 1
>Zenica</TOPONYM> 12
>Zermatt</TOPONYM> 5
>Zetland</TOPONYM> 1
>Zhizhong</TOPONYM> 1
>Zhongshan</TOPONYM> 2
>Zhuhai</TOPONYM> 1
>Zimbabwe</TOPONYM> 377
>Znamenskoye</TOPONYM> 1
>Zoetermeer</TOPONYM> 1
>Zola</TOPONYM> 1
>Zomba</TOPONYM> 3
>Zulu</TOPONYM> 1
>Zululand</TOPONYM> 2
>Zuni</TOPONYM> 2
>Zurich</TOPONYM> 86
>Zvornik</TOPONYM> 3
>Zwolle</TOPONYM> 1
Name: name, Length: 8585, dtype: int64
是否可以按字母顺序获取计数,首先我应该使用字母 a 运行命令,它应该给出所有带有 a 的值,然后是下一个 b 等等。或者是否可以获取跳过起始 100 个值的值。
我的真实数据如下所示:
<TOPONYM geonameid="2657540" lat="51.24827" lon="-0.76389" >Aldershot</TOPONYM>
<TOPONYM geonameid="3037854" lat="49.9" lon="2.3" >Amiens</TOPONYM>
<TOPONYM geonameid="6216857" lat="-43.59832" lon="171.55011" >Alaska</TOPONYM>
<TOPONYM geonameid="3037854" lat="49.9" lon="2.3" >Amiens</TOPONYM>
<TOPONYM geonameid="2759794" lat="52.37403" lon="4.88969" >Amsterdam</TOPONYM>
<TOPONYM geonameid="7216668" lat="28.0106" lon="-82.1184" >Alabama</TOPONYM>
<TOPONYM geonameid="5884078" lat="48.98339" lon="-73.34907" >Ally</TOPONYM>
<TOPONYM geonameid="2507480" lat="36.7525" lon="3.04197" >Algiers</TOPONYM>
<TOPONYM geonameid="2759794" lat="52.37403" lon="4.88969" >Amsterdam</TOPONYM>
<TOPONYM geonameid="2759794" lat="52.37403" lon="4.88969" >Amsterdam</TOPONYM>
最佳答案
您可以使用str[1]
选择第一个字母,然后使用value_counts
:
df = pandas.read_csv('a.csv')
a = df['name'].str[0].value_counts().rename_axis('alph').reset_index(name='count')
另一个解决方案 groupby
按第二个字母:
a = df['name'].groupby(df['name'].str[0]).count().reset_index(name='count')
<小时/>
a = df['name'].groupby(df['name'].str[0]).size().reset_index(name='count')
关于python - Jupyter 笔记本 Ipython : Groupby based on the values alphabetically,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47201227/