python - 如何使用 Pandas 从数据框中获得独特性?

标签 python pandas

我有df

2016-06-21 06:25:09 upi88@yandex.ru GET HTTP/1.1    Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53   200 application/json    2130    https://edge-chat.facebook.com/pull?channel=p_100006170407238&seq=27&clientid=1d67ca6e&profile=mobile&partition=-2&sticky_token=185&msgs_recv=27&qp=y&cb=1830997782&state=active&sticky_pool=frc3c09_chat-proxy&uid=100006170407238&viewer_uid=100006170407238&m_sess=&__dyn=1Z3p5wnE-4UpwDF3GAgy78qzoC6Erz8B0GxG9xu3Z0QwFzohxO3O2G2a1mwYxm48sxadwpVEy1qK78gwUx6&__req=79&__ajax__=AYlbtcBwGC2suZLI-J88V0PWa58vtQeG3YlQLydFRsAl6UwLSjsSpD7peu8mGl6NsHvd2zxfDcB6A0-XunBugUsYZ1lMYmUu97R43iV7XSfpyg&__user=100006170407238
2016-06-22 06:25:20 upi88@yandex.ru POST HTTP/1.1   Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53   200 application/x-javascript    20248   https://m.facebook.com/stories.php?aftercursor=MTQ2NjY2MzEwNToxNDY2NjYzMTA1Ojg6NzM0ODg0MDExMjAyNDY1MzA5NToxNDY2NjYyNzk1OjA%3D&tab=h_nor&__m_log_async__=1
2016-06-23 06:25:25 upi88@yandex.ru CONNECT HTTP/1.1    Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53   200 -   0   scontent.xx.fbcdn.net:443
2016-06-23 06:25:25 upi88@yandex.ru GET HTTP/1.1    Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53   200 text/html   1105    https://m.facebook.com/xti.php?xt=2.qid.6299270070554694533%3Amf_story_key.343726573953754118%3Aei.AI%40ecf11fb3faf9c0b1f73ce2a74bc9f228
2016-06-24 06:25:25 lemuska@mail.ru CONNECT HTTP/1.1    Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53   200 -   0   scontent.xx.fbcdn.net:443
2016-06-25 06:25:25 lemuska@mail.ru CONNECT HTTP/1.1    Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53   200 -   0   scontent.xx.fbcdn.net:443
2016-06-25 06:25:25 lemuska@mail.ru CONNECT HTTP/1.1    Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53   200 -   0   scontent.xx.fbcdn.net:443

我需要为每个 ID 获取唯一的日期(只有年、月和日)。 期望的输出:

upi88@yandex.ru - 2016-06-21, 2016-06-22, 2016-06-23
lemuska@mail.ru - 2016-06-24, 2016-06-25

我怎样才能得到这个日期?

最佳答案

您可以先从您的日期中提取您需要的信息:

df['filtered date'] = [w[:10] for w in df['date']]

然后你使用一个`drop duplicates':

output = df[['id','filtered date']].drop_duplicates()

为了清晰起见,您可以重新排序数据框:

output.sort_values(by['id','filtered date'],inplace = True)

你最终会得到这样的输出:

    id               filtered date
0   lemuska@mail.ru  2016-06-24
1   lemuska@mail.ru  2016-06-25
3   upi88@yandex.ru  2016-06-21
4   upi88@yandex.ru  2016-06-22
5   upi88@yandex.ru  2016-06-23

关于python - 如何使用 Pandas 从数据框中获得独特性?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38306478/

相关文章:

python - 在 Pandas 中,如何根据列中以逗号分隔的项目计数创建数据框?

python - 只有一个数字特征的逻辑回归

python - PyQt 和 py2exe : no sound from QSound after compiling to . exe

python - 如何在Python中识别字符串和字母数字字符串?

python - 通过将 x 添加到前一行 pandas 来填充 na 值

python - 方法对象不可订阅

python - 在 Django 中捕获 OperationalError 1040

python - 在 lxml 中查找元素结束标记的行号

python - 如何解决 Python 中的 i++ 循环错误?

python - 如何根据多列的正则表达式从数据框中选择行