python - 用固定字符串替换 Pandas 数据透视表非空结果单元格

标签 python pandas pivot-table

我正在尝试将类似于第一个示例的 CSV 转换为类似于下面第二个示例的 CSV。

我一直在玩 Pandas,认为我已经掌握了基础知识,但我似乎无法弄清楚如何进行最后一次转换(从枢轴中的占位符值到实际的英语单词)

在下面的代码中,我需要帮助的部分是这样的注释:“我需要找出可以放在这里的东西来替换在ivottally[c 列的单元格中找到的任何非空值” ] 带有字符串“registered”。”

注意 - 如果您建议一种比对列名列表进行 for 循环更有效的方式来浏览数据,请随意。 for 循环只是我第一次使用 Pandas 时测试功能的一种方法。


输入:

First  Last  Email      Program
john   doe   <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="bad0defad7df94d9d5d7" rel="noreferrer noopener nofollow">[email protected]</a>  BasketWeaving
jane   doe   <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="a8ccc2e8c5cd86cbc7c5" rel="noreferrer noopener nofollow">[email protected]</a>  BasketWeaving
jane   doe   <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="89ede3c9e4eca7eae6e4" rel="noreferrer noopener nofollow">[email protected]</a>  Acrobatics
jane   doe   <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="8ce8e6cce1e9a2efe3e1" rel="noreferrer noopener nofollow">[email protected]</a>  BasketWeaving
mick   jag   <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="80edeac0ede5aee3efed" rel="noreferrer noopener nofollow">[email protected]</a>  StageDiving

期望的输出:

First  Last  Email      StatusBasketWeaving__c  StatusAcrobatics__c  StatusStageDiving__c
john   doe   <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="234947634e460d404c4e" rel="noreferrer noopener nofollow">[email protected]</a>  registered
jane   doe   <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="ee8a84ae838bc08d8183" rel="noreferrer noopener nofollow">[email protected]</a>  registered              registered
mick   jag   <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="bdd0d7fdd0d893ded2d0" rel="noreferrer noopener nofollow">[email protected]</a>                                               registered

(实际上我的代码还插入了一列,但这会使该示例太宽,因此此处未显示。)


这是我到目前为止所写的内容:

import pandas
import numpy

# Read in the First Name, Last Name, Email Address, & "Program Registered For" columns of a log file of registrations conducted that day.
tally = pandas.read_csv('tally.csv', names=['First', 'Last', 'Email', 'Program'])

# Rename the First Name & Last Name columns so that they're Salesforce Contact object field names
tally.rename(columns={'First':'FirstName', 'Last':'LastName'}, inplace=True)

# Create a concatenation of First, Last, & Email that can be used for later Excel-based VLOOKUP-ing Salesforce Contact Ids from a daily export of Id+Calculated_Lastname_Firstname_Email from Salesforce
tally['Calculated_Lastname_Firstname_Email__c'] = tally['LastName'] + tally['FirstName'] + tally['Email']

# Rename the values in Program so that they're ready to become field names for the Salesforce Contact object
tally['Program'] = 'Status' + tally['Program'] + '__c'

# Pivot the data by grouping on First+Last+Email+(Concatenated), listing the old registered-for-Program values as column headings, and putting
# a non-null value under that column heading if the person has any rows indicating that they registered for it.
pivottally = pandas.pivot_table(tally, rows=['FirstName', 'LastName', 'Email', 'Calculated_Lastname_Firstname_Email__c'], cols='Program', aggfunc=numpy.size)

# Grab a list of column names that have to do with the programs themselves (these are where we'll want to replace our non-null placeholder with 'Registered')
statuscolumns = [s for s in (list(pivottally.columns.values)) if s.startswith('Status')]

for c in statuscolumns:
    #pivottally.rename(columns={c:'Hi'+c}, inplace=True) # Just a test line to make sure my for loop worked.
    # I need to figure out something I can put here that will replace any non-null value found in the cells of column pivottally[c] with the string 'Registered'

print(pivottally.head())

#pivottally.to_csv('pivottally.csv')

感谢您的帮助。

最佳答案

简单的选择就可以完成这项工作。构建列列表并对其进行迭代是没有用的,因为所有列都受到关注。其他列在索引中。

pivottally[pandas.notnull(pivottally)] = 'registered'

这是结果的屏幕截图。

result

关于python - 用固定字符串替换 Pandas 数据透视表非空结果单元格,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32727471/

相关文章:

c# - 在 C# 中激活数据透视表总计的显示详细信息

python - Python 实现的 RSA 加密

python - Django 将状态为 200 的请求记录到系统日志

python - Seaborn 绘图未显示

python - Pandas - 加速计算

python - Pandas 在一个尺度上绘制两个图

python-3.x - Pandas 使用 bool 值进行计算

python - 如何使用 pyodbc 将 df 提交到 SQL 数据库?

MySQL 将表转换为没有空值的矩阵

MySQL 动态交叉表