python - 将数据源信息附加到 pandas 系列

标签 python pandas metadata series

有没有办法将数据源的信息附加到 pandas 系列?目前,我只是在数据框中添加列以指示每个变量的来源...

非常感谢您的想法和建议!

最佳答案

来自官方pandas documentation :

To let original data structures have additional properties, you should let pandas know what properties are added. pandas maps unknown properties to data names overriding __getattribute__. Defining original properties can be done in one of 2 ways:

  1. Define _internal_names and _internal_names_set for temporary properties which WILL NOT be passed to manipulation results.

  2. Define _metadata for normal properties which will be passed to manipulation results.

Below is an example to define two original properties, “internal_cache” as a temporary property and “added_property” as a normal property

class SubclassedDataFrame2(DataFrame):

    # temporary properties
    _internal_names = pd.DataFrame._internal_names + ['internal_cache']
    _internal_names_set = set(_internal_names)

    # normal properties
    _metadata = ['added_property']

@property
def _constructor(self):
    return SubclassedDataFrame2

_

>>> df = SubclassedDataFrame2({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
>>> df
   A  B  C
0  1  4  7
1  2  5  8
2  3  6  9

>>> df.internal_cache = 'cached'
>>> df.added_property = 'property'

>>> df.internal_cache
cached
>>> df.added_property
property

# properties defined in _internal_names is reset after manipulation
>>> df[['A', 'B']].internal_cache
AttributeError: 'SubclassedDataFrame2' object has no attribute 'internal_cache'

# properties defined in _metadata are retained
>>> df[['A', 'B']].added_property
property

如您所见,通过 _metadata 定义自定义属性的好处是,属性将在(大多数)一对一数据帧操作期间自动传播。请注意,在多对一数据帧操作(例如 merge()concat())期间,您的自定义属性仍将丢失。

关于python - 将数据源信息附加到 pandas 系列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52699153/

相关文章:

python - 无法覆盖 FFMPEG 的编码器标签

F# Powerpack 的元数据无法将 FSharp.Core 识别为 F# 库

python - Django Rest框架属性错误

python - Win7/Python3.3 : PyLint failed to load its plugins

python - 如何从两个间隔中获取分割月份?

python - 将列添加到具有重复序列的数据框中

python - BeautifulSoup attrs 返回列表而不是字典

Python散点图根据值绘制不同的颜色

pandas - 执行分层时是否应该保留类别的比例?

javascript - Node JS : storing music metadata on JSON