有没有办法将数据源的信息附加到 pandas 系列?目前,我只是在数据框中添加列以指示每个变量的来源...
非常感谢您的想法和建议!
最佳答案
来自官方pandas documentation :
To let original data structures have additional properties, you should let
pandas
know what properties are added.pandas
maps unknown properties to data names overriding__getattribute__
. Defining original properties can be done in one of 2 ways:
Define
_internal_names
and_internal_names_set
for temporary properties which WILL NOT be passed to manipulation results.Define
_metadata
for normal properties which will be passed to manipulation results.Below is an example to define two original properties, “internal_cache” as a temporary property and “added_property” as a normal property
class SubclassedDataFrame2(DataFrame): # temporary properties _internal_names = pd.DataFrame._internal_names + ['internal_cache'] _internal_names_set = set(_internal_names) # normal properties _metadata = ['added_property'] @property def _constructor(self): return SubclassedDataFrame2
_
>>> df = SubclassedDataFrame2({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}) >>> df A B C 0 1 4 7 1 2 5 8 2 3 6 9 >>> df.internal_cache = 'cached' >>> df.added_property = 'property' >>> df.internal_cache cached >>> df.added_property property # properties defined in _internal_names is reset after manipulation >>> df[['A', 'B']].internal_cache AttributeError: 'SubclassedDataFrame2' object has no attribute 'internal_cache' # properties defined in _metadata are retained >>> df[['A', 'B']].added_property property
如您所见,通过 _metadata
定义自定义属性的好处是,属性将在(大多数)一对一数据帧操作期间自动传播。请注意,在多对一数据帧操作(例如 merge()
或 concat()
)期间,您的自定义属性仍将丢失。
关于python - 将数据源信息附加到 pandas 系列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52699153/