Adding meta-information/metadata to pandas DataFrame
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Pandas DataFrames do not have a dedicated metadata system, but there are several ways to attach extra information like data source, creation date, units, or column descriptions. The main approaches are using the attrs dictionary (pandas 1.0+), setting custom attributes directly on the DataFrame, or storing metadata alongside the data in a wrapper structure. Each approach has different trade-offs for persistence and survival through operations like copy, merge, and serialization.
Using DataFrame.attrs (Recommended)
Since pandas 1.0, every DataFrame has an attrs dictionary that is preserved through most pandas operations:
attrs is propagated through copy(), slicing, and many pandas operations. However, it is not guaranteed to survive all operations — complex transforms like merge() or groupby() may drop it.
Custom Attributes on the DataFrame
You can set arbitrary attributes directly on a DataFrame instance:
This works but the attributes are lost on any operation that returns a new DataFrame:
Custom attributes only survive on the exact same object. Any pandas operation that creates a new DataFrame (filtering, sorting, merging) loses them.
Storing Metadata in a Wrapper Class
For metadata that must survive all operations, wrap the DataFrame:
Column-Level Metadata
To describe individual columns, store descriptions alongside the DataFrame:
Persisting Metadata to Disk
HDF5 (Best for Metadata)
Parquet (Partial Support)
JSON Sidecar File
Metadata Survival Through Operations
| Operation | attrs preserved | Custom attributes preserved |
df.copy() | Yes | No |
df[df['col'] > 0] | Yes | No |
df.head() | Yes | No |
df.merge(other) | No | No |
df.groupby().agg() | No | No |
pd.concat([df1, df2]) | No | No |
df.to_csv() / read_csv() | No | No |
df.to_parquet() / read_parquet() | Partial | No |
Common Pitfalls
- Assuming
attrssurvives all operations:attrsis propagated through simple operations likecopy()and slicing, butmerge(),groupby(),concat(), andpivot_table()do not preserve it. Always re-attach metadata after complex transforms. - Setting custom attributes and expecting persistence:
df.my_attr = 'value'works on the current object but is lost whenever pandas creates a new DataFrame. This happens on nearly every operation. - Storing metadata in CSV files: CSV has no mechanism for metadata. The data is saved but
attrsand custom attributes are lost. Use HDF5, Parquet, or a sidecar JSON file instead. - Mutating
attrson a view vs a copy: Slicing may return a view or a copy depending on the operation. Modifyingattrson a view can unexpectedly modify the original DataFrame'sattrstoo. - Using
__dict__for metadata: Whiledf.__dict__stores custom attributes, directly manipulating it is fragile and not part of the pandas API. Useattrsfor metadata that pandas explicitly supports.
Summary
- Use
df.attrs(pandas 1.0+) for metadata that survives copy and slicing operations - Custom attributes (
df.source = 'x') are lost on any operation that creates a new DataFrame - Wrap DataFrames in a custom class when metadata must survive all operations
- HDF5 is the best format for persisting metadata alongside data
- CSV and most serialization formats do not preserve metadata — use sidecar files
- Always re-attach metadata after
merge(),groupby(),concat(), and similar transforms

