ML Systems & Infrastructure
ML Fundamentals for Engineers
Training Infrastructure
Model Serving
Evaluation and Testing
Production Operations
Specialized Systems and Capstone
Feature Engineering and Feature Stores
Every machine learning model, no matter how sophisticated, sees the world through features. A feature is a single measurable property of the phenomenon you are modeling — a column in the table that the model actually reads. Raw data almost never arrives in a form the model can use directly. A timestamp like "2025-03-15T14:32:07Z" means nothing to a gradient-boosted tree. But "hour of day = 14" and "is_weekend = false" and "minutes since last purchase = 47" — those are features the model can learn from.
The gap between raw data and useful features is where feature engineering lives, and it is often the single biggest lever for model quality. Teams that obsess over model architecture while ignoring feature quality are optimizing the wrong thing. A logistic regression with excellent features will outperform a deep neural network with poor features on most tabular problems.
Feature Types
Numerical features are continuous or discrete numbers the model uses directly. Age, transaction amount, number of items in cart, days since account creation. These are the simplest features because they are already numbers, but they still require thought — should you use raw age or age bucket? Raw transaction amount or log-transformed amount?
Categorical features represent discrete groups with no inherent ordering. Country, device type, subscription tier, payment method. Models cannot consume strings directly, so these must be encoded — one-hot encoding for low-cardinality features (payment method with 5 options), target encoding or embeddings for high-cardinality features (city with 10,000 options).
Embedding features are dense vector representations learned from data. A user embedding from a collaborative filtering model captures purchasing behavior in 64 dimensions. A text embedding from a sentence transformer captures semantic meaning in 384 dimensions. These are powerful because they compress complex patterns into fixed-size vectors the model can use alongside tabular features.
Aggregation features summarize behavior over time windows. Count of purchases in the last 7 days, average session duration over the past month, maximum transaction amount in the last hour. These are often the most predictive features in fraud detection, recommendation, and churn prediction because they capture behavioral trends that point-in-time snapshots miss.
The difference between a mediocre model and a great model is almost always features, not architecture. A senior ML engineer spends more time understanding the data and crafting features than tuning hyperparameters. Domain knowledge is the secret ingredient — knowing that 'days since last login' matters more than 'total logins' for churn prediction requires understanding the business, not just the math.
Why Feature Engineering Is Hard at Scale
On a laptop with a CSV file, feature engineering is straightforward — write a pandas script, compute your features, train your model. At production scale, everything changes. Features must be computed for millions of entities. Some features need real-time computation (what the user clicked 30 seconds ago). Others need historical computation (average spend over 90 days). The same feature must produce identical values during training and serving, or the model behaves unpredictably.
This is the problem that feature stores solve, and the rest of this lesson builds toward that architecture. But first, you need to understand the transformation patterns that turn raw data into model-ready features.