PCA's Impact on Regression Performance
A comparative empirical study of how Principal Component Analysis affects prediction metrics across regression models — including a comparison against XGBoost regression on shared datasets.
Mar 2023 - Aug 2023 • 5 months
Tech Stack
Pythonscikit-learnXGBoostStatistical Analysis
Summary
Dimensionality reduction is often applied as a regularization or noise-suppression step before regression. But how much does it actually help, and when does it hurt? This study runs a controlled comparison.
- Models compared: linear regression, ridge, random forest, XGBoost
- Metrics: RMSE, MAE, R²
- Variables: with vs. without PCA pre-processing, varying numbers of retained components
- Findings: PCA helps tree-based models least and ill-conditioned linear models most