Credit Risk Forecasting
1. Project Overview
A full pipeline for the CS5344 (Track 2) finance task: ingest tabular/temporal credit data, engineer domain-specific features, benchmark baseline models, and train optimized ensembles for default risk prediction. Repository structure covers data prep, feature engineering, baseline experiments, Bayesian hyperparameter search, and final models ready for evaluation.
2. Workflow & Components
- Data & docs:
data/, column and feature explanations (列说明文档.md,特征说明文档.md,Feature_Documentation_EN.md). - Feature engineering:
feature_engineering/scripts plusfeature_tests/for validation. - Baselines:
baseline.pywith results stored underbaseline_models/results/. - Optimization:
Bayesian Optimization.pyfor tuning key hyperparameters. - Final models: packaged in
final_models/withmodel.pyfor inference. - Utilities:
requirements.txtfor environment setup; PDF problem statement (CS5344_Formal_Problem_Formulation.pdf).
3. Models & Techniques
- Gradient boosting and tree ensembles as core predictors.
- Bayesian optimization to search learning rates, depths, and regularization.
- Feature importance checks to align signals with financial intuition.
- Train/validation splits and hold-out tests for generalization.
4. How to Run
pip install -r requirements.txt- Prepare data under
data/following the provided column spec. - Run baselines:
python baseline.py(results inbaseline_models/results/). - Tune:
python "Bayesian Optimization.py"to sweep hyperparameters. - Train/evaluate final model:
python model.py.
5. Highlights
- Clear documentation for columns/features eases reproducibility.
- Modular scripts split by stage (baseline → feature eng → tuning → final).
- Results and artifacts versioned for comparison.