Daegu Apartment Price Prediction
Development of an apartment price prediction model in Daegu, South Korea, using machine learning. The pipeline includes preprocessing (encoding, VIF analysis), benchmarking 4 basic models, hyperparameter tuning via RandomizedSearchCV, ensemble methods (Voting & Stacking), and final evaluation on a holdout test set. The model is saved in .pkl format for deployment.
Detailed Insights
Data Cleaning & Preprocessing
4,123 raw data bars were reduced to 2,701 (34.5% removed) after duplicate removal. Feature engineering was performed by calculating Age = CurrentYear – YearBuilt, and multicollinearity was addressed (VIF: 142.4).
Model Benchmarking & Evaluation
Benchmarking using Linear Regression, Random Forest, XGBoost, and Decision Tree. Random Forest with Log Transform was selected as the best model with R² 0.786 on the test set.
Feature Importance
Apartment size (Size sqf) and hallway type (HallwayType terraced) dominated the predictions with a total contribution of ~47%. Proximity to universities confirmed the premium property prices in Daegu.
Tech Stack
Key Results
- R²=0.786, MAPE=18.7%, RMSE=47,708 KRW
- Top feature: Size(sqf) 30.7%
- Model saved (.pkl, 2.99 MB)