Akbar Kanugraha

Data Analyst | Data Scientist

0%
Back to Portfolio
Machine Learning & Regression

Daegu Apartment Price Prediction

Development of an apartment price prediction model in Daegu, South Korea, using machine learning. The pipeline includes preprocessing (encoding, VIF analysis), benchmarking 4 basic models, hyperparameter tuning via RandomizedSearchCV, ensemble methods (Voting & Stacking), and final evaluation on a holdout test set. The model is saved in .pkl format for deployment.

    Daegu Apartment Price Prediction

Detailed Insights

Data Cleaning & Preprocessing

4,123 raw data bars were reduced to 2,701 (34.5% removed) after duplicate removal. Feature engineering was performed by calculating Age = CurrentYear – YearBuilt, and multicollinearity was addressed (VIF: 142.4).

Model Benchmarking & Evaluation

Benchmarking using Linear Regression, Random Forest, XGBoost, and Decision Tree. Random Forest with Log Transform was selected as the best model with R² 0.786 on the test set.

Feature Importance

Apartment size (Size sqf) and hallway type (HallwayType terraced) dominated the predictions with a total contribution of ~47%. Proximity to universities confirmed the premium property prices in Daegu.

Tech Stack

Scikit-learnXGBoostRandom ForestRandomizedSearchCV

Key Results

  • R²=0.786, MAPE=18.7%, RMSE=47,708 KRW
  • Top feature: Size(sqf) 30.7%
  • Model saved (.pkl, 2.99 MB)