{"id":26252,"date":"2025-03-29T09:06:00","date_gmt":"2025-03-29T02:06:00","guid":{"rendered":"https:\/\/interdata.vn\/blog\/?p=26252"},"modified":"2025-03-29T09:06:00","modified_gmt":"2025-03-29T02:06:00","slug":"xgboost-la-gi","status":"publish","type":"post","link":"https:\/\/interdata.vn\/blog\/xgboost-la-gi\/","title":{"rendered":"XGBoost l\u00e0 g\u00ec? C\u1ea5u tr\u00fac, T\u00ednh n\u0103ng &#038; \u1ee8ng d\u1ee5ng trong h\u1ecdc m\u00e1y"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_85 counter-hierarchy ez-toc-counter ez-toc-white ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">N\u1ed8I DUNG<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 eztoc-toggle-hide-by-default' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/interdata.vn\/blog\/xgboost-la-gi\/#XGBoost-la-gi\" >XGBoost l\u00e0 g\u00ec?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/interdata.vn\/blog\/xgboost-la-gi\/#Cach-hoat-dong-cua-XGBoost\" >C\u00e1ch ho\u1ea1t \u0111\u1ed9ng c\u1ee7a XGBoost<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/interdata.vn\/blog\/xgboost-la-gi\/#Gradient-Boosting\" >Gradient Boosting<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/interdata.vn\/blog\/xgboost-la-gi\/#Cai-tien-trong-XGBoost\" >C\u1ea3i ti\u1ebfn trong XGBoost<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/interdata.vn\/blog\/xgboost-la-gi\/#Cau-truc-cua-XGBoost\" >C\u1ea5u tr\u00fac c\u1ee7a XGBoost<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/interdata.vn\/blog\/xgboost-la-gi\/#Decision-Trees-Cay-quyet-dinh\" >Decision Trees (C\u00e2y quy\u1ebft \u0111\u1ecbnh)<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/interdata.vn\/blog\/xgboost-la-gi\/#Loss-Function-Ham-mat-mat\" >Loss Function (H\u00e0m m\u1ea5t m\u00e1t)<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/interdata.vn\/blog\/xgboost-la-gi\/#Learning-Rate-Toc-do-hoc\" >Learning Rate (T\u1ed1c \u0111\u1ed9 h\u1ecdc)<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/interdata.vn\/blog\/xgboost-la-gi\/#Regularization-Dieu-chuan\" >Regularization (\u0110i\u1ec1u chu\u1ea9n)<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/interdata.vn\/blog\/xgboost-la-gi\/#Cac-tinh-nang-cua-mo-hinh-XGBoost\" >C\u00e1c t\u00ednh n\u0103ng c\u1ee7a m\u00f4 h\u00ecnh XGBoost<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/interdata.vn\/blog\/xgboost-la-gi\/#Mot-so-loi-ich-va-han-che-cua-thuat-toan-XGBoost\" >M\u1ed9t s\u1ed1 l\u1ee3i \u00edch v\u00e0 h\u1ea1n ch\u1ebf c\u1ee7a thu\u1eadt to\u00e1n XGBoost<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/interdata.vn\/blog\/xgboost-la-gi\/#Loi-ich-cua-XGBoost\" >L\u1ee3i \u00edch c\u1ee7a XGBoost<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/interdata.vn\/blog\/xgboost-la-gi\/#Han-che-cua-XGBoost\" >H\u1ea1n ch\u1ebf c\u1ee7a XGBoost<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/interdata.vn\/blog\/xgboost-la-gi\/#So-sanh-XGBoost-voi-Gradient-Boosting\" >So s\u00e1nh XGBoost v\u1edbi Gradient Boosting<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-15\" href=\"https:\/\/interdata.vn\/blog\/xgboost-la-gi\/#Khac-biet-ve-Regularization-Dieu-chuan-hoa\" >Kh\u00e1c bi\u1ec7t v\u1ec1 Regularization (\u0110i\u1ec1u chu\u1ea9n h\u00f3a)<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-16\" href=\"https:\/\/interdata.vn\/blog\/xgboost-la-gi\/#Toc-do-va-Kha-nang-xu-ly-song-song\" >T\u1ed1c \u0111\u1ed9 v\u00e0 Kh\u1ea3 n\u0103ng x\u1eed l\u00fd song song<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-17\" href=\"https:\/\/interdata.vn\/blog\/xgboost-la-gi\/#Xu-ly-Gia-tri-bi-thieu-Missing-Values\" >X\u1eed l\u00fd Gi\u00e1 tr\u1ecb b\u1ecb thi\u1ebfu (Missing Values)<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-18\" href=\"https:\/\/interdata.vn\/blog\/xgboost-la-gi\/#Ung-dung-cua-XGBoost-hien-nay\" >\u1ee8ng d\u1ee5ng c\u1ee7a XGBoost hi\u1ec7n nay<\/a><\/li><\/ul><\/nav><\/div>\n<p>XGBoost l\u00e0 m\u1ed9t trong nh\u1eefng <a href=\"https:\/\/interdata.vn\/blog\/thuat-toan-algorithm\/\">thu\u1eadt to\u00e1n<\/a> h\u1ecdc m\u00e1y m\u1ea1nh m\u1ebd v\u00e0 ph\u1ed5 bi\u1ebfn hi\u1ec7n nay, \u0111\u1eb7c bi\u1ec7t trong c\u00e1c b\u00e0i to\u00e1n ph\u00e2n lo\u1ea1i v\u00e0 h\u1ed3i quy. \u0110\u01b0\u1ee3c x\u00e2y d\u1ef1ng d\u1ef1a tr\u00ean nguy\u00ean l\u00fd boosting, XGBoost mang l\u1ea1i hi\u1ec7u su\u1ea5t v\u01b0\u1ee3t tr\u1ed9i nh\u1edd c\u00e1c t\u00ednh n\u0103ng nh\u01b0 t\u1ed1i \u01b0u h\u00f3a b\u1ed9 nh\u1edb, x\u1eed l\u00fd song song v\u00e0 ch\u00ednh quy h\u00f3a. B\u00e0i vi\u1ebft n\u00e0y s\u1ebd gi\u00fap b\u1ea1n hi\u1ec3u r\u00f5 v\u1ec1 <a href=\"https:\/\/interdata.vn\/blog\/xgboost-la-gi\/\"><strong>XGBoost l\u00e0 g\u00ec<\/strong><\/a>, c\u00e1ch th\u1ee9c ho\u1ea1t \u0111\u1ed9ng, c\u1ea5u tr\u00fac v\u00e0 nh\u1eefng \u1ee9ng d\u1ee5ng n\u1ed5i b\u1eadt c\u1ee7a XGBoost trong h\u1ecdc m\u00e1y.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"XGBoost-la-gi\"><\/span>XGBoost l\u00e0 g\u00ec?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><strong>XGBoost<\/strong>, m\u1ed9t thu\u1eadt to\u00e1n h\u1ecdc m\u00e1y v\u1eadn h\u00e0nh theo nguy\u00ean t\u1eafc boosting v\u00e0 c\u1ee5 th\u1ec3 h\u01a1n l\u00e0 gradient boosting, \u0111\u00e3 tr\u1edf th\u00e0nh m\u1ed9t c\u00f4ng c\u1ee5 \u0111\u01b0\u1ee3c \u01b0a chu\u1ed9ng r\u1ed9ng r\u00e3i cho vi\u1ec7c gi\u1ea3i quy\u1ebft c\u00e1c b\u00e0i to\u00e1n thu\u1ed9c l\u0129nh v\u1ef1c ph\u00e2n lo\u1ea1i c\u0169ng nh\u01b0 h\u1ed3i quy.<\/p>\n<figure id=\"attachment_26265\" aria-describedby=\"caption-attachment-26265\" style=\"width: 800px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/interdata.vn\/blog\/wp-content\/uploads\/2025\/03\/XGBoost-la-gi.png\" alt=\"XGBoost l\u00e0 g\u00ec\" width=\"800\" height=\"449\" class=\"size-full wp-image-26265\" title=\"\" srcset=\"https:\/\/interdata.vn\/blog\/wp-content\/uploads\/2025\/03\/XGBoost-la-gi.png 800w, https:\/\/interdata.vn\/blog\/wp-content\/uploads\/2025\/03\/XGBoost-la-gi-300x168.png 300w, https:\/\/interdata.vn\/blog\/wp-content\/uploads\/2025\/03\/XGBoost-la-gi-768x431.png 768w, https:\/\/interdata.vn\/blog\/wp-content\/uploads\/2025\/03\/XGBoost-la-gi-750x421.png 750w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><figcaption id=\"caption-attachment-26265\" class=\"wp-caption-text\">XGBoost l\u00e0 g\u00ec?<\/figcaption><\/figure>\n<p>V\u1ec1 c\u01a1 b\u1ea3n, XGBoost \u0111\u01b0\u1ee3c xem nh\u01b0 m\u1ed9t phi\u00ean b\u1ea3n k\u1ebf th\u1eeba v\u00e0 m\u1edf r\u1ed9ng t\u1eeb thu\u1eadt to\u00e1n C\u00e2y Quy\u1ebft \u0111\u1ecbnh T\u0103ng c\u01b0\u1eddng Gradient (Gradient Boosting <a href=\"https:\/\/interdata.vn\/blog\/decision-tree-la-gi\/\">Decision Tree<\/a> &#8211; GBDT), tuy nhi\u00ean n\u00f3 mang trong m\u00ecnh h\u00e0ng lo\u1ea1t c\u1ea3i ti\u1ebfn \u0111\u00e1ng k\u1ec3 nh\u1eb1m n\u00e2ng cao hi\u1ec7u su\u1ea5t ho\u1ea1t \u0111\u1ed9ng v\u00e0 n\u0103ng l\u1ef1c x\u1eed l\u00fd c\u00e1c t\u1eadp d\u1eef li\u1ec7u quy m\u00f4 l\u1edbn.<\/p>\n<p>Nh\u1eefng \u0111i\u1ec3m l\u00e0m n\u00ean s\u1ef1 kh\u00e1c bi\u1ec7t c\u1ee7a XGBoost bao g\u1ed3m kh\u1ea3 n\u0103ng t\u1ed1i \u01b0u h\u00f3a qu\u00e1 tr\u00ecnh t\u00ednh to\u00e1n, c\u01a1 ch\u1ebf t\u00edch h\u1ee3p nh\u1eb1m h\u1ea1n ch\u1ebf hi\u1ec7n t\u01b0\u1ee3ng qu\u00e1 kh\u1edbp (overfitting), c\u00f9ng v\u1edbi \u0111\u00f3 l\u00e0 n\u0103ng l\u1ef1c th\u1ef1c thi c\u00e1c t\u00e1c v\u1ee5 t\u00ednh to\u00e1n m\u1ed9t c\u00e1ch song song.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Cach-hoat-dong-cua-XGBoost\"><\/span>C\u00e1ch ho\u1ea1t \u0111\u1ed9ng c\u1ee7a XGBoost<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>C\u01a1 ch\u1ebf v\u1eadn h\u00e0nh c\u1ee7a XGBoost tu\u00e2n th\u1ee7 ch\u1eb7t ch\u1ebd nguy\u00ean l\u00fd c\u1ed1t l\u00f5i c\u1ee7a boosting. Theo \u0111\u00f3, c\u00e1c m\u00f4 h\u00ecnh h\u1ecdc m\u00e1y kh\u00f4ng \u0111\u01b0\u1ee3c t\u1ea1o ra \u0111\u1ed9c l\u1eadp m\u00e0 \u0111\u01b0\u1ee3c x\u00e2y d\u1ef1ng tu\u1ea7n t\u1ef1 th\u00e0nh m\u1ed9t chu\u1ed7i. Trong chu\u1ed7i n\u00e0y, m\u1ed7i m\u00f4 h\u00ecnh m\u1edbi \u0111\u01b0\u1ee3c hu\u1ea5n luy\u1ec7n v\u1edbi m\u1ee5c ti\u00eau ch\u00ednh l\u00e0 kh\u1eafc ph\u1ee5c nh\u1eefng \u0111i\u1ec3m y\u1ebfu hay sai s\u00f3t c\u00f2n t\u1ed3n t\u1ea1i t\u1eeb m\u00f4 h\u00ecnh li\u1ec1n tr\u01b0\u1edbc n\u00f3.<\/p>\n<p>\u0110\u1ec3 th\u1ef1c hi\u1ec7n \u0111i\u1ec1u n\u00e0y, XGBoost ch\u1ecdn c\u00e2y quy\u1ebft \u0111\u1ecbnh (decision trees) l\u00e0m \u0111\u01a1n v\u1ecb m\u00f4 h\u00ecnh c\u01a1 s\u1edf (base model) v\u00e0 \u00e1p d\u1ee5ng k\u1ef9 thu\u1eadt gradient boosting \u0111\u1ec3 tu\u1ea7n t\u1ef1 x\u00e2y d\u1ef1ng n\u00ean c\u00e1c c\u00e2y quy\u1ebft \u0111\u1ecbnh n\u00e0y, h\u01b0\u1edbng ch\u00fang t\u1edbi vi\u1ec7c gi\u1ea3m thi\u1ec3u l\u1ed7i t\u1ed5ng th\u1ec3 c\u1ee7a h\u1ec7 th\u1ed1ng.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Gradient-Boosting\"><\/span>Gradient Boosting<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Gradient Boosting l\u00e0 m\u1ed9t k\u1ef9 thu\u1eadt h\u1ecdc m\u00e1y ensemble, t\u1eadp trung v\u00e0o vi\u1ec7c x\u00e2y d\u1ef1ng m\u00f4 h\u00ecnh m\u1ed9t c\u00e1ch c\u1ed9ng d\u1ed3n, t\u1eebng b\u01b0\u1edbc m\u1ed9t, nh\u1eb1m li\u00ean t\u1ee5c c\u1ea3i thi\u1ec7n \u0111\u1ed9 ch\u00ednh x\u00e1c.<\/p>\n<p>Tr\u1ecdng t\u00e2m c\u1ee7a ph\u01b0\u01a1ng ph\u00e1p n\u00e0y l\u00e0 vi\u1ec7c <strong>t\u1ed1i thi\u1ec3u h\u00f3a m\u1ed9t h\u00e0m m\u1ea5t m\u00e1t (loss function) \u0111\u00e3 \u0111\u1ecbnh tr\u01b0\u1edbc<\/strong>. \u1ede m\u1ed7i giai \u0111o\u1ea1n l\u1eb7p, thu\u1eadt to\u00e1n s\u1ebd t\u1ea1o ra m\u1ed9t c\u00e2y quy\u1ebft \u0111\u1ecbnh m\u1edbi. Nhi\u1ec7m v\u1ee5 c\u1ee7a c\u00e2y n\u00e0y kh\u00f4ng ph\u1ea3i l\u00e0 d\u1ef1 \u0111o\u00e1n tr\u1ef1c ti\u1ebfp k\u1ebft qu\u1ea3 cu\u1ed1i c\u00f9ng, m\u00e0 l\u00e0 d\u1ef1 \u0111o\u00e1n ph\u1ea7n l\u1ed7i c\u00f2n l\u1ea1i (ph\u1ea7n d\u01b0 &#8211; residual) t\u1eeb t\u1eadp h\u1ee3p c\u00e1c c\u00e2y \u0111\u00e3 \u0111\u01b0\u1ee3c t\u1ea1o ra tr\u01b0\u1edbc \u0111\u00f3.<\/p>\n<p>C\u00e2y quy\u1ebft \u0111\u1ecbnh m\u1edbi \u0111\u01b0\u1ee3c hu\u1ea5n luy\u1ec7n sao cho n\u00f3 g\u00f3p ph\u1ea7n gi\u1ea3m thi\u1ec3u hi\u1ec7u qu\u1ea3 nh\u1ea5t gi\u00e1 tr\u1ecb l\u1ed7i c\u1ee7a to\u00e0n b\u1ed9 m\u00f4 h\u00ecnh t\u00ednh \u0111\u1ebfn th\u1eddi \u0111i\u1ec3m hi\u1ec7n t\u1ea1i. K\u1ebft qu\u1ea3 d\u1ef1 \u0111o\u00e1n cu\u1ed1i c\u00f9ng c\u1ee7a m\u00f4 h\u00ecnh Gradient Boosting l\u00e0 s\u1ef1 k\u1ebft h\u1ee3p c\u00f3 tr\u1ecdng s\u1ed1 c\u1ee7a t\u1ea5t c\u1ea3 c\u00e1c c\u00e2y quy\u1ebft \u0111\u1ecbnh \u0111\u00e3 \u0111\u01b0\u1ee3c x\u00e2y d\u1ef1ng trong su\u1ed1t qu\u00e1 tr\u00ecnh.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Cai-tien-trong-XGBoost\"><\/span>C\u1ea3i ti\u1ebfn trong XGBoost<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>XGBoost \u0111\u00e3 mang \u0111\u1ebfn nh\u1eefng n\u00e2ng c\u1ea5p gi\u00e1 tr\u1ecb so v\u1edbi ph\u01b0\u01a1ng ph\u00e1p gradient boosting c\u01a1 b\u1ea3n th\u00f4ng qua m\u1ed9t s\u1ed1 c\u01a1 ch\u1ebf \u0111\u1ed5i m\u1edbi:<\/p>\n<ul>\n<li><strong>\u0110i\u1ec1u chu\u1ea9n h\u00f3a (Regularization):<\/strong> Ngay trong qu\u00e1 tr\u00ecnh hu\u1ea5n luy\u1ec7n m\u00f4 h\u00ecnh, XGBoost \u0111\u00e3 t\u00edch h\u1ee3p hai d\u1ea1ng \u0111i\u1ec1u chu\u1ea9n (L1 v\u00e0 L2) v\u00e0o h\u00e0m m\u1ee5c ti\u00eau. Vi\u1ec7c n\u00e0y t\u1ea1o ra m\u1ed9t &#8220;r\u00e0o c\u1ea3n&#8221; ch\u1ed1ng l\u1ea1i s\u1ef1 ph\u1ee9c t\u1ea1p qu\u00e1 m\u1ee9c c\u1ee7a m\u00f4 h\u00ecnh, qua \u0111\u00f3 g\u00f3p ph\u1ea7n \u0111\u00e1ng k\u1ec3 v\u00e0o vi\u1ec7c h\u1ea1n ch\u1ebf t\u00ecnh tr\u1ea1ng qu\u00e1 kh\u1edbp, \u0111\u1eb7c bi\u1ec7t ph\u00e1t huy t\u00e1c d\u1ee5ng khi l\u00e0m vi\u1ec7c v\u1edbi nh\u1eefng b\u1ed9 d\u1eef li\u1ec7u c\u00f3 k\u00edch th\u01b0\u1edbc l\u1edbn ho\u1eb7c b\u1ea3n ch\u1ea5t ph\u1ee9c t\u1ea1p.<\/li>\n<li><strong>T\u00ednh to\u00e1n song song (Parallelization):<\/strong> M\u1ed9t \u01b0u \u0111i\u1ec3m v\u01b0\u1ee3t tr\u1ed9i so v\u1edbi nhi\u1ec1u phi\u00ean b\u1ea3n gradient boosting tr\u01b0\u1edbc \u0111\u00f3 l\u00e0 XGBoost \u0111\u01b0\u1ee3c thi\u1ebft k\u1ebf \u0111\u1ec3 t\u1eadn d\u1ee5ng kh\u1ea3 n\u0103ng x\u1eed l\u00fd song song c\u1ee7a ph\u1ea7n c\u1ee9ng. \u0110i\u1ec1u n\u00e0y cho ph\u00e9p thu\u1eadt to\u00e1n th\u1ef1c hi\u1ec7n nhi\u1ec1u ph\u00e9p t\u00ednh \u0111\u1ed3ng th\u1eddi, gi\u00fap r\u00fat ng\u1eafn \u0111\u00e1ng k\u1ec3 th\u1eddi gian c\u1ea7n thi\u1ebft cho vi\u1ec7c hu\u1ea5n luy\u1ec7n m\u00f4 h\u00ecnh.<\/li>\n<li><strong>T\u1ed1i \u01b0u b\u1ed9 nh\u1edb (Memory Efficiency):<\/strong> XGBoost tri\u1ec3n khai c\u00e1c c\u1ea5u tr\u00fac d\u1eef li\u1ec7u v\u00e0 thu\u1eadt to\u00e1n \u0111\u01b0\u1ee3c t\u1ed1i \u01b0u h\u00f3a \u0111\u1ec3 s\u1eed d\u1ee5ng b\u1ed9 nh\u1edb m\u1ed9t c\u00e1ch hi\u1ec7u qu\u1ea3. Nh\u1edd v\u1eady, n\u00f3 c\u00f3 kh\u1ea3 n\u0103ng x\u1eed l\u00fd c\u00e1c t\u1eadp d\u1eef li\u1ec7u r\u1ea5t l\u1edbn m\u00e0 kh\u00f4ng \u0111\u00f2i h\u1ecfi qu\u00e1 nhi\u1ec1u t\u00e0i nguy\u00ean b\u1ed9 nh\u1edb, l\u00e0m t\u0103ng t\u00ednh th\u1ef1c ti\u1ec5n khi \u00e1p d\u1ee5ng v\u00e0o c\u00e1c b\u00e0i to\u00e1n quy m\u00f4 l\u1edbn.<\/li>\n<li><strong>X\u1eed l\u00fd gi\u00e1 tr\u1ecb khuy\u1ebft (Missing Values Handling):<\/strong> Thu\u1eadt to\u00e1n n\u00e0y \u0111\u01b0\u1ee3c trang b\u1ecb kh\u1ea3 n\u0103ng x\u1eed l\u00fd t\u1ef1 \u0111\u1ed9ng c\u00e1c gi\u00e1 tr\u1ecb b\u1ecb thi\u1ebfu trong d\u1eef li\u1ec7u \u0111\u1ea7u v\u00e0o. Ng\u01b0\u1eddi d\u00f9ng kh\u00f4ng nh\u1ea5t thi\u1ebft ph\u1ea3i th\u1ef1c hi\u1ec7n c\u00e1c b\u01b0\u1edbc ti\u1ec1n x\u1eed l\u00fd ph\u1ee9c t\u1ea1p \u0111\u1ec3 l\u1ea5p \u0111\u1ea7y ho\u1eb7c lo\u1ea1i b\u1ecf c\u00e1c gi\u00e1 tr\u1ecb n\u00e0y, gi\u00fap ti\u1ebft ki\u1ec7m th\u1eddi gian v\u00e0 c\u00f4ng s\u1ee9c trong giai \u0111o\u1ea1n chu\u1ea9n b\u1ecb d\u1eef li\u1ec7u.<\/li>\n<\/ul>\n<h2><span class=\"ez-toc-section\" id=\"Cau-truc-cua-XGBoost\"><\/span>C\u1ea5u tr\u00fac c\u1ee7a XGBoost<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>\u0110\u1ec3 n\u1eafm b\u1eaft c\u00e1ch th\u1ee9c XGBoost v\u1eadn h\u00e0nh m\u1ed9t c\u00e1ch th\u1ea5u \u0111\u00e1o, vi\u1ec7c xem x\u00e9t k\u1ef9 l\u01b0\u1ee1ng c\u1ea5u tr\u00fac n\u1ec1n t\u1ea3ng c\u0169ng nh\u01b0 c\u00e1c y\u1ebfu t\u1ed1 c\u1ea5u th\u00e0nh ch\u1ee7 ch\u1ed1t c\u1ee7a thu\u1eadt to\u00e1n n\u00e0y l\u00e0 \u0111i\u1ec1u c\u1ea7n thi\u1ebft.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Decision-Trees-Cay-quyet-dinh\"><\/span>Decision Trees (C\u00e2y quy\u1ebft \u0111\u1ecbnh)<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Trong ki\u1ebfn tr\u00fac c\u1ee7a XGBoost, c\u00e2y quy\u1ebft \u0111\u1ecbnh \u0111\u00f3ng vai tr\u00f2 l\u00e0 nh\u1eefng <strong>m\u00f4 h\u00ecnh d\u1ef1 \u0111o\u00e1n \u0111\u01a1n l\u1ebb<\/strong>, l\u00e0m n\u1ec1n t\u1ea3ng cho to\u00e0n b\u1ed9 h\u1ec7 th\u1ed1ng ensemble.<\/p>\n<p>V\u1ec1 b\u1ea3n ch\u1ea5t, m\u1ed9t c\u00e2y quy\u1ebft \u0111\u1ecbnh ho\u1ea1t \u0111\u1ed9ng nh\u01b0 m\u1ed9t l\u01b0u \u0111\u1ed3, n\u01a1i d\u1eef li\u1ec7u \u0111\u01b0\u1ee3c ph\u00e2n t\u00e1ch t\u1ea1i c\u00e1c n\u00fat d\u1ef1a tr\u00ean gi\u00e1 tr\u1ecb c\u1ee7a c\u00e1c \u0111\u1eb7c tr\u01b0ng c\u1ee5 th\u1ec3. M\u1ed7i n\u00fat \u0111\u1ea1i di\u1ec7n cho m\u1ed9t b\u00e0i ki\u1ec3m tra (ph\u00e9p chia) \u0111\u1ed1i v\u1edbi m\u1ed9t \u0111\u1eb7c tr\u01b0ng, v\u00e0 c\u00e1c nh\u00e1nh ph\u00e1t sinh t\u1eeb n\u00fat \u0111\u00f3 t\u01b0\u01a1ng \u1ee9ng v\u1edbi c\u00e1c k\u1ebft qu\u1ea3 c\u00f3 th\u1ec3 c\u1ee7a b\u00e0i ki\u1ec3m tra.<\/p>\n<p>B\u1eb1ng c\u00e1ch \u0111i theo c\u00e1c nh\u00e1nh t\u1eeb g\u1ed1c \u0111\u1ebfn l\u00e1, c\u00e2y quy\u1ebft \u0111\u1ecbnh \u0111\u01b0a ra m\u1ed9t d\u1ef1 \u0111o\u00e1n. C\u1ea5u tr\u00fac n\u00e0y cho ph\u00e9p m\u00f4 h\u00ecnh h\u1ecdc v\u00e0 bi\u1ec3u di\u1ec5n c\u00e1c m\u1ed1i li\u00ean h\u1ec7, \u0111\u00f4i khi ph\u1ee9c t\u1ea1p, gi\u1eefa c\u00e1c \u0111\u1eb7c tr\u01b0ng \u0111\u1ea7u v\u00e0o v\u00e0 bi\u1ebfn m\u1ee5c ti\u00eau c\u1ea7n d\u1ef1 \u0111o\u00e1n.<\/p>\n<figure id=\"attachment_26267\" aria-describedby=\"caption-attachment-26267\" style=\"width: 800px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/interdata.vn\/blog\/wp-content\/uploads\/2025\/03\/Cau-truc-cua-XGBoost.png\" alt=\"C\u1ea5u tr\u00fac c\u1ee7a XGBoost\" width=\"800\" height=\"500\" class=\"size-full wp-image-26267\" title=\"\" srcset=\"https:\/\/interdata.vn\/blog\/wp-content\/uploads\/2025\/03\/Cau-truc-cua-XGBoost.png 800w, https:\/\/interdata.vn\/blog\/wp-content\/uploads\/2025\/03\/Cau-truc-cua-XGBoost-300x188.png 300w, https:\/\/interdata.vn\/blog\/wp-content\/uploads\/2025\/03\/Cau-truc-cua-XGBoost-768x480.png 768w, https:\/\/interdata.vn\/blog\/wp-content\/uploads\/2025\/03\/Cau-truc-cua-XGBoost-750x469.png 750w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><figcaption id=\"caption-attachment-26267\" class=\"wp-caption-text\">C\u1ea5u tr\u00fac c\u1ee7a XGBoost<\/figcaption><\/figure>\n<h3><span class=\"ez-toc-section\" id=\"Loss-Function-Ham-mat-mat\"><\/span>Loss Function (H\u00e0m m\u1ea5t m\u00e1t)<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><strong>\u00a0H\u00e0m m\u1ea5t m\u00e1t<\/strong> gi\u1eef m\u1ed9t v\u1ecb tr\u00ed trung t\u00e2m v\u00e0 kh\u00f4ng th\u1ec3 thi\u1ebfu trong quy tr\u00ecnh hu\u1ea5n luy\u1ec7n c\u1ee7a XGBoost. N\u00f3 \u0111\u00f3ng vai tr\u00f2 nh\u01b0 m\u1ed9t <strong>th\u01b0\u1edbc \u0111o \u0111\u1ecbnh l\u01b0\u1ee3ng m\u1ee9c \u0111\u1ed9 sai l\u1ec7ch<\/strong> gi\u1eefa d\u1ef1 \u0111o\u00e1n c\u1ee7a m\u00f4 h\u00ecnh v\u00e0 gi\u00e1 tr\u1ecb th\u1ef1c t\u1ebf.<\/p>\n<p>M\u1ee5c \u0111\u00edch xuy\u00ean su\u1ed1t c\u1ee7a thu\u1eadt to\u00e1n l\u00e0 \u0111i\u1ec1u ch\u1ec9nh c\u00e1c tham s\u1ed1, c\u1ee5 th\u1ec3 l\u00e0 c\u1ea5u tr\u00fac v\u00e0 tr\u1ecdng s\u1ed1 c\u1ee7a c\u00e1c c\u00e2y quy\u1ebft \u0111\u1ecbnh, sao cho gi\u00e1 tr\u1ecb c\u1ee7a h\u00e0m m\u1ea5t m\u00e1t n\u00e0y \u0111\u1ea1t m\u1ee9c t\u1ed1i thi\u1ec3u.<\/p>\n<p>XGBoost h\u1ed7 tr\u1ee3 nhi\u1ec1u lo\u1ea1i h\u00e0m m\u1ea5t m\u00e1t kh\u00e1c nhau, v\u00e0 vi\u1ec7c l\u1ef1a ch\u1ecdn h\u00e0m m\u1ea5t m\u00e1t ph\u00f9 h\u1ee3p ph\u1ee5 thu\u1ed9c ch\u1eb7t ch\u1ebd v\u00e0o b\u1ea3n ch\u1ea5t c\u1ee7a b\u00e0i to\u00e1n \u0111ang gi\u1ea3i quy\u1ebft; v\u00ed d\u1ee5, h\u00e0m log-loss th\u01b0\u1eddng \u0111\u01b0\u1ee3c d\u00f9ng cho b\u00e0i to\u00e1n ph\u00e2n lo\u1ea1i nh\u1ecb ph\u00e2n, trong khi l\u1ed7i b\u00ecnh ph\u01b0\u01a1ng trung b\u00ecnh (Mean Squared Error &#8211; MSE) l\u1ea1i th\u00edch h\u1ee3p cho c\u00e1c b\u00e0i to\u00e1n h\u1ed3i quy.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Learning-Rate-Toc-do-hoc\"><\/span>Learning Rate (T\u1ed1c \u0111\u1ed9 h\u1ecdc)<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><strong>T\u1ed1c \u0111\u1ed9 h\u1ecdc (learning rate)<\/strong> l\u00e0 m\u1ed9t si\u00eau tham s\u1ed1 c\u00f3 t\u1ea7m \u1ea3nh h\u01b0\u1edfng l\u1edbn, \u0111\u00f3ng vai tr\u00f2 <strong>\u0111i\u1ec1u ti\u1ebft trong qu\u00e1 tr\u00ecnh hu\u1ea5n luy\u1ec7n c\u1ee7a XGBoost<\/strong>. Tham s\u1ed1 n\u00e0y quy \u0111\u1ecbnh &#8220;tr\u1ecdng l\u01b0\u1ee3ng&#8221; hay m\u1ee9c \u0111\u1ed9 \u0111\u00f3ng g\u00f3p c\u1ee7a m\u1ed7i c\u00e2y quy\u1ebft \u0111\u1ecbnh m\u1edbi \u0111\u01b0\u1ee3c th\u00eam v\u00e0o m\u00f4 h\u00ecnh t\u1ed5ng th\u1ec3 trong t\u1eebng b\u01b0\u1edbc l\u1eb7p.<\/p>\n<p>Vi\u1ec7c thi\u1ebft l\u1eadp m\u1ed9t t\u1ed1c \u0111\u1ed9 h\u1ecdc qu\u00e1 l\u1edbn c\u00f3 th\u1ec3 khi\u1ebfn m\u00f4 h\u00ecnh h\u1ecdc qu\u00e1 nhanh v\u00e0 tr\u1edf n\u00ean qu\u00e1 kh\u1edbp (overfitting) v\u1edbi d\u1eef li\u1ec7u hu\u1ea5n luy\u1ec7n, m\u1ea5t \u0111i kh\u1ea3 n\u0103ng t\u1ed5ng qu\u00e1t h\u00f3a. Ng\u01b0\u1ee3c l\u1ea1i, m\u1ed9t t\u1ed1c \u0111\u1ed9 h\u1ecdc qu\u00e1 nh\u1ecf l\u1ea1i c\u00f3 th\u1ec3 l\u00e0m cho qu\u00e1 tr\u00ecnh h\u1ed9i t\u1ee5 di\u1ec5n ra r\u1ea5t ch\u1eadm, \u0111\u00f2i h\u1ecfi nhi\u1ec1u b\u01b0\u1edbc l\u1eb7p h\u01a1n v\u00e0 c\u00f3 nguy c\u01a1 b\u1ecb k\u1eb9t \u1edf m\u1ed9t gi\u1ea3i ph\u00e1p ch\u01b0a ph\u1ea3i l\u00e0 t\u1ed1t nh\u1ea5t.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Regularization-Dieu-chuan\"><\/span>Regularization (\u0110i\u1ec1u chu\u1ea9n)<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>C\u01a1 ch\u1ebf \u0111i\u1ec1u chu\u1ea9n (Regularization) l\u00e0 m\u1ed9t th\u00e0nh ph\u1ea7n then ch\u1ed1t \u0111\u01b0\u1ee3c t\u00edch h\u1ee3p trong XGBoost, \u0111\u00f3ng g\u00f3p tr\u1ef1c ti\u1ebfp v\u00e0o s\u1ef1 th\u00e0nh c\u00f4ng c\u1ee7a thu\u1eadt to\u00e1n n\u00e0y. Ch\u1ee9c n\u0103ng ch\u00ednh c\u1ee7a n\u00f3 l\u00e0 ki\u1ec3m so\u00e1t \u0111\u1ed9 ph\u1ee9c t\u1ea1p c\u1ee7a m\u00f4 h\u00ecnh h\u1ecdc \u0111\u01b0\u1ee3c, ng\u0103n ch\u1eb7n vi\u1ec7c m\u00f4 h\u00ecnh tr\u1edf n\u00ean qu\u00e1 tinh vi v\u00e0 ch\u1ec9 ph\u00f9 h\u1ee3p v\u1edbi d\u1eef li\u1ec7u hu\u1ea5n luy\u1ec7n (t\u1ee9c l\u00e0 tr\u00e1nh overfitting).<\/p>\n<p>XGBoost \u00e1p d\u1ee5ng hai k\u1ef9 thu\u1eadt \u0111i\u1ec1u chu\u1ea9n ph\u1ed5 bi\u1ebfn l\u00e0 L1 (c\u00f2n g\u1ecdi l\u00e0 Lasso) v\u00e0 L2 (c\u00f2n g\u1ecdi l\u00e0 Ridge). C\u00e1c k\u1ef9 thu\u1eadt n\u00e0y th\u00eam &#8220;h\u00ecnh ph\u1ea1t&#8221; v\u00e0o h\u00e0m m\u1ee5c ti\u00eau d\u1ef1a tr\u00ean \u0111\u1ed9 l\u1edbn c\u1ee7a c\u00e1c tr\u1ecdng s\u1ed1 trong m\u00f4 h\u00ecnh, qua \u0111\u00f3 khuy\u1ebfn kh\u00edch m\u00f4 h\u00ecnh s\u1eed d\u1ee5ng c\u00e1c \u0111\u1eb7c tr\u01b0ng m\u1ed9t c\u00e1ch c\u00e2n b\u1eb1ng h\u01a1n v\u00e0 gi\u1ea3m thi\u1ec3u s\u1ef1 ph\u1ee5 thu\u1ed9c qu\u00e1 m\u1ee9c v\u00e0o b\u1ea5t k\u1ef3 m\u1ed9t ho\u1eb7c m\u1ed9t v\u00e0i \u0111\u1eb7c tr\u01b0ng \u0111\u01a1n l\u1ebb n\u00e0o.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Cac-tinh-nang-cua-mo-hinh-XGBoost\"><\/span>C\u00e1c t\u00ednh n\u0103ng c\u1ee7a m\u00f4 h\u00ecnh XGBoost<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>M\u00f4 h\u00ecnh XGBoost l\u00e0 m\u1ed9t tri\u1ec3n khai ph\u1ed5 bi\u1ebfn c\u1ee7a ph\u01b0\u01a1ng ph\u00e1p gradient boosting. D\u01b0\u1edbi \u0111\u00e2y l\u00e0 m\u1ed9t s\u1ed1 t\u00ednh n\u0103ng ho\u1eb7c ch\u1ec9 s\u1ed1 c\u1ee7a XGBoost khi\u1ebfn n\u00f3 tr\u1edf n\u00ean th\u00fa v\u1ecb:<\/p>\n<p><strong>Ch\u00ednh quy h\u00f3a (Regularization): <\/strong>XGBoost c\u00f3 t\u00f9y ch\u1ecdn \u0111\u1ec3 x\u1eed ph\u1ea1t c\u00e1c m\u00f4 h\u00ecnh ph\u1ee9c t\u1ea1p th\u00f4ng qua c\u1ea3 ch\u00ednh quy h\u00f3a L1 v\u00e0 L2. Ch\u00ednh quy h\u00f3a gi\u00fap ng\u0103n ng\u1eeba hi\u1ec7n t\u01b0\u1ee3ng overfitting (qu\u00e1 kh\u1edbp m\u00f4 h\u00ecnh).<\/p>\n<p><strong>X\u1eed l\u00fd d\u1eef li\u1ec7u th\u01b0a (Sparse data): <\/strong>C\u00e1c gi\u00e1 tr\u1ecb b\u1ecb thi\u1ebfu ho\u1eb7c c\u00e1c b\u01b0\u1edbc x\u1eed l\u00fd d\u1eef li\u1ec7u nh\u01b0 m\u00e3 h\u00f3a one-hot c\u00f3 th\u1ec3 l\u00e0m cho d\u1eef li\u1ec7u tr\u1edf n\u00ean th\u01b0a. B\u1ed9 ph\u00e2n lo\u1ea1i XGBoost s\u1eed d\u1ee5ng thu\u1eadt to\u00e1n ph\u00e2n chia d\u1eef li\u1ec7u nh\u1eadn th\u1ee9c v\u1ec1 t\u00ednh th\u01b0a \u0111\u1ec3 x\u1eed l\u00fd c\u00e1c ki\u1ec3u m\u1eabu th\u01b0a kh\u00e1c nhau trong d\u1eef li\u1ec7u.<\/p>\n<p><strong>Ph\u00e1c th\u1ea3o ph\u00e2n v\u1ecb c\u00f3 tr\u1ecdng s\u1ed1 (Weighted quantile sketch): <\/strong>H\u1ea7u h\u1ebft c\u00e1c thu\u1eadt to\u00e1n c\u00e2y quy\u1ebft \u0111\u1ecbnh hi\u1ec7n t\u1ea1i c\u00f3 th\u1ec3 t\u00ecm \u0111i\u1ec3m ph\u00e2n chia khi c\u00e1c \u0111i\u1ec3m d\u1eef li\u1ec7u c\u00f3 tr\u1ecdng s\u1ed1 b\u1eb1ng nhau (s\u1eed d\u1ee5ng thu\u1eadt to\u00e1n ph\u00e2n v\u1ecb ph\u00e1c th\u1ea3o). Tuy nhi\u00ean, ch\u00fang kh\u00f4ng \u0111\u01b0\u1ee3c trang b\u1ecb \u0111\u1ec3 x\u1eed l\u00fd d\u1eef li\u1ec7u c\u00f3 tr\u1ecdng s\u1ed1. XGBoost c\u00f3 m\u1ed9t thu\u1eadt to\u00e1n ph\u00e1c th\u1ea3o ph\u00e2n v\u1ecb c\u00f3 tr\u1ecdng s\u1ed1 ph\u00e2n t\u00e1n \u0111\u1ec3 x\u1eed l\u00fd hi\u1ec7u qu\u1ea3 d\u1eef li\u1ec7u c\u00f3 tr\u1ecdng s\u1ed1.<\/p>\n<p><strong>C\u1ea5u tr\u00fac kh\u1ed1i cho h\u1ecdc song song (Block structure for parallel learning): <\/strong>\u0110\u1ec3 t\u0103ng t\u1ed1c \u0111\u1ed9 t\u00ednh to\u00e1n, b\u1ed9 ph\u00e2n lo\u1ea1i XGBoost c\u00f3 th\u1ec3 s\u1eed d\u1ee5ng nhi\u1ec1u l\u00f5i tr\u00ean <a href=\"https:\/\/interdata.vn\/blog\/cpu-server\/\">CPU<\/a>. \u0110i\u1ec1u n\u00e0y tr\u1edf n\u00ean kh\u1ea3 thi nh\u1edd v\u00e0o c\u1ea5u tr\u00fac kh\u1ed1i trong thi\u1ebft k\u1ebf h\u1ec7 th\u1ed1ng c\u1ee7a n\u00f3. D\u1eef li\u1ec7u \u0111\u01b0\u1ee3c s\u1eafp x\u1ebfp v\u00e0 l\u01b0u tr\u1eef trong c\u00e1c \u0111\u01a1n v\u1ecb b\u1ed9 nh\u1edb g\u1ecdi l\u00e0 kh\u1ed1i. Kh\u00f4ng gi\u1ed1ng nh\u01b0 c\u00e1c thu\u1eadt to\u00e1n kh\u00e1c, ph\u01b0\u01a1ng ph\u00e1p n\u00e0y cho ph\u00e9p c\u00e1c v\u00f2ng l\u1eb7p ti\u1ebfp theo t\u00e1i s\u1eed d\u1ee5ng b\u1ed1 c\u1ee5c d\u1eef li\u1ec7u thay v\u00ec t\u00ednh to\u00e1n l\u1ea1i t\u1eeb \u0111\u1ea7u.<\/p>\n<p><strong>Nh\u1eadn th\u1ee9c b\u1ed9 nh\u1edb \u0111\u1ec7m (Cache awareness): <\/strong>Trong h\u1ecdc m\u00e1y XGBoost, Scala y\u00eau c\u1ea7u truy c\u1eadp b\u1ed9 nh\u1edb kh\u00f4ng li\u00ean t\u1ee5c \u0111\u1ec3 l\u1ea5y c\u00e1c th\u1ed1ng k\u00ea gradient theo ch\u1ec9 m\u1ee5c h\u00e0ng. Do \u0111\u00f3, Tianqi Chen \u0111\u00e3 thi\u1ebft k\u1ebf XGBoost \u0111\u1ec3 t\u1ed1i \u01b0u h\u00f3a vi\u1ec7c s\u1eed d\u1ee5ng ph\u1ea7n c\u1ee9ng. Vi\u1ec7c t\u1ed1i \u01b0u n\u00e0y \u0111\u01b0\u1ee3c th\u1ef1c hi\u1ec7n b\u1eb1ng c\u00e1ch ph\u00e2n b\u1ed5 b\u1ed9 \u0111\u1ec7m n\u1ed9i b\u1ed9 cho m\u1ed7i lu\u1ed3ng, n\u01a1i c\u00e1c th\u1ed1ng k\u00ea gradient c\u00f3 th\u1ec3 \u0111\u01b0\u1ee3c l\u01b0u tr\u1eef. V\u00e0 c\u00e1c c\u00e2y song song n\u00e0y gi\u00fap c\u1ea3i thi\u1ec7n c\u00e1c thu\u1eadt to\u00e1n XGBoost nh\u1edd v\u00e0o s\u1ef1 h\u1ed7 tr\u1ee3 c\u1ee7a c\u00e1c ng\u00f4n ng\u1eef Julia v\u00e0 <a href=\"https:\/\/interdata.vn\/blog\/ngon-ngu-lap-trinh-java\/\">Java<\/a>.<\/p>\n<p><strong>T\u00ednh to\u00e1n ngo\u00e0i b\u1ed9 nh\u1edb (Out-of-core computing): <\/strong>T\u00ednh n\u0103ng n\u00e0y t\u1ed1i \u01b0u h\u00f3a kh\u00f4ng gian \u0111\u0129a c\u00f3 s\u1eb5n v\u00e0 t\u1ed1i \u0111a h\u00f3a vi\u1ec7c s\u1eed d\u1ee5ng n\u00f3 khi x\u1eed l\u00fd c\u00e1c b\u1ed9 d\u1eef li\u1ec7u l\u1edbn kh\u00f4ng v\u1eeba v\u1edbi b\u1ed9 nh\u1edb.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Mot-so-loi-ich-va-han-che-cua-thuat-toan-XGBoost\"><\/span>M\u1ed9t s\u1ed1 l\u1ee3i \u00edch v\u00e0 h\u1ea1n ch\u1ebf c\u1ee7a thu\u1eadt to\u00e1n XGBoost<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<h3><span class=\"ez-toc-section\" id=\"Loi-ich-cua-XGBoost\"><\/span>L\u1ee3i \u00edch c\u1ee7a XGBoost<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>XGBoost c\u00f3 kh\u1ea3 n\u0103ng m\u1edf r\u1ed9ng v\u00e0 hi\u1ec7u qu\u1ea3 cao, v\u00ec n\u00f3 \u0111\u01b0\u1ee3c thi\u1ebft k\u1ebf \u0111\u1ec3 x\u1eed l\u00fd c\u00e1c b\u1ed9 d\u1eef li\u1ec7u l\u1edbn v\u1edbi h\u00e0ng tri\u1ec7u ho\u1eb7c th\u1eadm ch\u00ed h\u00e0ng t\u1ef7 \u0111\u1ed1i t\u01b0\u1ee3ng v\u00e0 \u0111\u1eb7c tr\u01b0ng.<\/p>\n<p>XGBoost th\u1ef1c hi\u1ec7n c\u00e1c <strong>k\u1ef9 thu\u1eadt x\u1eed l\u00fd song song v\u00e0 s\u1eed d\u1ee5ng t\u1ed1i \u01b0u h\u00f3a ph\u1ea7n c\u1ee9ng<\/strong>, ch\u1eb3ng h\u1ea1n nh\u01b0 t\u0103ng t\u1ed1c GPU, \u0111\u1ec3 t\u0103ng t\u1ed1c qu\u00e1 tr\u00ecnh hu\u1ea5n luy\u1ec7n. Kh\u1ea3 n\u0103ng m\u1edf r\u1ed9ng v\u00e0 hi\u1ec7u qu\u1ea3 n\u00e0y khi\u1ebfn XGBoost tr\u1edf n\u00ean ph\u00f9 h\u1ee3p cho c\u00e1c \u1ee9ng d\u1ee5ng d\u1eef li\u1ec7u l\u1edbn v\u00e0 d\u1ef1 \u0111o\u00e1n th\u1eddi gian th\u1ef1c.<\/p>\n<p>XGBoost cung c\u1ea5p m\u1ed9t lo\u1ea1t c\u00e1c tham s\u1ed1 c\u00f3 th\u1ec3 t\u00f9y ch\u1ec9nh v\u00e0 k\u1ef9 thu\u1eadt ch\u00ednh quy h\u00f3a, cho ph\u00e9p ng\u01b0\u1eddi d\u00f9ng \u0111i\u1ec1u ch\u1ec9nh m\u00f4 h\u00ecnh theo nhu c\u1ea7u c\u1ee5 th\u1ec3 c\u1ee7a m\u00ecnh.<\/p>\n<p>XGBoost cung c\u1ea5p ph\u00e2n t\u00edch t\u1ea7m quan tr\u1ecdng c\u1ee7a \u0111\u1eb7c tr\u01b0ng t\u00edch h\u1ee3p, gi\u00fap x\u00e1c \u0111\u1ecbnh c\u00e1c \u0111\u1eb7c tr\u01b0ng c\u00f3 \u1ea3nh h\u01b0\u1edfng l\u1edbn nh\u1ea5t trong b\u1ed9 d\u1eef li\u1ec7u. Th\u00f4ng tin n\u00e0y c\u00f3 th\u1ec3 c\u00f3 gi\u00e1 tr\u1ecb trong vi\u1ec7c ch\u1ecdn l\u1ef1a \u0111\u1eb7c tr\u01b0ng, gi\u1ea3m chi\u1ec1u v\u00e0 hi\u1ec3u r\u00f5 h\u01a1n v\u1ec1 c\u00e1c m\u1eabu d\u1eef li\u1ec7u c\u01a1 b\u1ea3n.<\/p>\n<p>XGBoost kh\u00f4ng ch\u1ec9 th\u1ec3 hi\u1ec7n hi\u1ec7u su\u1ea5t xu\u1ea5t s\u1eafc m\u00e0 c\u00f2n tr\u1edf th\u00e0nh c\u00f4ng c\u1ee5 \u0111\u01b0\u1ee3c c\u00e1c<strong> nh\u00e0 khoa h\u1ecdc d\u1eef li\u1ec7u v\u00e0 chuy\u00ean gia h\u1ecdc m\u00e1y \u01b0a chu\u1ed9ng<\/strong> tr\u00ean to\u00e0n th\u1ebf gi\u1edbi. N\u00f3 li\u00ean t\u1ee5c v\u01b0\u1ee3t tr\u1ed9i so v\u1edbi c\u00e1c thu\u1eadt to\u00e1n kh\u00e1c trong c\u00e1c cu\u1ed9c thi Kaggle, cho th\u1ea5y hi\u1ec7u qu\u1ea3 trong vi\u1ec7c t\u1ea1o ra c\u00e1c m\u00f4 h\u00ecnh d\u1ef1 \u0111o\u00e1n ch\u1ea5t l\u01b0\u1ee3ng cao.<\/p>\n<figure id=\"attachment_26266\" aria-describedby=\"caption-attachment-26266\" style=\"width: 777px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/interdata.vn\/blog\/wp-content\/uploads\/2025\/03\/Loi-ich-va-han-che-cua-XGBoost.png\" alt=\"L\u1ee3i \u00edch v\u00e0 h\u1ea1n ch\u1ebf c\u1ee7a XGBoost\" width=\"777\" height=\"404\" class=\"size-full wp-image-26266\" title=\"\" srcset=\"https:\/\/interdata.vn\/blog\/wp-content\/uploads\/2025\/03\/Loi-ich-va-han-che-cua-XGBoost.png 777w, https:\/\/interdata.vn\/blog\/wp-content\/uploads\/2025\/03\/Loi-ich-va-han-che-cua-XGBoost-300x156.png 300w, https:\/\/interdata.vn\/blog\/wp-content\/uploads\/2025\/03\/Loi-ich-va-han-che-cua-XGBoost-768x399.png 768w, https:\/\/interdata.vn\/blog\/wp-content\/uploads\/2025\/03\/Loi-ich-va-han-che-cua-XGBoost-750x390.png 750w\" sizes=\"auto, (max-width: 777px) 100vw, 777px\" \/><figcaption id=\"caption-attachment-26266\" class=\"wp-caption-text\">L\u1ee3i \u00edch v\u00e0 h\u1ea1n ch\u1ebf c\u1ee7a XGBoost<\/figcaption><\/figure>\n<h3><span class=\"ez-toc-section\" id=\"Han-che-cua-XGBoost\"><\/span>H\u1ea1n ch\u1ebf c\u1ee7a XGBoost<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>XGBoost c\u00f3 th\u1ec3 t\u1ed1n nhi\u1ec1u t\u00e0i nguy\u00ean t\u00ednh to\u00e1n, \u0111\u1eb7c bi\u1ec7t khi hu\u1ea5n luy\u1ec7n c\u00e1c m\u00f4 h\u00ecnh ph\u1ee9c t\u1ea1p, l\u00e0m cho n\u00f3 \u00edt ph\u00f9 h\u1ee3p v\u1edbi c\u00e1c h\u1ec7 th\u1ed1ng h\u1ea1n ch\u1ebf t\u00e0i nguy\u00ean.<\/p>\n<p>M\u1eb7c d\u00f9 m\u1ea1nh m\u1ebd, XGBoost v\u1eabn c\u00f3 th\u1ec3 <strong>nh\u1ea1y c\u1ea3m v\u1edbi d\u1eef li\u1ec7u nhi\u1ec5u ho\u1eb7c c\u00e1c \u0111i\u1ec3m ngo\u1ea1i lai<\/strong>, y\u00eau c\u1ea7u ph\u1ea3i x\u1eed l\u00fd d\u1eef li\u1ec7u c\u1ea9n th\u1eadn \u0111\u1ec3 \u0111\u1ea1t hi\u1ec7u su\u1ea5t t\u1ed1i \u01b0u.<br \/>\nXGBoost c\u00f3 th\u1ec3 g\u1eb7p ph\u1ea3i v\u1ea5n \u0111\u1ec1 overfitting (qu\u00e1 kh\u1edbp) khi s\u1eed d\u1ee5ng c\u00e1c b\u1ed9 d\u1eef li\u1ec7u nh\u1ecf ho\u1eb7c khi s\u1eed d\u1ee5ng qu\u00e1 nhi\u1ec1u c\u00e2y trong m\u00f4 h\u00ecnh.<\/p>\n<p>M\u1eb7c d\u00f9 c\u00f3 c\u00e1c \u0111i\u1ec3m s\u1ed1 t\u1ea7m quan tr\u1ecdng c\u1ee7a \u0111\u1eb7c tr\u01b0ng, m\u00f4 h\u00ecnh t\u1ed5ng th\u1ec3 c\u00f3 th\u1ec3 kh\u00f3 gi\u1ea3i th\u00edch h\u01a1n so v\u1edbi c\u00e1c ph\u01b0\u01a1ng ph\u00e1p \u0111\u01a1n gi\u1ea3n h\u01a1n nh\u01b0 <a href=\"https:\/\/interdata.vn\/blog\/hoi-quy-tuyen-tinh\/\">h\u1ed3i quy tuy\u1ebfn t\u00ednh<\/a> hay c\u00e2y quy\u1ebft \u0111\u1ecbnh. S\u1ef1 thi\u1ebfu minh b\u1ea1ch n\u00e0y c\u00f3 th\u1ec3 l\u00e0 m\u1ed9t nh\u01b0\u1ee3c \u0111i\u1ec3m trong c\u00e1c l\u0129nh v\u1ef1c nh\u01b0 y t\u1ebf ho\u1eb7c t\u00e0i ch\u00ednh, n\u01a1i kh\u1ea3 n\u0103ng gi\u1ea3i th\u00edch l\u00e0 r\u1ea5t quan tr\u1ecdng.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"So-sanh-XGBoost-voi-Gradient-Boosting\"><\/span>So s\u00e1nh XGBoost v\u1edbi Gradient Boosting<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>XGBoost th\u1ef1c ch\u1ea5t l\u00e0 <strong>m\u1ed9t phi\u00ean b\u1ea3n t\u1ed1i \u01b0u v\u00e0 m\u1ea1nh m\u1ebd h\u01a1n<\/strong> c\u1ee7a thu\u1eadt to\u00e1n Gradient Boosting Machine (GBM). D\u00f9 chung g\u1ed1c r\u1ec5 l\u00e0 k\u1ef9 thu\u1eadt boosting v\u1edbi c\u00e2y quy\u1ebft \u0111\u1ecbnh, XGBoost mang l\u1ea1i nh\u1eefng <strong>c\u1ea3i ti\u1ebfn v\u01b0\u1ee3t tr\u1ed9i<\/strong> v\u1ec1 t\u1ed1c \u0111\u1ed9, kh\u1ea3 n\u0103ng ki\u1ec3m so\u00e1t m\u00f4 h\u00ecnh v\u00e0 x\u1eed l\u00fd d\u1eef li\u1ec7u hi\u1ec7u qu\u1ea3 h\u01a1n h\u1eb3n so v\u1edbi phi\u00ean b\u1ea3n ti\u1ec1n nhi\u1ec7m.<\/p>\n<p>C\u1ea3 hai thu\u1eadt to\u00e1n \u0111\u1ec1u x\u00e2y d\u1ef1ng m\u00f4 h\u00ecnh tu\u1ea7n t\u1ef1, nh\u01b0ng ch\u00ednh nh\u1eefng <strong>c\u1ea3i ti\u1ebfn k\u1ef9 thu\u1eadt c\u1ee5 th\u1ec3<\/strong> \u0111\u00e3 gi\u00fap XGBoost tr\u1edf th\u00e0nh l\u1ef1a ch\u1ecdn h\u00e0ng \u0111\u1ea7u trong nhi\u1ec1u b\u00e0i to\u00e1n. Ch\u00fang ta s\u1ebd c\u00f9ng kh\u00e1m ph\u00e1 nh\u1eefng kh\u00e1c bi\u1ec7t ch\u00ednh, gi\u00fap b\u1ea1n th\u1ea5y r\u00f5 v\u00ec sao XGBoost l\u1ea1i \u0111\u01b0\u1ee3c \u01b0a chu\u1ed9ng \u0111\u1ebfn v\u1eady trong c\u1ed9ng \u0111\u1ed3ng khoa h\u1ecdc d\u1eef li\u1ec7u hi\u1ec7n nay.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Khac-biet-ve-Regularization-Dieu-chuan-hoa\"><\/span>Kh\u00e1c bi\u1ec7t v\u1ec1 Regularization (\u0110i\u1ec1u chu\u1ea9n h\u00f3a)<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>\u0110i\u1ec3m kh\u00e1c bi\u1ec7t then ch\u1ed1t l\u00e0 XGBoost <strong>t\u00edch h\u1ee3p s\u1eb5n \u0111i\u1ec1u chu\u1ea9n h\u00f3a L1 (Lasso) v\u00e0 L2 (Ridge)<\/strong> v\u00e0o h\u00e0m m\u1ee5c ti\u00eau, gi\u00fap ki\u1ec3m so\u00e1t \u0111\u1ed9 ph\u1ee9c t\u1ea1p m\u00f4 h\u00ecnh hi\u1ec7u qu\u1ea3. Ng\u01b0\u1ee3c l\u1ea1i, GBM truy\u1ec1n th\u1ed1ng <strong>th\u01b0\u1eddng thi\u1ebfu c\u01a1 ch\u1ebf n\u00e0y<\/strong> v\u00e0 ch\u1ee7 y\u1ebfu d\u1ef1a v\u00e0o vi\u1ec7c gi\u1edbi h\u1ea1n c\u00e1c tham s\u1ed1 c\u1ee7a c\u00e2y \u0111\u1ec3 ch\u1ed1ng overfitting (qu\u00e1 kh\u1edbp).<\/p>\n<p>\u0110i\u1ec1u chu\u1ea9n h\u00f3a (Regularization) l\u00e0 k\u1ef9 thu\u1eadt th\u00eam m\u1ed9t th\u00e0nh ph\u1ea7n &#8220;ph\u1ea1t&#8221; v\u00e0o qu\u00e1 tr\u00ecnh hu\u1ea5n luy\u1ec7n \u0111\u1ec3 <strong>ng\u0103n m\u00f4 h\u00ecnh tr\u1edf n\u00ean qu\u00e1 ph\u1ee9c t\u1ea1p<\/strong>. XGBoost th\u1ef1c hi\u1ec7n \u0111i\u1ec1u n\u00e0y b\u1eb1ng c\u00e1ch ph\u1ea1t \u0111\u1ed9 l\u1edbn c\u1ee7a c\u00e1c tr\u1ecdng s\u1ed1 \u1edf n\u00fat l\u00e1, khuy\u1ebfn kh\u00edch m\u00f4 h\u00ecnh \u0111\u01a1n gi\u1ea3n v\u00e0 t\u1ed5ng qu\u00e1t h\u00f3a t\u1ed1t h\u01a1n. Vi\u1ec7c t\u00edch h\u1ee3p tr\u1ef1c ti\u1ebfp v\u00e0o h\u00e0m m\u1ee5c ti\u00eau gi\u00fap ki\u1ec3m so\u00e1t overfitting <strong>m\u1ed9t c\u00e1ch b\u00e0i b\u1ea3n h\u01a1n<\/strong>.<\/p>\n<p>Trong khi \u0111\u00f3, \u0111\u1ec3 ch\u1ed1ng overfitting, ng\u01b0\u1eddi d\u00f9ng GBM th\u01b0\u1eddng ph\u1ea3i tinh ch\u1ec9nh c\u00e1c tham s\u1ed1 nh\u01b0 <code>max_depth<\/code> (\u0111\u1ed9 s\u00e2u t\u1ed1i \u0111a c\u1ee7a c\u00e2y) hay <code>min_samples_leaf<\/code> (s\u1ed1 m\u1eabu t\u1ed1i thi\u1ec3u \u1edf l\u00e1). M\u1eb7c d\u00f9 c\u0169ng c\u00f3 t\u00e1c d\u1ee5ng, c\u00e1ch n\u00e0y <strong>kh\u00f4ng tr\u1ef1c ti\u1ebfp ki\u1ec3m so\u00e1t \u0111\u1ed9 l\u1edbn c\u1ee7a c\u00e1c tr\u1ecdng s\u1ed1<\/strong> nh\u01b0 L1\/L2 trong XGBoost, d\u1eabn \u0111\u1ebfn kh\u1ea3 n\u0103ng ki\u1ec3m so\u00e1t overfitting c\u00f3 ph\u1ea7n h\u1ea1n ch\u1ebf h\u01a1n, \u0111\u1eb7c bi\u1ec7t v\u1edbi d\u1eef li\u1ec7u ph\u1ee9c t\u1ea1p.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Toc-do-va-Kha-nang-xu-ly-song-song\"><\/span>T\u1ed1c \u0111\u1ed9 v\u00e0 Kh\u1ea3 n\u0103ng x\u1eed l\u00fd song song<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>XGBoost <strong>v\u01b0\u1ee3t tr\u1ed9i v\u1ec1 t\u1ed1c \u0111\u1ed9 hu\u1ea5n luy\u1ec7n<\/strong> nh\u1edd kh\u1ea3 n\u0103ng x\u1eed l\u00fd song song v\u00e0 c\u00e1c t\u1ed1i \u01b0u h\u00f3a c\u1ea5p th\u1ea5p. N\u00f3 c\u00f3 th\u1ec3 t\u1eadn d\u1ee5ng hi\u1ec7u qu\u1ea3 ph\u1ea7n c\u1ee9ng \u0111a l\u00f5i, \u0111i\u1ec1u m\u00e0 c\u00e1c thu\u1eadt to\u00e1n GBM c\u01a1 b\u1ea3n <strong>th\u01b0\u1eddng th\u1ef1c thi tu\u1ea7n t\u1ef1<\/strong> v\u00e0 kh\u00f4ng l\u00e0m \u0111\u01b0\u1ee3c. \u0110i\u1ec1u n\u00e0y t\u1ea1o ra kh\u00e1c bi\u1ec7t l\u1edbn v\u1ec1 th\u1eddi gian, nh\u1ea5t l\u00e0 v\u1edbi d\u1eef li\u1ec7u l\u1edbn.<\/p>\n<p>Kh\u1ea3 n\u0103ng x\u1eed l\u00fd song song (Parallel Processing) cho ph\u00e9p XGBoost <strong>th\u1ef1c hi\u1ec7n nhi\u1ec1u t\u00ednh to\u00e1n \u0111\u1ed3ng th\u1eddi<\/strong>. C\u1ee5 th\u1ec3, qu\u00e1 tr\u00ecnh t\u00ecm \u0111i\u1ec3m chia t\u1ed1t nh\u1ea5t cho c\u00e1c n\u00fat c\u00e2y c\u00f3 th\u1ec3 di\u1ec5n ra song song tr\u00ean nhi\u1ec1u l\u00f5i CPU. XGBoost t\u1ed5 ch\u1ee9c d\u1eef li\u1ec7u theo c\u1ea5u tr\u00fac &#8220;block&#8221; \u0111\u01b0\u1ee3c s\u1eafp x\u1ebfp tr\u01b0\u1edbc, gi\u00fap vi\u1ec7c truy c\u1eadp v\u00e0 t\u00ednh to\u00e1n song song <strong>tr\u1edf n\u00ean c\u1ef1c k\u1ef3 hi\u1ec7u qu\u1ea3<\/strong>.<\/p>\n<p>Ng\u01b0\u1ee3c l\u1ea1i, c\u00e1c tri\u1ec3n khai GBM truy\u1ec1n th\u1ed1ng th\u01b0\u1eddng x\u00e2y d\u1ef1ng c\u00e2y m\u1ed9t c\u00e1ch tu\u1ea7n t\u1ef1, <strong>kh\u00f4ng \u0111\u01b0\u1ee3c thi\u1ebft k\u1ebf \u0111\u1ec3 khai th\u00e1c s\u1ee9c m\u1ea1nh<\/strong> c\u1ee7a CPU \u0111a l\u00f5i. Ngo\u00e0i ra, XGBoost c\u00f2n c\u00f3 c\u00e1c t\u1ed1i \u01b0u h\u00f3a kh\u00e1c nh\u01b0 &#8220;cache-aware access&#8221;, gi\u00fap gi\u1ea3m th\u1eddi gian ch\u1edd \u0111\u1ee3i khi truy xu\u1ea5t d\u1eef li\u1ec7u t\u1eeb b\u1ed9 nh\u1edb, <strong>g\u00f3p ph\u1ea7n t\u0103ng th\u00eam t\u1ed1c \u0111\u1ed9<\/strong> x\u1eed l\u00fd chung c\u1ee7a thu\u1eadt to\u00e1n.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Xu-ly-Gia-tri-bi-thieu-Missing-Values\"><\/span>X\u1eed l\u00fd Gi\u00e1 tr\u1ecb b\u1ecb thi\u1ebfu (Missing Values)<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>XGBoost c\u00f3 kh\u1ea3 n\u0103ng <strong>t\u1ef1 \u0111\u1ed9ng x\u1eed l\u00fd gi\u00e1 tr\u1ecb b\u1ecb thi\u1ebfu<\/strong> m\u1ed9t c\u00e1ch th\u00f4ng minh nh\u1edd c\u01a1 ch\u1ebf &#8220;sparsity-awareness&#8221;. \u0110i\u1ec1u n\u00e0y tr\u00e1i ng\u01b0\u1ee3c v\u1edbi GBM truy\u1ec1n th\u1ed1ng, v\u1ed1n <strong>y\u00eau c\u1ea7u ng\u01b0\u1eddi d\u00f9ng ph\u1ea3i ti\u1ec1n x\u1eed l\u00fd<\/strong> (v\u00ed d\u1ee5: imputation &#8211; \u0111i\u1ec1n gi\u00e1 tr\u1ecb) c\u00e1c \u00f4 d\u1eef li\u1ec7u b\u1ecb thi\u1ebfu tr\u01b0\u1edbc khi \u0111\u01b0a v\u00e0o hu\u1ea5n luy\u1ec7n, gi\u00fap ti\u1ebft ki\u1ec7m \u0111\u00e1ng k\u1ec3 c\u00f4ng s\u1ee9c.<\/p>\n<p>Khi g\u1eb7p gi\u00e1 tr\u1ecb thi\u1ebfu, XGBoost <strong>kh\u00f4ng lo\u1ea1i b\u1ecf m\u1eabu d\u1eef li\u1ec7u \u0111\u00f3<\/strong>. Thay v\u00e0o \u0111\u00f3, trong qu\u00e1 tr\u00ecnh hu\u1ea5n luy\u1ec7n, n\u00f3 s\u1ebd th\u1eed \u0111\u01b0a m\u1eabu \u0111\u00f3 v\u00e0o c\u1ea3 nh\u00e1nh tr\u00e1i v\u00e0 nh\u00e1nh ph\u1ea3i c\u1ee7a m\u1ed9t n\u00fat, sau \u0111\u00f3 &#8220;h\u1ecdc&#8221; xem h\u01b0\u1edbng \u0111i n\u00e0o (tr\u00e1i hay ph\u1ea3i) l\u00e0 t\u1ed1t nh\u1ea5t cho c\u00e1c gi\u00e1 tr\u1ecb thi\u1ebfu t\u1ea1i n\u00fat \u0111\u00f3. H\u01b0\u1edbng \u0111i m\u1eb7c \u0111\u1ecbnh n\u00e0y <strong>\u0111\u01b0\u1ee3c l\u01b0u l\u1ea1i v\u00e0 \u00e1p d\u1ee5ng<\/strong> cho d\u1eef li\u1ec7u m\u1edbi.<\/p>\n<p>\u0110\u1ed1i v\u1edbi GBM, b\u1ea1n ph\u1ea3i t\u1ef1 quy\u1ebft \u0111\u1ecbnh c\u00e1ch x\u1eed l\u00fd missing values: \u0111i\u1ec1n gi\u00e1 tr\u1ecb trung b\u00ecnh, trung v\u1ecb, t\u1ea1o m\u1ed9t gi\u00e1 tr\u1ecb \u0111\u1eb7c bi\u1ec7t, hay th\u1eadm ch\u00ed b\u1ecf lu\u00f4n d\u00f2ng d\u1eef li\u1ec7u \u0111\u00f3. M\u1ed7i c\u00e1ch \u0111\u1ec1u <strong>c\u00f3 \u01b0u nh\u01b0\u1ee3c \u0111i\u1ec3m ri\u00eang<\/strong> v\u00e0 \u0111\u00f2i h\u1ecfi c\u00f4ng s\u1ee9c ti\u1ec1n x\u1eed l\u00fd. Kh\u1ea3 n\u0103ng t\u1ef1 \u0111\u1ed9ng c\u1ee7a XGBoost <strong>gi\u00fap quy tr\u00ecnh l\u00e0m vi\u1ec7c g\u1ecdn g\u00e0ng<\/strong> v\u00e0 hi\u1ec7u qu\u1ea3 h\u01a1n nhi\u1ec1u.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Ung-dung-cua-XGBoost-hien-nay\"><\/span>\u1ee8ng d\u1ee5ng c\u1ee7a XGBoost hi\u1ec7n nay<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>XGBoost v\u00e0 c\u00e2y quy\u1ebft \u0111\u1ecbnh \u0111\u01b0\u1ee3c t\u0103ng c\u01b0\u1eddng theo gradient \u0111\u01b0\u1ee3c s\u1eed d\u1ee5ng trong nhi\u1ec1u \u1ee9ng d\u1ee5ng khoa h\u1ecdc d\u1eef li\u1ec7u, bao g\u1ed3m:<\/p>\n<ul>\n<li><strong>H\u1ecdc x\u1ebfp h\u1ea1ng:<\/strong> M\u1ed9t trong nh\u1eefng \u1ee9ng d\u1ee5ng ph\u1ed5 bi\u1ebfn c\u1ee7a thu\u1eadt to\u00e1n XGBoost l\u00e0 d\u00f9ng l\u00e0m b\u1ed9 x\u1ebfp h\u1ea1ng. Trong t\u00ecm ki\u1ebfm th\u00f4ng tin, m\u1ee5c ti\u00eau c\u1ee7a h\u1ecdc x\u1ebfp h\u1ea1ng l\u00e0 cung c\u1ea5p n\u1ed9i dung cho ng\u01b0\u1eddi d\u00f9ng \u0111\u01b0\u1ee3c s\u1eafp x\u1ebfp theo m\u1ee9c \u0111\u1ed9 li\u00ean quan. Trong XGBoost, XGBRanker \u0111\u01b0\u1ee3c x\u00e2y d\u1ef1ng tr\u00ean thu\u1eadt to\u00e1n LambdaMART.<br \/>\n<strong><\/strong><\/li>\n<li><strong>D\u1ef1 \u0111o\u00e1n t\u1ef7 l\u1ec7 nh\u1ea5p qu\u1ea3ng c\u00e1o:<\/strong> C\u00e1c nh\u00e0 nghi\u00ean c\u1ee9u \u0111\u00e3 s\u1eed d\u1ee5ng m\u00f4 h\u00ecnh hu\u1ea5n luy\u1ec7n XGBoost \u0111\u1ec3 x\u00e1c \u0111\u1ecbnh t\u1ea7n su\u1ea5t nh\u1ea5p chu\u1ed9t v\u00e0o qu\u1ea3ng c\u00e1o tr\u1ef1c tuy\u1ebfn trong 10 ng\u00e0y d\u1eef li\u1ec7u nh\u1ea5p chu\u1ed9t. M\u1ee5c ti\u00eau c\u1ee7a nghi\u00ean c\u1ee9u l\u00e0 \u0111o l\u01b0\u1eddng hi\u1ec7u qu\u1ea3 c\u1ee7a qu\u1ea3ng c\u00e1o tr\u1ef1c tuy\u1ebfn v\u00e0 ch\u1ec9 ra qu\u1ea3ng c\u00e1o n\u00e0o ho\u1ea1t \u0111\u1ed9ng hi\u1ec7u qu\u1ea3.<br \/>\n<strong><\/strong><\/li>\n<li><strong>D\u1ef1 \u0111o\u00e1n doanh s\u1ed1 c\u1eeda h\u00e0ng:<\/strong> XGBoost c\u00f3 th\u1ec3 \u0111\u01b0\u1ee3c s\u1eed d\u1ee5ng cho m\u00f4 h\u00ecnh d\u1ef1 \u0111o\u00e1n, nh\u01b0 \u0111\u00e3 \u0111\u01b0\u1ee3c ch\u1ee9ng minh trong b\u00e0i b\u00e1o n\u00e0y khi doanh s\u1ed1 t\u1eeb 45 c\u1eeda h\u00e0ng Walmart \u0111\u01b0\u1ee3c d\u1ef1 \u0111o\u00e1n s\u1eed d\u1ee5ng m\u00f4 h\u00ecnh XGBoost.<br \/>\n<strong><\/strong><\/li>\n<li><strong>Ph\u00e2n lo\u1ea1i ph\u1ea7n m\u1ec1m \u0111\u1ed9c h\u1ea1i:<\/strong> S\u1eed d\u1ee5ng b\u1ed9 ph\u00e2n lo\u1ea1i XGBoost, c\u00e1c k\u1ef9 s\u01b0 t\u1ea1i \u0110\u1ea1i h\u1ecdc K\u1ef9 thu\u1eadt Ko\u0161ice \u0111\u00e3 c\u00f3 th\u1ec3 ph\u00e2n lo\u1ea1i ph\u1ea7n m\u1ec1m \u0111\u1ed9c h\u1ea1i m\u1ed9t c\u00e1ch ch\u00ednh x\u00e1c, nh\u01b0 \u0111\u00e3 tr\u00ecnh b\u00e0y trong b\u00e0i b\u00e1o c\u1ee7a h\u1ecd.<br \/>\n<strong><\/strong><\/li>\n<li><strong>Cu\u1ed9c thi Kaggle:<\/strong> XGBoost \u0111\u00e3 l\u00e0 m\u1ed9t thu\u1eadt to\u00e1n chi\u1ebfn th\u1eafng ph\u1ed5 bi\u1ebfn trong c\u00e1c cu\u1ed9c thi Kaggle, nh\u01b0 \u0111\u00e3 \u0111\u01b0\u1ee3c ghi nh\u1eadn tr\u00ean trang DMLC (C\u1ed9ng \u0111\u1ed3ng H\u1ecdc M\u00e1y Ph\u00e2n T\u00e1n (Deep) <a href=\"https:\/\/interdata.vn\/blog\/machine-learning-la-gi\/\">Machine Learning<\/a>) v\u1edbi danh s\u00e1ch c\u00e1c ng\u01b0\u1eddi chi\u1ebfn th\u1eafng g\u1ea7n \u0111\u00e2y trong c\u00e1c cu\u1ed9c thi Kaggle s\u1eed d\u1ee5ng XGBoost cho c\u00e1c b\u00e0i thi c\u1ee7a h\u1ecd.<\/li>\n<\/ul>\n<p>XGBoost l\u00e0 m\u1ed9t c\u00f4ng c\u1ee5 m\u1ea1nh m\u1ebd trong l\u0129nh v\u1ef1c h\u1ecdc m\u00e1y, \u0111\u1eb7c bi\u1ec7t khi l\u00e0m vi\u1ec7c v\u1edbi c\u00e1c b\u1ed9 d\u1eef li\u1ec7u l\u1edbn v\u00e0 ph\u1ee9c t\u1ea1p. M\u1eb7c d\u00f9 c\u00f3 m\u1ed9t s\u1ed1 h\u1ea1n ch\u1ebf nh\u01b0 kh\u1ea3 n\u0103ng b\u1ecb overfitting ho\u1eb7c c\u1ea7n t\u00e0i nguy\u00ean t\u00ednh to\u00e1n l\u1edbn, XGBoost v\u1eabn l\u00e0 s\u1ef1 l\u1ef1a ch\u1ecdn ph\u1ed5 bi\u1ebfn nh\u1edd hi\u1ec7u su\u1ea5t v\u00e0 kh\u1ea3 n\u0103ng t\u1ed1i \u01b0u h\u00f3a cao.<\/p>\n<p>Vi\u1ec7c \u00e1p d\u1ee5ng XGBoost v\u00e0o c\u00e1c b\u00e0i to\u00e1n th\u1ef1c t\u1ebf nh\u01b0 d\u1ef1 \u0111o\u00e1n t\u1ef7 l\u1ec7 nh\u1ea5p qu\u1ea3ng c\u00e1o, ph\u00e2n lo\u1ea1i ph\u1ea7n m\u1ec1m \u0111\u1ed9c h\u1ea1i hay d\u1ef1 \u0111o\u00e1n doanh s\u1ed1 c\u1eeda h\u00e0ng \u0111\u00e3 ch\u1ee9ng minh \u0111\u01b0\u1ee3c hi\u1ec7u qu\u1ea3 v\u01b0\u1ee3t tr\u1ed9i c\u1ee7a n\u00f3.<\/p>\n<p>\u0110\u1ec3 hu\u1ea5n luy\u1ec7n v\u00e0 tri\u1ec3n khai c\u00e1c m\u00f4 h\u00ecnh h\u1ecdc m\u00e1y \u0111\u00f2i h\u1ecfi t\u00e0i nguy\u00ean t\u00ednh to\u00e1n nh\u01b0 XGBoost m\u1ed9t c\u00e1ch hi\u1ec7u qu\u1ea3, vi\u1ec7c s\u1edf h\u1eefu m\u1ed9t h\u1ea1 t\u1ea7ng m\u1ea1nh m\u1ebd l\u00e0 r\u1ea5t quan tr\u1ecdng. N\u1ebfu b\u1ea1n \u0111ang t\u00ecm ki\u1ebfm gi\u1ea3i ph\u00e1p, vi\u1ec7c <a href=\"https:\/\/interdata.vn\/thue-vps\/\">thu\u00ea VPS ch\u1ea5t l\u01b0\u1ee3ng gi\u00e1 r\u1ebb<\/a> t\u1ea1i InterData c\u00f3 th\u1ec3 l\u00e0 kh\u1edfi \u0111\u1ea7u t\u1ed1t, cung c\u1ea5p m\u00f4i tr\u01b0\u1eddng \u1ed5n \u0111\u1ecbnh v\u1edbi ph\u1ea7n c\u1ee9ng th\u1ebf h\u1ec7 m\u1edbi nh\u01b0 CPU Intel Xeon Platinum\/AMD EPYC v\u00e0 \u1ed5 c\u1ee9ng SSD NVMe U.2 t\u1ed1c \u0111\u1ed9 cao.<\/p>\n<p>V\u1edbi nh\u1eefng t\u00e1c v\u1ee5 hu\u1ea5n luy\u1ec7n ph\u1ee9c t\u1ea1p h\u01a1n hay c\u1ea7n kh\u1ea3 n\u0103ng m\u1edf r\u1ed9ng linh ho\u1ea1t, d\u1ecbch v\u1ee5 <a href=\"https:\/\/interdata.vn\/cloud-server\/\">thu\u00ea Cloud Server gi\u00e1 r\u1ebb t\u1ed1c \u0111\u1ed9 cao<\/a> c\u1ee7a InterData mang \u0111\u1ebfn hi\u1ec7u n\u0103ng v\u01b0\u1ee3t tr\u1ed9i. N\u1ec1n t\u1ea3ng n\u00e0y \u0111\u01b0\u1ee3c x\u00e2y d\u1ef1ng tr\u00ean ph\u1ea7n c\u1ee9ng cao c\u1ea5p, dung l\u01b0\u1ee3ng t\u1ed1i \u01b0u, <a href=\"https:\/\/interdata.vn\/blog\/bang-thong-la-gi\/\">b\u0103ng th\u00f4ng<\/a> kh\u00f4ng gi\u1edbi h\u1ea1n, \u0111\u1ea3m b\u1ea3o t\u1ed1c \u0111\u1ed9 x\u1eed l\u00fd nhanh ch\u00f3ng cho c\u00e1c d\u1ef1 \u00e1n h\u1ecdc m\u00e1y \u0111\u00f2i h\u1ecfi c\u1ea5u h\u00ecnh m\u1ea1nh, t\u01b0\u01a1ng t\u1ef1 nh\u01b0 c\u00e1c g\u00f3i Hosting gi\u00e1 r\u1ebb t\u1ed1c \u0111\u1ed9 cao chuy\u00ean d\u1ee5ng.<\/p>\n<div class=\"entry-content no-share\">\n<div class=\"content-inner \">\n<p>H\u00e3y li\u00ean h\u1ec7 ngay \u0111\u1ec3 \u0111\u01b0\u1ee3c t\u01b0 v\u1ea5n gi\u1ea3i ph\u00e1p ph\u00f9 h\u1ee3p v\u1edbi nhu c\u1ea7u c\u1ee7a b\u1ea1n!<\/p>\n<p><strong>INTERDATA<\/strong><\/p>\n<ul>\n<li><strong>Website:<\/strong>\u00a0Interdata.vn<\/li>\n<li><strong>Hotline:<\/strong>\u00a01900-636822<\/li>\n<li><strong>Email:<\/strong>\u00a0Info@interdata.vn<\/li>\n<li><strong>VP\u0110D:<\/strong>\u00a0240 Nguy\u1ec5n \u0110\u00ecnh Ch\u00ednh, P.11. Q. Ph\u00fa Nhu\u1eadn, TP. Ho\u0302\u0300 Ch\u00ed Minh<\/li>\n<li><strong>VPGD:<\/strong>\u00a0S\u1ed1 211 \u0110\u01b0\u1eddng s\u1ed1 5, K\u0110T Lakeview City, P. An Ph\u00fa, TP. Th\u1ee7 \u0110\u1ee9c, TP. H\u1ed3 Ch\u00ed Minh<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<div class=\"jeg_share_bottom_container\"><\/div>\n","protected":false},"excerpt":{"rendered":"<p>XGBoost l\u00e0 m\u1ed9t trong nh\u1eefng thu\u1eadt to\u00e1n h\u1ecdc m\u00e1y m\u1ea1nh m\u1ebd v\u00e0 ph\u1ed5 bi\u1ebfn hi\u1ec7n nay, \u0111\u1eb7c bi\u1ec7t trong c\u00e1c b\u00e0i to\u00e1n ph\u00e2n lo\u1ea1i v\u00e0 h\u1ed3i quy. \u0110\u01b0\u1ee3c x\u00e2y d\u1ef1ng d\u1ef1a tr\u00ean nguy\u00ean l\u00fd boosting, XGBoost mang l\u1ea1i hi\u1ec7u su\u1ea5t v\u01b0\u1ee3t tr\u1ed9i nh\u1edd c\u00e1c t\u00ednh n\u0103ng nh\u01b0 t\u1ed1i \u01b0u h\u00f3a b\u1ed9 nh\u1edb, x\u1eed l\u00fd song<\/p>\n","protected":false},"author":11,"featured_media":26268,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[108],"tags":[],"class_list":["post-26252","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai"],"_links":{"self":[{"href":"https:\/\/interdata.vn\/blog\/wp-json\/wp\/v2\/posts\/26252","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/interdata.vn\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/interdata.vn\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/interdata.vn\/blog\/wp-json\/wp\/v2\/users\/11"}],"replies":[{"embeddable":true,"href":"https:\/\/interdata.vn\/blog\/wp-json\/wp\/v2\/comments?post=26252"}],"version-history":[{"count":2,"href":"https:\/\/interdata.vn\/blog\/wp-json\/wp\/v2\/posts\/26252\/revisions"}],"predecessor-version":[{"id":26269,"href":"https:\/\/interdata.vn\/blog\/wp-json\/wp\/v2\/posts\/26252\/revisions\/26269"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/interdata.vn\/blog\/wp-json\/wp\/v2\/media\/26268"}],"wp:attachment":[{"href":"https:\/\/interdata.vn\/blog\/wp-json\/wp\/v2\/media?parent=26252"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/interdata.vn\/blog\/wp-json\/wp\/v2\/categories?post=26252"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/interdata.vn\/blog\/wp-json\/wp\/v2\/tags?post=26252"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}