Deep Learning vs. Gradient Boosted Tree
One of the most important decisions when building a machine learning model is choosing the right algorithm. For many problems, there isn’t a simple answer. It’s not uncommon for machine learning engineers to try multiple algorithms before selecting the best approach. In recent years, deep learning has become the dominant machine learning technique, but it still has competitors. In particular, gradient boosted trees can often produce comparable, if not better solutions than deep learning, often with less hassle. In this post, we’ll look at the differences between these two popular machine learning techniques and how to pick the right one for your problem.
Machine learning is advancing at an incredible rate, with new state-of-the-art results generated regularly. Of the myriad machine learning techniques, however, deep learning and gradient boosted trees have consistently outperformed their competitors. They are powerful algorithms that have proven themselves across a wide range of problem domains and datasets.
Computer Vision and Natural Language Processing
Deep learning has been particularly successful in the fields of computer vision and natural language processing (NLP). Its performance in these domains is unrivaled, so there’s currently no reason to consider any other technique when working on problems in these fields. Deep learning works so well in these fields because it addresses something called the “representation problem.” For example, deep convolutional neural networks (CNNs) are a type of neural network architecture commonly used in computer vision. They work by looking at groups of pixels simultaneously. This allows them to use the spatial relationships between pixels to learn higher-order concepts like edges and patterns. Contextual representation is also important when solving NLP problems. Individual words in a sentence are not very useful by themselves. The surrounding words are necessary to understand its context and derive its meaning. Deep learning can find patterns in data using features in combination that carry little information by themselves. Gradient boosted trees, however, can only handle data that has individually informative features. If a feature doesn’t carry much information on its own, gradient boosted trees will have a tough time finding a good solution. If your problem involves computer vision or NLP, deep learning is the best approach.
If you’re working with tabular data (e.g. spreadsheet-type data), gradient boosted trees can be an excellent choice. Gradient boosted trees require very little data pre-processing and handle missing data automatically. But there’s an important caveat: deep learning can sometimes solve complex tabular data problems with a higher degree of success than gradient boosted trees. In general, gradient boosted trees work best on tabular data problems that have categorical features of limited size. A categorical feature is an input that can take on one of a fixed number of possible values (e.g. high, medium, low). If your problem contains categorical features with tens of thousands of possible values, deep learning is a better choice.
Deep learning models are notoriously difficult to interpret. It’s common for modern deep learning models to have hundreds of hidden layers and billions of parameters, resulting in very low explainability. Conversely, gradient boosted trees are relatively easy to interpret and have good explainability. Generating feature importance plots on a trained gradient boosted tree is simple and allows you to directly observe the relationships the model has discovered. If interpretability and explainability are requirements for your machine learning model, gradient boosted trees are the best choice.
Interestingly, deep learning and gradient boosted trees take roughly the same amount of time to train. However, gradient boosted trees will almost certainly be faster after they’ve been trained. For problems where low model inference latency is required, gradient boosted trees should be preferred.
Deep Learning Pros:
- Unsurpassed in computer vision and NLP
- Handles extremely large input feature spaces
- Works well in most problem domains
Deep Learning Cons:
- Computationally expensive
- Often require large amounts of training data
- Hyperparameter tuning can be difficult
- Very low explainability
- Difficult to troubleshoot when training fails
Gradient Boosted Tree Pros:
- Easy to interpret and explain
- Very little data pre-processing required
- Handles missing data well
Gradient Boosted Tree Cons:
- Prone to overfitting
- Not well-suited for some problem domains (e.g. images and text)
Use Neural Networks when:
- Working in computer vision or NLP
- Input features are hard to represent
- Explainability isn’t important
- Speed of the trained model is less important
Use Gradient Boosted Trees when:
- Using tabular data
- Features can be easily represented
- Explainability is important
- Speed of the trained model is important