Extend your GBM-workflow (visual or programmatical) to apply cross-validation before training.
Does CV affect the accuracy of your model?
How does the confusion matrix change?
What happens if you use a bigger dataset, e.g. https://archive.ics.uci.edu/dataset/186/wine+quality ?