For semantic segmentation, label probabilities are often uncalibrated as they are typically only the by-product of a segmentation task. Intersection over Union (IoU) and Dice score are often used as criteria for segmentation success, while metrics related to label probabilities are rarely explored. On the other hand, probability calibration approaches have been studied, which aim at matching probability outputs with experimentally observed errors, but they mainly focus on classification tasks, not on semantic segmentation. Thus, we propose a learning-based calibration method that focuses on multi-label semantic segmentation. Specifically, we adopt a tree-like convolution neural network to predict local temperature values for probability calibration. One advantage of our approach is that it does not change prediction accuracy, hence allowing for calibration as a post-processing step. Experiments on the COCO and LPBA40 datasets demonstrate improved calibration performance over different metrics. We also demonstrate the performance of our method for multi-atlas brain segmentation from magnetic resonance images.