Code
# install required libraries
# !pip install opencv-python==4.5.1.48
# !pip install numpy==1.21.0
# !pip install matplotlib==3.4.2
Pramesh Gautam
March 27, 2022
Data augmentation is one of the major techniques used when we want to improve the model’s performance on new data (i.e. generalization). It is especially useful in cases where only limited training data are available. In this post we’ll discuss some of the augmentation techniques (besides affine transformation) that help the models to perform well (in computer vision).
The main motivation of Cutout is to simulate the situation of object occlusion that is mostly encountered in tasks such as object recognition or human pose estimation. We occlude the part of image randomly. Instead of model seeing the same image everytime, it sees different parts of that image which helps it to perform well. The occluded part does not contain any information. In the example below, the randomly occluded part is replaced by all 0s.
class Cutout(object):
def __init__(self, cutout_width):
# width of the square to occlude
self.cutout_width = cutout_width
def __call__(self, img):
h = img.shape[0]
w = img.shape[1]
mask = np.ones(img.shape)
y = np.random.randint(h)
x = np.random.randint(w)
x1 = np.clip(x - self.cutout_width // 2, 0, w)
x2 = np.clip(x + self.cutout_width // 2, 0, w)
y1 = np.clip(y - self.cutout_width // 2, 0, h)
y2 = np.clip(y + self.cutout_width // 2, 0, h)
mask[y1: y2, x1: x2] = 0.0
# occlude the part of image using mask (zero out the part)
img = img * mask
return img
Mixup linearly mixes two images and their labels in the same proportion. With this approach the model will be able to see a different image with different label during training. It enables the model to make smooth decision boundaries while classifying the object since we are using linearly interpolated images and labels for classification decision intead of binary decision. As can be seen in the example below, with the mixup of images and labels, the new image will have the labels [0.19, 0.81]
which means the class distribution is tilted more towards dog and the mixup visualization also proves this.
class Mixup:
def __init__(self, img1, img2, label1, label2):
self.img1 = img1
self.img2 = img2
self.label1 = label1
self.label2 = label2
def __call__(self):
alpha = 1
lam = np.random.beta(alpha, alpha)
# mix image and labels
mix_img = lam * self.img1 + (1 - lam) * self.img2
mix_label = lam * self.label1 + (1 - lam) * self.label2
return mix_img, mix_label
As we saw above, the occluded part in Cutout does not contain any information which in unwanted since it just increases resource usage without adding any value. Cutmix utilizes above two techniques to mix the images and utilize the occluded part. As can be seen in the example, the cutout portion in dog image is replaced by the same portion of cat image. It also linearly interpolates the label as in Mixup. As seen in the example below, the new labels are [0.18, 0.82]
for cat and dog respectively and this can be verified from image as well.
class Cutmix:
def __init__(self, img1, img2, lbl1, lbl2):
self.img1 = img1
self.img2 = img2
self.lbl1 = lbl1
self.lbl2 = lbl2
def __call__(self):
# sample bounding box B
alpha = 1
lam = np.random.beta(alpha, alpha)
h, w, c = self.img1.shape
r_x = np.random.randint(0, w)
r_y = np.random.randint(0, h)
r_w = np.int32(w * np.sqrt(1 - lam))
r_h = np.int32(h * np.sqrt(1 - lam))
mid_x, mid_y = r_w//2, r_h//2
point1_x = np.clip(r_x - mid_x, 0, w)
point2_x = np.clip(r_x + mid_x, 0, w)
point1_y = np.clip(r_y - mid_y, 0, h)
point2_y = np.clip(r_y + mid_y, 0, h)
# refer to paper to see how M is cimputed, in short it is to make the ratio of mixing images and labels same
M = np.ones(self.img1.shape)
M[point1_x:point2_x, point1_y:point2_y, :] = 0
cut_mix_img = M * self.img1 + (1 - M) * self.img2
cut_mix_label = lam * self.lbl1 + (1 - lam) * self.lbl2
return cut_mix_img, cut_mix_label
Label smoothing is used so that the model can be prevented from memorizing the training data and being over-confident. It adds noise to the labels without modifying the data itself. If we are using a 5-class classifier then instead of label being assigned only to one class, it is distributed among all the classes. As seen in example below, \(\epsilon\) is used to control the smoothing.
ϵ = 0.01
num_labels = 5
classes = ["cat", "dog", "horse", "bus", "carrot"]
original_label = [1, 0, 0, 0, 0]
new_neg_labels = ϵ/(num_labels-1)
# after label smoothing, cat gets 1 - ϵ and other classes get ϵ/(1-ϵ) probability
smooth_labels = [1 - ϵ, new_neg_labels, new_neg_labels, new_neg_labels, new_neg_labels]
smooth_labels
[0.99, 0.0025, 0.0025, 0.0025, 0.0025]
All the above implementations assume that both the images are of same resolution. There might be some minor differences while compared with the original paper. However the main motivation and results of these techniques are same to that of paper. Please feel free to post any comment, question or suggestion. I’ll see you in the next one. :)