Thoughts on CosineEmbeddingLoss

pytorch
Published

August 15, 2023

loss(x,y)={1cos(x1,x2)if y=1max(0,cos(x1,x2)margin)if y=1

nn.CosineEmbeddingLoss needs 3 arguments, the predicted embedding (x1), label embedding (x2) as well as a label (y) indicating that we need to move these embeddings closer (y=1) and further if not (y=0).

The distance between two points (1cos(x1,x2)) is minimised, while in the y=0 case, the similarity cos(x1,x2) is minimised. However, if the similarity is lower than a certain margin (defaulting to 0.5) nothing happens. This is due to the clipping that happens with the max function. This causes these examples to have zero gradient.