machine learning - Misunderstanding of the gradient calculation -


i'm trying understand how loss_metric class in dlib calculates gradient. here code(full version):

// should noted derivative of length(x-y) respect // x vector unit vector (x-y)/length(x-y).  if stare // @ code below long enough see it's // application of formula. if (x_label == y_label) {     // things same label should have distances < dist_thresh between     // them.  if not experience non-zero loss.     if (d2 < dist_thresh-margin)//d2 - distance between x , y samples.     {         gm[r*temp.num_samples() + c] = 0;     }     else     {    // whole objective function multiplied scale loss    // relative number of things in mini-batch.    // scale = 0.5/num_pos_samps;         loss += scale*(d2 - (dist_thresh-margin));         //r - x sample index, c - y sample index         gm[r*temp.num_samples() + r] += scale/d2;         gm[r*temp.num_samples() + c] = -scale/d2;     } } else {     // things different labels should have distances > dist_thresh between     // them.  if not experience non-zero loss.     if (d2 > dist_thresh+margin || d2 > neg_thresh)     {         gm[r*temp.num_samples() + c] = 0;     }     else     {         loss += scale*((dist_thresh+margin) - d2);         // don't divide 0 (or small number)         d2 = std::max(d2, 0.001f);         gm[r*temp.num_samples() + r] -= scale/d2;         gm[r*temp.num_samples() + c] = scale/d2;     } }  //... // gemm - matrix multiplication // grad - final gradient // grad_mul - gm // output_tensor - output tensor of last layer tt::gemm(0, grad, 1, grad_mul, false, output_tensor, false);  

let's @ loss same classes (1030 line), think gradient of scale*(d2 - (dist_thresh-margin)) should equal gradient of c1*(||x1 - x2|| - (c2-c3)), cn - constant, , xn - output vector, gradient should = c1 x1 , -c1 x2, instead of this, calculations on lines 1031, 1032, 1056.

same gradient different classes (starting 1048 line).

unfortunately, hint in first comment not make clearer. don't have enough experience understanding this, think more experienced person can show made mistake.

so, gradient formula used here? how did it?


Comments

Popular posts from this blog

javascript - Clear button on addentry page doesn't work -

c# - Selenium Authentication Popup preventing driver close or quit -

tensorflow when input_data MNIST_data , zlib.error: Error -3 while decompressing: invalid block type -