Continuing from #1461...
We attempt to avoid zero or negatives in the log (or division be zero or negatives in the gradient) by thresholding the quotient. Unfortunately, the code is not consistent, even after many attempts.
First introducing notation:
$T(e, y) = max(e, y/Q)$
(i.e. lower threshold on $e$), and (full) forward projection $e = P\lambda+b$.
formulas
A strategy could be
$ \log L = \sum_i y_i \log ( T(e_i,y_i) ) - T(e_i,y_i)$ [1]
This function has a gradient w.r.t. $\lambda$ (with some element-wise multiplications etc., and ignoring the non-differentiable points)
$ { P^T {\partial T \over \partial e} (e, y) ( {y \over T(e,y) } - 1) }$ [2]
with
$ {\partial T \over \partial e} T(e,y) = 1$ if $e>= y/Q$ and $0$ otherwise [3]
This gradient is equivalent to back-projecting
e >= y/Q ? y/e : 0 [4]
Log-likelihood code
The function [1] is what is computed by PoissonLogLikelihoodWithLinearModelForMeanAndProjData for the value via
|
const float new_estimate = max(estimated_projections[r][b], projection_data[r][b] / max_quotient); |
|
if (projection_data[r][b] <= small_value) |
|
sub_result += -double(new_estimate); |
|
else |
|
sub_result += projection_data[r][b] * log(double(new_estimate)) - double(new_estimate); |
where the arguments are computed
|
forward_projector_sptr->forward_project(estimated_viewgrams); |
|
|
|
if (additive_binwise_correction_ptr != NULL) |
|
{ |
|
estimated_viewgrams += (*additive_binwise_correction_ptr); |
|
}; |
|
|
|
if (mult_viewgrams_ptr != NULL) |
|
{ |
|
estimated_viewgrams *= (*mult_viewgrams_ptr); |
|
} |
|
|
|
RelatedViewgrams<float>::iterator meas_viewgrams_iter = measured_viewgrams_ptr->begin(); |
|
RelatedViewgrams<float>::const_iterator est_viewgrams_iter = estimated_viewgrams.begin(); |
|
// call function that does the actual work, it sits in recon_array_funtions.cxx (TODO) |
|
for (; meas_viewgrams_iter != measured_viewgrams_ptr->end(); ++meas_viewgrams_iter, ++est_viewgrams_iter) |
|
accumulate_loglikelihood(*meas_viewgrams_iter, *est_viewgrams_iter, rim_truncation_sino, log_likelihood_ptr); |
Gradient code
For the gradient we use the optimisation that any multiplicative factors drop out.
|
forward_projector_sptr->forward_project(estimated_viewgrams); |
|
|
|
if (additive_binwise_correction_ptr != NULL) |
|
estimated_viewgrams += (*additive_binwise_correction_ptr); |
|
|
|
// for sinogram division |
|
divide_and_truncate(*measured_viewgrams_ptr, estimated_viewgrams, rim_truncation_sino, count, count2, log_likelihood_ptr); |
|
|
|
// adding the sensitivity: backproj[y/ybar] * |
|
// not adding the sensitivity computes the gradient: backproj[y/ybar - 1] * |
|
// * ignoring normalisation * |
|
if (!add_sensitivity) |
|
{ |
|
if (mult_viewgrams_ptr) |
|
{ |
|
// subtract normalised ones from the data [y/ybar - 1/N] |
|
*measured_viewgrams_ptr -= *mult_viewgrams_ptr; |
|
} |
|
else |
|
{ |
|
// No mult_viewgrams_ptr, subtract ones [y/ybar - 1] |
|
*measured_viewgrams_ptr -= 1; |
|
} |
|
} |
|
|
|
// back project |
|
back_projector_sptr->back_project(*measured_viewgrams_ptr); |
Unfortunately, the divide_and_truncate method (which implements the division [4]) does not know about the multiplicative factors, and is therefore using a different threshold than the logL value.
Gradient formulas
Writing the full forward projection as $e= P\lambda + b = m (G\lambda+a)$, and the forward projection without $m$ as $ f = G(\lambda+a)$, the current gradient computation is
G.back_project((f >= y/Q ? y/f : 0) - m)
This is equivalent to
P.back_project((e >= m*y/Q ? y/e : 0) - 1)
Conclusion
There are 2 inconsistencies:
- In both
divide_and_truncate and accumulate_likelihood, Q=10000, such that the quotient in the gradient computation is not consistent with the log-term in the logL value wherever only one of those reaches the threshold, so if $m<1$ where $m y/Q < e < y/Q $.
- the "sensitivity" term uses a threshold in the current logL computation, but not in the gradient computation.
Continuing from #1461...
We attempt to avoid zero or negatives in the
log(or division be zero or negatives in the gradient) by thresholding the quotient. Unfortunately, the code is not consistent, even after many attempts.First introducing notation:
$T(e, y) = max(e, y/Q)$ $e$ ), and (full) forward projection $e = P\lambda+b$ .
(i.e. lower threshold on
formulas
A strategy could be
This function has a gradient w.r.t.$\lambda$ (with some element-wise multiplications etc., and ignoring the non-differentiable points)
with
This gradient is equivalent to back-projecting
e >= y/Q ? y/e : 0[4]Log-likelihood code
The function [1] is what is computed by
PoissonLogLikelihoodWithLinearModelForMeanAndProjDatafor the value viaSTIR/src/buildblock/recon_array_functions.cxx
Lines 367 to 371 in 12bfa87
where the arguments are computed
STIR/src/recon_buildblock/PoissonLogLikelihoodWithLinearModelForMeanAndProjData.cxx
Lines 1451 to 1467 in 12bfa87
Gradient code
For the gradient we use the optimisation that any multiplicative factors drop out.
STIR/src/recon_buildblock/PoissonLogLikelihoodWithLinearModelForMeanAndProjData.cxx
Lines 1407 to 1433 in 12bfa87
Unfortunately, the
divide_and_truncatemethod (which implements the division [4]) does not know about the multiplicative factors, and is therefore using a different threshold than the logL value.Gradient formulas
Writing the full forward projection as$e= P\lambda + b = m (G\lambda+a)$ , and the forward projection without $m$ as $ f = G(\lambda+a)$ , the current gradient computation is
G.back_project((f >= y/Q ? y/f : 0) - m)This is equivalent to
P.back_project((e >= m*y/Q ? y/e : 0) - 1)Conclusion
There are 2 inconsistencies:
divide_and_truncateandaccumulate_likelihood,Q=10000, such that the quotient in the gradient computation is not consistent with the log-term in the logL value wherever only one of those reaches the threshold, so if