inconsistency between Poisson loglikelihood value,  gradient (when using multiplicative factors)  and Hessian due to thresholding

Continuing from #1461...

We attempt to avoid zero or negatives in the `log` (or division be zero or negatives in the gradient) by thresholding the quotient. Unfortunately, the code is not consistent, even after many attempts.

First introducing notation:
$` T(e, y) = max(e, y/Q) `$
(i.e. lower threshold on $e$), and (full) forward projection $` e = P\lambda+b `$.

## formulas

A strategy could be

$` \log L = \sum_i y_i \log ( T(e_i,y_i) ) - T(e_i,y_i)`$ [1]

This function has a gradient w.r.t. $`\lambda`$ (with some element-wise multiplications etc., and ignoring the non-differentiable points)

$`   { P^T   {\partial T \over \partial e} (e, y) ( {y \over  T(e,y) }  - 1) } `$ [2]

with

$`  {\partial T \over \partial e} T(e,y) = 1 `$ if $` e>= y/Q `$ and $` 0 `$ otherwise [3]

This gradient is equivalent to back-projecting 

` e >= y/Q ? y/e : 0 ` [4]

## Log-likelihood  code

The function [1] is what is computed by `PoissonLogLikelihoodWithLinearModelForMeanAndProjData` for the value via
https://github.com/UCL/STIR/blob/12bfa873522653936a4818e9bad9b9da41f11706/src/buildblock/recon_array_functions.cxx#L367-L371
where the arguments are computed
https://github.com/UCL/STIR/blob/12bfa873522653936a4818e9bad9b9da41f11706/src/recon_buildblock/PoissonLogLikelihoodWithLinearModelForMeanAndProjData.cxx#L1451-L1467

## Gradient code

For the gradient we use the optimisation that any multiplicative factors drop out.

https://github.com/UCL/STIR/blob/12bfa873522653936a4818e9bad9b9da41f11706/src/recon_buildblock/PoissonLogLikelihoodWithLinearModelForMeanAndProjData.cxx#L1407-L1433

Unfortunately, the [`divide_and_truncate` method](https://github.com/UCL/STIR/blob/12bfa873522653936a4818e9bad9b9da41f11706/src/buildblock/recon_array_functions.cxx#L235-L258) (which implements the division [4]) does not know about the multiplicative factors, and is therefore using a ***different*** threshold than the logL value.

## Gradient formulas

Writing the full forward projection as $` e= P\lambda + b = m (G\lambda+a) `$, and the forward projection without $m$ as $` f = G(\lambda+a)`$, the current gradient computation is

` G.back_project((f >= y/Q ? y/f : 0) - m)`

This is equivalent to

` P.back_project((e >= m*y/Q ? y/e : 0) - 1)`

## Conclusion

There are 2 inconsistencies:
- In both `divide_and_truncate` and `accumulate_likelihood`, `Q=10000`, such that the quotient in the gradient computation is not consistent with the log-term in the logL value wherever only one of those reaches the threshold, so if $m<1$ where $m y/Q < e < y/Q $. 
- the "sensitivity" term uses a threshold in the current logL computation, but not in the gradient computation.

	const float new_estimate = max(estimated_projections[r][b], projection_data[r][b] / max_quotient);
	if (projection_data[r][b] <= small_value)
	sub_result += -double(new_estimate);
	else
	sub_result += projection_data[r][b] * log(double(new_estimate)) - double(new_estimate);

	forward_projector_sptr->forward_project(estimated_viewgrams);

	if (additive_binwise_correction_ptr != NULL)
	{
	estimated_viewgrams += (*additive_binwise_correction_ptr);
	};

	if (mult_viewgrams_ptr != NULL)
	{
	estimated_viewgrams = (mult_viewgrams_ptr);
	}

	RelatedViewgrams<float>::iterator meas_viewgrams_iter = measured_viewgrams_ptr->begin();
	RelatedViewgrams<float>::const_iterator est_viewgrams_iter = estimated_viewgrams.begin();
	// call function that does the actual work, it sits in recon_array_funtions.cxx (TODO)
	for (; meas_viewgrams_iter != measured_viewgrams_ptr->end(); ++meas_viewgrams_iter, ++est_viewgrams_iter)
	accumulate_loglikelihood(meas_viewgrams_iter, est_viewgrams_iter, rim_truncation_sino, log_likelihood_ptr);

	forward_projector_sptr->forward_project(estimated_viewgrams);

	if (additive_binwise_correction_ptr != NULL)
	estimated_viewgrams += (*additive_binwise_correction_ptr);

	// for sinogram division
	divide_and_truncate(*measured_viewgrams_ptr, estimated_viewgrams, rim_truncation_sino, count, count2, log_likelihood_ptr);

	// adding the sensitivity: backproj[y/ybar] *
	// not adding the sensitivity computes the gradient: backproj[y/ybar - 1] *
	// * ignoring normalisation *
	if (!add_sensitivity)
	{
	if (mult_viewgrams_ptr)
	{
	// subtract normalised ones from the data [y/ybar - 1/N]
	measured_viewgrams_ptr -= mult_viewgrams_ptr;
	}
	else
	{
	// No mult_viewgrams_ptr, subtract ones [y/ybar - 1]
	*measured_viewgrams_ptr -= 1;
	}
	}

	// back project
	back_projector_sptr->back_project(*measured_viewgrams_ptr);

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inconsistency between Poisson loglikelihood value, gradient (when using multiplicative factors) and Hessian due to thresholding #1472

formulas

Log-likelihood code

Gradient code

Gradient formulas

Conclusion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

inconsistency between Poisson loglikelihood value, gradient (when using multiplicative factors) and Hessian due to thresholding #1472

Description

formulas

Log-likelihood code

Gradient code

Gradient formulas

Conclusion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions