Skip to content

(Closes #3409) reproducible reduction performance improvements#3452

Open
LonelyCat124 wants to merge 13 commits into
masterfrom
3409_reproducible_improvements
Open

(Closes #3409) reproducible reduction performance improvements#3452
LonelyCat124 wants to merge 13 commits into
masterfrom
3409_reproducible_improvements

Conversation

@LonelyCat124

Copy link
Copy Markdown
Collaborator

Assuming tests are all successful here, then ready for a first look. This does the changes we discussed in the issue, there are some rogue newline statements that seem to commen from adding preceding_comments, though I could also be tempted to remove the preceding comments if they seems a bit over the top.

@LonelyCat124 LonelyCat124 requested a review from sergisiso June 9, 2026 14:51
@codecov

codecov Bot commented Jun 9, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (a7ea7cc) to head (fda2d89).

Additional details and impacted files
@@            Coverage Diff            @@
##            master     #3452   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          393       393           
  Lines        55067     55097   +30     
=========================================
+ Hits         55067     55097   +30     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@LonelyCat124

Copy link
Copy Markdown
Collaborator Author

@sergisiso I think this is ready for a first look now.

@sergisiso

Copy link
Copy Markdown
Collaborator

@LonelyCat124 Today LFRic meeting discussion has reminded me that we still don't test reproducible reductions in our integration tests and we should because this is what the MetOffice always use. Can you update:
examples/lfric/scripts/KGOs/*.cfg
to have
REPRODUCIBLE_REDUCTIONS = true
and then launch the integration tests on this branch?

@sergisiso

Copy link
Copy Markdown
Collaborator

A second step would be to update the:
python ${PSYCLONE_LFRIC_DIR}/compare_output.py ${PSYCLONE_LFRIC_DIR}/KGOs/gungho_model-checksums.txt gungho_model-checksums.txt
to just:
diff ${PSYCLONE_LFRIC_DIR}/KGOs/gungho_model-checksums.txt gungho_model-checksums.txt
but I do not expect this to work because ${PSYCLONE_LFRIC_DIR}/KGOs/gungho_model-checksums.txt is very likely generated with different number of ranks.

We could regenerate the KGO by doing manual run of
./build/local_build.py -j ${NUM_PARALLEL} -w working-gh-passthrough gungho_model (without the -p)
and then
OMP_NUM_THREADS=1 mpirun -n 12 ../bin/gungho_model configuration.nml

There is another issue for this #3207, so I am not sure if it should be here or a separate PR. But it will be great to be confident that this produced the exact expected results (and continues doing so going forward).

@LonelyCat124

Copy link
Copy Markdown
Collaborator Author

The LFRic colour tiling IT fails - I'll investigate.

@LonelyCat124

Copy link
Copy Markdown
Collaborator Author

@sergisiso So I setup running this and its an issue due to OMPParallelDoDirective doing a reproducible reduction. This is not supportable with the new way of creating them, since we have

!$omp parallel do
do i = 1, n
   ! some statement with a reproducible reduction
end do
!$omp end parallel do

This means we can't have asum_temp_local = 0.0 after the omp parallel, nor asum_local(1, th_idx) = asum_temp_local after the end do

This gives 2 options as I see, first is to make the transformation do something different if reprod is True (i.e. use parallel then do directives separately) or disallow reprod on OmpParallelDoTransformation. Do you have any other ideas or preference in which I should do?

@sergisiso

sergisiso commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

The standard has this reproducible order-modifier https://www.openmp.org/spec-html/5.2/openmpse59.html, Is this something we can use / the compilers implement?

@sergisiso

Copy link
Copy Markdown
Collaborator

Any other suggestions @markb-epcc ?

@markb-epcc

Copy link
Copy Markdown

@sergisiso AFAIK, the reproducible order-modifier applies to the mapping of loop iterations to threads, not to the order of reduction operations. The spec says "The location in the OpenMP program at which values are combined and the order in which values are combined are unspecified. Thus, when comparing sequential and parallel executions, or when comparing one parallel execution to another (even if the number of threads used is the same), bitwise-identical results are not guaranteed."

I would have thought that padding the temporary array would solve the problem, but that is not done in https://github.com/MetOffice/lfric_core/blob/bb10f897f7ecc4a931906b8d54bf1692f4a66b7f/components/science/source/psy/sci_psykal_light_mod.f90 where I noticed the false sharing problem coming from.

If you want to implement this with !$omp parallel do then would you not need to split the parallel and do in any case, so that omp_get_thread_num() is only called once and not in every loop iteration?

@LonelyCat124

Copy link
Copy Markdown
Collaborator Author

Yeah, I'm not sure what the best solution is here - essentially how should PSyclone do the splitting (its a mostly a bit of an inheritance nightmare).

I could simply skip the reproducible reductions being created if the node is a OMPParallelDoDirective, but weirdly that requires checking the type in OMPParallelDirective which is a mess.

The entire LFRicOMPParallelLoopTrans infrastructure currently never checks to see if reproducible reductions are enabled, it just (now? Maybe it did before and we never tested but the old version just worked) magically happens at lowering time, when it hits OMPParallelDirective.lower_to_language_level which performs the new reproducible reductions setup, and results in :

E           psyclone.errors.GenerationError: Generation Error: An OMPParallelDoDirective can only be applied to a single loop but this Node has 4 children: [<psyclone.psyir.nodes.assignment.Assignment object at 0x7bb197a97da0>, <psyclone.psyir.nodes.assignment.Assignment object at 0x7bb197a97350>, <psyclone.psyir.nodes.loop.Loop object at 0x7bb197a97590>, <psyclone.psyir.nodes.assignment.Assignment object at 0x7bb197b38050>]

This is a good sanity error checking test at least, but the only thing I can thing to do is when lowering the OMPParallelDoDirective, to do the following:

    reprod_red_call_list = self.reductions(reprod=True) # I don't love this argument name being reprod...
    if reprod_red_call_list:
       # Replace this node with a parallel and a do
       dir_body = self.dir_body.detach()
       dodir = OMPDoDirective(
   omp_schedule=self.omp_schedule,
   collapse = self.collapse,
    reprod=self.reprod,
    nowait=self.nowait,
    children=dir_body
      )
     pdir = OMPParallelDirective.create(dodir)
     self.replace_with(pdir)
     return pdir.lower_to_language_level()


        # Calling the super() explicitly to avoid confusion
        # with the multiple-inheritance
        OMPParallelDirective.lower_to_language_level(self)
        self.children[4].replace_with(OMPScheduleClause(self._omp_schedule))

        return self

I'm not sure how you feel about changing the tree like this during lowering @sergisiso ? This is a rough problem to solve in another way though - its possible I could do it at transformation application (or just in general, if someone requests LFRICParallelLoopTrans with reprod set to true in the config it applies the 2 transformations separately instead regardless of whether reductions exist), but I don't know if we know whether there are reductions required until after transformation time.

Let me know if you have a preference between the options.

@LonelyCat124

Copy link
Copy Markdown
Collaborator Author

I decided to implement the transformation approach, and have a test in PSyclone now that works (we never tested that functionality anywhere before...). I'm hoping the ITs pass now but waiting on them for now.

@LonelyCat124

Copy link
Copy Markdown
Collaborator Author

@sergisiso This is ready for a look now, ITs all green.

@LonelyCat124

Copy link
Copy Markdown
Collaborator Author

So I can restrict it to

if node.walk(Kern) and node.walk(Kern)[0].is_reduction:
   # Do OMPDo and OMPParallel

But I have some questions from reading the Kern documentation which I suspect isn't true:

    @property
    def reprod_reduction(self):
        '''
        :returns: whether this kernel/built-in is enclosed within an OpenMP
                  do loop. If so report whether it has the reproducible flag
                  set. Note, this also catches OMPParallelDo Directives but
                  they have reprod set to False so it is OK.
        :rtype: bool

        '''
        ancestor = self.ancestor(OMPDoDirective)
        if ancestor:
            return ancestor.reprod
        return False

This claims that ancestor.reprod is always False for OMPParallelDo Directives, but I'm not sure that this is true.
Its only defined on OMPDoDirective, and the only place I can see this value being set is in OMPDoDirective.__init__ where its set with self._reprod = Config.get().reproducible_reductions (or = reprod if its passed to the constructor, but OMPParallelLoopTrans doesn't pass that argument), so maybe we're not doing the currently expected behaviour already? We had no tests that failed with the changes here apart from the one I added.

@arporter

Copy link
Copy Markdown
Member

Ooh, that doesn't look like it's right anymore. Can we remove the reprod_reduction method altogether perhaps?

@LonelyCat124

Copy link
Copy Markdown
Collaborator Author

Ooh, that doesn't look like it's right anymore. Can we remove the reprod_reduction method altogether perhaps?

I'm not sure, its tricky, in theory we need to check the ancestor's reprod behaviour, as it could be set to False in which case we need to know its not reproducible in places for determining whether some symbols are needed so I don't think so - I just think that part of the docstring is wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants