Skip to content

Improve indexing command#2133

Open
mascarpon3 wants to merge 6 commits intomainfrom
improve-indexing-command
Open

Improve indexing command#2133
mascarpon3 wants to merge 6 commits intomainfrom
improve-indexing-command

Conversation

@mascarpon3
Copy link
Copy Markdown
Contributor

@mascarpon3 mascarpon3 commented Mar 25, 2026

Purpose

to index all documents we need to be able to restart from last successful batch in case of crash. to do so we need to be able to specify time bounds on the bulk index to allow recovering from checkpoint after crash and asynchronously delegate it to celery worker.

Proposal

  • ✨ cache the last successful document.updated_at
  • ✨ add lower_time_bound and upper_time_bound args
  • ✨ apply async batch_document_indexer_task
  • ✨ make the command available in django admin

External contributions

Thank you for your contribution! 🎉

Please ensure the following items are checked before submitting your pull request:

  • I have read and followed the contributing guidelines
  • I have read and agreed to the Code of Conduct
  • I have signed off my commits with git commit --signoff (DCO compliance)
  • I have signed my commits with my SSH or GPG key (git commit -S)
  • My commit messages follow the required format: <gitmoji>(type) title description
  • I have added a changelog entry under ## [Unreleased] section (if noticeable change)
  • I have added corresponding tests for new features or bug fixes (if applicable)

@mascarpon3 mascarpon3 force-pushed the improve-indexing-command branch 6 times, most recently from a4f6689 to 2641629 Compare March 30, 2026 11:55
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 30, 2026

Walkthrough

Adds time-bounded indexing and optional async execution across the backend: new CLI flags (--lower-time-bound, --upper-time-bound, --async) and Makefile passthrough; a new admin form/view and URL to trigger indexing; a queryset helper filter_updated_at for inclusive time-range selection; indexer changes to accept a queryset, use iterator-based batching, and support crash-safe checkpointing; Celery task and trigger adjustments for bound-based dispatch; factory, form, template, and extensive tests to exercise sync/async and crash-recovery behaviors.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • jmaupetit
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Improve indexing command' directly relates to the main change: enhancing the indexing command with crash recovery, time bounds, async support, and admin access.
Description check ✅ Passed The description clearly explains the purpose (crash-safe recovery via checkpoints and async delegation), outlines the implementation approach with specific checklist items, and confirms compliance with contribution guidelines.
Docstring Coverage ✅ Passed Docstring coverage is 95.24% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch improve-indexing-command
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch improve-indexing-command

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mascarpon3 mascarpon3 force-pushed the improve-indexing-command branch 2 times, most recently from 8bdd083 to d086f36 Compare March 30, 2026 15:39
i think some docstrings, about a counter, were outdated
i add more details to help understand the logic
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the document indexing workflow by adding time-bounded batch selection, crash-safe checkpointing for restartability, and making bulk indexing runnable asynchronously via Celery (including a Django admin trigger).

Changes:

  • Added lower_time_bound / upper_time_bound filtering to build an indexable document queryset.
  • Updated the indexer batching logic to support crash-safe checkpointing (cache last successful updated_at).
  • Made the index management command enqueue a Celery batch task and added an admin page to trigger indexing.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
src/backend/core/utils.py Adds build_indexable_documents_queryset() to filter documents by updated/deleted timestamps.
src/backend/core/tests/test_utils.py Adds test coverage for the new time-bound queryset builder.
src/backend/core/tests/test_services_search_indexers.py Updates tests to the new index(queryset, ...) signature.
src/backend/core/tests/conftest.py Adds helper to create documents with controlled updated_at values for time-bound tests.
src/backend/core/tests/commands/test_index.py Expands command tests for bounds, crash-safe mode, and checkpoint recovery behavior.
src/backend/core/templates/runindexing.html Introduces an admin HTML form to trigger indexing with parameters.
src/backend/core/tasks/search.py Removes old search indexing tasks module.
src/backend/core/tasks/indexing.py Adds new Celery tasks module for single/batch indexing + debounce/throttle.
src/backend/core/signals.py Switches signal-triggered indexing to the new tasks module.
src/backend/core/services/search_indexers.py Adds checkpoint constant, updates batching implementation, and extends index() API with crash-safe mode.
src/backend/core/models.py Adds RunIndexing model used to expose the admin entry point.
src/backend/core/migrations/0032_runindexing.py Creates the RunIndexing model via migration.
src/backend/core/management/commands/index.py Updates the command to accept bounds/crash-safe mode and enqueue the Celery batch task.
src/backend/core/admin.py Registers RunIndexing admin and implements the POST handler to call the command.
docs/commands/index.md Adds documentation for CLI/admin/Make usage and parameters.
Makefile Extends make index to pass optional parameters through to manage.py index.
CHANGELOG.md Adds an entry describing the indexing command improvements.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/commands/index.md Outdated
Comment thread src/backend/core/templates/runindexing.html Outdated
Comment thread src/backend/core/management/commands/index.py Outdated
Comment thread src/backend/core/migrations/0032_runindexing.py Outdated
Comment thread src/backend/core/services/search_indexers.py
Comment thread src/backend/core/tasks/search.py
Comment thread src/backend/core/templates/runindexing.html Outdated
Comment thread src/backend/core/templates/runindexing.html Outdated
Comment thread src/backend/core/management/commands/index.py Outdated
Comment thread src/backend/core/management/commands/index.py
@suitenumerique suitenumerique deleted a comment from Copilot AI Mar 30, 2026
Comment thread src/backend/core/management/commands/index.py
Comment thread src/backend/core/management/commands/index.py Outdated
Comment thread src/backend/core/templates/runindexing.html
Comment thread src/backend/core/tests/conftest.py Outdated
Comment thread src/backend/core/tasks/search.py
Comment thread src/backend/core/models.py Outdated
Comment thread src/backend/core/utils.py Outdated
Comment thread src/backend/core/utils.py Outdated
Comment thread Makefile Outdated
Comment thread src/backend/core/admin.py Outdated
Comment thread src/backend/core/services/search_indexers.py Outdated
@mascarpon3 mascarpon3 force-pushed the improve-indexing-command branch 4 times, most recently from 94b921a to da5556a Compare April 3, 2026 15:07
@mascarpon3 mascarpon3 force-pushed the improve-indexing-command branch 2 times, most recently from 030084f to 980befa Compare April 8, 2026 15:13
@mascarpon3 mascarpon3 marked this pull request as ready for review April 8, 2026 16:07
@mascarpon3 mascarpon3 requested a review from qbey April 8, 2026 16:07
Comment thread src/backend/core/tests/conftest.py Outdated
Comment thread src/backend/core/templates/admin/index.html Outdated
Comment thread src/backend/core/templates/runindexing.html Outdated
Comment thread src/backend/core/tests/test_models_documents.py Outdated
Comment thread src/backend/core/tests/test_models_documents.py Outdated
Comment thread src/backend/core/models.py
@mascarpon3 mascarpon3 force-pushed the improve-indexing-command branch from 980befa to f71cbe0 Compare April 22, 2026 13:28
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

♻️ Duplicate comments (1)
src/backend/core/admin.py (1)

19-50: 🧹 Nitpick | 🔵 Trivial

Monkey-patching admin.site.get_app_list is fragile; prefer a custom AdminSite or an admin index template override.

This rebinds get_app_list on the default AdminSite instance at import time, which:

  • Silently breaks if another app tries the same trick (last import wins).
  • Is sensitive to import ordering (if admin.py is imported before all other apps register, the original captured reference could change semantics in edge cases).
  • Injects a pseudo-"app" that Django’s permission framework never checked — has_module_perms: True is set unconditionally, so the menu item shows even for staff users who have no business running indexing.

Consider subclassing AdminSite, or simply linking to /admin/core/run-indexing/ from a small admin/index.html template override guarded by a permission check. At minimum, gate the commands_app entry on request.user.is_superuser (or a specific permission) so non-privileged staff don't see a button they (hopefully, per the fix above) cannot use.

♻️ Minimal hardening
 def custom_get_app_list(self, request, app_label=None):
     """Add custom commands to the app list."""
     app_list = _original_get_app_list(request, app_label)
 
+    if not request.user.is_superuser:
+        return app_list
+
     # Add Commands app with Run Indexing command
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/core/admin.py` around lines 19 - 50, The current monkey-patch of
admin.site.get_app_list in custom_get_app_list is fragile and exposes a Commands
entry to all staff; change this by either (A) creating a custom AdminSite
subclass and installing it instead of rebinding admin.site.get_app_list, or (B)
at minimum gate the commands_app addition inside custom_get_app_list on a
permission check (e.g., request.user.is_superuser or a specific permission) so
the pseudo-app is only appended for authorized users; update references to
custom_get_app_list, admin.site.get_app_list and the commands_app dict
accordingly and remove the import-time rebinding if you implement the AdminSite
subclass approach.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.md`:
- Around line 17-20: Move the changelog entry "✨(backend) improve indexing
command" out of the dated v4.8.3 section and place it under the top-level
"[Unreleased]" heading; specifically, locate the block containing "✨(backend)
improve indexing command" with its subpoints "checkpoint recovery",
"asynchronicity", and "admin command trigger" and cut/paste it beneath the
"[Unreleased]" header so the entry is no longer listed under the v4.8.3 dated
release.

In `@docs/commands/index.md`:
- Around line 64-65: Update the documentation text to use the correct field name
`updated_at` (underscore) instead of `updated.at`; specifically edit the
sentence that describes saving the last document's timestamp into the
`bulk-indexer-checkpoint` cache and its use as the `lower-time-bound` so it
reads "updated_at" where the field name is referenced.
- Line 5: The top-level sentence in docs/commands/index.md is misleading; update
the description to state the command runs synchronously by default and only
dispatches to Celery when the --async flag is passed (matching the argparse
default async_mode=False in the Command implementation in
src/backend/core/management/commands/index.py); reference the --async flag and
the Command class/async_mode setting to ensure docs and code behavior are
consistent.
- Around line 22-28: Update the example CLI invocation to use the correct flag
name: replace the incorrect `--async_mode` with `--async` so it matches the
argparse definition (`dest="async_mode"`); ensure any other examples showing the
async flag use `--async` (not `--async_mode`) to avoid "unrecognized arguments"
errors when running the `index` command.
- Around line 34-36: Fix the broken ordered list and inconsistent button label:
update the numbered steps in docs/commands/index.md so they read 1, 2, 3, 4
(insert the missing step 2 or renumber sequentially) and make the button text in
the docs match the actual submit value used in the template (runindexing.html) —
either change the docs to say "Run Indexing" or update the template's submit
value to "Run Indexing Command" so both the markdown steps and the template
submit button value are consistent.
- Around line 57-60: Replace the incorrect heading `## --async_mode` with the
correct CLI flag name and heading level by changing it to `### --async` to match
other parameter headings; fix the description line ("Runs asynchronously is
async_mode==True.") to a clear grammatical sentence such as "Runs asynchronously
when the --async flag is set." and keep the type as "Boolean flag" and default
"False" unchanged.

In `@src/backend/core/admin.py`:
- Around line 274-286: After validating RunIndexingForm, stop re-reading
request.POST and use form.cleaned_data for batch_size, lower_time_bound, and
upper_time_bound when calling call_command("index"); replace
int(request.POST.get("batch_size")) and
convert_to_isoformat(request.POST.get(...)) with the form.cleaned_data values
(e.g., form.cleaned_data["batch_size"], form.cleaned_data["lower_time_bound"],
form.cleaned_data["upper_time_bound"]) so you rely on the form's coercion and
cross-field validation; remove or stop using convert_to_isoformat for these
parameters if the form already returns the correct datetime/typed values.
- Around line 271-291: The run_indexing_view currently lacks any
authentication/permission checks allowing unauthenticated users to trigger
indexing; restrict access by decorating or checking permissions before
processing POST: add a decorator such as `@staff_member_required` or
`@login_required` plus a permission check (e.g. user.is_superuser or Django's
`@permission_required`("core.run_indexing")) on run_indexing_view (or enforce
inside the function by validating request.user and returning
HttpResponseForbidden), ensure the RunIndexingForm handling remains the same,
and import the appropriate decorators (or permission utilities) and use them on
run_indexing_view to prevent unauthorized POSTs.
- Around line 287-291: The view currently renders the template directly after
handling the POST which allows form resubmission on refresh; change the POST
handling to follow Post/Redirect/Get: after calling messages.success(request,
_("Indexing triggered!")) perform a redirect (e.g., return
redirect(request.path) or return redirect(reverse(...))) instead of falling
through to template rendering, and likewise after messages.error(...) redirect
back to the form page so both the success and error POST branches return a
redirect; also ensure django.shortcuts.redirect (and django.urls.reverse if
used) are imported and that RunIndexingForm() remains used for GET.

In `@src/backend/core/forms.py`:
- Around line 33-40: The cross-field validation in the form's clean() currently
raises forms.ValidationError which creates a non-field error; change it to
attach the error to the specific field by calling
self.add_error("upper_time_bound", ...) when lower_time_bound >
upper_time_bound, and wrap the message with gettext_lazy (use _(...)) to keep
i18n consistent; update the clean method where lower_time_bound and
upper_time_bound are compared to use self.add_error instead of raise.

In `@src/backend/core/management/commands/index.py`:
- Around line 65-76: Wrap the async dispatch call to
batch_document_indexer_task.apply_async in a try/except block (similar to the
sync branch) so enqueue failures are caught and re-raised as a clean
CommandError; specifically, catch the concrete broker-related exceptions (e.g.,
kombu.exceptions.OperationalError) or a general Exception around the apply_async
call, log the failure via logger.error, and raise CommandError with a clear
message including the original exception details; keep the logger.info("Document
indexing task sent to worker") only on success.

In `@src/backend/core/models.py`:
- Around line 875-892: The time-bounded filter in filter_updated_at() currently
checks only updated_at and ancestors_deleted_at, which misses restored documents
because restore() clears/moves ancestors_deleted_at without bumping updated_at;
change the restore path (restore()/descendant restore logic) to also update a
field that the filter can use (either increment updated_at or set a dedicated
indexation timestamp like index_updated_at on soft-delete and restore), and
update filter_updated_at() to include that indexation timestamp in its Q
conditions so restored documents fall inside bounded re-index runs.

In `@src/backend/core/services/search_indexers.py`:
- Around line 163-167: The code only logs the checkpoint when crash_safe_mode is
true; instead persist the last successful checkpoint value
serialized_batch[-1]["updated_at"] into a durable cache/store so the indexing
run can resume; modify the block that currently calls logger.info(...) (the
crash_safe_mode branch) to also write that timestamp into your caching layer
(e.g., cache.set("search_indexer:checkpoint",
serialized_batch[-1]["updated_at"]) or use the existing cache client in this
module/service) and ensure any error writing the checkpoint is handled/logged so
it won't break the indexing loop.

In `@src/backend/core/templates/runindexing.html`:
- Around line 11-13: The paragraph text in runindexing.html ("This command
triggers the indexing of all documents within the specified time bound.") is a
hardcoded user-facing string and must be wrapped for translation to match the
template's i18n convention (the submit button already uses {% translate %});
update the <p> element to use either {% translate %} or {% blocktranslate %} so
the text is internationalized (keep {% blocktranslate %} if you later add
punctuation/formatting).

In `@src/backend/core/tests/commands/test_index.py`:
- Around line 169-175: The test calls call_command("index", ...) with mixed
types for time bounds—lower_time_bound passed as a datetime
(documents[3].updated_at) while upper_time_bound is a string (.isoformat())—so
make the test realistic by passing both bounds as strings just like the CLI
would; update the call_command invocation in the test to convert
lower_time_bound to a string (e.g., documents[3].updated_at.isoformat()) so both
lower_time_bound and upper_time_bound are ISO-formatted strings and the mocked
FindDocumentIndexer.push behavior remains unchanged.

In `@src/backend/core/tests/conftest.py`:
- Around line 34-37: The test fixture is mutating global logger propagation
(logging.getLogger(logger_name).propagate = True) without restoring it, which
can leak into other tests; change the fixture to record the original propagate
values for each logger from settings.LOGGING, set propagate True for the test,
yield caplog, and then in a finally/teardown restore each logger's propagate to
its saved value so original state is preserved (reference settings.LOGGING,
logging.getLogger(logger_name).propagate, and caplog).

In `@src/backend/core/tests/test_models_documents.py`:
- Around line 1709-1787: The tests (e.g.,
test_models_documents_manager_time_filter_oldest_updated_at,
test_models_documents_manager_time_filter_newest_updated_at,
test_models_documents_manager_time_filter_both_bounds, and
test_utils_build_indexable_documents_queryset_ancestors_deleted_at) currently
construct naive datetimes with datetime(...) and pass them to
DocumentFactory(updated_at=...) and
models.Document.objects.filter_updated_at(...); update those datetime(...) calls
to produce timezone-aware datetimes (for example datetime(...,
tzinfo=timezone.utc) or use django.utils.timezone.make_aware) so the
DateTimeField comparisons are consistent across timezones; ensure every
lower_time_bound and upper_time_bound and any updated_at / ancestors_deleted_at
factory args are changed accordingly and imports adjusted (timezone or
make_aware) where used.

In `@src/backend/impress/urls.py`:
- Around line 15-19: The URL for run_indexing_view is currently registered under
"admin/core/run-indexing/" which risks colliding with the Django admin app
namespace; move the route out of the `core` namespace by changing the
urlpatterns entry that references run_indexing_view so it uses a non-conflicting
path such as "admin/run-indexing/" (update the urlpatterns list containing
run_indexing_view and admin.site.urls), or alternatively register the view via a
custom AdminSite.get_urls() override so the endpoint is properly namespaced
under the admin site (refer to run_indexing_view and admin.site.urls when making
the change).

---

Duplicate comments:
In `@src/backend/core/admin.py`:
- Around line 19-50: The current monkey-patch of admin.site.get_app_list in
custom_get_app_list is fragile and exposes a Commands entry to all staff; change
this by either (A) creating a custom AdminSite subclass and installing it
instead of rebinding admin.site.get_app_list, or (B) at minimum gate the
commands_app addition inside custom_get_app_list on a permission check (e.g.,
request.user.is_superuser or a specific permission) so the pseudo-app is only
appended for authorized users; update references to custom_get_app_list,
admin.site.get_app_list and the commands_app dict accordingly and remove the
import-time rebinding if you implement the AdminSite subclass approach.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e277ab42-da6c-44f6-a14c-ed1e160cc90d

📥 Commits

Reviewing files that changed from the base of the PR and between aed8ae7 and f71cbe0.

📒 Files selected for processing (16)
  • CHANGELOG.md
  • Makefile
  • docs/commands/index.md
  • src/backend/core/admin.py
  • src/backend/core/factories.py
  • src/backend/core/forms.py
  • src/backend/core/management/commands/index.py
  • src/backend/core/models.py
  • src/backend/core/services/search_indexers.py
  • src/backend/core/tasks/search.py
  • src/backend/core/templates/runindexing.html
  • src/backend/core/tests/commands/test_index.py
  • src/backend/core/tests/conftest.py
  • src/backend/core/tests/test_models_documents.py
  • src/backend/core/tests/test_services_search_indexers.py
  • src/backend/impress/urls.py

Comment thread CHANGELOG.md Outdated
Comment thread docs/commands/index.md Outdated
Comment thread docs/commands/index.md
Comment thread docs/commands/index.md Outdated
Comment thread docs/commands/index.md Outdated
Comment thread src/backend/core/templates/runindexing.html
Comment thread src/backend/core/tests/commands/test_index.py
Comment thread src/backend/core/tests/conftest.py Outdated
Comment thread src/backend/core/tests/test_models_documents.py Outdated
Comment thread src/backend/impress/urls.py
@mascarpon3 mascarpon3 requested a review from qbey April 23, 2026 08:19
crash-save mode consist in indexing documents in ascending
updated_at order and save the last document.update_at.
This allows resuming indexing from the last successful batch
in case of a crash.

Signed-off-by: charles <charles.englebert@protonmail.com>
we need to be able to specify time bounds on the bulk index
to allow recovering from checkpoint after crash
i am documenting the index command

Signed-off-by: charles <charles.englebert@protonmail.com>
we want to run the indexing from the admin.
in `dmin/core/runindexing/`is a form to do so.

Signed-off-by: charles <charles.englebert@protonmail.com>

✨(backend) add async_mode flag

the command must be killable.
this adds a async_mode flag to preserve async
feature and allow running sync.

Signed-off-by: charles <charles.englebert@protonmail.com>
the index command has new params to add on the make command

Signed-off-by: charles <charles.englebert@protonmail.com>
@mascarpon3 mascarpon3 force-pushed the improve-indexing-command branch from f71cbe0 to c75f9b7 Compare April 24, 2026 10:03
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

♻️ Duplicate comments (1)
CHANGELOG.md (1)

12-13: ⚠️ Potential issue | 🟡 Minor

Remove trailing whitespace.

Lines 12 and 13 have trailing whitespace after "checkpoint recovery" and "asynchronicity", which is inconsistent with line 14. This should be cleaned up for consistency.

🧹 Proposed fix to remove trailing whitespace
 - ✨(backend) improve indexing command
-  - checkpoint recovery 
-  - asynchronicity 
+  - checkpoint recovery
+  - asynchronicity
   - admin command trigger
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG.md` around lines 12 - 13, Remove the trailing whitespace at the end
of the lines containing the list items "checkpoint recovery" and
"asynchronicity" in CHANGELOG.md; update those two lines so they end immediately
after the text (no spaces or tabs) to match the formatting of the surrounding
lines.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/commands/index.md`:
- Line 58: The documentation line contains a leftover diff marker ("+-
**description:** When set, dispatches the indexing job to a Celery worker...");
locate that exact string in docs/commands/index.md and remove the leading '+' so
the line becomes a normal Markdown bullet (e.g., "- **description:** When set,
...") or simply "**description:** ..." depending on surrounding list style,
ensuring no stray diff markers remain.

In `@src/backend/core/admin.py`:
- Around line 17-50: Replace the fragile monkey-patch of admin.site.get_app_list
by creating a proper AdminSite subclass that overrides get_app_list (move
custom_get_app_list logic into that override) and register it as the default
site via AppConfig.default_site (or instantiate and use your subclass when
configuring the app), or alternatively implement a normal admin entry for the
RunIndexing command by registering a ModelAdmin/proxy model or adding a custom
admin view via admin.site.get_urls so the "Commands" app appears through
Django's standard app-list machinery; remove the lambda rebind of
admin.site.get_app_list and the unused _self parameter in custom_get_app_list
once migrated.
- Around line 26-41: The Commands entry is always visible because
custom_get_app_list unconditionally appends the commands_app with
has_module_perms=True; update custom_get_app_list to only include or set
has_module_perms for the commands_app when the current request.user has staff
privileges (e.g., request.user.is_staff) or the specific permission you want to
gate; locate the commands_app dict and the custom_get_app_list function and
either skip appending commands_app for non-staff users or set its
"has_module_perms" to False when request.user.is_staff is False so the
"RunIndexing" link is hidden from non-staff.

In `@src/backend/core/forms.py`:
- Around line 17-22: The DateTimeField widgets lower_time_bound and
upper_time_bound need explicit input_formats to accept HTML5 "datetime-local"
values (e.g. "%Y-%m-%dT%H:%M" and "%Y-%m-%dT%H:%M:%S") and you must normalize
timezone awareness before passing results to filter_updated_at; update those
fields to include input_formats=["%Y-%m-%dT%H:%M", "%Y-%m-%dT%H:%M:%S"] and
then, depending on your project USE_TZ setting, either convert the parsed naive
datetimes to aware datetimes using
django.utils.timezone.make_aware(timezone.get_current_timezone()) before calling
filter_updated_at or adjust filter_updated_at to accept naive values by
localizing them consistently.

In `@src/backend/core/services/search_indexers.py`:
- Around line 144-166: Crash-safe recovery currently orders documents only by
updated_at which is non-deterministic for tied timestamps and can cause
duplicate pushes at checkpoint boundaries; either confirm with the Find team
that the indexing endpoint used by push() is idempotent (upsert by id) and
document that guarantee, or make the code deterministic by changing the ordering
to include a secondary key (e.g., order_by("updated_at", "id")) and update the
checkpoint logic to resume with a compound predicate (updated_at >= last_time
AND id > last_id) so documents at the boundary are not re-pushed; update
serialize_document/get_batch_accesses_by_users_and_teams usage as needed to
return the last_id for checkpointing and adjust logger.info to include both
updated_at and id when emitting the checkpoint.

In `@src/backend/core/templates/runindexing.html`:
- Line 6: The form element in runindexing.html is relying on the default POST
target; update the <form> tag to include an explicit action attribute (for
example action="" or action="{% url 'run_indexing' %}") so the template is safe
for reuse/inclusion—modify the <form method="POST"> declaration in the template
to include that action value.

In `@src/backend/core/tests/test_admin_run_indexing.py`:
- Line 5: The test contains an unused mock return and an unnecessary import:
remove the HttpResponse import and drop the assignment
mock_call_command.return_value = HttpResponse("Mocked render") in
test_admin_run_indexing.py (also remove the duplicated lines at 35-37); the view
under test (run_indexing_view) does a redirect and does not consume
call_command's return value, so simply mock call_command without setting a
return_value and keep assertions about call invocation and redirect behavior.

In `@src/backend/core/tests/test_models_documents.py`:
- Around line 1699-1796: Remove the redundant import "from core.factories import
DocumentFactory" and update the tests to use the already-imported factories
module (i.e., factories.DocumentFactory) so all factory usage is consistent;
locate references to DocumentFactory in the new test functions (e.g.,
test_models_documents_manager_time_filter_* and
test_utils_build_indexable_documents_queryset_ancestors_deleted_at) and replace
them with factories.DocumentFactory, then delete the duplicate import statement.

---

Duplicate comments:
In `@CHANGELOG.md`:
- Around line 12-13: Remove the trailing whitespace at the end of the lines
containing the list items "checkpoint recovery" and "asynchronicity" in
CHANGELOG.md; update those two lines so they end immediately after the text (no
spaces or tabs) to match the formatting of the surrounding lines.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c2856f8d-9c81-4714-943a-4025c2b2022a

📥 Commits

Reviewing files that changed from the base of the PR and between f71cbe0 and c75f9b7.

📒 Files selected for processing (17)
  • CHANGELOG.md
  • Makefile
  • docs/commands/index.md
  • src/backend/core/admin.py
  • src/backend/core/factories.py
  • src/backend/core/forms.py
  • src/backend/core/management/commands/index.py
  • src/backend/core/models.py
  • src/backend/core/services/search_indexers.py
  • src/backend/core/tasks/search.py
  • src/backend/core/templates/runindexing.html
  • src/backend/core/tests/commands/test_index.py
  • src/backend/core/tests/conftest.py
  • src/backend/core/tests/test_admin_run_indexing.py
  • src/backend/core/tests/test_models_documents.py
  • src/backend/core/tests/test_services_search_indexers.py
  • src/backend/impress/urls.py

Comment thread docs/commands/index.md
### `--async`
- **type:** Boolean flag
- **default:** `False`
+- **description:** When set, dispatches the indexing job to a Celery worker instead of running it synchronously.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Stray + diff marker leftover in rendered documentation.

The line literally starts with +- **description:** — a leftover from a previously applied suggested diff that was pasted verbatim. Markdown will render this with a leading +.

📝 Proposed fix
-+- **description:** When set, dispatches the indexing job to a Celery worker instead of running it synchronously.
+- **description:** When set, dispatches the indexing job to a Celery worker instead of running it synchronously.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
+- **description:** When set, dispatches the indexing job to a Celery worker instead of running it synchronously.
- **description:** When set, dispatches the indexing job to a Celery worker instead of running it synchronously.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/commands/index.md` at line 58, The documentation line contains a
leftover diff marker ("+- **description:** When set, dispatches the indexing job
to a Celery worker..."); locate that exact string in docs/commands/index.md and
remove the leading '+' so the line becomes a normal Markdown bullet (e.g., "-
**description:** When set, ...") or simply "**description:** ..." depending on
surrounding list style, ensuring no stray diff markers remain.

Comment thread src/backend/core/admin.py
Comment on lines +17 to +50
# Customize the default admin site's get_app_list method
_original_get_app_list = admin.site.get_app_list


def custom_get_app_list(_self, request, app_label=None):
"""Add custom commands to the app list."""
app_list = _original_get_app_list(request, app_label)

# Add Commands app with Run Indexing command
commands_app = {
"name": _("Commands"),
"app_label": "commands",
"app_url": "#",
"has_module_perms": True,
"models": [
{
"name": _("Run indexing"),
"object_name": "RunIndexing",
"admin_url": "/admin/run-indexing/",
"view_only": False,
"add_url": None,
"change_url": None,
}
],
}

app_list.append(commands_app)
return app_list


# Monkey-patch the admin site
admin.site.get_app_list = lambda request, app_label=None: custom_get_app_list(
admin.site, request, app_label
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Monkey-patching admin.site.get_app_list is fragile; prefer a AdminSite subclass or clean override.

Directly reassigning admin.site.get_app_list to a lambda:

  • Is not composable: if another module (now or in the future) does the same thing, only the last patch wins, and the order becomes import-order-dependent.
  • Makes the override hard to discover and to test in isolation.
  • Captures _original_get_app_list at import time; any later rebinding by another app is invisible to this wrapper.

Cleaner alternatives:

  1. Subclass AdminSite, override get_app_list, and set default_site via AppConfig.default_site.
  2. Or add a proper ModelAdmin/entry for RunIndexing (e.g. a tiny proxy model or a dedicated admin view registered via admin.site.get_urls) so the "Commands" entry comes through the normal app-list machinery.

Also, custom_get_app_list(_self, request, app_label=None) never uses _self; you can drop it and the lambda’s admin.site argument:

♻️ Suggested simplification
-def custom_get_app_list(_self, request, app_label=None):
+def custom_get_app_list(request, app_label=None):
     """Add custom commands to the app list."""
     app_list = _original_get_app_list(request, app_label)
     ...
     app_list.append(commands_app)
     return app_list


-# Monkey-patch the admin site
-admin.site.get_app_list = lambda request, app_label=None: custom_get_app_list(
-    admin.site, request, app_label
-)
+admin.site.get_app_list = custom_get_app_list
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/core/admin.py` around lines 17 - 50, Replace the fragile
monkey-patch of admin.site.get_app_list by creating a proper AdminSite subclass
that overrides get_app_list (move custom_get_app_list logic into that override)
and register it as the default site via AppConfig.default_site (or instantiate
and use your subclass when configuring the app), or alternatively implement a
normal admin entry for the RunIndexing command by registering a ModelAdmin/proxy
model or adding a custom admin view via admin.site.get_urls so the "Commands"
app appears through Django's standard app-list machinery; remove the lambda
rebind of admin.site.get_app_list and the unused _self parameter in
custom_get_app_list once migrated.

Comment thread src/backend/core/admin.py
Comment on lines +26 to +41
commands_app = {
"name": _("Commands"),
"app_label": "commands",
"app_url": "#",
"has_module_perms": True,
"models": [
{
"name": _("Run indexing"),
"object_name": "RunIndexing",
"admin_url": "/admin/run-indexing/",
"view_only": False,
"add_url": None,
"change_url": None,
}
],
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

"Commands" entry is always shown, regardless of staff status.

custom_get_app_list unconditionally appends the commands_app with "has_module_perms": True. Any user reaching the admin index page will see the "Run indexing" link, even if they are not staff (the view itself is now correctly staff-gated, so clicking it will redirect to login — but the entry is still visible). Consider gating visibility on request.user.is_staff (or a more specific permission) so non-staff users don't see a link they can't use.

🔒 Suggested fix
 def custom_get_app_list(_self, request, app_label=None):
     """Add custom commands to the app list."""
     app_list = _original_get_app_list(request, app_label)

+    if not request.user.is_staff:
+        return app_list
+
     # Add Commands app with Run Indexing command
     commands_app = { ... }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/core/admin.py` around lines 26 - 41, The Commands entry is always
visible because custom_get_app_list unconditionally appends the commands_app
with has_module_perms=True; update custom_get_app_list to only include or set
has_module_perms for the commands_app when the current request.user has staff
privileges (e.g., request.user.is_staff) or the specific permission you want to
gate; locate the commands_app dict and the custom_get_app_list function and
either skip appending commands_app for non-staff users or set its
"has_module_perms" to False when request.user.is_staff is False so the
"RunIndexing" link is hidden from non-staff.

Comment thread src/backend/core/forms.py
Comment on lines +17 to +22
lower_time_bound = forms.DateTimeField(
required=False, widget=forms.TextInput(attrs={"type": "datetime-local"})
)
upper_time_bound = forms.DateTimeField(
required=False, widget=forms.TextInput(attrs={"type": "datetime-local"})
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Look at Django's default DATETIME_INPUT_FORMATS and project overrides
rg -nP 'DATETIME_INPUT_FORMATS|USE_TZ\s*=' --type=py -C1
# Check how the index command's datetime.fromisoformat() handles the produced string
rg -nP 'fromisoformat|isoformat\(' src/backend/core --type=py -C2

Repository: suitenumerique/docs

Length of output: 50376


🏁 Script executed:

# Search for fromisoformat usage more specifically
rg -n 'fromisoformat' src/backend --type=py -B2 -A2

Repository: suitenumerique/docs

Length of output: 1019


🏁 Script executed:

# Look for admin.py and how it uses the form
fd -type f -name 'admin.py' src/backend

Repository: suitenumerique/docs

Length of output: 234


🏁 Script executed:

# Search for DATETIME_INPUT_FORMATS customization
rg -n 'DATETIME_INPUT_FORMATS' src/backend --type=py -A3

Repository: suitenumerique/docs

Length of output: 45


🏁 Script executed:

# Find the index command that uses these bounds
rg -n 'lower_time_bound|upper_time_bound' src/backend --type=py -B2 -A2

Repository: suitenumerique/docs

Length of output: 22766


🏁 Script executed:

# Find Django version requirement
find . -name 'requirements*.txt' -o -name 'pyproject.toml' -o -name 'setup.py' | xargs grep -l 'django' 2>/dev/null | head -5

Repository: suitenumerique/docs

Length of output: 92


🏁 Script executed:

# Check for tests of the admin form itself with datetime submission
rg -n 'RunIndexingForm|test.*admin.*indexing' src/backend/core/tests --type=py -l

Repository: suitenumerique/docs

Length of output: 45


🏁 Script executed:

# Look at actual admin.py to see the context around form usage
cat -n src/backend/core/admin.py | sed -n '270,295p'

Repository: suitenumerique/docs

Length of output: 1256


🏁 Script executed:

# Check Django version in pyproject.toml
cat src/backend/pyproject.toml | grep -A 20 -B 5 'django'

Repository: suitenumerique/docs

Length of output: 3191


🏁 Script executed:

# Look for any tests of the RunIndexingForm with actual datetime-local input
rg -n 'RunIndexingForm|run_indexing_view' src/backend/core/tests --type=py -B 3 -A 5

Repository: suitenumerique/docs

Length of output: 1649


🏁 Script executed:

# Check if there's a settings file that customizes DATETIME_INPUT_FORMATS
find src/backend -name 'settings.py' -o -name 'settings_*' | xargs cat 2>/dev/null | rg -A 5 'DATETIME_INPUT_FORMATS' || echo "No DATETIME_INPUT_FORMATS found"

Repository: suitenumerique/docs

Length of output: 95


🏁 Script executed:

# Get the full test file to see how the form is tested
cat -n src/backend/core/tests/test_admin_run_indexing.py

Repository: suitenumerique/docs

Length of output: 1928


🏁 Script executed:

# Check Django's default DATETIME_INPUT_FORMATS to understand what it accepts
python3 << 'EOF'
from django.conf import settings
import django
# Try to get Django's default input formats
print("Python code to understand Django's DATETIME_INPUT_FORMATS")
# In Django 3.2+, the defaults include ISO 8601 formats
# We need to know which version of Django is being used
# Version constraint is: django<6.0.0 (from pyproject.toml)
# So it could be Django 3.2, 4.0, 4.1, 4.2, or 5.x

# Let's check what formats modern Django supports
try:
    import django
    print(f"Django version in environment: {django.get_version() if hasattr(django, 'get_version') else 'unknown'}")
except:
    pass

# The key issue: does DateTimeField accept "2024-02-01T12:00" format?
# Django's default DATETIME_INPUT_FORMATS (since 3.2) are:
# '%Y-%m-%d %H:%M:%S',
# '%Y-%m-%d %H:%M:%S.%f',
# '%Y-%m-%d %H:%M',
# Note: NO T-separated format by default

print("Django defaults do NOT include T-separated ISO format by default")
print("HTML5 datetime-local submits: 2024-02-01T12:00")
print("Django defaults expect: 2024-02-01 12:00")
print("\nThis is a parsing mismatch!")
EOF

Repository: suitenumerique/docs

Length of output: 182


🏁 Script executed:

# Look at the complete forms.py file to see all form definitions
cat -n src/backend/core/forms.py

Repository: suitenumerique/docs

Length of output: 1684


🏁 Script executed:

# Search for any test that actually submits datetime values to the form
rg -n 'lower_time_bound.*=' src/backend/core/tests --type=py -B 2 -A 2 | head -50

Repository: suitenumerique/docs

Length of output: 4398


🏁 Script executed:

# Check if there's a custom DateTimeField or widget in the codebase
rg -n 'class.*DateTimeField|class.*DateTimeInput' src/backend --type=py

Repository: suitenumerique/docs

Length of output: 45


Add explicit input_formats to DateTimeField to accept HTML5 datetime-local format.

The HTML5 datetime-local widget submits values as YYYY-MM-DDTHH:MM[SS] (T-separated, no timezone), but Django's default DATETIME_INPUT_FORMATS only include space-separated formats (%Y-%m-%d %H:%M[:%S]). Form validation will fail with "Enter a valid date/time" when users submit via the HTML5 widget.

Also ensure consistent timezone handling: form fields produce naive datetimes (no tzinfo); these are then passed to filter_updated_at() which compares against tz-aware updated_at values. Either use USE_TZ=False or explicitly handle timezone awareness in the form and downstream queries.

Suggested fix
lower_time_bound = forms.DateTimeField(
    required=False,
    widget=forms.TextInput(attrs={"type": "datetime-local"}),
    input_formats=["%Y-%m-%dT%H:%M", "%Y-%m-%dT%H:%M:%S"]
)
upper_time_bound = forms.DateTimeField(
    required=False,
    widget=forms.TextInput(attrs={"type": "datetime-local"}),
    input_formats=["%Y-%m-%dT%H:%M", "%Y-%m-%dT%H:%M:%S"]
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/core/forms.py` around lines 17 - 22, The DateTimeField widgets
lower_time_bound and upper_time_bound need explicit input_formats to accept
HTML5 "datetime-local" values (e.g. "%Y-%m-%dT%H:%M" and "%Y-%m-%dT%H:%M:%S")
and you must normalize timezone awareness before passing results to
filter_updated_at; update those fields to include
input_formats=["%Y-%m-%dT%H:%M", "%Y-%m-%dT%H:%M:%S"] and then, depending on
your project USE_TZ setting, either convert the parsed naive datetimes to aware
datetimes using
django.utils.timezone.make_aware(timezone.get_current_timezone()) before calling
filter_updated_at or adjust filter_updated_at to accept naive values by
localizing them consistently.

Comment on lines +144 to +166
if crash_safe_mode:
queryset = queryset.order_by("updated_at")

for documents_batch in itertools.batched(queryset.iterator(), batch_size):
doc_paths = [doc.path for doc in documents_batch]
last_id = documents_batch[-1].id
accesses_by_document_path = get_batch_accesses_by_users_and_teams(doc_paths)

serialized_batch = [
self.serialize_document(document, accesses_by_document_path)
for document in documents_batch
if document.content or document.title
]

if serialized_batch:
self.push(serialized_batch)
count += len(serialized_batch)
if not serialized_batch:
continue

self.push(serialized_batch)
count += len(serialized_batch)

if crash_safe_mode:
logger.info(
"Indexing checkpoint: %s.",
serialized_batch[-1]["updated_at"],
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Inspect the Find push endpoint contract (upsert vs create-only) via docs or comments
rg -nH -C3 -e 'indexing|upsert|POST.*documents' -g '!**/node_modules/**' --iglob '*.md' --iglob '*.py'

Repository: suitenumerique/docs

Length of output: 24864


🏁 Script executed:

# Find the FindDocumentIndexer.push implementation
rg -nH --type py "def push" src/backend/core/services/search_indexers.py -A 20

Repository: suitenumerique/docs

Length of output: 2767


🏁 Script executed:

# Check the FindDocumentIndexer class structure to understand the push endpoint
rg -nH --type py "class FindDocumentIndexer" src/backend/core/services/search_indexers.py -A 50

Repository: suitenumerique/docs

Length of output: 4082


🏁 Script executed:

# Look for the Find API request handler to see if it's upsert or create-only
rg -nH --type py "requests\.(post|put)" src/backend/core/services/search_indexers.py -B 5 -A 5

Repository: suitenumerique/docs

Length of output: 1769


🏁 Script executed:

# Check test_index_with_crash_recovery to see what idempotency assumptions it validates
sed -n '100,200p' src/backend/core/tests/commands/test_index.py | cat -n

Repository: suitenumerique/docs

Length of output: 5116


Crash-safe recovery relies on undocumented push idempotency.

order_by("updated_at") is non-deterministic when multiple documents share the same timestamp. With inclusive lower_time_bound on resume, documents at the checkpoint boundary will be re-pushed on restart. The test confirms this behavior (documents[3] is re-indexed after recovery), but push() idempotency is neither documented in code nor verified in practice.

Before deploying this feature, confirm with the Find team that the indexing endpoint is idempotent (upsert by id, not create-only). If uncertain, add a secondary ordering key (order_by("updated_at", "id")) and advance the checkpoint by updated_at >= X AND id > last_id to eliminate duplicate pushes at boundaries.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/core/services/search_indexers.py` around lines 144 - 166,
Crash-safe recovery currently orders documents only by updated_at which is
non-deterministic for tied timestamps and can cause duplicate pushes at
checkpoint boundaries; either confirm with the Find team that the indexing
endpoint used by push() is idempotent (upsert by id) and document that
guarantee, or make the code deterministic by changing the ordering to include a
secondary key (e.g., order_by("updated_at", "id")) and update the checkpoint
logic to resume with a compound predicate (updated_at >= last_time AND id >
last_id) so documents at the boundary are not re-pushed; update
serialize_document/get_batch_accesses_by_users_and_teams usage as needed to
return the last_id for checkpointing and adjust logger.info to include both
updated_at and id when emitting the checkpoint.


{% block content %}

<form method="POST" >
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Nit: be explicit about the form target.

Relying on the default (posting to the current URL) works, but an explicit action="" (or {% url 'run_indexing' %}) makes the template safer against future re-use/inclusion.

✏️ Suggested tweak
-<form method="POST" >
+<form method="POST" action="{% url 'run_indexing' %}">

Also the two HTMLHint warnings (doctype-first, input-requires-label) are false positives here — the doctype comes from admin/base_site.html and labels are rendered by {{ form.as_p }}.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<form method="POST" >
<form method="POST" action="{% url 'run_indexing' %}">
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/core/templates/runindexing.html` at line 6, The form element in
runindexing.html is relying on the default POST target; update the <form> tag to
include an explicit action attribute (for example action="" or action="{% url
'run_indexing' %}") so the template is safe for reuse/inclusion—modify the <form
method="POST"> declaration in the template to include that action value.


from unittest.mock import patch

from django.http import HttpResponse
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Minor: unused mock return value and unused HttpResponse import.

mock_call_command.return_value = HttpResponse("Mocked render") has no effect — run_indexing_view never uses the return value of call_command; it redirects on success. The line and the HttpResponse import can be dropped for clarity.

🧹 Suggested cleanup
 from unittest.mock import patch

-from django.http import HttpResponse
-
 import pytest
@@
     with patch("core.admin.call_command") as mock_call_command:
-        mock_call_command.return_value = HttpResponse("Mocked render")
         response = client.post("/admin/run-indexing/", {"batch_size": batch_size})

Also applies to: 35-37

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/core/tests/test_admin_run_indexing.py` at line 5, The test
contains an unused mock return and an unnecessary import: remove the
HttpResponse import and drop the assignment mock_call_command.return_value =
HttpResponse("Mocked render") in test_admin_run_indexing.py (also remove the
duplicated lines at 35-37); the view under test (run_indexing_view) does a
redirect and does not consume call_command's return value, so simply mock
call_command without setting a return_value and keep assertions about call
invocation and redirect behavior.

Comment on lines +1699 to +1796
def test_models_documents_manager_time_filter_no_filters():
"""Test time_filter with no filters returns all documents."""
factories.DocumentFactory.create_batch(3)

queryset = models.Document.objects.filter_updated_at()

assert queryset.count() == 3


def test_models_documents_manager_time_filter_oldest_updated_at():
"""
Test filtering by oldest_updated_at includes documents updated after or at
lower_time_bound.
"""
lower_time_bound = datetime(2024, 2, 1, tzinfo=base_timezone.utc)

DocumentFactory(updated_at=lower_time_bound - timedelta(days=30))
document_at_boundary = DocumentFactory(updated_at=lower_time_bound)
document_recent = DocumentFactory(updated_at=lower_time_bound + timedelta(days=15))

queryset = models.Document.objects.filter_updated_at(
lower_time_bound=lower_time_bound
)

assert queryset.count() == 2
assert sorted(queryset.values_list("pk", flat=True)) == sorted(
[document_at_boundary.pk, document_recent.pk]
)


def test_models_documents_manager_time_filter_newest_updated_at():
"""Test filtering by newest_updated_at includes documents updated before timestamp."""
upper_time_bound = datetime(2024, 2, 1, tzinfo=base_timezone.utc)

document_old = DocumentFactory(updated_at=upper_time_bound - timedelta(days=30))
document_at_boundary = DocumentFactory(updated_at=upper_time_bound)
document_too_recent = DocumentFactory(
updated_at=upper_time_bound + timedelta(days=15)
)

queryset = models.Document.objects.filter_updated_at(
upper_time_bound=upper_time_bound
)

assert queryset.count() == 2
assert document_old in queryset
assert document_at_boundary in queryset
assert document_too_recent not in queryset


def test_models_documents_manager_time_filter_both_bounds():
"""Test filtering with both oldest and newest bounds."""
lower_time_bound = datetime(2024, 2, 1, tzinfo=base_timezone.utc)
upper_time_bound = lower_time_bound + timedelta(days=30)

document_too_early = DocumentFactory(
updated_at=lower_time_bound - timedelta(days=10)
)
document_in_window = DocumentFactory(
updated_at=lower_time_bound + timedelta(days=5)
)
other_document_in_window = DocumentFactory(
updated_at=lower_time_bound + timedelta(days=15)
)
document_too_late = DocumentFactory(
updated_at=upper_time_bound + timedelta(days=10)
)

queryset = models.Document.objects.filter_updated_at(
lower_time_bound=lower_time_bound, upper_time_bound=upper_time_bound
)

assert queryset.count() == 2
assert document_too_early not in queryset
assert document_in_window in queryset
assert other_document_in_window in queryset
assert document_too_late not in queryset


def test_utils_build_indexable_documents_queryset_ancestors_deleted_at():
"""Test filtering includes documents with ancestors_deleted_at in range."""
lower_time_bound = datetime(2024, 2, 1, tzinfo=base_timezone.utc)

document_ancestor_deleted_in_range = DocumentFactory(
updated_at=lower_time_bound - timedelta(days=30),
ancestors_deleted_at=lower_time_bound + timedelta(days=15),
)
document_updated_before_range = DocumentFactory(
updated_at=lower_time_bound - timedelta(days=30)
)

queryset = models.Document.objects.filter_updated_at(
lower_time_bound=lower_time_bound
)

assert queryset.count() == 1
assert document_ancestor_deleted_in_range in queryset
assert document_updated_before_range not in queryset
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

LGTM — comprehensive coverage of filter_updated_at semantics.

Good tests covering: no-filter passthrough, lower/upper/both bounds with boundary inclusion (gte/lte), and the ancestors_deleted_at OR-branch. Timezone-aware datetimes are used consistently, addressing earlier feedback.

One tiny nit (optional): since factories is already imported at line 23, the duplicate from core.factories import DocumentFactory at line 24 can be dropped and the new tests can use factories.DocumentFactory like the rest of the file for consistency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/core/tests/test_models_documents.py` around lines 1699 - 1796,
Remove the redundant import "from core.factories import DocumentFactory" and
update the tests to use the already-imported factories module (i.e.,
factories.DocumentFactory) so all factory usage is consistent; locate references
to DocumentFactory in the new test functions (e.g.,
test_models_documents_manager_time_filter_* and
test_utils_build_indexable_documents_queryset_ancestors_deleted_at) and replace
them with factories.DocumentFactory, then delete the duplicate import statement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants