Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ and this project adheres to

### Changed

- ✨(backend) improve indexing command
- checkpoint recovery
- asynchronicity
- admin command trigger
- 💄(frontend) improve comments highlights #1961

## [v4.8.3] - 2026-03-23
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,7 @@ demo: ## flush db then create a demo for load testing purpose
.PHONY: demo

index: ## index all documents to remote search
@$(MANAGE) index
@$(MANAGE) index $(args)
.PHONY: index

# Nota bene: Black should come after isort just in case they don't agree...
Expand Down
63 changes: 63 additions & 0 deletions docs/commands/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Index Command

The `index` management command is used to index documents to the remote search indexer.

## Usage

### Make Command

```bash
# Basic usage with defaults
make index

# With custom parameters
make index args="--batch-size 100 --lower-time-bound 2024-01-01T00:00:00 --upper-time-bound 2026-01-01T00:00:00"

```

### Command line

```bash
python manage.py index \
--lower-time-bound "2024-01-01T00:00:00" \
--upper-time-bound "2024-01-31T23:59:59" \
--batch-size 200 \
--async
```
Comment thread
mascarpon3 marked this conversation as resolved.

### Django Admin

The command is available in the Django admin interface:

1. Go to `/admin/run-indexing/`, you arrive at the "Run Indexing Command" page
2. Fill in the form with the desired parameters
3. Click **"Run Indexing Command"**

## Parameters

### `--batch-size`
- **type:** Integer
- **default:** `settings.SEARCH_INDEXER_BATCH_SIZE`
- **description:** Number of documents to process per batch. Higher values may improve performance but use more memory.

### `--lower-time-bound`
- **optional**: true
- **type:** ISO 8601 datetime string
- **default:** `None`
- **description:** Only documents updated after this date will be indexed.

### `--upper-time-bound`
- **optional**: true
- **type:** ISO 8601 datetime string
- **default:** `None`
- **description:** Only documents updated before this date will be indexed.

### `--async`
- **type:** Boolean flag
- **default:** `False`
+- **description:** When set, dispatches the indexing job to a Celery worker instead of running it synchronously.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Stray + diff marker leftover in rendered documentation.

The line literally starts with +- **description:** — a leftover from a previously applied suggested diff that was pasted verbatim. Markdown will render this with a leading +.

📝 Proposed fix
-+- **description:** When set, dispatches the indexing job to a Celery worker instead of running it synchronously.
+- **description:** When set, dispatches the indexing job to a Celery worker instead of running it synchronously.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
+- **description:** When set, dispatches the indexing job to a Celery worker instead of running it synchronously.
- **description:** When set, dispatches the indexing job to a Celery worker instead of running it synchronously.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/commands/index.md` at line 58, The documentation line contains a
leftover diff marker ("+- **description:** When set, dispatches the indexing job
to a Celery worker..."); locate that exact string in docs/commands/index.md and
remove the leading '+' so the line becomes a normal Markdown bullet (e.g., "-
**description:** When set, ...") or simply "**description:** ..." depending on
surrounding list style, ensuring no stray diff markers remain.


## Crash Safe Mode

The command saves the `updated_at` of the last document of each successful batch into the `bulk-indexer-checkpoint` cache variable.
If the process crashes, this value can be used as `lower-time-bound` to resume from the last successfully indexed document.
77 changes: 76 additions & 1 deletion src/backend/core/admin.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,54 @@
"""Admin classes and registrations for core app."""

from django.contrib import admin, messages
from django.contrib.admin.views.decorators import staff_member_required
from django.contrib.auth import admin as auth_admin
from django.shortcuts import redirect
from django.core.management import call_command
from django.http import HttpRequest
from django.shortcuts import redirect, render
from django.utils.translation import gettext_lazy as _

from treebeard.admin import TreeAdmin

from core import models
from core.forms import RunIndexingForm
from core.tasks.user_reconciliation import user_reconciliation_csv_import_job

# Customize the default admin site's get_app_list method
_original_get_app_list = admin.site.get_app_list


def custom_get_app_list(_self, request, app_label=None):
"""Add custom commands to the app list."""
app_list = _original_get_app_list(request, app_label)

# Add Commands app with Run Indexing command
commands_app = {
"name": _("Commands"),
"app_label": "commands",
"app_url": "#",
"has_module_perms": True,
"models": [
{
"name": _("Run indexing"),
"object_name": "RunIndexing",
"admin_url": "/admin/run-indexing/",
"view_only": False,
"add_url": None,
"change_url": None,
}
],
}
Comment on lines +26 to +41
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

"Commands" entry is always shown, regardless of staff status.

custom_get_app_list unconditionally appends the commands_app with "has_module_perms": True. Any user reaching the admin index page will see the "Run indexing" link, even if they are not staff (the view itself is now correctly staff-gated, so clicking it will redirect to login — but the entry is still visible). Consider gating visibility on request.user.is_staff (or a more specific permission) so non-staff users don't see a link they can't use.

🔒 Suggested fix
 def custom_get_app_list(_self, request, app_label=None):
     """Add custom commands to the app list."""
     app_list = _original_get_app_list(request, app_label)

+    if not request.user.is_staff:
+        return app_list
+
     # Add Commands app with Run Indexing command
     commands_app = { ... }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/core/admin.py` around lines 26 - 41, The Commands entry is always
visible because custom_get_app_list unconditionally appends the commands_app
with has_module_perms=True; update custom_get_app_list to only include or set
has_module_perms for the commands_app when the current request.user has staff
privileges (e.g., request.user.is_staff) or the specific permission you want to
gate; locate the commands_app dict and the custom_get_app_list function and
either skip appending commands_app for non-staff users or set its
"has_module_perms" to False when request.user.is_staff is False so the
"RunIndexing" link is hidden from non-staff.


app_list.append(commands_app)
return app_list


# Monkey-patch the admin site
admin.site.get_app_list = lambda request, app_label=None: custom_get_app_list(
admin.site, request, app_label
)
Comment on lines +17 to +50
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Monkey-patching admin.site.get_app_list is fragile; prefer a AdminSite subclass or clean override.

Directly reassigning admin.site.get_app_list to a lambda:

  • Is not composable: if another module (now or in the future) does the same thing, only the last patch wins, and the order becomes import-order-dependent.
  • Makes the override hard to discover and to test in isolation.
  • Captures _original_get_app_list at import time; any later rebinding by another app is invisible to this wrapper.

Cleaner alternatives:

  1. Subclass AdminSite, override get_app_list, and set default_site via AppConfig.default_site.
  2. Or add a proper ModelAdmin/entry for RunIndexing (e.g. a tiny proxy model or a dedicated admin view registered via admin.site.get_urls) so the "Commands" entry comes through the normal app-list machinery.

Also, custom_get_app_list(_self, request, app_label=None) never uses _self; you can drop it and the lambda’s admin.site argument:

♻️ Suggested simplification
-def custom_get_app_list(_self, request, app_label=None):
+def custom_get_app_list(request, app_label=None):
     """Add custom commands to the app list."""
     app_list = _original_get_app_list(request, app_label)
     ...
     app_list.append(commands_app)
     return app_list


-# Monkey-patch the admin site
-admin.site.get_app_list = lambda request, app_label=None: custom_get_app_list(
-    admin.site, request, app_label
-)
+admin.site.get_app_list = custom_get_app_list
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/core/admin.py` around lines 17 - 50, Replace the fragile
monkey-patch of admin.site.get_app_list by creating a proper AdminSite subclass
that overrides get_app_list (move custom_get_app_list logic into that override)
and register it as the default site via AppConfig.default_site (or instantiate
and use your subclass when configuring the app), or alternatively implement a
normal admin entry for the RunIndexing command by registering a ModelAdmin/proxy
model or adding a custom admin view via admin.site.get_urls so the "Commands"
app appears through Django's standard app-list machinery; remove the lambda
rebind of admin.site.get_app_list and the unused _self parameter in
custom_get_app_list once migrated.



@admin.register(models.User)
class UserAdmin(auth_admin.UserAdmin):
Expand Down Expand Up @@ -227,3 +266,39 @@ class InvitationAdmin(admin.ModelAdmin):
def save_model(self, request, obj, form, change):
obj.issuer = request.user
obj.save()


@staff_member_required
def run_indexing_view(request: HttpRequest):
"""Custom admin view for running indexing commands."""
if request.method == "POST":
form = RunIndexingForm(request.POST)
if form.is_valid():
lower_time_bound = form.cleaned_data.get("lower_time_bound")
upper_time_bound = form.cleaned_data.get("upper_time_bound")
call_command(
"index",
batch_size=form.cleaned_data["batch_size"],
lower_time_bound=lower_time_bound.isoformat()
if lower_time_bound
else None,
upper_time_bound=upper_time_bound.isoformat()
if upper_time_bound
else None,
async_mode=True,
)
Comment thread
mascarpon3 marked this conversation as resolved.
messages.success(request, _("Indexing triggered!"))
return redirect("run_indexing")
messages.error(request, _("Please correct the errors below."))
else:
form = RunIndexingForm()
Comment thread
mascarpon3 marked this conversation as resolved.
Comment thread
mascarpon3 marked this conversation as resolved.

return render(
request=request,
template_name="runindexing.html",
context={
**admin.site.each_context(request),
"title": "Run Indexing Command",
"form": form,
},
)
15 changes: 15 additions & 0 deletions src/backend/core/factories.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
from django.contrib.auth.hashers import make_password

import factory.fuzzy
from factory import post_generation
from faker import Faker

from core import models
Expand Down Expand Up @@ -159,6 +160,20 @@ def masked_by(self, create, extracted, **kwargs):
document=self, user=item, defaults={"is_masked": True}
)

@post_generation
def updated_at(self, create, extracted, **kwargs):
"""
the BaseModel.updated_at has auto_now=True.
This prevents setting a specific updated_at value with the factory.

This post_generation method bypasses this behavior.
"""
if not create or not extracted:
return

self.__class__.objects.filter(pk=self.pk).update(updated_at=extracted)
self.refresh_from_db()


class UserDocumentAccessFactory(factory.django.DjangoModelFactory):
"""Create fake document user accesses for testing."""
Expand Down
42 changes: 42 additions & 0 deletions src/backend/core/forms.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
"""Forms for the core app."""

from django import forms
from django.conf import settings
from django.utils.translation import gettext_lazy as _


class RunIndexingForm(forms.Form):
"""
Form for running the indexing process.
"""

batch_size = forms.IntegerField(
min_value=1,
initial=settings.SEARCH_INDEXER_BATCH_SIZE,
)
lower_time_bound = forms.DateTimeField(
required=False, widget=forms.TextInput(attrs={"type": "datetime-local"})
)
upper_time_bound = forms.DateTimeField(
required=False, widget=forms.TextInput(attrs={"type": "datetime-local"})
)
Comment on lines +17 to +22
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Look at Django's default DATETIME_INPUT_FORMATS and project overrides
rg -nP 'DATETIME_INPUT_FORMATS|USE_TZ\s*=' --type=py -C1
# Check how the index command's datetime.fromisoformat() handles the produced string
rg -nP 'fromisoformat|isoformat\(' src/backend/core --type=py -C2

Repository: suitenumerique/docs

Length of output: 50376


🏁 Script executed:

# Search for fromisoformat usage more specifically
rg -n 'fromisoformat' src/backend --type=py -B2 -A2

Repository: suitenumerique/docs

Length of output: 1019


🏁 Script executed:

# Look for admin.py and how it uses the form
fd -type f -name 'admin.py' src/backend

Repository: suitenumerique/docs

Length of output: 234


🏁 Script executed:

# Search for DATETIME_INPUT_FORMATS customization
rg -n 'DATETIME_INPUT_FORMATS' src/backend --type=py -A3

Repository: suitenumerique/docs

Length of output: 45


🏁 Script executed:

# Find the index command that uses these bounds
rg -n 'lower_time_bound|upper_time_bound' src/backend --type=py -B2 -A2

Repository: suitenumerique/docs

Length of output: 22766


🏁 Script executed:

# Find Django version requirement
find . -name 'requirements*.txt' -o -name 'pyproject.toml' -o -name 'setup.py' | xargs grep -l 'django' 2>/dev/null | head -5

Repository: suitenumerique/docs

Length of output: 92


🏁 Script executed:

# Check for tests of the admin form itself with datetime submission
rg -n 'RunIndexingForm|test.*admin.*indexing' src/backend/core/tests --type=py -l

Repository: suitenumerique/docs

Length of output: 45


🏁 Script executed:

# Look at actual admin.py to see the context around form usage
cat -n src/backend/core/admin.py | sed -n '270,295p'

Repository: suitenumerique/docs

Length of output: 1256


🏁 Script executed:

# Check Django version in pyproject.toml
cat src/backend/pyproject.toml | grep -A 20 -B 5 'django'

Repository: suitenumerique/docs

Length of output: 3191


🏁 Script executed:

# Look for any tests of the RunIndexingForm with actual datetime-local input
rg -n 'RunIndexingForm|run_indexing_view' src/backend/core/tests --type=py -B 3 -A 5

Repository: suitenumerique/docs

Length of output: 1649


🏁 Script executed:

# Check if there's a settings file that customizes DATETIME_INPUT_FORMATS
find src/backend -name 'settings.py' -o -name 'settings_*' | xargs cat 2>/dev/null | rg -A 5 'DATETIME_INPUT_FORMATS' || echo "No DATETIME_INPUT_FORMATS found"

Repository: suitenumerique/docs

Length of output: 95


🏁 Script executed:

# Get the full test file to see how the form is tested
cat -n src/backend/core/tests/test_admin_run_indexing.py

Repository: suitenumerique/docs

Length of output: 1928


🏁 Script executed:

# Check Django's default DATETIME_INPUT_FORMATS to understand what it accepts
python3 << 'EOF'
from django.conf import settings
import django
# Try to get Django's default input formats
print("Python code to understand Django's DATETIME_INPUT_FORMATS")
# In Django 3.2+, the defaults include ISO 8601 formats
# We need to know which version of Django is being used
# Version constraint is: django<6.0.0 (from pyproject.toml)
# So it could be Django 3.2, 4.0, 4.1, 4.2, or 5.x

# Let's check what formats modern Django supports
try:
    import django
    print(f"Django version in environment: {django.get_version() if hasattr(django, 'get_version') else 'unknown'}")
except:
    pass

# The key issue: does DateTimeField accept "2024-02-01T12:00" format?
# Django's default DATETIME_INPUT_FORMATS (since 3.2) are:
# '%Y-%m-%d %H:%M:%S',
# '%Y-%m-%d %H:%M:%S.%f',
# '%Y-%m-%d %H:%M',
# Note: NO T-separated format by default

print("Django defaults do NOT include T-separated ISO format by default")
print("HTML5 datetime-local submits: 2024-02-01T12:00")
print("Django defaults expect: 2024-02-01 12:00")
print("\nThis is a parsing mismatch!")
EOF

Repository: suitenumerique/docs

Length of output: 182


🏁 Script executed:

# Look at the complete forms.py file to see all form definitions
cat -n src/backend/core/forms.py

Repository: suitenumerique/docs

Length of output: 1684


🏁 Script executed:

# Search for any test that actually submits datetime values to the form
rg -n 'lower_time_bound.*=' src/backend/core/tests --type=py -B 2 -A 2 | head -50

Repository: suitenumerique/docs

Length of output: 4398


🏁 Script executed:

# Check if there's a custom DateTimeField or widget in the codebase
rg -n 'class.*DateTimeField|class.*DateTimeInput' src/backend --type=py

Repository: suitenumerique/docs

Length of output: 45


Add explicit input_formats to DateTimeField to accept HTML5 datetime-local format.

The HTML5 datetime-local widget submits values as YYYY-MM-DDTHH:MM[SS] (T-separated, no timezone), but Django's default DATETIME_INPUT_FORMATS only include space-separated formats (%Y-%m-%d %H:%M[:%S]). Form validation will fail with "Enter a valid date/time" when users submit via the HTML5 widget.

Also ensure consistent timezone handling: form fields produce naive datetimes (no tzinfo); these are then passed to filter_updated_at() which compares against tz-aware updated_at values. Either use USE_TZ=False or explicitly handle timezone awareness in the form and downstream queries.

Suggested fix
lower_time_bound = forms.DateTimeField(
    required=False,
    widget=forms.TextInput(attrs={"type": "datetime-local"}),
    input_formats=["%Y-%m-%dT%H:%M", "%Y-%m-%dT%H:%M:%S"]
)
upper_time_bound = forms.DateTimeField(
    required=False,
    widget=forms.TextInput(attrs={"type": "datetime-local"}),
    input_formats=["%Y-%m-%dT%H:%M", "%Y-%m-%dT%H:%M:%S"]
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/core/forms.py` around lines 17 - 22, The DateTimeField widgets
lower_time_bound and upper_time_bound need explicit input_formats to accept
HTML5 "datetime-local" values (e.g. "%Y-%m-%dT%H:%M" and "%Y-%m-%dT%H:%M:%S")
and you must normalize timezone awareness before passing results to
filter_updated_at; update those fields to include
input_formats=["%Y-%m-%dT%H:%M", "%Y-%m-%dT%H:%M:%S"] and then, depending on
your project USE_TZ setting, either convert the parsed naive datetimes to aware
datetimes using
django.utils.timezone.make_aware(timezone.get_current_timezone()) before calling
filter_updated_at or adjust filter_updated_at to accept naive values by
localizing them consistently.


def clean(self):
"""Override clean to validate time bounds."""
cleaned_data = super().clean()
self.check_time_bounds()
return cleaned_data

def check_time_bounds(self):
"""Validate that lower_time_bound is before upper_time_bound."""
lower_time_bound = self.cleaned_data.get("lower_time_bound")
upper_time_bound = self.cleaned_data.get("upper_time_bound")
if (
lower_time_bound
and upper_time_bound
and lower_time_bound > upper_time_bound
):
self.add_error(
"upper_time_bound",
_("Upper time bound must be after lower time bound."),
)
Comment thread
mascarpon3 marked this conversation as resolved.
77 changes: 62 additions & 15 deletions src/backend/core/management/commands/index.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,16 @@

import logging
import time
from datetime import datetime

from django.conf import settings
from django.core.management.base import BaseCommand, CommandError

from core import models
from core.services.search_indexers import get_document_indexer
from core.tasks.search import batch_document_indexer_task

logger = logging.getLogger("docs.search.bootstrap_search")
logger = logging.getLogger(__name__)


class Command(BaseCommand):
Expand All @@ -24,9 +28,32 @@ def add_arguments(self, parser):
action="store",
dest="batch_size",
type=int,
default=50,
default=settings.SEARCH_INDEXER_BATCH_SIZE,
help="Indexation query batch size",
)
parser.add_argument(
"--lower-time-bound",
action="store",
dest="lower_time_bound",
type=datetime.fromisoformat,
default=None,
help="DateTime in ISO format. Only documents updated after this date will be indexed",
)
parser.add_argument(
"--upper-time-bound",
action="store",
dest="upper_time_bound",
type=datetime.fromisoformat,
default=None,
help="DateTime in ISO format. Only documents updated before this date will be indexed",
)
parser.add_argument(
"--async",
action="store_true",
dest="async_mode",
default=False,
help="Whether to execute indexing asynchronously in a Celery task (default: False)",
)

def handle(self, *args, **options):
"""Launch and log search index generation."""
Expand All @@ -35,18 +62,38 @@ def handle(self, *args, **options):
if not indexer:
raise CommandError("The indexer is not enabled or properly configured.")

logger.info("Starting to regenerate Find index...")
start = time.perf_counter()
batch_size = options["batch_size"]
if options["async_mode"]:
try:
batch_document_indexer_task.apply_async(
kwargs={
"lower_time_bound": options["lower_time_bound"],
"upper_time_bound": options["upper_time_bound"],
"batch_size": options["batch_size"],
"crash_safe_mode": True,
},
)
except Exception as err:
raise CommandError("Unable to dispatch indexing task") from err
logger.info("Document indexing task sent to worker")
else:
logger.info("Starting to regenerate Find index...")
start = time.perf_counter()

try:
count = indexer.index(batch_size=batch_size)
except Exception as err:
raise CommandError("Unable to regenerate index") from err
try:
count = indexer.index(
queryset=models.Document.objects.filter_updated_at(
lower_time_bound=options["lower_time_bound"],
upper_time_bound=options["upper_time_bound"],
),
batch_size=options["batch_size"],
crash_safe_mode=True,
)
except Exception as err:
raise CommandError("Unable to regenerate index") from err

duration = time.perf_counter() - start
logger.info(
"Search index regenerated from %d document(s) in %.2f seconds.",
count,
duration,
)
duration = time.perf_counter() - start
logger.info(
"Search index regenerated from %d document(s) in %.2f seconds.",
count,
duration,
)
Loading
Loading