-
Notifications
You must be signed in to change notification settings - Fork 582
Improve indexing command #2133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Improve indexing command #2133
Changes from all commits
a223e49
7181844
3427c68
71ff920
9d7f876
c75f9b7
2213d7a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,63 @@ | ||||||
| # Index Command | ||||||
|
|
||||||
| The `index` management command is used to index documents to the remote search indexer. | ||||||
|
|
||||||
| ## Usage | ||||||
|
|
||||||
| ### Make Command | ||||||
|
|
||||||
| ```bash | ||||||
| # Basic usage with defaults | ||||||
| make index | ||||||
|
|
||||||
| # With custom parameters | ||||||
| make index args="--batch-size 100 --lower-time-bound 2024-01-01T00:00:00 --upper-time-bound 2026-01-01T00:00:00" | ||||||
|
|
||||||
| ``` | ||||||
|
|
||||||
| ### Command line | ||||||
|
|
||||||
| ```bash | ||||||
| python manage.py index \ | ||||||
| --lower-time-bound "2024-01-01T00:00:00" \ | ||||||
| --upper-time-bound "2024-01-31T23:59:59" \ | ||||||
| --batch-size 200 \ | ||||||
| --async | ||||||
| ``` | ||||||
|
|
||||||
| ### Django Admin | ||||||
|
|
||||||
| The command is available in the Django admin interface: | ||||||
|
|
||||||
| 1. Go to `/admin/run-indexing/`, you arrive at the "Run Indexing Command" page | ||||||
| 2. Fill in the form with the desired parameters | ||||||
| 3. Click **"Run Indexing Command"** | ||||||
|
|
||||||
| ## Parameters | ||||||
|
|
||||||
| ### `--batch-size` | ||||||
| - **type:** Integer | ||||||
| - **default:** `settings.SEARCH_INDEXER_BATCH_SIZE` | ||||||
| - **description:** Number of documents to process per batch. Higher values may improve performance but use more memory. | ||||||
|
|
||||||
| ### `--lower-time-bound` | ||||||
| - **optional**: true | ||||||
| - **type:** ISO 8601 datetime string | ||||||
| - **default:** `None` | ||||||
| - **description:** Only documents updated after this date will be indexed. | ||||||
|
|
||||||
| ### `--upper-time-bound` | ||||||
| - **optional**: true | ||||||
| - **type:** ISO 8601 datetime string | ||||||
| - **default:** `None` | ||||||
| - **description:** Only documents updated before this date will be indexed. | ||||||
|
|
||||||
| ### `--async` | ||||||
| - **type:** Boolean flag | ||||||
| - **default:** `False` | ||||||
| +- **description:** When set, dispatches the indexing job to a Celery worker instead of running it synchronously. | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Stray The line literally starts with 📝 Proposed fix-+- **description:** When set, dispatches the indexing job to a Celery worker instead of running it synchronously.
+- **description:** When set, dispatches the indexing job to a Celery worker instead of running it synchronously.📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||
|
|
||||||
| ## Crash Safe Mode | ||||||
|
|
||||||
| The command saves the `updated_at` of the last document of each successful batch into the `bulk-indexer-checkpoint` cache variable. | ||||||
| If the process crashes, this value can be used as `lower-time-bound` to resume from the last successfully indexed document. | ||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,15 +1,54 @@ | ||
| """Admin classes and registrations for core app.""" | ||
|
|
||
| from django.contrib import admin, messages | ||
| from django.contrib.admin.views.decorators import staff_member_required | ||
| from django.contrib.auth import admin as auth_admin | ||
| from django.shortcuts import redirect | ||
| from django.core.management import call_command | ||
| from django.http import HttpRequest | ||
| from django.shortcuts import redirect, render | ||
| from django.utils.translation import gettext_lazy as _ | ||
|
|
||
| from treebeard.admin import TreeAdmin | ||
|
|
||
| from core import models | ||
| from core.forms import RunIndexingForm | ||
| from core.tasks.user_reconciliation import user_reconciliation_csv_import_job | ||
|
|
||
| # Customize the default admin site's get_app_list method | ||
| _original_get_app_list = admin.site.get_app_list | ||
|
|
||
|
|
||
| def custom_get_app_list(_self, request, app_label=None): | ||
| """Add custom commands to the app list.""" | ||
| app_list = _original_get_app_list(request, app_label) | ||
|
|
||
| # Add Commands app with Run Indexing command | ||
| commands_app = { | ||
| "name": _("Commands"), | ||
| "app_label": "commands", | ||
| "app_url": "#", | ||
| "has_module_perms": True, | ||
| "models": [ | ||
| { | ||
| "name": _("Run indexing"), | ||
| "object_name": "RunIndexing", | ||
| "admin_url": "/admin/run-indexing/", | ||
| "view_only": False, | ||
| "add_url": None, | ||
| "change_url": None, | ||
| } | ||
| ], | ||
| } | ||
|
Comment on lines
+26
to
+41
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
🔒 Suggested fix def custom_get_app_list(_self, request, app_label=None):
"""Add custom commands to the app list."""
app_list = _original_get_app_list(request, app_label)
+ if not request.user.is_staff:
+ return app_list
+
# Add Commands app with Run Indexing command
commands_app = { ... }🤖 Prompt for AI Agents |
||
|
|
||
| app_list.append(commands_app) | ||
| return app_list | ||
|
|
||
|
|
||
| # Monkey-patch the admin site | ||
| admin.site.get_app_list = lambda request, app_label=None: custom_get_app_list( | ||
| admin.site, request, app_label | ||
| ) | ||
|
Comment on lines
+17
to
+50
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion | 🟠 Major Monkey-patching Directly reassigning
Cleaner alternatives:
Also, ♻️ Suggested simplification-def custom_get_app_list(_self, request, app_label=None):
+def custom_get_app_list(request, app_label=None):
"""Add custom commands to the app list."""
app_list = _original_get_app_list(request, app_label)
...
app_list.append(commands_app)
return app_list
-# Monkey-patch the admin site
-admin.site.get_app_list = lambda request, app_label=None: custom_get_app_list(
- admin.site, request, app_label
-)
+admin.site.get_app_list = custom_get_app_list🤖 Prompt for AI Agents |
||
|
|
||
|
|
||
| @admin.register(models.User) | ||
| class UserAdmin(auth_admin.UserAdmin): | ||
|
|
@@ -227,3 +266,39 @@ class InvitationAdmin(admin.ModelAdmin): | |
| def save_model(self, request, obj, form, change): | ||
| obj.issuer = request.user | ||
| obj.save() | ||
|
|
||
|
|
||
| @staff_member_required | ||
| def run_indexing_view(request: HttpRequest): | ||
| """Custom admin view for running indexing commands.""" | ||
| if request.method == "POST": | ||
| form = RunIndexingForm(request.POST) | ||
| if form.is_valid(): | ||
| lower_time_bound = form.cleaned_data.get("lower_time_bound") | ||
| upper_time_bound = form.cleaned_data.get("upper_time_bound") | ||
| call_command( | ||
| "index", | ||
| batch_size=form.cleaned_data["batch_size"], | ||
| lower_time_bound=lower_time_bound.isoformat() | ||
| if lower_time_bound | ||
| else None, | ||
| upper_time_bound=upper_time_bound.isoformat() | ||
| if upper_time_bound | ||
| else None, | ||
| async_mode=True, | ||
| ) | ||
|
mascarpon3 marked this conversation as resolved.
|
||
| messages.success(request, _("Indexing triggered!")) | ||
| return redirect("run_indexing") | ||
| messages.error(request, _("Please correct the errors below.")) | ||
| else: | ||
| form = RunIndexingForm() | ||
|
mascarpon3 marked this conversation as resolved.
mascarpon3 marked this conversation as resolved.
|
||
|
|
||
| return render( | ||
| request=request, | ||
| template_name="runindexing.html", | ||
| context={ | ||
| **admin.site.each_context(request), | ||
| "title": "Run Indexing Command", | ||
| "form": form, | ||
| }, | ||
| ) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| """Forms for the core app.""" | ||
|
|
||
| from django import forms | ||
| from django.conf import settings | ||
| from django.utils.translation import gettext_lazy as _ | ||
|
|
||
|
|
||
| class RunIndexingForm(forms.Form): | ||
| """ | ||
| Form for running the indexing process. | ||
| """ | ||
|
|
||
| batch_size = forms.IntegerField( | ||
| min_value=1, | ||
| initial=settings.SEARCH_INDEXER_BATCH_SIZE, | ||
| ) | ||
| lower_time_bound = forms.DateTimeField( | ||
| required=False, widget=forms.TextInput(attrs={"type": "datetime-local"}) | ||
| ) | ||
| upper_time_bound = forms.DateTimeField( | ||
| required=False, widget=forms.TextInput(attrs={"type": "datetime-local"}) | ||
| ) | ||
|
Comment on lines
+17
to
+22
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Look at Django's default DATETIME_INPUT_FORMATS and project overrides
rg -nP 'DATETIME_INPUT_FORMATS|USE_TZ\s*=' --type=py -C1
# Check how the index command's datetime.fromisoformat() handles the produced string
rg -nP 'fromisoformat|isoformat\(' src/backend/core --type=py -C2Repository: suitenumerique/docs Length of output: 50376 🏁 Script executed: # Search for fromisoformat usage more specifically
rg -n 'fromisoformat' src/backend --type=py -B2 -A2Repository: suitenumerique/docs Length of output: 1019 🏁 Script executed: # Look for admin.py and how it uses the form
fd -type f -name 'admin.py' src/backendRepository: suitenumerique/docs Length of output: 234 🏁 Script executed: # Search for DATETIME_INPUT_FORMATS customization
rg -n 'DATETIME_INPUT_FORMATS' src/backend --type=py -A3Repository: suitenumerique/docs Length of output: 45 🏁 Script executed: # Find the index command that uses these bounds
rg -n 'lower_time_bound|upper_time_bound' src/backend --type=py -B2 -A2Repository: suitenumerique/docs Length of output: 22766 🏁 Script executed: # Find Django version requirement
find . -name 'requirements*.txt' -o -name 'pyproject.toml' -o -name 'setup.py' | xargs grep -l 'django' 2>/dev/null | head -5Repository: suitenumerique/docs Length of output: 92 🏁 Script executed: # Check for tests of the admin form itself with datetime submission
rg -n 'RunIndexingForm|test.*admin.*indexing' src/backend/core/tests --type=py -lRepository: suitenumerique/docs Length of output: 45 🏁 Script executed: # Look at actual admin.py to see the context around form usage
cat -n src/backend/core/admin.py | sed -n '270,295p'Repository: suitenumerique/docs Length of output: 1256 🏁 Script executed: # Check Django version in pyproject.toml
cat src/backend/pyproject.toml | grep -A 20 -B 5 'django'Repository: suitenumerique/docs Length of output: 3191 🏁 Script executed: # Look for any tests of the RunIndexingForm with actual datetime-local input
rg -n 'RunIndexingForm|run_indexing_view' src/backend/core/tests --type=py -B 3 -A 5Repository: suitenumerique/docs Length of output: 1649 🏁 Script executed: # Check if there's a settings file that customizes DATETIME_INPUT_FORMATS
find src/backend -name 'settings.py' -o -name 'settings_*' | xargs cat 2>/dev/null | rg -A 5 'DATETIME_INPUT_FORMATS' || echo "No DATETIME_INPUT_FORMATS found"Repository: suitenumerique/docs Length of output: 95 🏁 Script executed: # Get the full test file to see how the form is tested
cat -n src/backend/core/tests/test_admin_run_indexing.pyRepository: suitenumerique/docs Length of output: 1928 🏁 Script executed: # Check Django's default DATETIME_INPUT_FORMATS to understand what it accepts
python3 << 'EOF'
from django.conf import settings
import django
# Try to get Django's default input formats
print("Python code to understand Django's DATETIME_INPUT_FORMATS")
# In Django 3.2+, the defaults include ISO 8601 formats
# We need to know which version of Django is being used
# Version constraint is: django<6.0.0 (from pyproject.toml)
# So it could be Django 3.2, 4.0, 4.1, 4.2, or 5.x
# Let's check what formats modern Django supports
try:
import django
print(f"Django version in environment: {django.get_version() if hasattr(django, 'get_version') else 'unknown'}")
except:
pass
# The key issue: does DateTimeField accept "2024-02-01T12:00" format?
# Django's default DATETIME_INPUT_FORMATS (since 3.2) are:
# '%Y-%m-%d %H:%M:%S',
# '%Y-%m-%d %H:%M:%S.%f',
# '%Y-%m-%d %H:%M',
# Note: NO T-separated format by default
print("Django defaults do NOT include T-separated ISO format by default")
print("HTML5 datetime-local submits: 2024-02-01T12:00")
print("Django defaults expect: 2024-02-01 12:00")
print("\nThis is a parsing mismatch!")
EOFRepository: suitenumerique/docs Length of output: 182 🏁 Script executed: # Look at the complete forms.py file to see all form definitions
cat -n src/backend/core/forms.pyRepository: suitenumerique/docs Length of output: 1684 🏁 Script executed: # Search for any test that actually submits datetime values to the form
rg -n 'lower_time_bound.*=' src/backend/core/tests --type=py -B 2 -A 2 | head -50Repository: suitenumerique/docs Length of output: 4398 🏁 Script executed: # Check if there's a custom DateTimeField or widget in the codebase
rg -n 'class.*DateTimeField|class.*DateTimeInput' src/backend --type=pyRepository: suitenumerique/docs Length of output: 45 Add explicit The HTML5 Also ensure consistent timezone handling: form fields produce naive datetimes (no tzinfo); these are then passed to Suggested fixlower_time_bound = forms.DateTimeField(
required=False,
widget=forms.TextInput(attrs={"type": "datetime-local"}),
input_formats=["%Y-%m-%dT%H:%M", "%Y-%m-%dT%H:%M:%S"]
)
upper_time_bound = forms.DateTimeField(
required=False,
widget=forms.TextInput(attrs={"type": "datetime-local"}),
input_formats=["%Y-%m-%dT%H:%M", "%Y-%m-%dT%H:%M:%S"]
)🤖 Prompt for AI Agents |
||
|
|
||
| def clean(self): | ||
| """Override clean to validate time bounds.""" | ||
| cleaned_data = super().clean() | ||
| self.check_time_bounds() | ||
| return cleaned_data | ||
|
|
||
| def check_time_bounds(self): | ||
| """Validate that lower_time_bound is before upper_time_bound.""" | ||
| lower_time_bound = self.cleaned_data.get("lower_time_bound") | ||
| upper_time_bound = self.cleaned_data.get("upper_time_bound") | ||
| if ( | ||
| lower_time_bound | ||
| and upper_time_bound | ||
| and lower_time_bound > upper_time_bound | ||
| ): | ||
| self.add_error( | ||
| "upper_time_bound", | ||
| _("Upper time bound must be after lower time bound."), | ||
| ) | ||
|
mascarpon3 marked this conversation as resolved.
|
||
Uh oh!
There was an error while loading. Please reload this page.