Skip to content

41526 Storage [Backend] Create a file service#53

Open
EBirkenfeld wants to merge 141 commits into
masterfrom
41526__create_file_service
Open

41526 Storage [Backend] Create a file service#53
EBirkenfeld wants to merge 141 commits into
masterfrom
41526__create_file_service

Conversation

@EBirkenfeld

@EBirkenfeld EBirkenfeld commented Oct 24, 2025

Copy link
Copy Markdown
Collaborator

1. Description (Problem)

Pneumatic's file storage system previously relied on Google Cloud Storage (GCS) through a legacy Django storage module. All file operations (upload/download) passed through the monolithic Django backend, creating performance bottlenecks, vendor lock-in, and no flexibility for self-hosted deployments.

Issues addressed:

  • All file operations routed through monolithic Django backend
  • Hard dependency on GCS with no local storage option
  • Legacy Attachment model (guardian-based permissions) without support for public/guest token access
  • Frontend uploaded files via backend proxy instead of directly to storage
  • No rate limiting, security headers, or granular file access control

2. Context

  • Business: Transition to self-hosted model requires local storage support (SeaweedFS) alongside cloud (GCS). File service must operate independently from Django backend.
  • Technical: Architectural decision to extract file operations into a dedicated FastAPI microservice (storage/) with its own DB, authentication via shared Redis (DRF-compatible tokens), and S3-compatible storage (SeaweedFS for local, GCS S3 API for cloud).
  • Related: Affects FileAttachment model, Attachment model, workflow/task serializers, frontend upload components, nginx configuration, docker-compose.

3. Solution

  1. New storage/ microservice (FastAPI + SQLAlchemy + Alembic) — file upload/download via S3 API
  2. Dual storage backend — SeaweedFS (local) and GCS S3 (cloud), selected via STORAGE_TYPE
  3. Authentication — DRF token verification via shared Redis, support for User/Public/Guest token types
  4. Access control — owner-check + fallback to Django backend /attachments/check-permission
  5. Frontend migration — direct upload to file-service instead of Django proxy, new parseMarkdownFiles utility for Workflow Log attachments
  6. Infrastructure — nginx proxy pass, docker-compose configuration, class-based settings hierarchy
  7. Data migration — management commands for migrating existing FileAttachment → new Attachment model with scoped uniqueness

4. Implementation Details

4.1 New storage/ Microservice (108 files, ~15K lines)

Layer Modules Description
Presentation api/files.py POST /upload, GET /{file_id} with Range support
Application use_cases/ UploadFileUseCase, DownloadFileUseCase
Domain entities/file_record.py FileRecord entity
Infrastructure adapters/storage_service.py S3-compatible adapter (aioboto3)
Shared Kernel auth/, middleware/, permissions.py, config.py Auth, rate limit, security headers, DI container

Key decisions:

  • Class-based settings (BaseAppSettingsTesting/Development/Production)
  • SecurityHeadersMiddleware (CSP, X-Frame-Options, HSTS)
  • RateLimitMiddleware (per-endpoint, Redis-backed)
  • Sanitization: secure_filename(), sanitize_content_type()
  • Pickle deserialization RCE protection in Redis client

4.2 Django Backend (172 files, ~19K lines changed)

  • New fields on FileAttachmentfile_id, access_type (migration 0254)
  • TaskFieldService — migrated to scoped uniqueness (composite: account_id + file_id)
  • AttachmentService — updated refresh/clone attachment logic
  • Permission endpoint /attachments/check-permission — for file-service
  • Management commands: fill_file_attachment_file_id, migrate_file_attachment_to_attachment, replace_storage_links_with_file_service, sync_files_to_file_service
  • Removed: legacy GCS integration, FileSyncViewSet, comment attachment logic, STORAGE feature flag

4.3 Frontend (79 files, ~3.2K lines)

  • uploadFiles.ts — switched to direct file-service upload
  • fileServiceUpload.ts — new API client for file-service
  • parseMarkdownFiles.ts — utility for extracting files from Markdown (Workflow Log)
  • getErrorMessage.ts — improved API error handling
  • Attachment components — updated for new URL patterns
  • getConfig.ts — updated URL mapping configuration

4.4 Infrastructure (12 files)

  • docker-compose: added pneumatic_file_service, seaweedfs-* services
  • nginx: proxy_to_file_service.conf, updated location blocks
  • Railway: separate nginx + Dockerfile for Railway deployment
  • start.sh — Alembic migration integration on startup

5. What to Test

5.1 Preconditions

  • Dev environment with services running: docker-compose up
  • Two users in the same account (for access control tests)
  • Access to DevTools → Network for API verification
  • Files of different types: image (PNG/JPG), document (PDF), video (MP4)

5.2 Positive Scenarios

  1. Upload file to task:

    • Open workflow → task → attach file via RichEditor
    • Expected: file uploads, preview displays (for images)
    • Verify: file URL contains /files/ path
  2. Download file:

    • Click on attached file
    • Expected: file downloads with correct name and type
  3. Kickoff with file field:

    • Create template with file field → run workflow → upload file
    • Expected: file saved and displayed in outputs
  4. Public form:

    • Open public form with file field → upload file
    • Expected: upload works without authentication (public token)
  5. Workflow Log — Attachments tab:

    • Open workflow with attachments → "Attachments" tab
    • Expected: files displayed, available for download
  6. Image preview:

    • Upload PNG/JPG → check inline preview in task
    • Expected: image renders correctly

5.3 Negative Scenarios and Edge Cases

  1. File too large (>100MB):

    • Upload >100MB file → Expected: error message about size limit
  2. Unauthenticated access:

    • Open file URL without token → Expected: redirect to login (browser) or 401
  3. Access to another user's file:

    • User A uploads file → User B (different account) tries to open URL
    • Expected: 403 Forbidden
  4. File with non-ASCII name:

    • Upload file with Unicode name → download → verify filename
    • Expected: filename preserved (UTF-8 encoding in Content-Disposition)

5.4 Verification Points

  • UI: files display, previews work, downloads correct
  • API: POST /files/upload returns { public_url, file_id }; GET /files/{file_id} streams file
  • Network (DevTools): requests go to /files/ endpoint, not old Django endpoint
  • Docker: pneumatic_file_service container is running and responsive

5.5 API Checks

  • Upload: POST /files/upload — multipart/form-data, response { "public_url": "...", "file_id": "..." }
  • Download: GET /files/{file_id} — headers Content-Disposition, Accept-Ranges, X-Content-Type-Options: nosniff
  • Range requests: GET /files/{file_id} with Range: bytes=0-1023 → 206 Partial Content
  • Permission check: file-service calls POST /attachments/check-permission on Django backend

5.6 What Was NOT Tested

  • Production environment (tested in dev only)
  • Load testing
  • Mobile devices
  • Data migration (management commands) — requires separate test plan
  • GCS storage type (only SeaweedFS/local verified)

6. Testing Affected Areas (Dependencies)

Area What to Check
RichEditor (all forms with attachments) File upload/display/delete
Kickoff outputs (FileOutput) File display in task outputs
Workflow Log Attachments tab, Markdown parsing
Comments Attachments in comments (old logic removed)
Public forms File upload via public form
User avatar / Account logo Profile photo and account logo upload
Clone workflow Kickoff attachment copying on clone

7. Refactoring

  • Removed legacy GCS module (backend/src/storage/) — replaced with new file-service
  • Removed STORAGE feature flag and related env variables
  • Removed comment attachment logic from Django (moved to Markdown format)
  • TaskFieldService — migrated from flat uniqueness to scoped (composite index)
  • Config — from flat CONFIG string checks to class-based settings hierarchy

Additional testing: all areas from section 6 — previous behavior must be preserved.

8. Release Notes

Added a new File Service microservice based on FastAPI, replacing the direct GCS integration. Supports local storage (SeaweedFS) and cloud storage (GCS S3 API). Includes authentication, rate limiting, security headers, access control, and public form support.

Note

[!NOTE]

Add a dedicated file storage microservice to replace Google Cloud Storage

  • Introduces a new FastAPI storage microservice (storage/src/main.py) with upload/download endpoints, SeaweedFS/GCS S3 backends, JWT-based auth middleware, rate limiting, and security headers.
  • Replaces legacy FileAttachment-based attachment handling with a new Django Attachment model (backend/src/storage/models.py) using django-guardian object permissions for fine-grained access control (PUBLIC / ACCOUNT / RESTRICTED).
  • File values in task fields, comments, templates, and workflows are now represented as markdown links ([name](url)) rather than integer attachment IDs; TaskFieldService, CommentService, and related serializers are updated accordingly.
  • Adds Django management commands (run_file_migration, fill_file_attachment_file_id, migrate_file_attachment_to_attachment, sync_files_to_file_service, replace_storage_links_with_file_service) to migrate existing GCS-hosted FileAttachment records to the new service.
  • Nginx configs and both docker-compose files are updated to add file-service, file-postgres, and SeaweedFS stack services; the frontend uses a new fileServiceUrl config value and uploads via uploadFileToFileService.
  • Risk: attachments fields are removed from WorkflowEventSerializer, TaskFieldSerializer, and comment/workflow API payloads — clients that depend on these fields will break. The /workflows/attachments and /workflows/public/attachments API routes are also removed.

Changes since #53 opened

  • Relaxed Content-Security-Policy header from default-src 'none' to a multi-directive policy allowing self, inline scripts and styles, and external resources from specific CDNs [53c7500]
  • Increased client_max_body_size directive from 100m to 105m in nginx configuration files [7d3d6d5]
  • Changed the cookie path for file_service_auth cookie from /files/ to / in FileServiceAuthMiddleware.process_response middleware hook [ce42e6f]

Macroscope summarized 30c69a9.


Note

High Risk
Large cross-cutting change touching authentication (public token cache), file access control, and removed API fields (attachments on events/fields/comments). Production rollout depends on migration commands and coordinated file-service deployment.

Overview
Replaces in-backend Google Cloud Storage with a dedicated file-service (wired via FILE_SERVICE_URL / FileServiceClient) and local SeaweedFS stack in docker-compose, plus a separate file-postgres database.

Upload paths for user/contact/integration photos, Microsoft Graph avatars, and account logos now go through the file service; sync_account_file_fields keeps Attachment records aligned when those URLs change. django-guardian and a custom GroupObjectPermission model back object-level file access; admin no longer triggers legacy bucket public/private Celery tasks.

API and model shifts: workflow event/field serializers drop attachments; comments require text only (no attachment IDs); file fields validate markdown link lists instead of integer attachment IDs. Analytics for uploads now key off storage.Attachment / file_id. Public/embed auth caches template account_id in Redis with invalidation when templates are deactivated or access is revoked.

Migration tooling adds orchestrated commands (run_file_migration, fill file_id, migrate FileAttachmentAttachment, sync rows into the file DB, rewrite GCS URLs). Legacy FileAttachment remains temporarily with file_id / access_type fields.

Repo hygiene: root .pre-commit-config.yaml covers backend and storage/; google-cloud-storage removed from backend dependencies (lockfile still pulls GCS libs transitively).

Reviewed by Cursor Bugbot for commit 7d3d6d5. Bugbot is set up for automated code reviews on this repo. Configure here.

…ion API

- Add FileAttachment.access_type field with account/restricted options
- Add FileAttachment.file_id field for unique file identification
- Create FileAttachmentPermission model for user-specific file access
- Implement AttachmentService.check_user_permission method
- Add AttachmentsViewSet with check-permission endpoint
- Add Redis caching for public template authentication
- Add response_forbidden method to BaseResponseMixin
- Include comprehensive test coverage for permission checking

This enables fine-grained file access control with two access types:
- account: accessible by all users in the same account
- restricted: accessible only by users with explicit permissions
…integration

- Add FastAPI-based file upload and download endpoints with streaming support
- Implement Clean Architecture with domain entities, use cases, and repositories
- Add authentication middleware with JWT token validation and Redis caching
- Integrate Google Cloud Storage S3-compatible API for file storage
- Add comprehensive error handling with custom exceptions and HTTP status codes
- Implement file access permissions validation through external HTTP service
- Add database models and Alembic migrations for file metadata storage
- Include Docker containerization with docker-compose for local development
- Add comprehensive test suite with unit, integration, and e2e tests
- Configure pre-commit hooks with ruff, mypy, and pytest for code quality
…e_access_rights' into 41526__сreate_file_service

# Conflicts:
#	backend/src/processes/enums.py
#	backend/src/processes/models/__init__.py
#	backend/src/processes/models/workflows/attachment.py
#	backend/src/processes/services/attachments.py
#	backend/src/processes/tests/test_services/test_attachments.py
@EBirkenfeld EBirkenfeld self-assigned this Oct 24, 2025
@EBirkenfeld EBirkenfeld added the Backend API changes request label Oct 24, 2025
Comment thread storage/src/application/use_cases/file_upload.py
Comment thread storage/src/shared_kernel/exceptions/domain_exceptions.py
Comment thread storage/src/infra/repositories/storage_service.py Outdated
Comment thread storage/src/shared_kernel/middleware/auth_middleware.py
Comment thread storage/src/infra/repositories/storage_service.py Outdated
….toml to dedicated config files

- Move mypy configuration from pyproject.toml to mypy.ini for better separation of concerns
- Simplify ruff.toml configuration by removing extensive rule selections and using "ALL" selector
- Update ruff target version from py37 to py311 to match project Python version
- Remove redundant ruff configuration from pyproject.toml to avoid duplication
- Apply code formatting fixes across entire codebase
- Standardize import statements and code style according to new linting rules
- Update test files to comply with new formatting standards
Comment thread storage/src/shared_kernel/database/migrations/env.py
Comment thread storage/src/shared_kernel/exceptions/validation_exceptions.py Outdated
Comment thread storage/src/shared_kernel/config.py
Comment thread storage/src/shared_kernel/exceptions/base_exceptions.py Outdated
Comment thread storage/src/shared_kernel/exceptions/error_codes.py
Comment thread storage/src/shared_kernel/uow/unit_of_work.py Outdated
Comment thread backend/src/processes/services/attachments.py Outdated
Comment thread storage/src/shared_kernel/exceptions/exception_handler.py
Comment thread storage/src/shared_kernel/exceptions/validation_exceptions.py Outdated
Comment thread storage/src/infra/repositories/file_record_repository.py
…ore rule in ruff configuration

- Update docstrings across various modules to ensure consistency and clarity.
- Remove unused "D" rule from ruff.toml configuration.
- Enhance readability and maintainability of the codebase.
Comment thread storage/src/presentation/api/files.py Outdated
Comment thread storage/src/presentation/api/files.py
Comment thread storage/src/infra/http_client.py
…ling for consistency

- Adjust import paths in test files to ensure they reference the correct locations.
- Replace instances of FileNotFoundError with DomainFileNotFoundError for better clarity in exception handling.
- Streamline fixture definitions and improve code readability across various test modules.
… configuration

- Update docstrings across various test files for consistency and clarity.
- Add new linting rules in ruff.toml for improved code quality.
- Enhance readability and maintainability of the codebase by refining fixture definitions and mock implementations.
…line permission handling

- Refactor the AuthenticationMiddleware to enhance error handling and response formatting.
- Update permission classes to use direct Request type hints instead of string annotations.
- Consolidate permission checks into FastAPI dependency wrappers for better clarity and usability.
- Remove unused exception classes and error messages to clean up the codebase.
- Adjust test cases to reflect changes in authentication and permission handling.
Comment thread backend/src/processes/views/attachments.py Outdated
Comment thread storage/src/shared_kernel/middleware/auth_middleware.py Outdated
Comment thread storage/src/presentation/api/files.py
…exception tests

- Remove unused infrastructure error codes from error_codes.py to streamline the codebase.
- Update the AuthenticationMiddleware constructor to use direct FastAPI type hints for clarity.
- Add new tests for validation exceptions, including file size and storage errors, to improve coverage and ensure accurate error handling.
Comment thread storage/src/shared_kernel/exceptions/permission_exceptions.py
Comment thread storage/src/shared_kernel/exceptions/external_service_exceptions.py
Comment thread storage/src/shared_kernel/auth/dependencies.py Outdated
Comment thread storage/src/infra/http_client.py
Comment thread storage/src/shared_kernel/auth/public_token.py
Comment thread storage/src/shared_kernel/exceptions/base_exceptions.py Outdated
Comment thread storage/tests/fixtures/e2e.py Outdated
Comment thread storage/src/shared_kernel/auth/guest_token.py
Comment thread storage/tests/fixtures/e2e.py
Comment thread backend/src/processes/models/workflows/attachment.py Outdated
Comment thread storage/src/infra/http_client.py Outdated
Comment thread storage/src/infra/mappers/file_record_mapper.py
@railway-app railway-app Bot temporarily deployed to divine-insight / production June 12, 2026 00:51 Inactive
@railway-app railway-app Bot temporarily deployed to feisty-wisdom / production June 12, 2026 01:28 Inactive
Comment thread backend/src/authentication/services/public_auth.py
Comment thread backend/src/processes/models/templates/template.py
Comment thread storage/src/presentation/api/files.py
@railway-app railway-app Bot temporarily deployed to hospitable-love / production June 15, 2026 13:35 Inactive
Comment thread backend/src/processes/management/commands/run_file_migration.py
…s hierarchy

Migrated from single Settings class with 7 scattered CONFIG string checks to BaseAppSettings -> TestingSettings / DevelopmentSettings / ProductionSettings class hierarchy.

Changes:

- config.py: class-based inheritance with environment-specific flags (HSTS_ENABLED, RATE_LIMIT_ENABLED, RELOAD, WORKERS)

- main.py: removed all CONFIG string comparisons, docs always enabled, added root_path derived from FASTAPI_BASE_URL

- security_headers.py: CSP updated to allow Swagger/ReDoc CDN resources

- DI container + API: Settings -> BaseAppSettings type hints

- Tests: CONFIG=Testing in conftest, updated imports
- Add SVG and WEBP to IMAGE_FILE_EXTENSIONS for correct <img> rendering
- Implement filename-based fallback in getLinkEntityType for UUID URLs
  that lack file extensions (customMarkdownPlugins.ts)
- Create getAttachmentEntityTypeByFilename to classify attachments by
  extension (Image/Video/File/Link) as single source of truth
- Replace duplicated IMAGE_EXTENSION_RE regex in parseMarkdownFiles.ts
  with canonical getAttachmentTypeByFilename function
- Add comprehensive unit/integration tests for new detection logic
Comment thread backend/src/processes/management/commands/run_file_migration.py
@railway-app railway-app Bot temporarily deployed to enthusiastic-inspiration / production June 15, 2026 22:03 Inactive
- Add SVG and WEBP to IMAGE_FILE_EXTENSIONS for correct <img> rendering
- Implement filename-based fallback in getLinkEntityType for UUID URLs
  that lack file extensions (customMarkdownPlugins.ts)
- Create getAttachmentEntityTypeByFilename to classify attachments by
  extension (Image/Video/File/Link) as single source of truth
- Replace duplicated IMAGE_EXTENSION_RE regex in parseMarkdownFiles.ts
  with canonical getAttachmentTypeByFilename function
- Add comprehensive unit/integration tests for new detection logic
…igration

- Add SVG and WEBP to IMAGE_FILE_EXTENSIONS for correct <img> rendering
- Implement filename-based fallback in getLinkEntityType for UUID URLs
  that lack file extensions (customMarkdownPlugins.ts)
- Create getAttachmentEntityTypeByFilename to classify attachments by
  extension (Image/Video/File/Link) as single source of truth
- Replace duplicated IMAGE_EXTENSION_RE regex in parseMarkdownFiles.ts
  with canonical getAttachmentTypeByFilename function
- Add comprehensive unit/integration tests for new detection logic
Comment thread backend/src/storage/services/attachments.py
…service

# Conflicts:
#	frontend/src/public/api/commonRequest.ts
@railway-app railway-app Bot temporarily deployed to fulfilling-warmth / production June 17, 2026 09:49 Inactive

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

There are 109 total unresolved issues (including 108 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 7d3d6d5. Configure here.

try:
token = PublicAuthService.get_token(raw_token)
if token:
auth_data = await PublicAuthService.authenticate_public_token(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Public cookie lacks Token prefix

Medium Severity

Public-token authentication from the public-token cookie passes the raw cookie value into PublicAuthService.get_token, which only accepts a two-part Token &lt;value&gt; header string. Browser clients store the bare token in that cookie, so public-form file requests using cookies fail auth unless the prefixed header is also sent.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 7d3d6d5. Configure here.

@railway-app railway-app Bot temporarily deployed to fulfilling-warmth / production June 21, 2026 10:44 Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend API changes request Frontend Web client changes request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants