Indexing/Connecting Codebase #10217
Replies: 1 comment
-
|
There are three ways to get your codebase indexed in Onyx, depending on your version and setup: Option 1: GitHub connector — enable file indexing (check your version) The GitHub connector in recent Onyx versions supports indexing source code files, not just PRs and Issues. When configuring the connector, look for a "File extensions" or "Index repository files" field. If it's present, add the extensions you want ( If your Onyx instance is older and that option isn't there, updating to the latest release is the quickest fix — file indexing via GitHub was added as part of expanded connector support. Option 2: File connector — .py files should work as plain text Python files are plain text, so the file connector should accept them. The documented restriction is against binary formats, not source code files. A few things to try:
Option 3: Programmatic ingestion via the Document API (most flexible) For indexing an entire codebase, Onyx exposes a document ingestion API that lets you push any text content directly into the index. This is the most reliable approach for code: import requests, os
ONYX_URL = "http://your-onyx-instance"
API_KEY = "your-api-key"
REPO_PATH = "/path/to/your/repo"
for root, _, files in os.walk(REPO_PATH):
for fname in files:
if not fname.endswith(('.py', '.ts', '.js', '.md', '.yaml')):
continue
fpath = os.path.join(root, fname)
with open(fpath, 'r', errors='ignore') as f:
content = f.read()
if not content.strip():
continue
requests.post(
f"{ONYX_URL}/api/manage/admin/doc-set",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"document": {
"id": fpath,
"sections": [{"text": content, "link": fpath}],
"source": "file",
"semantic_identifier": fname,
"metadata": {"language": fname.rsplit('.', 1)[-1]},
}
}
)Run this once to seed the index, then set up a cron job or git hook to re-index changed files on each push. Recommended path: Start with Option 1 — update to the latest Onyx and check if the GitHub connector now exposes file extension filtering. If not, Option 3 gives you full control over what gets indexed and lets you attach metadata (language, file path, module name) that makes search results much more precise. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I want to connect and index my codebase uploaded in GitHub. Although by using the GitHub connector option I can find and connect with the repo, it does not index the actual code files. From the documentation it is clear that this connector is used only for PRs and Issues.
Additionally, I tried uploading a .py file manually throught the File connector, and no indexing took place as well. This can be confirmed from the docs pages as well, where only text formats are allowed (pdfs etc).
So my question is, how can I connect my actual codebase(s) for index to work with them? Is there no such option?
Beta Was this translation helpful? Give feedback.
All reactions