PHASE 1 MVP + MIGRATIONS

Enterprise RAG &
Knowledge Graph Engine

An open-source reference architecture for building secure, permission-aware vector search pipelines, document parsing, and structured entity graphs.

$python -m nexus_document_parser.cli docs/vault/ --source-type obsidian
Initializing Ingestion Pipeline...
Connecting to PostgreSQL at localhost:5432... Success
Connecting to Qdrant at localhost:6333... Success
Loading vault files: 12 markdown documents discovered.
Extracting YAML Frontmatter & wikilinks...
Splitting into 84 chunks using Markdown-aware chunker.
Generating embeddings using BAAI/bge-m3...
Upserting vector points to collection 'nexus_chunks'...
=== Ingestion Completed Successfully ===
{ "run_id": "8f2a1b9d-4e8c-4c4c-8b8a-9f5b2d8e4f5a", "documents_indexed": 12, "chunks_indexed": 84 }

Engine Features

Built with reliable, enterprise-grade components designed to decouple pipeline ingestion from storage layers.

Obsidian Ingestion

Extracts YAML metadata, wiki-style links (`[[InternalLink]]`), and markdown tags. Prepares rich seeds for building relation graphs in future stages.

Dual-Store Architecture

Structured document logs, metadata records, and audit paths are persisted in PostgreSQL, while vector coordinates reside in Qdrant.

Hybrid Reranking

Search retrievals combine vector cosine similarity with metadata filters (tags, directories) and title-heading text scoring for precise answers.

Alembic Database Versioning

Full support for SQLAlchemy ORM and database versioning. Schema adjustments are managed dynamically through SQL version paths.

Governed AI Workflows

Configured with robust repo boundaries. Code templates, CI/CD scripts, and agent rules prevent leakages of raw customer logs or secrets.

HA Vector Demo

Includes a runnable multi-node Qdrant cluster compose file equipped with Prometheus monitoring dashboards for validating network workloads.

Getting Started

Quick instructions to spin up the local development stack and ingest your first documents.

1Set Up Virtual Environment

Initialize your Python environment and install the required packages:

python -m venv .venv; .venv\Scripts\Activate.ps1; pip install -r requirements.txt

2Bring Up Storage Engines

Spin up local PostgreSQL and Qdrant containers in the background:

docker compose up -d

3Apply Database Migrations

Migrate the schema schema head using Alembic:

alembic -c infrastructure/alembic.ini upgrade head

4Launch the FastAPI Server

Initialize environment variables and start the server:

$env:PYTHONPATH="packages/shared-contracts;packages/vector-client;workers/document-parser;services/nexus-api"; uvicorn nexus_api.main:app --reload

1Set Up Virtual Environment

Initialize your Python environment and install the required packages:

python3 -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt

2Bring Up Storage Engines

Spin up local PostgreSQL and Qdrant containers in the background:

docker compose up -d

3Apply Database Migrations

Migrate the database schema using Alembic:

alembic -c infrastructure/alembic.ini upgrade head

4Launch the FastAPI Server

Initialize environment variables and start the server:

export PYTHONPATH="packages/shared-contracts:packages/vector-client:workers/document-parser:services/nexus-api" && uvicorn nexus_api.main:app --reload

Documentation Hub

Read repository operating guides, safety boundaries, and contributing procedures directly below without leaving the page.

Governance & Policy

Repository Agent Guide

This file defines the operating baseline for AI coding agents and human contributors working in this repository.

Scope

The repository contains:

  • Architecture documentation for Enterprise Knowledge Hub.
  • Project-scoped agent configuration under .codex/, .claude/, .agents/, and .hermes/.
  • A runnable Qdrant multi-node demo under qdrant-multi-node-cluster/.

Required Workflow

  • Read README.md and docs/architecture.md before changing architecture, rules, or implementation structure.
  • Check the target area before editing:
    • Root and docs/: project documentation and governance.
    • .codex/, .claude/, .agents/, .hermes/: agent configuration and local automation.
    • qdrant-multi-node-cluster/: runnable Python and Docker demo.
  • Keep changes scoped. Do not rewrite unrelated generated files, local runtime state, or private configuration.
  • Prefer explicit, reproducible commands. Do not rely on undocumented local state.
  • Before writing code, inspect existing patterns and update tests or verification steps when behavior changes.

Architecture Rules

  • Treat docs/architecture.md as the source of truth.
  • Use clean boundaries between UI, backend services, workers, MCP connectors, and storage.
  • Do not bypass RBAC, audit logging, or human review in workflows that process sensitive documents.
  • Keep LLM calls behind a gateway that supports model routing, caching, retries, and observability.
  • Keep raw documents, model files, vector snapshots, and audit exports out of git.

Security Rules

[SECURITY RULE] Never commit credentials, API keys, internal hostnames, customer data, logs, local sessions, or agent memory.

  • Use .env.example for required configuration keys and document expected values without secrets.
  • Prefer least-privilege access for MCP connectors and storage clients.
  • Fail closed for authorization checks and document access checks.

Verification Guidance

  • Documentation changes: review links and headings manually.
  • Qdrant demo changes: run tests from qdrant-multi-node-cluster/ with make test or python -m unittest discover -s tests.
  • Agent configuration changes: verify that referenced files exist and local-only files remain ignored.

Writing Standards

  • Be direct and technical.
  • Avoid marketing claims that are not backed by implementation or documentation.
  • Use consistent terminology: RAG, Knowledge Graph, MCP, RBAC, audit, queue, worker, vector database.
  • Keep examples public-safe and free of company-private data.

Contributing

Thank you for considering a contribution to Enterprise Knowledge Hub.

Before You Start

  • Read README.md, docs/architecture.md, and AGENTS.md.
  • Keep changes small and focused.
  • Do not commit secrets, runtime logs, local sessions, generated cache, raw documents, model weights, or database data.

Contribution Types

Good first contributions include:

  • Improving architecture documentation.
  • Adding tests or verification commands for the Qdrant demo.
  • Hardening security guidance.
  • Clarifying AI-agent rules.
  • Adding implementation modules that follow the documented architecture.

Development Workflow

  • Create a feature branch off of the default branch.
  • Make the smallest coherent change.
  • Update documentation when behavior, architecture, or commands change.
  • Run the relevant verification:
    • Documentation-only changes: check links and headings manually.
    • Qdrant demo changes: run make test in qdrant-multi-node-cluster.
  • Open a pull request with a concise summary, verification results, and any known limitations.

Pull Request Standard

Each pull request should include:

  • What changed.
  • Why it changed.
  • How it was verified.
  • Any follow-up work or risk.

Style

  • Prefer clear prose over broad claims.
  • Keep architecture terms consistent.
  • Avoid examples that expose internal systems or private data.
  • Keep generated artifacts out of source control.

Security Policy

Supported Scope

Security reports are accepted for:

  • Repository documentation that could encourage unsafe deployment.
  • Qdrant demo configuration that exposes credentials or unsafe defaults.
  • Agent rules that could leak secrets, local sessions, logs, or private documents.
  • Future application code added under the documented architecture.

Reporting a Vulnerability

[IMPORTANT] Do not open a public issue for exploitable vulnerabilities or leaked secrets.

Send a private report to the maintainers (or open a GitHub Security Advisory on this repository) with:

  • Affected file or component.
  • Reproduction steps.
  • Impact assessment.
  • Suggested mitigation, if available.

If no private channel is published yet, create a public issue that only says you have a security report and need a maintainer contact. Do not include exploit details in that issue.

Secret Handling

Never commit:

  • API keys, tokens, passwords, certificates, or private keys.
  • Internal URLs, customer identifiers, or production hostnames.
  • Raw documents, model weights, vector database snapshots, audit exports, or logs.
  • Local agent memory, sessions, or authentication files.

Use .env.example to document configuration names without real values.