Enterprise RAG &
Knowledge Graph Engine
An open-source reference architecture for building secure, permission-aware vector search pipelines, document parsing, and structured entity graphs.
Engine Features
Built with reliable, enterprise-grade components designed to decouple pipeline ingestion from storage layers.
Obsidian Ingestion
Extracts YAML metadata, wiki-style links (`[[InternalLink]]`), and markdown tags. Prepares rich seeds for building relation graphs in future stages.
Dual-Store Architecture
Structured document logs, metadata records, and audit paths are persisted in PostgreSQL, while vector coordinates reside in Qdrant.
Hybrid Reranking
Search retrievals combine vector cosine similarity with metadata filters (tags, directories) and title-heading text scoring for precise answers.
Alembic Database Versioning
Full support for SQLAlchemy ORM and database versioning. Schema adjustments are managed dynamically through SQL version paths.
Governed AI Workflows
Configured with robust repo boundaries. Code templates, CI/CD scripts, and agent rules prevent leakages of raw customer logs or secrets.
HA Vector Demo
Includes a runnable multi-node Qdrant cluster compose file equipped with Prometheus monitoring dashboards for validating network workloads.
Getting Started
Quick instructions to spin up the local development stack and ingest your first documents.
1Set Up Virtual Environment
Initialize your Python environment and install the required packages:
2Bring Up Storage Engines
Spin up local PostgreSQL and Qdrant containers in the background:
3Apply Database Migrations
Migrate the schema schema head using Alembic:
4Launch the FastAPI Server
Initialize environment variables and start the server:
1Set Up Virtual Environment
Initialize your Python environment and install the required packages:
2Bring Up Storage Engines
Spin up local PostgreSQL and Qdrant containers in the background:
3Apply Database Migrations
Migrate the database schema using Alembic:
4Launch the FastAPI Server
Initialize environment variables and start the server:
Documentation Hub
Read repository operating guides, safety boundaries, and contributing procedures directly below without leaving the page.
Repository Agent Guide
This file defines the operating baseline for AI coding agents and human contributors working in this repository.
Scope
The repository contains:
- Architecture documentation for Enterprise Knowledge Hub.
- Project-scoped agent configuration under
.codex/,.claude/,.agents/, and.hermes/. - A runnable Qdrant multi-node demo under
qdrant-multi-node-cluster/.
Required Workflow
- Read
README.mdanddocs/architecture.mdbefore changing architecture, rules, or implementation structure. - Check the target area before editing:
- Root and
docs/: project documentation and governance. .codex/,.claude/,.agents/,.hermes/: agent configuration and local automation.qdrant-multi-node-cluster/: runnable Python and Docker demo.
- Root and
- Keep changes scoped. Do not rewrite unrelated generated files, local runtime state, or private configuration.
- Prefer explicit, reproducible commands. Do not rely on undocumented local state.
- Before writing code, inspect existing patterns and update tests or verification steps when behavior changes.
Architecture Rules
- Treat
docs/architecture.mdas the source of truth. - Use clean boundaries between UI, backend services, workers, MCP connectors, and storage.
- Do not bypass RBAC, audit logging, or human review in workflows that process sensitive documents.
- Keep LLM calls behind a gateway that supports model routing, caching, retries, and observability.
- Keep raw documents, model files, vector snapshots, and audit exports out of git.
Security Rules
[SECURITY RULE] Never commit credentials, API keys, internal hostnames, customer data, logs, local sessions, or agent memory.
- Use
.env.examplefor required configuration keys and document expected values without secrets. - Prefer least-privilege access for MCP connectors and storage clients.
- Fail closed for authorization checks and document access checks.
Verification Guidance
- Documentation changes: review links and headings manually.
- Qdrant demo changes: run tests from
qdrant-multi-node-cluster/withmake testorpython -m unittest discover -s tests. - Agent configuration changes: verify that referenced files exist and local-only files remain ignored.
Writing Standards
- Be direct and technical.
- Avoid marketing claims that are not backed by implementation or documentation.
- Use consistent terminology: RAG, Knowledge Graph, MCP, RBAC, audit, queue, worker, vector database.
- Keep examples public-safe and free of company-private data.
Contributing
Thank you for considering a contribution to Enterprise Knowledge Hub.
Before You Start
- Read
README.md,docs/architecture.md, andAGENTS.md. - Keep changes small and focused.
- Do not commit secrets, runtime logs, local sessions, generated cache, raw documents, model weights, or database data.
Contribution Types
Good first contributions include:
- Improving architecture documentation.
- Adding tests or verification commands for the Qdrant demo.
- Hardening security guidance.
- Clarifying AI-agent rules.
- Adding implementation modules that follow the documented architecture.
Development Workflow
- Create a feature branch off of the default branch.
- Make the smallest coherent change.
- Update documentation when behavior, architecture, or commands change.
- Run the relevant verification:
- Documentation-only changes: check links and headings manually.
- Qdrant demo changes: run
make testinqdrant-multi-node-cluster.
- Open a pull request with a concise summary, verification results, and any known limitations.
Pull Request Standard
Each pull request should include:
- What changed.
- Why it changed.
- How it was verified.
- Any follow-up work or risk.
Style
- Prefer clear prose over broad claims.
- Keep architecture terms consistent.
- Avoid examples that expose internal systems or private data.
- Keep generated artifacts out of source control.
Security Policy
Supported Scope
Security reports are accepted for:
- Repository documentation that could encourage unsafe deployment.
- Qdrant demo configuration that exposes credentials or unsafe defaults.
- Agent rules that could leak secrets, local sessions, logs, or private documents.
- Future application code added under the documented architecture.
Reporting a Vulnerability
[IMPORTANT] Do not open a public issue for exploitable vulnerabilities or leaked secrets.
Send a private report to the maintainers (or open a GitHub Security Advisory on this repository) with:
- Affected file or component.
- Reproduction steps.
- Impact assessment.
- Suggested mitigation, if available.
If no private channel is published yet, create a public issue that only says you have a security report and need a maintainer contact. Do not include exploit details in that issue.
Secret Handling
Never commit:
- API keys, tokens, passwords, certificates, or private keys.
- Internal URLs, customer identifiers, or production hostnames.
- Raw documents, model weights, vector database snapshots, audit exports, or logs.
- Local agent memory, sessions, or authentication files.
Use .env.example to document configuration names without real values.