kura

❯ kura add notes

created inbox ~/Kura_notes -> ~/.config/kuradb/notes/inbox

drop your PDFs, docs, sheets, or markdown here

❯ kura

watch notes · 14 files · 312 chunks

embed 312/312 · text-embedding-3-small · dim 512

serving http://127.0.0.1:48213

❯ curl '127.0.0.1:48213/api/semantic?db=notes&q=retry+backoff+strategy'

{ "results": [{

"source": "resilience-patterns.pdf",

"matches": [{ "chunk": 7, "content": "Exponential backoff with jitter..." }]

}] }

Features

A RAG Backend That Stays Out of Your Way

Watch, parse, embed, and serve — a single Go binary with zero external vector database.

♻

Automatic Ingestion

Drop a file in the watched folder. KuraDB sniffs, parses, chunks, and queues it for embedding — no commands, no schemas.

★

Semantic Vector Search

Two-stage cosine search: coarse source-level ranking, then a parallel chunk-level scan across CPU cores. Self-implemented, no ToriiDB.

⌨

CJK Keyword Search

gse-powered Chinese segmentation with stopwords. Scores rows by keyword hit count for precise, language-aware matching.

🔒

Read-Only Contract

The HTTP API has no mutation endpoints by design. Data enters one way only — through the watcher pipeline. A trust boundary, not a config flag.

⚖

SQLite Source of Truth

Every chunk and vector lives in SQLite. The in-memory vector cache is rebuildable — never authoritative, never lost on restart.

⚡

Smart Re-embedding

Upsert only invalidates a vector when content actually changed. Identical files keep their embeddings — no wasted OpenAI round-trips.

⏳

Query Cache

Query embeddings are cached in a global SQLite store. Repeated searches skip the embedding call and go straight to the vector scan.

♻

Soft-Delete Consistency

Removed files are dismissed, not dropped. Every semantic and keyword query strictly filters dismissed content from results.

🔌

Agenvoy-Native

Binds a random local port and publishes its URL to an endpoint file, so Agenvoy discovers and consumes it as a child process automatically.

Pipeline

One-Way Ingestion, by Design

Data enters through a single path. The database package is the only write entry point — nothing else touches SQLite.

Step 1

Watch

Poll the inbox every 10s. Detect changes via a persisted size + mtime snapshot.

walkFiles →

Step 2

Parse

Sniff text, dispatch by type, split into chunks — PDF, Office, sheets, markdown.

parser →

Step 3

Embed

Tick every 5s, batch of 64. Embed pending rows, write vectors back to SQLite.

openai →

Step 4

Serve

Read-only HTTP over localhost. Two-stage vector + keyword search over fresh rows.

gin

Search

Two Ways to Find Anything

Semantic for meaning, keyword for exact terms. Both filter dismissed content and never leak internal scoring fields.

Semantic

Search by Meaning

Query cache → embed on miss → two-stage cosine search → drop hits below 0.3 → hydrate full rows from SQLite, grouped by source.

                # GET /api/semantic

                GET /api/semantic?db=notes&q=retry+strategy&limit=10

                {

                  "results": [{

                    "source": "resilience.pdf",

                    "matches": [{ "chunk": 7, "content": "..." }]

                  }]

                }

cosine similarity two-stage scan minScore 0.3

Keyword

Search by Term

Tokenize the query with CJK-aware segmentation, score each row by how many keyword clauses match, order by hit count — always filtering dismissed rows.

                # GET /api/keyword

                GET /api/keyword?db=notes&q=重試+退避&limit=10

                {

                  "results": [{

                    "source": "retry-notes.md",

                    "matches": [{ "chunk": 2, "content": "..." }]

                  }]

                }

go-ego/gse stopwords hit-count rank

Formats

Bring Any Document

Text sniffing skips binaries and media automatically. Everything else is parsed, chunked, and embedded.

📄

PDF

Full text extraction, page-aware chunking

📝

DOCX · PPTX

Word and PowerPoint document bodies

📊

XLSX · CSV

Tabular sheets flattened to rows

✎

Markdown

Markdown and plain UTF-8 text

⇄

REST API

Read-only HTTP for any consumer

Interface

Read-Only API & a Tiny CLI

Four endpoints to query, four subcommands to manage. No mutation surface anywhere.

HTTP Endpoints

All under /api, bound to localhost on a random free port. limit defaults to 10, max 100.

GET/api/health

GET/api/list

GET/api/semantic?db=&q=&limit=

GET/api/keyword?db=&q=&limit=

CLI Commands

Run kura with no args to start the server. Subcommands manage the database registry.

kura · start server kura add <name> kura list kura remove <name> kura edit <old> <new> kura help

Under the Hood

Lean Stack, No Heavy Dependencies

Personal-scale data needs a linear scan and a good index — not a distributed vector cluster.

Go 1.25

Single static binary

SQLite

Source of truth

OpenAI

embedding-3-small · 512

Gin

Read-only HTTP

go-sqlkit

Read/write pools

go-pkg

Parsers · keychain

go-ego/gse

CJK segmentation

Vector

Self-implemented

Get Started

From Folder to Search in Four Steps

Install once, register a database, drop your files, and start querying.

1

Install

Bootstraps Go and builds the kura binary. macOS and Linux.

curl -fsSL https://kuradb.agenvoy.com/scripts/install.sh | bash

2

Add a Database

Creates an inbox and a ~/Kura_notes symlink.

kura add notes

3

Drop Files

Copy documents into the symlinked inbox folder.

cp ~/Documents/*.pdf ~/Kura_notes/

4

Start & Query

Server auto-loads, watches, and embeds.

kura

Drop a Folder.
Get a Searchable Knowledge Base.

A RAG Backend That Stays Out of Your Way

Automatic Ingestion

Semantic Vector Search

CJK Keyword Search

Read-Only Contract

SQLite Source of Truth

Smart Re-embedding

Query Cache

Soft-Delete Consistency

Agenvoy-Native

One-Way Ingestion, by Design

Watch

Parse

Embed

Serve

Two Ways to Find Anything

Search by Meaning

Search by Term

Bring Any Document

PDF

DOCX · PPTX

XLSX · CSV

Markdown

REST API

Read-Only API & a Tiny CLI

HTTP Endpoints

CLI Commands

Lean Stack, No Heavy Dependencies

From Folder to Search in Four Steps

Install

Add a Database

Drop Files

Start & Query

Turn Your Files Into Knowledge

Drop a Folder.Get a Searchable Knowledge Base.

A RAG Backend That Stays Out of Your Way

Automatic Ingestion

Semantic Vector Search

CJK Keyword Search

Read-Only Contract

SQLite Source of Truth

Smart Re-embedding

Query Cache

Soft-Delete Consistency

Agenvoy-Native

One-Way Ingestion, by Design

Watch

Parse

Embed

Serve

Two Ways to Find Anything

Search by Meaning

Search by Term

Bring Any Document

PDF

DOCX · PPTX

XLSX · CSV

Markdown

REST API

Read-Only API & a Tiny CLI

HTTP Endpoints

CLI Commands

Lean Stack, No Heavy Dependencies

From Folder to Search in Four Steps

Install

Add a Database

Drop Files

Start & Query

Turn Your Files Into Knowledge

Drop a Folder.
Get a Searchable Knowledge Base.