Skip to content

API Tutorial

Three production workflows with real curl commands. Replace $BASE, $TOKEN, and $DOC_TYPE with your values.

Setup Shell variables

export BASE=http://localhost:8001    # dev mode; production uses port 8000
export DOC_TYPE=fin-report

Auth Authentication

All /api/* routes require a JWT. Login once, reuse for 8 hours.

User login — single step

TOKEN=$(curl -s -X POST $BASE/auth/user/login \
  -H "Content-Type: application/json" \
  -d '{"username":"alice","password":"MyAdminPass123!!!"}' \
  | python3 -c "import sys,json; print(json.load(sys.stdin)['access_token'])")

Regular user (with group selection)

TOKEN=$(curl -s -X POST $BASE/auth/user/login \
  -H "Content-Type: application/json" \
  -d '{"username":"bob","password":"GroupPass1!","group_id":1}' \
  | python3 -c "import sys,json; print(json.load(sys.stdin)['access_token'])")
Use on every subsequent request: -H "Authorization: Bearer $TOKEN"

Keys API Keys Management

LLM provider keys (OpenAI, Gemini, Anthropic, etc.) are stored encrypted per workspace. Admins manage them from the UI.

Keys Panel workflow (Admin page)

  1. Navigate to Admin → Keys Panel.
  2. Select a workspace (group + doc type). Keys are displayed masked (first 4 chars + dots).
  3. Click Unlock and enter your admin password (not the master password).
  4. The textarea shows decrypted keys in YAML format. Edit or paste new keys.
  5. Click Save — keys are encrypted to .keys.yaml.enc automatically.
  6. Click Lock to return to the masked view.

CLI key management

# Encrypt plaintext .keys.yaml → .keys.yaml.enc
python -m app.backend.security.keys encrypt \
  --workspace reedcapital --doc-type fin-report --remove-plaintext

# Decrypt and verify
python -m app.backend.security.keys decrypt \
  --workspace reedcapital --doc-type fin-report

# Check status
python -m app.backend.security.keys status
Keys are encrypted with Fernet (PBKDF2 + MASTER_PASSWORD from auth.db). Plaintext .keys.yaml is a read-only fallback, auto-migrated to encrypted on first save. Keys are never stored in .env.

01 Workflow 1 — Daily Ingest

Upload PDF reports, convert to Markdown, archive to permanent tree.

Step 1

Ingest — Fast (synchronous)

Returns immediately with Markdown output. VisionInterceptor auto-detects charts.

# Single file
curl -X POST "$BASE/api/$DOC_TYPE/ingest-fast" \
  -H "Authorization: Bearer $TOKEN" \
  -F "files=@blackrock_q1_2025.pdf"

# Multiple files
curl -X POST "$BASE/api/$DOC_TYPE/ingest-fast" \
  -H "Authorization: Bearer $TOKEN" \
  -F "files=@blackrock_q1.pdf" \
  -F "files=@vanguard_q1.pdf"

Response: {"status": "ok", "filename": "...", "stats": {"pages": 12, "images": 3, "elapsed_s": 8.4}}

Step 1b

Ingest — Standard (background, full DOCINDEX tree)

Use for complex documents needing deep section-level structure. Runs asynchronously.

curl -X POST "$BASE/api/$DOC_TYPE/ingest" \
  -H "Authorization: Bearer $TOKEN" \
  -F "files=@blackrock_q1.pdf" \
  -F "files=@vanguard_q1.pdf" \
  -F "files=@pimco_q1.pdf"

Returns immediately: {"message": "Accepted for processing", "results": [...]}. Processing continues in background (30–120 s/file).

Step 1c

Ingest existing file with page range

File already in temp/01-input/. Useful for testing specific pages.

curl -X POST "$BASE/api/$DOC_TYPE/ingest-fast-file" \
  -H "Authorization: Bearer $TOKEN" \
  -G \
  --data-urlencode "filename=blackrock_q1_2025.pdf" \
  --data-urlencode "pages=1-4,8,10-12"
Step 2

Archive — move to permanent tree

LLM infers dimension values (entity, date, strategy…) from each Markdown. Moves files to permanent path. Updates _index/index.json.

curl -X POST "$BASE/api/$DOC_TYPE/archive" \
  -H "Authorization: Bearer $TOKEN"

Response: {"archived_count": 3, "results": [...]}

Step 3

Clean up temp folder

curl -X DELETE "$BASE/api/$DOC_TYPE/temp/cleanup" \
  -H "Authorization: Bearer $TOKEN"
Permanent tree after archive: workspaces/{group}/{doc_type}/{entity}_{fund}/{quarter}/{stem}.pdf + .md + .json

02 Workflow 2 — Mapping Review

Before extraction, ensure every entity has a specific mapping template.

Step 1

Check which entities are missing a mapping

curl "$BASE/api/$DOC_TYPE/mappings/status" \
  -H "Authorization: Bearer $TOKEN"

Response: {"missing": ["amundi_initiative-impact", "pimco_total-return"]}

Step 2

Generate mapping via LLM

curl -X POST "$BASE/api/$DOC_TYPE/mappings/generate" \
  -H "Authorization: Bearer $TOKEN"

LLM reads the archived Markdown and generates a specific mapping template saved to _mapping_templates/.

Step 3

Verify all entities are mapped

curl "$BASE/api/$DOC_TYPE/mappings/status" \
  -H "Authorization: Bearer $TOKEN"
# → { "missing": [] }

03 Workflow 3 — Extraction & Query

Extract structured data from archived documents, then query and chat.

Step 1

Extract — single document

curl -X POST "$BASE/api/$DOC_TYPE/extract" \
  -H "Authorization: Bearer $TOKEN" \
  -G \
  --data-urlencode "path_level2=amundi_initiative-impact" \
  --data-urlencode "path_level3=2024q4" \
  --data-urlencode "schema_suffix=prtf"

Outputs: extract-singles-prtf.md and extract-tables-prtf.md alongside the PDF.

Step 1b

Extract — all entities for a quarter

curl -X POST "$BASE/api/$DOC_TYPE/extract/batch" \
  -H "Authorization: Bearer $TOKEN" \
  -G \
  --data-urlencode "path_level3=2024q4" \
  --data-urlencode "schema_suffix=prtf"

# Override output format (csv | md | kv-md)
curl -X POST "$BASE/api/$DOC_TYPE/extract/batch" \
  -H "Authorization: Bearer $TOKEN" \
  -G \
  --data-urlencode "path_level3=2024q4" \
  --data-urlencode "schema_suffix=prtf" \
  --data-urlencode "table_format=csv"
Step 2

Query — select documents

# Snapshot — one doc per entity at a single date
curl -X POST "$BASE/api/$DOC_TYPE/select/snapshot?date=2024q4" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{}'

# Series — all docs for one entity across a date range
curl -X POST "$BASE/api/$DOC_TYPE/select/series?start_date=2024q1&end_date=2024q4" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"config_params": {"asset-manager": "amundi"}}'

# Trend — paired T vs T-1 docs (entities present at both dates only)
curl -X POST "$BASE/api/$DOC_TYPE/select/trend?date=2024q4&previous_date=2024q3" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{}'

# Filtered snapshot by dimension
curl -X POST "$BASE/api/$DOC_TYPE/select/snapshot?date=2024q4" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"config_params": {"asset-manager": "amundi"}}'

Response includes matched_files — pass these to chat. Trend responses also include entity_count (number of matched pairs).

Step 3

Chat — grounded Q&A (single-pass)

Used when total token budget is below threshold (≤ 100 000 tokens). All documents are sent in one call.

# 1. Select documents
RESULT=$(curl -s -X POST "$BASE/api/$DOC_TYPE/select/snapshot?date=2024q4" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{}')

MATCHED=$(echo $RESULT | python3 -c \
  "import sys,json; print(json.dumps(json.load(sys.stdin)['matched_files']))")
TOKENS=$(echo $RESULT | python3 -c \
  "import sys,json; print(json.load(sys.stdin).get('total_tokens', 0))")
echo "Total tokens: $TOKENS"

# 2. Chat grounded in those documents
curl -X POST "$BASE/api/$DOC_TYPE/chat" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d "{\"matched_files\": $MATCHED, \"prompt\": \"What is the total AUM as of Q4 2024?\"}"
Step 3b

Transmuted Query — large corpus (parallel per-doc)

Used when total_tokens > 100 000. The question is rewritten as a single-doc question, executed in parallel across all selected documents, then reduced to a final answer. Three steps:

Step 3b-1 — Transmute the question

curl -X POST "$BASE/query/transmute" \
  -H "Content-Type: application/json" \
  -d '{
    "group": "reedcapital",
    "doc_type": "fin-report",
    "query_type": "snapshot",
    "user_query": "Which fund had the highest Sharpe ratio in Q4 2024?",
    "target_date": "2024q4"
  }'
# → {"single_doc_question": "What is the Sharpe ratio reported in this document? Return as a single numeric value.",
#    "return_type": "scalar", "reduce_operation": "max", "validated": false, ...}

Step 3b-2 — Confirm (set validated=true)

Review the single_doc_question in the response. In the UI this is a confirmation button; via API set validated: true in the execute payload.

Step 3b-3 — Execute (SSE stream)

curl -X POST "$BASE/query/execute" \
  -H "Content-Type: application/json" \
  -d "{
    \"group\": \"reedcapital\",
    \"doc_type\": \"fin-report\",
    \"matched_files\": $MATCHED,
    \"user_query\": \"Which fund had the highest Sharpe ratio in Q4 2024?\",
    \"doc_format\": \"minified_json\",
    \"metadata\": { \"query_type\": \"snapshot\", \"single_doc_question\": \"...\",
                    \"reduce_operation\": \"max\", \"validated\": true }
  }"
# Streams SSE events: query.routing → query.doc.done (×N) → query.reduce.start → query.complete
The UI handles this automatically — select documents, enable Transmuted, click Transmute, review the single-doc question, click Confirm & Execute.

Ref Mapping Fallback Logic

PriorityPathCondition
1 _mapping_templates/specific-mapping-{suffix}.json Always preferred if found
2 _extract-templates/extract-schema-{suffix}.yaml Generic workspace schema fallback
3 Skip — entity logged as error
Run Workflow 2 (mapping review) before a quarterly extraction to ensure all new entities have specific mappings.

AI Review of Text Fields UI feature

Any editable text field in the application supports AI-assisted review via right-click. This applies to query inputs, refinement boxes, template editors, and any contenteditable output area.

1 Right-click to review

  1. Click inside any editable text area (query input, refinement, template, output in Edit mode).
  2. Optionally type or paste your draft text.
  3. Right-click inside the field — a small context menu appears.
  4. Click Review by AI. The model rewrites the text to be clearer and more precise.
  5. If the result is unsatisfactory, right-click again and choose Undo AI Review to restore the original.
The AI reviewer sharpens phrasing and resolves ambiguity — it does not add or invent information. If the input is already clear, it is returned unchanged.

2 Adding instructions (optional)

To guide the rewrite, prepend a one-line instruction to your text before right-clicking:

[Instruction: make this more formal and concise]
Which fund had the highest sharpe ratio? Show a table sorted by fund name.

The reviewer strips the instruction prefix after applying it. You can also append instructions at the end:

compare bond allocation across all funds
// Note: emphasise YoY delta, not absolute values

3 Output editing and review

After a query runs, the output panel shows a ✎ Edit button. Click it to make the output editable, then right-click → Review by AI to improve the language or structure of the response — without re-fetching the source documents.

Use Edit → Review by AI on the output when you only want to rephrase the result. Use the Refinement box when you need a different answer (it re-reads the documents).

Click Cancel in the output panel to discard all edits and revert to the original response.