Docs

Reference

Release Notes

Chatbot SDK

Installation

Install the full pdf-autofillr SDK (v1.1.1) with all four modules — chatbot, doc_upload, mapper, and RAG — or install only the modules you need. A single pdf-autofillr setup command scaffolds your entire project structure, config files, and environment template.

Python 3.10+pip install "pdf-autofillr[all]"pdf-autofillr setupragpdf init-vectorsAll 4 moduleschatbot-cli / doc-upload-cliv1.1.1

What this page covers

Python version requirements before installing

pip install with extras — full or partial stack

Git clone — when you need to edit source code

pdf-autofillr setup — scaffolds all config files and folders

Configuring .env for all four modules

Configuring mapper_config.ini for the mapper module

ragpdf init-vectors — initialise the RAG vector database

pdf-autofillr status — verify every module is configured

Running chatbot-cli, doc-upload-cli, pdf-mapper, and ragpdf

Tips for safe and effective usage

1. Prerequisites

Before installing, confirm your environment meets the minimum requirements. The SDK requires Python 3.10 because it uses structural pattern matching and type union syntax introduced in that release. A virtual environment is strongly recommended to avoid dependency conflicts with other Python projects.

terminal

python --version
# Expected: Python 3.10.x or higher
# e.g. Python 3.12.0

pip --version
# Expected: pip 22.0 or higher

Requirement	Minimum version	Notes
Python	3.10	Structural pattern matching required. 3.12 recommended.
pip	22.0	Ships with Python 3.10+. Run pip install --upgrade pip if older.
OpenAI API	Any	You must have an active OpenAI account and API key (sk-…).
Redis	Optional	Required only when rate limiting with Redis backend.
Docker	Optional	Only if deploying via the provided Dockerfile.

Use a virtual environment

Run python -m venv .venv && source .venv/bin/activate (or .venv\Scripts\activate on Windows) before installing to keep the SDK's dependencies isolated from your system Python.

OpenAI key is always required

The SDK calls the OpenAI API for every message extraction. If OPENAI_API_KEY is not set, the server will raise an EnvironmentError on startup and refuse to start.

2. Option A - pip install (Recommended)

This is the recommended path for most users. The package is published to PyPI with named extras so you install only the modules you need. Use pdf-autofillr[all] to get everything, or choose a specific combination. All CLI entry points and server commands are registered automatically by pip and are available as soon as the install completes.

terminal

# Full stack — all four modules (recommended for new projects)
pip install "pdf-autofillr[all]"

# Or choose only what you need:
pip install "pdf-autofillr[chatbot]"          # chatbot + mapper
pip install "pdf-autofillr[doc-upload]"       # doc_upload + mapper
pip install "pdf-autofillr[chatbot,rag]"      # chatbot + mapper + RAG
pip install "pdf-autofillr[doc-upload,rag]"   # doc_upload + mapper + RAG

# Verify installed modules and configuration state
pdf-autofillr status

Command	Module	Description
chatbot-cli	chatbot	Run an interactive terminal chatbot session
chatbot-server	chatbot	FastAPI REST API server — port 8000
doc-upload-cli	doc_upload	Extract fields from an uploaded document
doc-upload-server	doc_upload	FastAPI REST API server — port 8001
pdf-mapper	mapper	CLI: extract, map, embed, fill a PDF
pdf-mapper-server	mapper	FastAPI REST API server — port 8002
ragpdf	rag	CLI: system info, init vectors, metrics, feedback
ragpdf-server	rag	FastAPI REST API server — port 8003
pdf-autofillr	umbrella	setup, status — project scaffolding and health checks

All runtime dependencies are bundled

Each extras group pulls in its own FastAPI, uvicorn, LiteLLM, and storage dependencies. You do not need to install separate requirements files when using PyPI.

3. Option B - Clone the Repository

Use this path when you need to customise conversation handlers, add new states, edit the extraction prompt, or modify any module's source code. After cloning, install requirements-full.txt to get all four modules and their dependencies.

terminal

git clone https://github.com/yourorg/pdf-autofillr.git
cd pdf-autofillr

# Install everything from source
pip install -r requirements-full.txt

# Add src/ to Python path so imports work
export PYTHONPATH=$(pwd)/src

Set PYTHONPATH when running from the repo

The SDK's source lives under src/. If you skip the export PYTHONPATH=$(pwd)/src step, Python will not be able to find any module and every import will fail with a ModuleNotFoundError.

4. Project Setup & Config Files

After installation, run pdf-autofillr setup once. It detects which modules are installed and automatically creates the complete folder tree, all required config files, and a .env.example template populated with every environment variable your installed modules need.

terminal

# Run once after any install — detects installed modules
# and creates all folders, configs, and .env.example
pdf-autofillr setup

# What it creates:
# .env.example                ← template with all env vars
# configs/
#   form_keys.json            ← field schema (edit for your PDF)
#   mandatory.json
#   field_questions.json
#   mapper_config.ini         ← mapper LLM, chunking, storage, RAG toggle
# data/
#   input/                    ← drop your blank PDF here
#   chatbot/
#   doc_upload/
#   mapper/
#   rag/                      ← pre-loaded with 137 real vectors

# Verify what's installed and configured
pdf-autofillr status

Generated configs/ directory

form_keys.jsonMaster field schema — all field names and value types. Edit to match your PDF.

mandatory.jsonRequired fields per investor type

field_questions.jsonHuman-readable prompts per field (optional)

mapper_config.iniMapper LLM model, chunking strategy, cloud storage, RAG toggle

.env.exampleAll environment variables with sensible defaults — copy to .env and fill in secrets

Edit form_keys.json to match your PDF

The sample config uses generic field names. Before going to production, open configs/form_keys.json and replace the field names with the actual field IDs from your blank PDF form.

5. Configure .env

Copy .env.example to .env and fill in your secrets. The file is split into sections — one per module. The only universally required variable is your LLM API key. Every other value has a sensible default that works for local development out of the box.

.env

# Step 1 - copy the template
cp .env.example .env

# Step 2 - minimum required for a working full-stack setup:

# ── LLM (pick one) ─────────────────────────────────────
OPENAI_API_KEY=sk-...
# ANTHROPIC_API_KEY=sk-ant-...

# ── Chatbot ────────────────────────────────────────────
CHATBOT_LLM_MODEL=openai/gpt-4o-mini
chatbot_PDF_FILLER=mapper
chatbot_PDF_PATH=./data/input/blank_form.pdf

# ── Doc Upload ─────────────────────────────────────────
DOC_UPLOAD_LLM_MODEL=openai/gpt-4.1-mini
DOC_UPLOAD_PDF_FILLER=mapper
DOC_UPLOAD_PDF_PATH=./data/input/blank_form.pdf

# ── Mapper (inprocess by default - no URL needed) ──────
# MAPPER_API_URL=        ← leave empty for inprocess mode

# ── RAG ────────────────────────────────────────────────
RAG_ENABLED=true
RAG_MODE=inprocess
RAGPDF_EMBEDDING_BACKEND=openai
RAGPDF_OPENAI_EMBEDDING_MODEL=text-embedding-3-small
RAGPDF_VECTOR_STORE=local
RAGPDF_DATA_PATH=./data/rag
RAGPDF_CORRECTOR_BACKEND=noop

# ── System ─────────────────────────────────────────────
LITELLM_LOG=ERROR
PYTHONUTF8=1

Set your LLM API key

Add OPENAI_API_KEY (or ANTHROPIC_API_KEY). This is the only key required in every deployment mode — all four modules use it for LLM calls.

Point chatbot and doc_upload at your PDF

Set chatbot_PDF_PATH and DOC_UPLOAD_PDF_PATH to the path of your blank PDF form (e.g. ./data/input/blank_form.pdf). Set chatbot_PDF_FILLER=mapper and DOC_UPLOAD_PDF_FILLER=mapper to enable filling.

Configure RAG (optional but recommended)

Set RAG_ENABLED=true and RAGPDF_EMBEDDING_BACKEND=openai. The RAG module ships with 137 pre-loaded vectors — you get predictive field filling from day one without any training data.

Add system settings to suppress noise

Set LITELLM_LOG=ERROR to suppress LiteLLM verbose logs. On Windows, also set PYTHONUTF8=1 to prevent encoding errors in PDF text extraction.

Mapper runs inprocess by default

Leave MAPPER_API_URL empty (or unset) and the mapper runs inside the same process — no separate server needed. Set it to http://localhost:8002 only when you want to run the mapper as a standalone HTTP service.

6. Configure mapper_config.ini

The configs/mapper_config.ini file controls the mapper module's LLM model, chunking strategy for dense PDF forms, cloud storage backend, and RAG integration. Review the key settings below — the defaults work for local development but you will want to tune them for production.

configs/mapper_config.ini

# configs/mapper_config.ini — key settings to review:

[general]
source_type = local          # change to aws / azure / gcp for cloud storage
pdf_cache_enabled = true     # skip re-processing if PDF hasn't changed

[mapping]
llm_model = gpt-4o           # your preferred LLM for field mapping
chunking_strategy = page     # page (default) | window (better for dense forms)

[rag]
enabled = true               # must also set RAG_ENABLED=true in .env
mode = inprocess             # inprocess | http

Setting	Options	When to change
source_type	local \| aws \| azure \| gcp	Change to aws/azure/gcp for cloud PDF storage
pdf_cache_enabled	true \| false	Keep true — caches field extraction to skip re-processing unchanged PDFs
llm_model	any LiteLLM model string	Use a more powerful model for complex, multi-column forms
chunking_strategy	page \| window	Switch to window for dense forms where fields span page boundaries
rag.enabled	true \| false	Must match RAG_ENABLED in .env
rag.mode	inprocess \| http	Use http only when running ragpdf-server separately

Windows PowerShell users

On Windows, use PowerShell's Copy-Item to copy the generated config if needed: Copy-Item configs\mapper_config.ini config.ini. On macOS/Linux use cp configs/mapper_config.ini config.ini.

7. Initialize RAG Vectors (if RAG enabled)

If you set RAG_ENABLED=true in your .env, run ragpdf init-vectors once before your first fill. This loads the 137 pre-built investor field vectors into the local vector database. You only need to repeat this if you want to switch embedding backends or force a full rebuild.

ragpdf init-vectorsLoad bundled vectors into the local database (first time setup)

ragpdf init-vectors --source data/rag/vectors/source/vector_source.json --forceForce rebuild from a custom source file (advanced)

ragpdf init-vectors --backend sentence_transformer --forceRebuild using a different embedding backend (changes RAGPDF_EMBEDDING_BACKEND)

Run again after adding new documents to the source

If you edit data/rag/vectors/source/vector_source.json to add new field definitions, re-run ragpdf init-vectors --force to rebuild the database with the updated source. The --force flag is required to overwrite an existing database.

8. Verify Setup

Before running any module, confirm every installed module is correctly configured. pdf-autofillr status checks all required environment variables and reports any that are missing. ragpdf system-info shows the vector database state and embedding configuration.

pdf-autofillr statusShows each installed module with a ✓ configured or ✗ misconfigured status

ragpdf system-infoShows RAG vector database status, embedding backend, and vector count

Fix all misconfigured modules before running

If pdf-autofillr status shows any module as misconfigured, check the reported missing variable in your .env and restart. Running a misconfigured module will raise an EnvironmentError on startup.

9. Drop Your PDF and Run

Copy your blank PDF form to data/input/blank_form.pdf, then use the CLI commands below to run any module. Each module can also run as an independent HTTP server — see the Advanced page for the full server setup.

terminal

# Drop your blank PDF into data/input/
# data/input/blank_form.pdf

# Run chatbot (interactive session)
chatbot-cli --pdf-path data/input/blank_form.pdf --report

# Run doc upload (extract from a document)
doc-upload-cli --document data/input/Avery.pdf --schema configs/form_keys.json --report

# Run mapper directly (extract + map + embed + fill)
pdf-mapper run-all --pdf data/input/blank_form.pdf --data collected_data.json

# Check RAG system info
ragpdf system-info

# View global prediction metrics
ragpdf metrics --type global

What each CLI command does

chatbot-cli --pdf-path ... --reportRuns interactive chatbot session and prints a fill report at the end

doc-upload-cli --document ... --schema ...Extracts fields from a document and maps them to your form schema

pdf-mapper run-all --pdf ... --data ...Runs the full mapper pipeline: extract → map → embed → fill

ragpdf system-infoShows RAG vector database status and embedding configuration

Interactive Swagger UI available for every server

Each module's server exposes a Swagger UI at /docs. Visit http://localhost:8000/docs (chatbot), http://localhost:8001/docs (doc_upload), http://localhost:8002/docs (mapper), or http://localhost:8003/docs (RAG) to try every endpoint interactively without writing any code.

10. Tips

Match paths in .env to your actual folder structure

If you move data/input/ or rename your blank PDF, update chatbot_PDF_PATH, DOC_UPLOAD_PDF_PATH, and RAGPDF_DATA_PATH in .env accordingly. Mismatched paths are the most common cause of startup errors.

Re-run ragpdf init-vectors after adding new vector sources

If you add new field definitions to data/rag/vectors/source/vector_source.json, run ragpdf init-vectors --force to rebuild the database. The --force flag is required to overwrite the existing database.

Use --report for debugging and fill insights

Both chatbot-cli and doc-upload-cli support a --report flag that prints a detailed fill summary at the end of the session — showing which fields were filled, which were skipped, and the overall fill percentage.

Keep your API key secure

Never commit .env to version control. Add .env to your .gitignore. For production deployments, inject API keys through environment variables from your secret manager (AWS Secrets Manager, Azure Key Vault, etc.) rather than from a file.

Run pdf-autofillr status after any configuration change

Any time you change .env or mapper_config.ini, re-run pdf-autofillr status to confirm all modules are still correctly configured before restarting the server.

Next steps

ConfigurationFull reference for config.ini and .env - every option and its default.Read more REST APIAll API endpoints, request/response schemas, and usage examples.Read more Python LibraryFull Python library reference - classes, methods, and storage backends.Read more

AI SDK Overview Advanced

Was this page helpful?

PDFFILLR.AI

The intelligent layer for modern fund
administration. Automating high-stakes
documentation with precision and speed.

Powered byEngineersMind

Legal & Privacy

Terms & Conditions

User Agreement

Service Level Agreement

Data Processing Addendum

Resources

API Documentation

Security Overview