Chatbot SDK

Installation

Install the full pdf-autofillr SDK (v1.1.1) with all four modules — chatbot, doc_upload, mapper, and RAG — or install only the modules you need. A single pdf-autofillr setup command scaffolds your entire project structure, config files, and environment template.

Python 3.10+pip install "pdf-autofillr[all]"pdf-autofillr setupragpdf init-vectorsAll 4 moduleschatbot-cli / doc-upload-cliv1.1.1
What this page covers
Python version requirements before installing
pip install with extras — full or partial stack
Git clone — when you need to edit source code
pdf-autofillr setup — scaffolds all config files and folders
Configuring .env for all four modules
Configuring mapper_config.ini for the mapper module
ragpdf init-vectors — initialise the RAG vector database
pdf-autofillr status — verify every module is configured
Running chatbot-cli, doc-upload-cli, pdf-mapper, and ragpdf
Tips for safe and effective usage

1. Prerequisites

Before installing, confirm your environment meets the minimum requirements. The SDK requires Python 3.10 because it uses structural pattern matching and type union syntax introduced in that release. A virtual environment is strongly recommended to avoid dependency conflicts with other Python projects.

terminal
python --version
# Expected: Python 3.10.x or higher
# e.g. Python 3.12.0

pip --version
# Expected: pip 22.0 or higher
RequirementMinimum versionNotes
Python3.10Structural pattern matching required. 3.12 recommended.
pip22.0Ships with Python 3.10+. Run pip install --upgrade pip if older.
OpenAI APIAnyYou must have an active OpenAI account and API key (sk-…).
RedisOptionalRequired only when rate limiting with Redis backend.
DockerOptionalOnly if deploying via the provided Dockerfile.
Use a virtual environment

Run python -m venv .venv && source .venv/bin/activate (or .venv\Scripts\activate on Windows) before installing to keep the SDK's dependencies isolated from your system Python.

OpenAI key is always required

The SDK calls the OpenAI API for every message extraction. If OPENAI_API_KEY is not set, the server will raise an EnvironmentError on startup and refuse to start.


2. Option A - pip install (Recommended)

This is the recommended path for most users. The package is published to PyPI with named extras so you install only the modules you need. Use pdf-autofillr[all] to get everything, or choose a specific combination. All CLI entry points and server commands are registered automatically by pip and are available as soon as the install completes.

terminal
# Full stack — all four modules (recommended for new projects)
pip install "pdf-autofillr[all]"

# Or choose only what you need:
pip install "pdf-autofillr[chatbot]"          # chatbot + mapper
pip install "pdf-autofillr[doc-upload]"       # doc_upload + mapper
pip install "pdf-autofillr[chatbot,rag]"      # chatbot + mapper + RAG
pip install "pdf-autofillr[doc-upload,rag]"   # doc_upload + mapper + RAG

# Verify installed modules and configuration state
pdf-autofillr status
CommandModuleDescription
chatbot-clichatbotRun an interactive terminal chatbot session
chatbot-serverchatbotFastAPI REST API server — port 8000
doc-upload-clidoc_uploadExtract fields from an uploaded document
doc-upload-serverdoc_uploadFastAPI REST API server — port 8001
pdf-mappermapperCLI: extract, map, embed, fill a PDF
pdf-mapper-servermapperFastAPI REST API server — port 8002
ragpdfragCLI: system info, init vectors, metrics, feedback
ragpdf-serverragFastAPI REST API server — port 8003
pdf-autofillrumbrellasetup, status — project scaffolding and health checks
All runtime dependencies are bundled
Each extras group pulls in its own FastAPI, uvicorn, LiteLLM, and storage dependencies. You do not need to install separate requirements files when using PyPI.

3. Option B - Clone the Repository

Use this path when you need to customise conversation handlers, add new states, edit the extraction prompt, or modify any module's source code. After cloning, install requirements-full.txt to get all four modules and their dependencies.

terminal
git clone https://github.com/yourorg/pdf-autofillr.git
cd pdf-autofillr

# Install everything from source
pip install -r requirements-full.txt

# Add src/ to Python path so imports work
export PYTHONPATH=$(pwd)/src
Set PYTHONPATH when running from the repo
The SDK's source lives under src/. If you skip the export PYTHONPATH=$(pwd)/src step, Python will not be able to find any module and every import will fail with a ModuleNotFoundError.

4. Project Setup & Config Files

After installation, run pdf-autofillr setup once. It detects which modules are installed and automatically creates the complete folder tree, all required config files, and a .env.example template populated with every environment variable your installed modules need.

terminal
# Run once after any install — detects installed modules
# and creates all folders, configs, and .env.example
pdf-autofillr setup

# What it creates:
# .env.example                ← template with all env vars
# configs/
#   form_keys.json            ← field schema (edit for your PDF)
#   mandatory.json
#   field_questions.json
#   mapper_config.ini         ← mapper LLM, chunking, storage, RAG toggle
# data/
#   input/                    ← drop your blank PDF here
#   chatbot/
#   doc_upload/
#   mapper/
#   rag/                      ← pre-loaded with 137 real vectors

# Verify what's installed and configured
pdf-autofillr status
Generated configs/ directory
form_keys.jsonMaster field schema — all field names and value types. Edit to match your PDF.
mandatory.jsonRequired fields per investor type
field_questions.jsonHuman-readable prompts per field (optional)
mapper_config.iniMapper LLM model, chunking strategy, cloud storage, RAG toggle
.env.exampleAll environment variables with sensible defaults — copy to .env and fill in secrets
Edit form_keys.json to match your PDF
The sample config uses generic field names. Before going to production, open configs/form_keys.json and replace the field names with the actual field IDs from your blank PDF form.

5. Configure .env

Copy .env.example to .env and fill in your secrets. The file is split into sections — one per module. The only universally required variable is your LLM API key. Every other value has a sensible default that works for local development out of the box.

.env
# Step 1 - copy the template
cp .env.example .env

# Step 2 - minimum required for a working full-stack setup:

# ── LLM (pick one) ─────────────────────────────────────
OPENAI_API_KEY=sk-...
# ANTHROPIC_API_KEY=sk-ant-...

# ── Chatbot ────────────────────────────────────────────
CHATBOT_LLM_MODEL=openai/gpt-4o-mini
chatbot_PDF_FILLER=mapper
chatbot_PDF_PATH=./data/input/blank_form.pdf

# ── Doc Upload ─────────────────────────────────────────
DOC_UPLOAD_LLM_MODEL=openai/gpt-4.1-mini
DOC_UPLOAD_PDF_FILLER=mapper
DOC_UPLOAD_PDF_PATH=./data/input/blank_form.pdf

# ── Mapper (inprocess by default - no URL needed) ──────
# MAPPER_API_URL=        ← leave empty for inprocess mode

# ── RAG ────────────────────────────────────────────────
RAG_ENABLED=true
RAG_MODE=inprocess
RAGPDF_EMBEDDING_BACKEND=openai
RAGPDF_OPENAI_EMBEDDING_MODEL=text-embedding-3-small
RAGPDF_VECTOR_STORE=local
RAGPDF_DATA_PATH=./data/rag
RAGPDF_CORRECTOR_BACKEND=noop

# ── System ─────────────────────────────────────────────
LITELLM_LOG=ERROR
PYTHONUTF8=1
1
Set your LLM API key

Add OPENAI_API_KEY (or ANTHROPIC_API_KEY). This is the only key required in every deployment mode — all four modules use it for LLM calls.

2
Point chatbot and doc_upload at your PDF

Set chatbot_PDF_PATH and DOC_UPLOAD_PDF_PATH to the path of your blank PDF form (e.g. ./data/input/blank_form.pdf). Set chatbot_PDF_FILLER=mapper and DOC_UPLOAD_PDF_FILLER=mapper to enable filling.

3
Configure RAG (optional but recommended)

Set RAG_ENABLED=true and RAGPDF_EMBEDDING_BACKEND=openai. The RAG module ships with 137 pre-loaded vectors — you get predictive field filling from day one without any training data.

4
Add system settings to suppress noise

Set LITELLM_LOG=ERROR to suppress LiteLLM verbose logs. On Windows, also set PYTHONUTF8=1 to prevent encoding errors in PDF text extraction.

Mapper runs inprocess by default
Leave MAPPER_API_URL empty (or unset) and the mapper runs inside the same process — no separate server needed. Set it to http://localhost:8002 only when you want to run the mapper as a standalone HTTP service.

6. Configure mapper_config.ini

The configs/mapper_config.ini file controls the mapper module's LLM model, chunking strategy for dense PDF forms, cloud storage backend, and RAG integration. Review the key settings below — the defaults work for local development but you will want to tune them for production.

configs/mapper_config.ini
# configs/mapper_config.ini — key settings to review:

[general]
source_type = local          # change to aws / azure / gcp for cloud storage
pdf_cache_enabled = true     # skip re-processing if PDF hasn't changed

[mapping]
llm_model = gpt-4o           # your preferred LLM for field mapping
chunking_strategy = page     # page (default) | window (better for dense forms)

[rag]
enabled = true               # must also set RAG_ENABLED=true in .env
mode = inprocess             # inprocess | http
SettingOptionsWhen to change
source_typelocal | aws | azure | gcpChange to aws/azure/gcp for cloud PDF storage
pdf_cache_enabledtrue | falseKeep true — caches field extraction to skip re-processing unchanged PDFs
llm_modelany LiteLLM model stringUse a more powerful model for complex, multi-column forms
chunking_strategypage | windowSwitch to window for dense forms where fields span page boundaries
rag.enabledtrue | falseMust match RAG_ENABLED in .env
rag.modeinprocess | httpUse http only when running ragpdf-server separately
Windows PowerShell users
On Windows, use PowerShell's Copy-Item to copy the generated config if needed: Copy-Item configs\mapper_config.ini config.ini. On macOS/Linux use cp configs/mapper_config.ini config.ini.

7. Initialize RAG Vectors (if RAG enabled)

If you set RAG_ENABLED=true in your .env, run ragpdf init-vectors once before your first fill. This loads the 137 pre-built investor field vectors into the local vector database. You only need to repeat this if you want to switch embedding backends or force a full rebuild.

ragpdf init-vectorsLoad bundled vectors into the local database (first time setup)
ragpdf init-vectors --source data/rag/vectors/source/vector_source.json --forceForce rebuild from a custom source file (advanced)
ragpdf init-vectors --backend sentence_transformer --forceRebuild using a different embedding backend (changes RAGPDF_EMBEDDING_BACKEND)
Run again after adding new documents to the source
If you edit data/rag/vectors/source/vector_source.json to add new field definitions, re-run ragpdf init-vectors --force to rebuild the database with the updated source. The --force flag is required to overwrite an existing database.

8. Verify Setup

Before running any module, confirm every installed module is correctly configured. pdf-autofillr status checks all required environment variables and reports any that are missing. ragpdf system-info shows the vector database state and embedding configuration.

pdf-autofillr statusShows each installed module with a ✓ configured or ✗ misconfigured status
ragpdf system-infoShows RAG vector database status, embedding backend, and vector count
Fix all misconfigured modules before running
If pdf-autofillr status shows any module as misconfigured, check the reported missing variable in your .env and restart. Running a misconfigured module will raise an EnvironmentError on startup.

9. Drop Your PDF and Run

Copy your blank PDF form to data/input/blank_form.pdf, then use the CLI commands below to run any module. Each module can also run as an independent HTTP server — see the Advanced page for the full server setup.

terminal
# Drop your blank PDF into data/input/
# data/input/blank_form.pdf

# Run chatbot (interactive session)
chatbot-cli --pdf-path data/input/blank_form.pdf --report

# Run doc upload (extract from a document)
doc-upload-cli --document data/input/Avery.pdf --schema configs/form_keys.json --report

# Run mapper directly (extract + map + embed + fill)
pdf-mapper run-all --pdf data/input/blank_form.pdf --data collected_data.json

# Check RAG system info
ragpdf system-info

# View global prediction metrics
ragpdf metrics --type global

What each CLI command does

chatbot-cli --pdf-path ... --reportRuns interactive chatbot session and prints a fill report at the end
doc-upload-cli --document ... --schema ...Extracts fields from a document and maps them to your form schema
pdf-mapper run-all --pdf ... --data ...Runs the full mapper pipeline: extract → map → embed → fill
ragpdf system-infoShows RAG vector database status and embedding configuration
Interactive Swagger UI available for every server
Each module's server exposes a Swagger UI at /docs. Visit http://localhost:8000/docs (chatbot), http://localhost:8001/docs (doc_upload), http://localhost:8002/docs (mapper), or http://localhost:8003/docs (RAG) to try every endpoint interactively without writing any code.

10. Tips

Match paths in .env to your actual folder structure

If you move data/input/ or rename your blank PDF, update chatbot_PDF_PATH, DOC_UPLOAD_PDF_PATH, and RAGPDF_DATA_PATH in .env accordingly. Mismatched paths are the most common cause of startup errors.

Re-run ragpdf init-vectors after adding new vector sources

If you add new field definitions to data/rag/vectors/source/vector_source.json, run ragpdf init-vectors --force to rebuild the database. The --force flag is required to overwrite the existing database.

Use --report for debugging and fill insights

Both chatbot-cli and doc-upload-cli support a --report flag that prints a detailed fill summary at the end of the session — showing which fields were filled, which were skipped, and the overall fill percentage.

Keep your API key secure

Never commit .env to version control. Add .env to your .gitignore. For production deployments, inject API keys through environment variables from your secret manager (AWS Secrets Manager, Azure Key Vault, etc.) rather than from a file.

Run pdf-autofillr status after any configuration change

Any time you change .env or mapper_config.ini, re-run pdf-autofillr status to confirm all modules are still correctly configured before restarting the server.


Next steps

Was this page helpful?
PDFFILLR.AI logo

PDFFILLR.AI

The intelligent layer for modern fund
administration. Automating high-stakes
documentation with precision and speed.

Powered byEngineersMind