Installation
Install the full pdf-autofillr SDK (v1.1.1) with all four modules — chatbot, doc_upload, mapper, and RAG — or install only the modules you need. A single pdf-autofillr setup command scaffolds your entire project structure, config files, and environment template.
1. Prerequisites
Before installing, confirm your environment meets the minimum requirements. The SDK requires Python 3.10 because it uses structural pattern matching and type union syntax introduced in that release. A virtual environment is strongly recommended to avoid dependency conflicts with other Python projects.
python --version
# Expected: Python 3.10.x or higher
# e.g. Python 3.12.0
pip --version
# Expected: pip 22.0 or higher| Requirement | Minimum version | Notes |
|---|---|---|
| Python | 3.10 | Structural pattern matching required. 3.12 recommended. |
| pip | 22.0 | Ships with Python 3.10+. Run pip install --upgrade pip if older. |
| OpenAI API | Any | You must have an active OpenAI account and API key (sk-…). |
| Redis | Optional | Required only when rate limiting with Redis backend. |
| Docker | Optional | Only if deploying via the provided Dockerfile. |
Run python -m venv .venv && source .venv/bin/activate (or .venv\Scripts\activate on Windows) before installing to keep the SDK's dependencies isolated from your system Python.
The SDK calls the OpenAI API for every message extraction. If OPENAI_API_KEY is not set, the server will raise an EnvironmentError on startup and refuse to start.
2. Option A - pip install (Recommended)
This is the recommended path for most users. The package is published to PyPI with named extras so you install only the modules you need. Use pdf-autofillr[all] to get everything, or choose a specific combination. All CLI entry points and server commands are registered automatically by pip and are available as soon as the install completes.
# Full stack — all four modules (recommended for new projects)
pip install "pdf-autofillr[all]"
# Or choose only what you need:
pip install "pdf-autofillr[chatbot]" # chatbot + mapper
pip install "pdf-autofillr[doc-upload]" # doc_upload + mapper
pip install "pdf-autofillr[chatbot,rag]" # chatbot + mapper + RAG
pip install "pdf-autofillr[doc-upload,rag]" # doc_upload + mapper + RAG
# Verify installed modules and configuration state
pdf-autofillr status| Command | Module | Description |
|---|---|---|
| chatbot-cli | chatbot | Run an interactive terminal chatbot session |
| chatbot-server | chatbot | FastAPI REST API server — port 8000 |
| doc-upload-cli | doc_upload | Extract fields from an uploaded document |
| doc-upload-server | doc_upload | FastAPI REST API server — port 8001 |
| pdf-mapper | mapper | CLI: extract, map, embed, fill a PDF |
| pdf-mapper-server | mapper | FastAPI REST API server — port 8002 |
| ragpdf | rag | CLI: system info, init vectors, metrics, feedback |
| ragpdf-server | rag | FastAPI REST API server — port 8003 |
| pdf-autofillr | umbrella | setup, status — project scaffolding and health checks |
3. Option B - Clone the Repository
Use this path when you need to customise conversation handlers, add new states, edit the extraction prompt, or modify any module's source code. After cloning, install requirements-full.txt to get all four modules and their dependencies.
git clone https://github.com/yourorg/pdf-autofillr.git
cd pdf-autofillr
# Install everything from source
pip install -r requirements-full.txt
# Add src/ to Python path so imports work
export PYTHONPATH=$(pwd)/srcsrc/. If you skip the export PYTHONPATH=$(pwd)/src step, Python will not be able to find any module and every import will fail with a ModuleNotFoundError.4. Project Setup & Config Files
After installation, run pdf-autofillr setup once. It detects which modules are installed and automatically creates the complete folder tree, all required config files, and a .env.example template populated with every environment variable your installed modules need.
# Run once after any install — detects installed modules
# and creates all folders, configs, and .env.example
pdf-autofillr setup
# What it creates:
# .env.example ← template with all env vars
# configs/
# form_keys.json ← field schema (edit for your PDF)
# mandatory.json
# field_questions.json
# mapper_config.ini ← mapper LLM, chunking, storage, RAG toggle
# data/
# input/ ← drop your blank PDF here
# chatbot/
# doc_upload/
# mapper/
# rag/ ← pre-loaded with 137 real vectors
# Verify what's installed and configured
pdf-autofillr statusform_keys.jsonMaster field schema — all field names and value types. Edit to match your PDF.mandatory.jsonRequired fields per investor typefield_questions.jsonHuman-readable prompts per field (optional)mapper_config.iniMapper LLM model, chunking strategy, cloud storage, RAG toggle.env.exampleAll environment variables with sensible defaults — copy to .env and fill in secretsconfigs/form_keys.json and replace the field names with the actual field IDs from your blank PDF form.5. Configure .env
Copy .env.example to .env and fill in your secrets. The file is split into sections — one per module. The only universally required variable is your LLM API key. Every other value has a sensible default that works for local development out of the box.
# Step 1 - copy the template
cp .env.example .env
# Step 2 - minimum required for a working full-stack setup:
# ── LLM (pick one) ─────────────────────────────────────
OPENAI_API_KEY=sk-...
# ANTHROPIC_API_KEY=sk-ant-...
# ── Chatbot ────────────────────────────────────────────
CHATBOT_LLM_MODEL=openai/gpt-4o-mini
chatbot_PDF_FILLER=mapper
chatbot_PDF_PATH=./data/input/blank_form.pdf
# ── Doc Upload ─────────────────────────────────────────
DOC_UPLOAD_LLM_MODEL=openai/gpt-4.1-mini
DOC_UPLOAD_PDF_FILLER=mapper
DOC_UPLOAD_PDF_PATH=./data/input/blank_form.pdf
# ── Mapper (inprocess by default - no URL needed) ──────
# MAPPER_API_URL= ← leave empty for inprocess mode
# ── RAG ────────────────────────────────────────────────
RAG_ENABLED=true
RAG_MODE=inprocess
RAGPDF_EMBEDDING_BACKEND=openai
RAGPDF_OPENAI_EMBEDDING_MODEL=text-embedding-3-small
RAGPDF_VECTOR_STORE=local
RAGPDF_DATA_PATH=./data/rag
RAGPDF_CORRECTOR_BACKEND=noop
# ── System ─────────────────────────────────────────────
LITELLM_LOG=ERROR
PYTHONUTF8=1Add OPENAI_API_KEY (or ANTHROPIC_API_KEY). This is the only key required in every deployment mode — all four modules use it for LLM calls.
Set chatbot_PDF_PATH and DOC_UPLOAD_PDF_PATH to the path of your blank PDF form (e.g. ./data/input/blank_form.pdf). Set chatbot_PDF_FILLER=mapper and DOC_UPLOAD_PDF_FILLER=mapper to enable filling.
Set RAG_ENABLED=true and RAGPDF_EMBEDDING_BACKEND=openai. The RAG module ships with 137 pre-loaded vectors — you get predictive field filling from day one without any training data.
Set LITELLM_LOG=ERROR to suppress LiteLLM verbose logs. On Windows, also set PYTHONUTF8=1 to prevent encoding errors in PDF text extraction.
MAPPER_API_URL empty (or unset) and the mapper runs inside the same process — no separate server needed. Set it to http://localhost:8002 only when you want to run the mapper as a standalone HTTP service.6. Configure mapper_config.ini
The configs/mapper_config.ini file controls the mapper module's LLM model, chunking strategy for dense PDF forms, cloud storage backend, and RAG integration. Review the key settings below — the defaults work for local development but you will want to tune them for production.
# configs/mapper_config.ini — key settings to review:
[general]
source_type = local # change to aws / azure / gcp for cloud storage
pdf_cache_enabled = true # skip re-processing if PDF hasn't changed
[mapping]
llm_model = gpt-4o # your preferred LLM for field mapping
chunking_strategy = page # page (default) | window (better for dense forms)
[rag]
enabled = true # must also set RAG_ENABLED=true in .env
mode = inprocess # inprocess | http| Setting | Options | When to change |
|---|---|---|
| source_type | local | aws | azure | gcp | Change to aws/azure/gcp for cloud PDF storage |
| pdf_cache_enabled | true | false | Keep true — caches field extraction to skip re-processing unchanged PDFs |
| llm_model | any LiteLLM model string | Use a more powerful model for complex, multi-column forms |
| chunking_strategy | page | window | Switch to window for dense forms where fields span page boundaries |
| rag.enabled | true | false | Must match RAG_ENABLED in .env |
| rag.mode | inprocess | http | Use http only when running ragpdf-server separately |
Copy-Item to copy the generated config if needed: Copy-Item configs\mapper_config.ini config.ini. On macOS/Linux use cp configs/mapper_config.ini config.ini.7. Initialize RAG Vectors (if RAG enabled)
If you set RAG_ENABLED=true in your .env, run ragpdf init-vectors once before your first fill. This loads the 137 pre-built investor field vectors into the local vector database. You only need to repeat this if you want to switch embedding backends or force a full rebuild.
ragpdf init-vectorsLoad bundled vectors into the local database (first time setup)ragpdf init-vectors --source data/rag/vectors/source/vector_source.json --forceForce rebuild from a custom source file (advanced)ragpdf init-vectors --backend sentence_transformer --forceRebuild using a different embedding backend (changes RAGPDF_EMBEDDING_BACKEND)data/rag/vectors/source/vector_source.json to add new field definitions, re-run ragpdf init-vectors --force to rebuild the database with the updated source. The --force flag is required to overwrite an existing database.8. Verify Setup
Before running any module, confirm every installed module is correctly configured. pdf-autofillr status checks all required environment variables and reports any that are missing. ragpdf system-info shows the vector database state and embedding configuration.
pdf-autofillr statusShows each installed module with a ✓ configured or ✗ misconfigured statusragpdf system-infoShows RAG vector database status, embedding backend, and vector countpdf-autofillr status shows any module as misconfigured, check the reported missing variable in your .env and restart. Running a misconfigured module will raise an EnvironmentError on startup.9. Drop Your PDF and Run
Copy your blank PDF form to data/input/blank_form.pdf, then use the CLI commands below to run any module. Each module can also run as an independent HTTP server — see the Advanced page for the full server setup.
# Drop your blank PDF into data/input/
# data/input/blank_form.pdf
# Run chatbot (interactive session)
chatbot-cli --pdf-path data/input/blank_form.pdf --report
# Run doc upload (extract from a document)
doc-upload-cli --document data/input/Avery.pdf --schema configs/form_keys.json --report
# Run mapper directly (extract + map + embed + fill)
pdf-mapper run-all --pdf data/input/blank_form.pdf --data collected_data.json
# Check RAG system info
ragpdf system-info
# View global prediction metrics
ragpdf metrics --type globalWhat each CLI command does
chatbot-cli --pdf-path ... --reportRuns interactive chatbot session and prints a fill report at the enddoc-upload-cli --document ... --schema ...Extracts fields from a document and maps them to your form schemapdf-mapper run-all --pdf ... --data ...Runs the full mapper pipeline: extract → map → embed → fillragpdf system-infoShows RAG vector database status and embedding configuration/docs. Visit http://localhost:8000/docs (chatbot), http://localhost:8001/docs (doc_upload), http://localhost:8002/docs (mapper), or http://localhost:8003/docs (RAG) to try every endpoint interactively without writing any code.10. Tips
If you move data/input/ or rename your blank PDF, update chatbot_PDF_PATH, DOC_UPLOAD_PDF_PATH, and RAGPDF_DATA_PATH in .env accordingly. Mismatched paths are the most common cause of startup errors.
If you add new field definitions to data/rag/vectors/source/vector_source.json, run ragpdf init-vectors --force to rebuild the database. The --force flag is required to overwrite the existing database.
Both chatbot-cli and doc-upload-cli support a --report flag that prints a detailed fill summary at the end of the session — showing which fields were filled, which were skipped, and the overall fill percentage.
Never commit .env to version control. Add .env to your .gitignore. For production deployments, inject API keys through environment variables from your secret manager (AWS Secrets Manager, Azure Key Vault, etc.) rather than from a file.
Any time you change .env or mapper_config.ini, re-run pdf-autofillr status to confirm all modules are still correctly configured before restarting the server.
Next steps
PDFFILLR.AI
The intelligent layer for modern fund
administration. Automating high-stakes
documentation with precision and speed.