pdf-autofillr SDK
A full-stack Python SDK for intelligent, automated PDF form filling. Install once, configure once, and fill PDF forms from two directions — through a natural-language chatbot conversation or by uploading an existing document. Four cooperative modules handle every step: data collection, document parsing, field mapping, and predictive improvement with RAG.
1. What This SDK Does
pdf-autofillr automates the full lifecycle of PDF form filling — from collecting investor data to delivering a completed document. It gives you two input paths (conversational chatbot or document upload), a shared PDF filling engine (mapper), and an optional RAG layer that makes field predictions smarter after every fill.
The SDK is not a generic AI framework. It is purpose-built for regulated financial onboarding: it understands ten investor types, handles address copy shortcuts, manages three-state boolean checkbox groups, filters Form PF fields for US investors, and tracks mandatory vs optional fields per investor type — all configured through JSON files without touching Python.
2. The Four Modules
Each module is independent and exposes its own CLI and REST API server. Install any combination using pip extras. All four share the same config folder, data folder, and cloud storage backend.
chatbotConversational LLM chat — collects data — fills PDFchatbot-cli8000doc_uploadUpload any document — extract fields — fill PDFdoc-upload-cli8001mapperRead blank PDF — map fields — embed values — write PDFpdf-mapper8002ragVector predictions — learn from every fillragpdf8003pdf-mapper-server or ragpdf-server separately. By default both run inside the calling module's process. Start them as standalone servers only when you need to scale them independently or share them across multiple services.3. Quick Start — 7-Step Flow
From a fresh Python environment to a running, PDF-filling system in seven steps. Follow these in order the first time — they only take a few minutes.
# Step 1 — Install (full stack or pick what you need)
pip install "pdf-autofillr[all]"
# Step 2 — Scaffold project: configs/, data/, .env.example
pdf-autofillr setup
# Step 3 — Copy and fill in your secrets
cp .env.example .env
# Edit .env — minimum: OPENAI_API_KEY=sk-...
# Step 4 — Drop your blank PDF form
# data/input/blank_form.pdf
# Step 5 — Initialize RAG (if RAG_ENABLED=true)
ragpdf init-vectors
# Step 6 — Verify all modules are correctly configured
pdf-autofillr status
ragpdf system-info
# Step 7 — Run chatbot or doc-upload
chatbot-cli --pdf-path data/input/blank_form.pdf --report
doc-upload-cli --document data/input/Avery.pdf --schema configs/form_keys.json --reportUse pip install "pdf-autofillr[all]" for the full stack, or pick a specific extras combination for just the modules you need.
pdf-autofillr setup detects installed modules and creates configs/, data/, and .env.example — a fully scaffolded project in one command.
Copy .env.example to .env and fill in your secrets. The minimum required variable is OPENAI_API_KEY. Set chatbot_PDF_PATH and DOC_UPLOAD_PDF_PATH to enable PDF filling.
Drop your blank PDF form at data/input/blank_form.pdf. This is the template the mapper fills. Make sure the path matches what you set in .env.
If RAG_ENABLED=true, run ragpdf init-vectors once. This loads the 137 bundled vectors into the local database. Re-run with --force if you update the vector source file.
Run pdf-autofillr status to confirm all modules are correctly configured. Run ragpdf system-info to check the vector database state. Fix any reported issues before running.
Use chatbot-cli for a conversational session. Use doc-upload-cli to extract fields from an existing document. Add --report to both commands for a detailed fill summary.
4. Which Install Is Right for You?
Use pip extras to install only the modules you need. Every combination includes the mapper module automatically — it is the shared engine used by both chatbot and doc_upload for PDF filling.
pip install "pdf-autofillr[all]"chatbot + doc_upload + mapper + RAGpip install "pdf-autofillr[chatbot]"chatbot + mapperpip install "pdf-autofillr[doc-upload]"doc_upload + mapperpip install "pdf-autofillr[chatbot,rag]"chatbot + mapper + RAGpip install "pdf-autofillr[doc-upload,rag]"doc_upload + mapper + RAGClone repo + pip install -r requirements-full.txtAll — full source editablepdf-autofillr setup once. It detects installed modules, creates the full folder tree, writes configs/form_keys.json, configs/mapper_config.ini, and .env.example. Then pdf-autofillr status confirms every module is ready.C:\pdf-fillr on Windows or ~/pdf-fillr on macOS/Linux) and re-run pdf-autofillr setup. Avoid spaces and special characters in the path as well.5. The Chatbot Module — 13-State Conversation
The chatbot module drives a finite state machine. Each state is a distinct phase of the investor onboarding flow. The machine moves forward deterministically — it never asks for information it already has, retries gracefully when extraction fails, and routes differently based on investor type.
State.GREETINGBot introduces itself and asks the investor if they are ready to begin the onboarding process.
State.INVESTOR_TYPEIdentifies the investor type — Individual, LLC, Trust, Partnership, Corporation, and seven others. All subsequent field routing depends on this.
State.MANDATORY_FIELDSGPT-4o-mini extracts all mandatory KYC fields for the selected investor type. Retries gracefully when a turn yields only partial data.
State.ADDRESS_COPYAsks whether the mailing address matches the registered address. Copies all address sub-fields automatically when the investor says yes.
State.CHECKBOX_GROUPSCollects boolean checkbox fields (PEP declarations, ERISA, FATCA) using the three-state system: true, false, or not applicable.
State.FORM_PFCollects Form PF regulatory fields. This state is only entered for US-based investors — skipped entirely for all others.
State.OPTIONAL_FIELDSOffers to collect optional fields one by one. The investor can skip any or all of them — each skip is recorded.
State.REVIEWPresents a summary of all collected data and asks the investor to confirm accuracy or flag corrections before finalising.
State.CORRECTIONHandles correction requests from the REVIEW state. Re-extracts specified fields and returns to REVIEW when done.
State.FILL_PDFTriggers the mapper module to fill the blank PDF with all collected data. Runs in a background thread — does not block the conversation.
State.WAITING_FOR_FILLPolls the fill workflow status. Informs the investor when the PDF is ready, or reports an error if the fill fails.
State.COMPLETESession is marked complete. All collected data is saved as structured JSON. The fill report is written to disk.
State.ERROREntered when an unrecoverable error occurs (LLM failure, storage failure). Logs the error and gracefully ends the session.
BaseHandler, registering the handler in StateRouter.build(), and routing to it from an existing handler. See the Advanced page for a complete step-by-step walkthrough.Explore the AI SDK
Start with Installation if this is your first time. Each page below covers one topic in depth — jump directly to what you need.
pip install with extras, pdf-autofillr setup, configure .env for all modules, ragpdf init-vectors, verify setup, and run your first fill.
Full reference for mapper_config.ini and .env — every option for all four modules, required vs optional, and defaults.
All HTTP endpoints for all four servers — chatbot (8000), doc_upload (8001), mapper (8002), and RAG (8003) — with request/response schemas.
Use any module directly in your Python code without running a server. chatbotClient, storage backends, FormConfig, and PDF filler interfaces.
Running all four servers, cloud storage (AWS/Azure/GCP), data folder layout, rate limiting, telemetry, custom PromptBuilder, and extending the state machine.
Every startup, import, API, and config error with exact fixes. Complete setup scenarios and a troubleshooting quick-reference table.
PDFFILLR.AI
The intelligent layer for modern fund
administration. Automating high-stakes
documentation with precision and speed.