AI SDK — v1.1.1

pdf-autofillr SDK

A full-stack Python SDK for intelligent, automated PDF form filling. Install once, configure once, and fill PDF forms from two directions — through a natural-language chatbot conversation or by uploading an existing document. Four cooperative modules handle every step: data collection, document parsing, field mapping, and predictive improvement with RAG.

Python 3.10+GPT-4o-mini poweredchatbot + doc_upload + mapper + RAGREST API + CLIAWS · Azure · GCPv1.1.1

1. What This SDK Does

pdf-autofillr automates the full lifecycle of PDF form filling — from collecting investor data to delivering a completed document. It gives you two input paths (conversational chatbot or document upload), a shared PDF filling engine (mapper), and an optional RAG layer that makes field predictions smarter after every fill.

The SDK is not a generic AI framework. It is purpose-built for regulated financial onboarding: it understands ten investor types, handles address copy shortcuts, manages three-state boolean checkbox groups, filters Form PF fields for US investors, and tracks mandatory vs optional fields per investor type — all configured through JSON files without touching Python.

Conversational data collectionThe chatbot module runs a 13-state multi-turn conversation, collects all KYC fields through natural language using GPT-4o-mini, and returns a structured JSON output — with or without PDF filling.
chatbot module · 13 states
Document-based extractionThe doc_upload module accepts PDF, Word, or Excel files, uses an LLM to extract field values matching your form schema, and passes them to the mapper — no conversation needed.
doc_upload module · any document format
Semantic PDF fillingThe mapper module reads your blank PDF form, identifies every fillable field, semantically maps collected data to those fields, embeds the values, and writes the completed PDF. Used by both input modules.
mapper module · extract → map → embed → fill
Self-improving RAG predictionsThe RAG module ships with 137 pre-loaded investor field vectors. It predicts field values before each session starts, then records corrections after every fill — improving accuracy automatically over time.
rag module · 137 pre-loaded vectors

2. The Four Modules

Each module is independent and exposes its own CLI and REST API server. Install any combination using pip extras. All four share the same config folder, data folder, and cloud storage backend.

ModuleRoleCLIServer port
chatbotConversational LLM chat — collects data — fills PDFchatbot-cli8000
doc_uploadUpload any document — extract fields — fill PDFdoc-upload-cli8001
mapperRead blank PDF — map fields — embed values — write PDFpdf-mapper8002
ragVector predictions — learn from every fillragpdf8003
Mapper and RAG run inprocess by default
You do not need to start pdf-mapper-server or ragpdf-server separately. By default both run inside the calling module's process. Start them as standalone servers only when you need to scale them independently or share them across multiple services.

3. Quick Start — 7-Step Flow

From a fresh Python environment to a running, PDF-filling system in seven steps. Follow these in order the first time — they only take a few minutes.

terminal
# Step 1 — Install (full stack or pick what you need)
pip install "pdf-autofillr[all]"

# Step 2 — Scaffold project: configs/, data/, .env.example
pdf-autofillr setup

# Step 3 — Copy and fill in your secrets
cp .env.example .env
# Edit .env — minimum: OPENAI_API_KEY=sk-...

# Step 4 — Drop your blank PDF form
# data/input/blank_form.pdf

# Step 5 — Initialize RAG (if RAG_ENABLED=true)
ragpdf init-vectors

# Step 6 — Verify all modules are correctly configured
pdf-autofillr status
ragpdf system-info

# Step 7 — Run chatbot or doc-upload
chatbot-cli --pdf-path data/input/blank_form.pdf --report
doc-upload-cli --document data/input/Avery.pdf --schema configs/form_keys.json --report
1
Install

Use pip install "pdf-autofillr[all]" for the full stack, or pick a specific extras combination for just the modules you need.

2
Run the setup wizard

pdf-autofillr setup detects installed modules and creates configs/, data/, and .env.example — a fully scaffolded project in one command.

3
Configure .env

Copy .env.example to .env and fill in your secrets. The minimum required variable is OPENAI_API_KEY. Set chatbot_PDF_PATH and DOC_UPLOAD_PDF_PATH to enable PDF filling.

4
Add your blank PDF

Drop your blank PDF form at data/input/blank_form.pdf. This is the template the mapper fills. Make sure the path matches what you set in .env.

5
Initialize RAG vectors

If RAG_ENABLED=true, run ragpdf init-vectors once. This loads the 137 bundled vectors into the local database. Re-run with --force if you update the vector source file.

6
Verify setup

Run pdf-autofillr status to confirm all modules are correctly configured. Run ragpdf system-info to check the vector database state. Fix any reported issues before running.

7
Run chatbot or doc-upload

Use chatbot-cli for a conversational session. Use doc-upload-cli to extract fields from an existing document. Add --report to both commands for a detailed fill summary.


4. Which Install Is Right for You?

Use pip extras to install only the modules you need. Every combination includes the mapper module automatically — it is the shared engine used by both chatbot and doc_upload for PDF filling.

My goalInstall commandModules included
Full stack — everythingpip install "pdf-autofillr[all]"chatbot + doc_upload + mapper + RAG
Chatbot only (conversation → PDF)pip install "pdf-autofillr[chatbot]"chatbot + mapper
Document upload only (doc → PDF)pip install "pdf-autofillr[doc-upload]"doc_upload + mapper
Chatbot + predictive suggestionspip install "pdf-autofillr[chatbot,rag]"chatbot + mapper + RAG
Doc upload + predictive suggestionspip install "pdf-autofillr[doc-upload,rag]"doc_upload + mapper + RAG
Customise source code / add statesClone repo + pip install -r requirements-full.txtAll — full source editable
pdf-autofillr setup does the heavy lifting
After any install, run pdf-autofillr setup once. It detects installed modules, creates the full folder tree, writes configs/form_keys.json, configs/mapper_config.ini, and .env.example. Then pdf-autofillr status confirms every module is ready.
LLM Lite — use short, simple folder paths
If you are using LLM Lite and encountering errors during installation or setup, the most common cause is a long or deeply nested working directory path. Move your project to a short, top-level folder (for example C:\pdf-fillr on Windows or ~/pdf-fillr on macOS/Linux) and re-run pdf-autofillr setup. Avoid spaces and special characters in the path as well.

5. The Chatbot Module — 13-State Conversation

The chatbot module drives a finite state machine. Each state is a distinct phase of the investor onboarding flow. The machine moves forward deterministically — it never asks for information it already has, retries gracefully when extraction fails, and routes differently based on investor type.

01
GREETINGState.GREETING

Bot introduces itself and asks the investor if they are ready to begin the onboarding process.

02
INVESTOR_TYPEState.INVESTOR_TYPE

Identifies the investor type — Individual, LLC, Trust, Partnership, Corporation, and seven others. All subsequent field routing depends on this.

03
MANDATORY_FIELDSState.MANDATORY_FIELDS

GPT-4o-mini extracts all mandatory KYC fields for the selected investor type. Retries gracefully when a turn yields only partial data.

04
ADDRESS_COPYState.ADDRESS_COPY

Asks whether the mailing address matches the registered address. Copies all address sub-fields automatically when the investor says yes.

05
CHECKBOX_GROUPSState.CHECKBOX_GROUPS

Collects boolean checkbox fields (PEP declarations, ERISA, FATCA) using the three-state system: true, false, or not applicable.

06
FORM_PFState.FORM_PF

Collects Form PF regulatory fields. This state is only entered for US-based investors — skipped entirely for all others.

07
OPTIONAL_FIELDSState.OPTIONAL_FIELDS

Offers to collect optional fields one by one. The investor can skip any or all of them — each skip is recorded.

08
REVIEWState.REVIEW

Presents a summary of all collected data and asks the investor to confirm accuracy or flag corrections before finalising.

09
CORRECTIONState.CORRECTION

Handles correction requests from the REVIEW state. Re-extracts specified fields and returns to REVIEW when done.

10
FILL_PDFState.FILL_PDF

Triggers the mapper module to fill the blank PDF with all collected data. Runs in a background thread — does not block the conversation.

11
WAITING_FOR_FILLState.WAITING_FOR_FILL

Polls the fill workflow status. Informs the investor when the PDF is ready, or reports an error if the fill fails.

12
COMPLETEState.COMPLETE

Session is marked complete. All collected data is saved as structured JSON. The fill report is written to disk.

13
ERRORState.ERROR

Entered when an unrecoverable error occurs (LLM failure, storage failure). Logs the error and gracefully ends the session.

States are fully extensible
Add new states by subclassing BaseHandler, registering the handler in StateRouter.build(), and routing to it from an existing handler. See the Advanced page for a complete step-by-step walkthrough.

AI SDK documentation

Explore the AI SDK

Start with Installation if this is your first time. Each page below covers one topic in depth — jump directly to what you need.

Was this page helpful?
PDFFILLR.AI logo

PDFFILLR.AI

The intelligent layer for modern fund
administration. Automating high-stakes
documentation with precision and speed.

Powered byEngineersMind