Docs

Reference

Release Notes

AI SDK — v1.1.1

pdf-autofillr SDK

A full-stack Python SDK for intelligent, automated PDF form filling. Install once, configure once, and fill PDF forms from two directions — through a natural-language chatbot conversation or by uploading an existing document. Four cooperative modules handle every step: data collection, document parsing, field mapping, and predictive improvement with RAG.

Python 3.10+GPT-4o-mini poweredchatbot + doc_upload + mapper + RAGREST API + CLIAWS · Azure · GCPv1.1.1

1. What This SDK Does

pdf-autofillr automates the full lifecycle of PDF form filling — from collecting investor data to delivering a completed document. It gives you two input paths (conversational chatbot or document upload), a shared PDF filling engine (mapper), and an optional RAG layer that makes field predictions smarter after every fill.

The SDK is not a generic AI framework. It is purpose-built for regulated financial onboarding: it understands ten investor types, handles address copy shortcuts, manages three-state boolean checkbox groups, filters Form PF fields for US investors, and tracks mandatory vs optional fields per investor type — all configured through JSON files without touching Python.

Conversational data collectionThe chatbot module runs a 13-state multi-turn conversation, collects all KYC fields through natural language using GPT-4o-mini, and returns a structured JSON output — with or without PDF filling.

chatbot module · 13 states

Document-based extractionThe doc_upload module accepts PDF, Word, or Excel files, uses an LLM to extract field values matching your form schema, and passes them to the mapper — no conversation needed.

doc_upload module · any document format

Semantic PDF fillingThe mapper module reads your blank PDF form, identifies every fillable field, semantically maps collected data to those fields, embeds the values, and writes the completed PDF. Used by both input modules.

mapper module · extract → map → embed → fill

Self-improving RAG predictionsThe RAG module ships with 137 pre-loaded investor field vectors. It predicts field values before each session starts, then records corrections after every fill — improving accuracy automatically over time.

rag module · 137 pre-loaded vectors

2. The Four Modules

Each module is independent and exposes its own CLI and REST API server. Install any combination using pip extras. All four share the same config folder, data folder, and cloud storage backend.

ModuleRoleCLIServer port

chatbotConversational LLM chat — collects data — fills PDFchatbot-cli8000

doc_uploadUpload any document — extract fields — fill PDFdoc-upload-cli8001

mapperRead blank PDF — map fields — embed values — write PDFpdf-mapper8002

ragVector predictions — learn from every fillragpdf8003

Mapper and RAG run inprocess by default

You do not need to start pdf-mapper-server or ragpdf-server separately. By default both run inside the calling module's process. Start them as standalone servers only when you need to scale them independently or share them across multiple services.

3. Quick Start — 7-Step Flow

From a fresh Python environment to a running, PDF-filling system in seven steps. Follow these in order the first time — they only take a few minutes.

terminal

# Step 1 — Install (full stack or pick what you need)
pip install "pdf-autofillr[all]"

# Step 2 — Scaffold project: configs/, data/, .env.example
pdf-autofillr setup

# Step 3 — Copy and fill in your secrets
cp .env.example .env
# Edit .env — minimum: OPENAI_API_KEY=sk-...

# Step 4 — Drop your blank PDF form
# data/input/blank_form.pdf

# Step 5 — Initialize RAG (if RAG_ENABLED=true)
ragpdf init-vectors

# Step 6 — Verify all modules are correctly configured
pdf-autofillr status
ragpdf system-info

# Step 7 — Run chatbot or doc-upload
chatbot-cli --pdf-path data/input/blank_form.pdf --report
doc-upload-cli --document data/input/Avery.pdf --schema configs/form_keys.json --report

Install

Use pip install "pdf-autofillr[all]" for the full stack, or pick a specific extras combination for just the modules you need.

Run the setup wizard

pdf-autofillr setup detects installed modules and creates configs/, data/, and .env.example — a fully scaffolded project in one command.

Configure .env

Copy .env.example to .env and fill in your secrets. The minimum required variable is OPENAI_API_KEY. Set chatbot_PDF_PATH and DOC_UPLOAD_PDF_PATH to enable PDF filling.

Add your blank PDF

Drop your blank PDF form at data/input/blank_form.pdf. This is the template the mapper fills. Make sure the path matches what you set in .env.

Initialize RAG vectors

If RAG_ENABLED=true, run ragpdf init-vectors once. This loads the 137 bundled vectors into the local database. Re-run with --force if you update the vector source file.

Verify setup

Run pdf-autofillr status to confirm all modules are correctly configured. Run ragpdf system-info to check the vector database state. Fix any reported issues before running.

Run chatbot or doc-upload

Use chatbot-cli for a conversational session. Use doc-upload-cli to extract fields from an existing document. Add --report to both commands for a detailed fill summary.

4. Which Install Is Right for You?

Use pip extras to install only the modules you need. Every combination includes the mapper module automatically — it is the shared engine used by both chatbot and doc_upload for PDF filling.

My goalInstall commandModules included

Full stack — everythingpip install "pdf-autofillr[all]"chatbot + doc_upload + mapper + RAG

Chatbot only (conversation → PDF)pip install "pdf-autofillr[chatbot]"chatbot + mapper

Document upload only (doc → PDF)pip install "pdf-autofillr[doc-upload]"doc_upload + mapper

Chatbot + predictive suggestionspip install "pdf-autofillr[chatbot,rag]"chatbot + mapper + RAG

Doc upload + predictive suggestionspip install "pdf-autofillr[doc-upload,rag]"doc_upload + mapper + RAG

Customise source code / add statesClone repo + pip install -r requirements-full.txtAll — full source editable

pdf-autofillr setup does the heavy lifting

After any install, run pdf-autofillr setup once. It detects installed modules, creates the full folder tree, writes configs/form_keys.json, configs/mapper_config.ini, and .env.example. Then pdf-autofillr status confirms every module is ready.

LLM Lite — use short, simple folder paths

If you are using LLM Lite and encountering errors during installation or setup, the most common cause is a long or deeply nested working directory path. Move your project to a short, top-level folder (for example C:\pdf-fillr on Windows or ~/pdf-fillr on macOS/Linux) and re-run pdf-autofillr setup. Avoid spaces and special characters in the path as well.

5. The Chatbot Module — 13-State Conversation

The chatbot module drives a finite state machine. Each state is a distinct phase of the investor onboarding flow. The machine moves forward deterministically — it never asks for information it already has, retries gracefully when extraction fails, and routes differently based on investor type.

GREETINGState.GREETING

Bot introduces itself and asks the investor if they are ready to begin the onboarding process.

INVESTOR_TYPEState.INVESTOR_TYPE

Identifies the investor type — Individual, LLC, Trust, Partnership, Corporation, and seven others. All subsequent field routing depends on this.

MANDATORY_FIELDSState.MANDATORY_FIELDS

GPT-4o-mini extracts all mandatory KYC fields for the selected investor type. Retries gracefully when a turn yields only partial data.

ADDRESS_COPYState.ADDRESS_COPY

Asks whether the mailing address matches the registered address. Copies all address sub-fields automatically when the investor says yes.

CHECKBOX_GROUPSState.CHECKBOX_GROUPS

Collects boolean checkbox fields (PEP declarations, ERISA, FATCA) using the three-state system: true, false, or not applicable.

FORM_PFState.FORM_PF

Collects Form PF regulatory fields. This state is only entered for US-based investors — skipped entirely for all others.

OPTIONAL_FIELDSState.OPTIONAL_FIELDS

Offers to collect optional fields one by one. The investor can skip any or all of them — each skip is recorded.

REVIEWState.REVIEW

Presents a summary of all collected data and asks the investor to confirm accuracy or flag corrections before finalising.

CORRECTIONState.CORRECTION

Handles correction requests from the REVIEW state. Re-extracts specified fields and returns to REVIEW when done.

FILL_PDFState.FILL_PDF

Triggers the mapper module to fill the blank PDF with all collected data. Runs in a background thread — does not block the conversation.

WAITING_FOR_FILLState.WAITING_FOR_FILL

Polls the fill workflow status. Informs the investor when the PDF is ready, or reports an error if the fill fails.

COMPLETEState.COMPLETE

Session is marked complete. All collected data is saved as structured JSON. The fill report is written to disk.

ERRORState.ERROR

Entered when an unrecoverable error occurs (LLM failure, storage failure). Logs the error and gracefully ends the session.

States are fully extensible

Add new states by subclassing BaseHandler, registering the handler in StateRouter.build(), and routing to it from an existing handler. See the Advanced page for a complete step-by-step walkthrough.

AI SDK documentation

Explore the AI SDK

Start with Installation if this is your first time. Each page below covers one topic in depth — jump directly to what you need.

Installation

pip install with extras, pdf-autofillr setup, configure .env for all modules, ragpdf init-vectors, verify setup, and run your first fill.

Read installation guide

Configuration

Full reference for mapper_config.ini and .env — every option for all four modules, required vs optional, and defaults.

Read configuration reference

REST API

All HTTP endpoints for all four servers — chatbot (8000), doc_upload (8001), mapper (8002), and RAG (8003) — with request/response schemas.

Read REST API reference

Python Library

Use any module directly in your Python code without running a server. chatbotClient, storage backends, FormConfig, and PDF filler interfaces.

Read Python library reference

Advanced

Running all four servers, cloud storage (AWS/Azure/GCP), data folder layout, rate limiting, telemetry, custom PromptBuilder, and extending the state machine.

Read advanced guide

Errors & Scenarios

Every startup, import, API, and config error with exact fixes. Complete setup scenarios and a troubleshooting quick-reference table.

Read errors guide

SDK Feedback Installation

Was this page helpful?

PDFFILLR.AI

The intelligent layer for modern fund
administration. Automating high-stakes
documentation with precision and speed.

Powered byEngineersMind

Legal & Privacy

Terms & Conditions

User Agreement

Service Level Agreement

Data Processing Addendum

Resources

API Documentation

Security Overview