Redact PDF for AI
Redact PDF files before sending to AI.
PDFs are the worst-case file format for AI workflows — contracts, medical reports, invoices, depositions — and they're full of PII. Cypherz extracts text via poppler's `pdftotext` for native PDFs and falls back to Tesseract OCR for scanned pages. Either way you get a clean tokenized output you can pass to any LLM.
01
Embedded + scanned
Native PDFs use poppler. Scanned PDFs auto-detect (low text density) and run OCR per page.
02
All 16 detectors
Every detector — emails, names, IDs, money, dates — runs against the extracted text.
03
Redacted PDF download
Get a portable PDF artifact you can share with the team or send to your AI.
Upload via the REST API
curl -X POST https://api.cypherz.app/v1/files \
-H "authorization: Bearer $CYPHERZ_KEY" \
-F "file=@contract.pdf"Common questions
Frequently asked.
What happens to my PDF file when I upload it?
Cypherz extracts text, detects PII, tokenizes it, and stores the original encrypted (AES-256-GCM, per-project key) for download. You get a tokenized extraction back immediately. Files can be deleted at any time and their encryption key destroyed.
What's the max file size?
25 MB on managed cloud. Configurable up to 100 MB on self-hosted deployments.
Can I download a redacted version of the file?
Yes — GET /v1/files/{id}/redacted returns a downloadable redacted artifact. Text-native formats keep their original shape; binary formats fall back to a clean PDF.
Is the original ever stored unencrypted?
No. From the moment we receive your file, it's encrypted at rest with a project-scoped data encryption key.
Get started
Upload your first PDF now.
Sign up, create a project, copy your API key. The first request is tokenized in under sixty seconds.
More file types
Redact DOCX
Upload a Word document, get a tokenized extraction. Cypherz uses mammoth to extr…
Redact XLSX
Upload XLSX, get cell-level PII tokenization with the spreadsheet structure pres…
Redact CSV
Upload a CSV, get a tokenized version you can feed to ChatGPT, Claude, Gemini, o…
Redact PNG / JPG
Cypherz runs Tesseract OCR on PNG, JPG, and WebP uploads, then tokenizes every d…