Redact PDF for AI

Redact PDF files before sending to AI.

PDFs are the worst-case file format for AI workflows — contracts, medical reports, invoices, depositions — and they're full of PII. Cypherz extracts text via poppler's `pdftotext` for native PDFs and falls back to Tesseract OCR for scanned pages. Either way you get a clean tokenized output you can pass to any LLM.

  • 01

    Embedded + scanned

    Native PDFs use poppler. Scanned PDFs auto-detect (low text density) and run OCR per page.

  • 02

    All 16 detectors

    Every detector — emails, names, IDs, money, dates — runs against the extracted text.

  • 03

    Redacted PDF download

    Get a portable PDF artifact you can share with the team or send to your AI.

Upload via the REST API

curl -X POST https://api.cypherz.app/v1/files \
  -H "authorization: Bearer $CYPHERZ_KEY" \
  -F "file=@contract.pdf"

Common questions

Frequently asked.

What happens to my PDF file when I upload it?

Cypherz extracts text, detects PII, tokenizes it, and stores the original encrypted (AES-256-GCM, per-project key) for download. You get a tokenized extraction back immediately. Files can be deleted at any time and their encryption key destroyed.

What's the max file size?

25 MB on managed cloud. Configurable up to 100 MB on self-hosted deployments.

Can I download a redacted version of the file?

Yes — GET /v1/files/{id}/redacted returns a downloadable redacted artifact. Text-native formats keep their original shape; binary formats fall back to a clean PDF.

Is the original ever stored unencrypted?

No. From the moment we receive your file, it's encrypted at rest with a project-scoped data encryption key.

Get started

Upload your first PDF now.

Sign up, create a project, copy your API key. The first request is tokenized in under sixty seconds.