PDF工具 PDF元数据清理与敏感信息脱敏审计 AI 提示词 (Prompts)

本篇整理一组「在线 PDF 元数据清理与敏感信息脱敏审计」方向的在线工具生成型 AI 提示词（Prompts）。每条 Prompt 都要求 AI 直接产出一个可运行/可构建/可部署的完整项目：包含清晰的文件结构、关键源码、运行命令、Docker 部署说明，以及不少于 5 条测试用例或 QA checklist，适用于合同、简历、投标文件、研报等 PDF 在分享/上传前的隐私与合规检查。

在线 PDF 元数据查看与一键清理工具

面向上传前自检：展示并清理作者、创建软件、时间戳、自定义属性等元数据，并输出审计报告。

英文 Prompt:

You are a senior full-stack engineer. Build an online PDF data inspector + cleaner web app. Scope: - Upload a PDF (no external network calls required). - Show extracted data (Info dict, XMP if present) in a readable table. - Provide toggles to remove fields: Author, Creator, Producer, CreationDate, ModDate, , Subject, Keywords, plus any custom keys. - Allow setting safe replacements (e.g., Author="", =""). - Generate a JSON audit report (before/after) and allow download. Tech stack: - Frontend: React + Type + Vite. - Backend: Node.js (Express) or Python (FastAPI) — pick one and justify. - PDF libraries: choose robust open-source libs (e.g., pdf-lib, pikepdf/qpdf, or similar). Deliverables (must include all): 1) Project file tree. 2) Full source code for frontend and backend. 3) Clear run commands for dev and production. 4) Dockerfile + docker-compose.yml. 5) Security notes: file size limits, MIME sniffing, temp file handling. 6) Tests: at least 5 automated tests OR a QA checklist with 10 steps. Functional requirements: - Works for common PDFs. - Shows warnings for encrypted/secured PDFs. - Outputs a cleaned PDF for download. Do not generate images. Focus on PDF processing only.

中文释义: 让 AI 生成一个“上传 PDF → 查看元数据 → 选择清理项 → 下载清理后 PDF + 审计报告”的在线工具完整项目，适合分享前去除作者/软件/时间等敏感痕迹。

PDF 敏感信息扫描与可疑片段定位工具

对 PDF 文本做规则与词典扫描，定位可能的手机号、邮箱、身份证号、地址等，并给出页码/坐标/上下文片段。

英文 Prompt:

Build a web tool that scans PDF text for sensitive patterns and produces a review report. Patterns to detect: - Email, phone numbers, ID-like numbers, bank card-like numbers (use conservative rules), and custom keywords list. Features: - Extract text per page. - For each match: page number, surrounding context (±30 chars), and confidence score. - Allow user to export the findings as CSV and JSON. - Provide a UI to add/remove custom keywords and regexes. Implementation constraints: - Include a clear file structure. - Use Type end-to-end (Node + React) OR Python backend + TS frontend. - Must include rate limiting, max upload size, and safe temporary storage. Deliverables: - Full code + run commands. - Docker setup. - Tests/QA: minimum 5 meaningful checks, including tricky PDFs and false positive control. Do NOT generate any images or rely on external services.

中文释义: 让 AI 生成一个“PDF 敏感信息扫描器”，输出可审阅的命中清单与导出文件，用于上传前的隐私风险排查。

PDF 文字脱敏（涂黑/替换）与版本对比工具

提供可复现的脱敏流程：按命中规则在 PDF 上做不可逆涂黑（redaction），并生成脱敏前后差异摘要。

英文 Prompt:

Create an online PDF redaction tool that performs irreversible redaction. Requirements: - User uploads a PDF. - Tool scans for sensitive matches (email/phone/custom keywords). - User reviews matches and selects which to redact. - Apply redaction properly (remove underlying text, not just draw a rectangle). - Output: redacted PDF + a change summary (pages changed, number of redactions). Tech: - Prefer a backend approach that guarantees true redaction (e.g., qpdf + pikepdf, or a proven PDF redaction library). Deliverables: - Full project code, file tree. - Dev/prod commands. - Docker compose. - Tests/QA: at least 5 cases, including: 1) Redaction removes searchable text. 2) Works on multi-page PDFs. 3) Handles rotated pages. 4) Handles encrypted PDFs gracefully. 5) Ensures no sensitive strings in output bytes (basic check). No image generation.

中文释义: 让 AI 生成一个真正“不可逆脱敏”的在线工具：扫描→勾选→执行 redaction→输出脱敏 PDF 与变更摘要，避免只画黑框但文字仍可复制的问题。

PDF 字体与嵌入资源审计工具

用于合规与交付检查：列出 PDF 中的字体、是否嵌入、子集化情况，并生成风险提示。

英文 Prompt:

Build a PDF assets audit web app. Goals: - Upload PDF. - Extract and list fonts used (names, embedded/subset flags), images count, and basic PDF version info. - Provide warnings: non-embedded fonts risk, large images risk, unusual data keys. - Export audit report as JSON. Stack: - Frontend: React + TS. - Backend: Python FastAPI with pikepdf or similar. Deliverables: - Full code + file tree. - Commands and Docker. - Tests/QA: at least 5, including a PDF with non-embedded fonts and one with subset fonts. Do not generate images.

中文释义: 让 AI 生成一个“PDF 资源审计”在线工具，重点输出字体嵌入与子集化信息，辅助印刷/交付/合规检查。

PDF 元数据模板化重写与批量处理队列

面向团队流程：上传多个 PDF，统一按模板重写或清空元数据，支持任务队列、失败重试与导出日志。

英文 Prompt:

Implement a multi-file PDF data batch processing web tool. Features: - Upload multiple PDFs. - Define a data policy template (fields to set/clear). - Process files in a queue with progress and retry. - Download results as a ZIP. - Produce a CSV log: filename, status, cleaned fields, warnings. Non-functional: - Handle large files safely; max file size configurable. - Use worker threads / background jobs. Deliverables: - Full stack code. - Job queue implementation. - Docker compose. - Tests/QA: at least 5, including batch of mixed valid/invalid PDFs. No image generation.

中文释义: 让 AI 生成一个“多文件批处理”的 PDF 元数据清理工具，带队列与日志，适合在交付/投标等场景批量统一规范。

PDF 结构健康检查与修复建议工具

对 PDF 做结构与一致性检查（xref、对象流、线性化等），输出可读的诊断与修复建议。

英文 Prompt:

Build a web- d PDF health checker. What to detect: - Encrypted / password protected PDFs. - Corrupted xref or structural anomalies. - PDF version, linearized flag, stream usage. - Suspicious features: embedded files, actions. Output: - A diagnostic report with severity levels and remediation suggestions. Implementation: - Use reliable CLI tools via backend (e.g., qpdf, pdfinfo) in a safe sandbox. - Provide exact commands used and sanitize filenames. Deliverables: - Full project code + Docker. - Run commands. - Tests/QA: at least 5. No image generation.

中文释义: 让 AI 生成一个“PDF 健康检查”在线工具，输出结构诊断与风险提示（如嵌入文件/脚本），并给出可执行的修复建议。

PDF 书签与目录一致性校验工具

检查书签层级、目标页码、重复标题，并输出修复建议或一键重排书签。

英文 Prompt:

Create an online PDF bookmarks validator. Features: - Upload PDF. - Parse bookmarks/outline. - Validate: broken destinations, out-of-range pages, duplicate s, inconsistent hierarchy. - Offer a "fix" mode: normalize s, remove broken entries, re-indent d on rules. - Output fixed PDF + a validation report. Deliverables: - Full code, file tree, run commands. - Docker. - Tests/QA: at least 5, including PDFs with missing destinations. No image generation.

中文释义: 让 AI 生成一个“PDF 书签目录校验/修复”在线工具，输出校验报告与修复后的 PDF，适合长文档交付前检查。

PDF 页级水印与去水印合规检查工具

只做“检测与标注”不做图片生成：识别常见文本水印/叠加层，输出可能存在水印的页码与特征。

英文 Prompt:

Build a PDF watermark detector web app (detection only). Goals: - Upload PDF. - Analyze content streams to detect repeated text patterns that look like watermarks (e.g., same string across many pages, low opacity graphics state). - Report suspected watermark pages and the detected string(s) with confidence. - Provide an exportable report. Constraints: - Do not generate or render new images. - Focus on analysis and reporting. Deliverables: - Full code + Docker. - QA checklist: at least 10 steps. No image generation.

中文释义: 让 AI 生成一个“PDF 水印检测”在线工具，只做分析与报告导出，帮助合规审查与交付验收。

PDF 附件/嵌入文件与外部链接审计工具

检查 PDF 是否包含嵌入文件、外部链接、可疑动作，并生成风险清单与处置建议。

英文 Prompt:

Create a PDF security audit web tool. Checks: - Embedded files / attachments. - External s / URI actions. - actions. - Launch actions. Output: - Findings list with severity and remediation steps. - Optionally produce a "sanitized" PDF that removes high-risk features (explain limitations). Deliverables: - Full project code. - Docker. - Tests/QA: at least 5. No image generation.

中文释义: 让 AI 生成一个“PDF 安全审计”在线工具，重点检查嵌入文件/外链/脚本等风险点，并提供可导出的处置建议。

PDF 脱敏工作流：规则集管理与审计追踪后台

将脱敏规则、操作记录与审计报告串起来：可配置规则集，按项目/批次追踪处理历史。

英文 Prompt:

Design and implement a minimal PDF privacy workflow app. Features: - Manage redaction policies (regex + keyword lists). - Upload PDFs for a "batch". - Record processing history (who, when, policy version, counts). - Store audit reports and allow download. Stack: - Backend: FastAPI + SQLite/Postgres. - Frontend: React + TS. - Include authentication (simple email+password or magic ) and RBAC for admin/editor/viewer. Deliverables: - Full code + migrations. - Docker compose. - Seed data. - Tests/QA: at least 5. No image generation.

中文释义: 让 AI 生成一个“PDF 脱敏工作流”小型系统：规则集版本化 + 处理批次 + 审计追踪，适合团队合规流程落地。