本篇整理一组「在线 PDF 元数据清理与敏感信息脱敏审计」方向的在线工具生成型 AI 提示词(Prompts)。每条 Prompt 都要求 AI 直接产出一个可运行/可构建/可部署的完整项目:包含清晰的文件结构、关键源码、运行命令、Docker 部署说明,以及不少于 5 条测试用例或 QA checklist,适用于合同、简历、投标文件、研报等 PDF 在分享/上传前的隐私与合规检查。
在线 PDF 元数据查看与一键清理工具
面向上传前自检:展示并清理作者、创建软件、时间戳、自定义属性等元数据,并输出审计报告。
英文 Prompt:
You are a senior full-stack engineer. Build an online PDF data inspector + cleaner web app.
Scope:
- Upload a PDF (no external network calls required).
- Show extracted data (Info dict, XMP if present) in a readable table.
- Provide toggles to remove fields: Author, Creator, Producer, CreationDate, ModDate, , Subject, Keywords, plus any custom keys.
- Allow setting safe replacements (e.g., Author="", ="").
- Generate a JSON audit report (before/after) and allow download.
Tech stack:
- Frontend: React + Type + Vite.
- Backend: Node.js (Express) or Python (FastAPI) — pick one and justify.
- PDF libraries: choose robust open-source libs (e.g., pdf-lib, pikepdf/qpdf, or similar).
Deliverables (must include all):
1) Project file tree.
2) Full source code for frontend and backend.
3) Clear run commands for dev and production.
4) Dockerfile + docker-compose.yml.
5) Security notes: file size limits, MIME sniffing, temp file handling.
6) Tests: at least 5 automated tests OR a QA checklist with 10 steps.
Functional requirements:
- Works for common PDFs.
- Shows warnings for encrypted/secured PDFs.
- Outputs a cleaned PDF for download.
Do not generate images. Focus on PDF processing only.
中文释义: 让 AI 生成一个“上传 PDF → 查看元数据 → 选择清理项 → 下载清理后 PDF + 审计报告”的在线工具完整项目,适合分享前去除作者/软件/时间等敏感痕迹。
PDF 敏感信息扫描与可疑片段定位工具
对 PDF 文本做规则与词典扫描,定位可能的手机号、邮箱、身份证号、地址等,并给出页码/坐标/上下文片段。
英文 Prompt:
Build a web tool that scans PDF text for sensitive patterns and produces a review report.
Patterns to detect:
- Email, phone numbers, ID-like numbers, bank card-like numbers (use conservative rules), and custom keywords list.
Features:
- Extract text per page.
- For each match: page number, surrounding context (±30 chars), and confidence score.
- Allow user to export the findings as CSV and JSON.
- Provide a UI to add/remove custom keywords and regexes.
Implementation constraints:
- Include a clear file structure.
- Use Type end-to-end (Node + React) OR Python backend + TS frontend.
- Must include rate limiting, max upload size, and safe temporary storage.
Deliverables:
- Full code + run commands.
- Docker setup.
- Tests/QA: minimum 5 meaningful checks, including tricky PDFs and false positive control.
Do NOT generate any images or rely on external services.
中文释义: 让 AI 生成一个“PDF 敏感信息扫描器”,输出可审阅的命中清单与导出文件,用于上传前的隐私风险排查。
PDF 文字脱敏(涂黑/替换)与版本对比工具
提供可复现的脱敏流程:按命中规则在 PDF 上做不可逆涂黑(redaction),并生成脱敏前后差异摘要。
英文 Prompt:
Create an online PDF redaction tool that performs irreversible redaction.
Requirements:
- User uploads a PDF.
- Tool scans for sensitive matches (email/phone/custom keywords).
- User reviews matches and selects which to redact.
- Apply redaction properly (remove underlying text, not just draw a rectangle).
- Output: redacted PDF + a change summary (pages changed, number of redactions).
Tech:
- Prefer a backend approach that guarantees true redaction (e.g., qpdf + pikepdf, or a proven PDF redaction library).
Deliverables:
- Full project code, file tree.
- Dev/prod commands.
- Docker compose.
- Tests/QA: at least 5 cases, including:
1) Redaction removes searchable text.
2) Works on multi-page PDFs.
3) Handles rotated pages.
4) Handles encrypted PDFs gracefully.
5) Ensures no sensitive strings in output bytes (basic check).
No image generation.
中文释义: 让 AI 生成一个真正“不可逆脱敏”的在线工具:扫描→勾选→执行 redaction→输出脱敏 PDF 与变更摘要,避免只画黑框但文字仍可复制的问题。
PDF 字体与嵌入资源审计工具
用于合规与交付检查:列出 PDF 中的字体、是否嵌入、子集化情况,并生成风险提示。
英文 Prompt:
Build a PDF assets audit web app.
Goals:
- Upload PDF.
- Extract and list fonts used (names, embedded/subset flags), images count, and basic PDF version info.
- Provide warnings: non-embedded fonts risk, large images risk, unusual data keys.
- Export audit report as JSON.
Stack:
- Frontend: React + TS.
- Backend: Python FastAPI with pikepdf or similar.
Deliverables:
- Full code + file tree.
- Commands and Docker.
- Tests/QA: at least 5, including a PDF with non-embedded fonts and one with subset fonts.
Do not generate images.
中文释义: 让 AI 生成一个“PDF 资源审计”在线工具,重点输出字体嵌入与子集化信息,辅助印刷/交付/合规检查。
PDF 元数据模板化重写与批量处理队列
面向团队流程:上传多个 PDF,统一按模板重写或清空元数据,支持任务队列、失败重试与导出日志。
英文 Prompt:
Implement a multi-file PDF data batch processing web tool.
Features:
- Upload multiple PDFs.
- Define a data policy template (fields to set/clear).
- Process files in a queue with progress and retry.
- Download results as a ZIP.
- Produce a CSV log: filename, status, cleaned fields, warnings.
Non-functional:
- Handle large files safely; max file size configurable.
- Use worker threads / background jobs.
Deliverables:
- Full stack code.
- Job queue implementation.
- Docker compose.
- Tests/QA: at least 5, including batch of mixed valid/invalid PDFs.
No image generation.
中文释义: 让 AI 生成一个“多文件批处理”的 PDF 元数据清理工具,带队列与日志,适合在交付/投标等场景批量统一规范。
PDF 结构健康检查与修复建议工具
对 PDF 做结构与一致性检查(xref、对象流、线性化等),输出可读的诊断与修复建议。
英文 Prompt:
Build a web- d PDF health checker.
What to detect:
- Encrypted / password protected PDFs.
- Corrupted xref or structural anomalies.
- PDF version, linearized flag, stream usage.
- Suspicious features: embedded files, actions.
Output:
- A diagnostic report with severity levels and remediation suggestions.
Implementation:
- Use reliable CLI tools via backend (e.g., qpdf, pdfinfo) in a safe sandbox.
- Provide exact commands used and sanitize filenames.
Deliverables:
- Full project code + Docker.
- Run commands.
- Tests/QA: at least 5.
No image generation.
中文释义: 让 AI 生成一个“PDF 健康检查”在线工具,输出结构诊断与风险提示(如嵌入文件/脚本),并给出可执行的修复建议。
PDF 书签与目录一致性校验工具
检查书签层级、目标页码、重复标题,并输出修复建议或一键重排书签。
英文 Prompt:
Create an online PDF bookmarks validator.
Features:
- Upload PDF.
- Parse bookmarks/outline.
- Validate: broken destinations, out-of-range pages, duplicate s, inconsistent hierarchy.
- Offer a "fix" mode: normalize s, remove broken entries, re-indent d on rules.
- Output fixed PDF + a validation report.
Deliverables:
- Full code, file tree, run commands.
- Docker.
- Tests/QA: at least 5, including PDFs with missing destinations.
No image generation.
中文释义: 让 AI 生成一个“PDF 书签目录校验/修复”在线工具,输出校验报告与修复后的 PDF,适合长文档交付前检查。
PDF 页级水印与去水印合规检查工具
只做“检测与标注”不做图片生成:识别常见文本水印/叠加层,输出可能存在水印的页码与特征。
英文 Prompt:
Build a PDF watermark detector web app (detection only).
Goals:
- Upload PDF.
- Analyze content streams to detect repeated text patterns that look like watermarks (e.g., same string across many pages, low opacity graphics state).
- Report suspected watermark pages and the detected string(s) with confidence.
- Provide an exportable report.
Constraints:
- Do not generate or render new images.
- Focus on analysis and reporting.
Deliverables:
- Full code + Docker.
- QA checklist: at least 10 steps.
No image generation.
中文释义: 让 AI 生成一个“PDF 水印检测”在线工具,只做分析与报告导出,帮助合规审查与交付验收。
PDF 附件/嵌入文件与外部链接审计工具
检查 PDF 是否包含嵌入文件、外部链接、可疑动作,并生成风险清单与处置建议。
英文 Prompt:
Create a PDF security audit web tool.
Checks:
- Embedded files / attachments.
- External s / URI actions.
- actions.
- Launch actions.
Output:
- Findings list with severity and remediation steps.
- Optionally produce a "sanitized" PDF that removes high-risk features (explain limitations).
Deliverables:
- Full project code.
- Docker.
- Tests/QA: at least 5.
No image generation.
中文释义: 让 AI 生成一个“PDF 安全审计”在线工具,重点检查嵌入文件/外链/脚本等风险点,并提供可导出的处置建议。
PDF 脱敏工作流:规则集管理与审计追踪后台
将脱敏规则、操作记录与审计报告串起来:可配置规则集,按项目/批次追踪处理历史。
英文 Prompt:
Design and implement a minimal PDF privacy workflow app.
Features:
- Manage redaction policies (regex + keyword lists).
- Upload PDFs for a "batch".
- Record processing history (who, when, policy version, counts).
- Store audit reports and allow download.
Stack:
- Backend: FastAPI + SQLite/Postgres.
- Frontend: React + TS.
- Include authentication (simple email+password or magic ) and RBAC for admin/editor/viewer.
Deliverables:
- Full code + migrations.
- Docker compose.
- Seed data.
- Tests/QA: at least 5.
No image generation.
中文释义: 让 AI 生成一个“PDF 脱敏工作流”小型系统:规则集版本化 + 处理批次 + 审计追踪,适合团队合规流程落地。