# Dealspace — Watermark & File Protection Pipeline **Version:** 0.1 — 2026-02-28 **Status:** Design specification for section 6.2 of SPEC.md **Author:** James (subagent) --- ## 1. Design Principles 1. **Original files are sacred.** Storage always contains the clean, unmodified original. Watermarks are applied at serve time, never persisted. 2. **Watermarks are forensic, not decorative.** If a document leaks, we must trace it to a specific user/org/timestamp. Watermarks are evidence, not theater. 3. **FIPS 140-3 throughout.** All crypto operations use FIPS-approved algorithms. No exceptions. 4. **Performance over perfection.** A 50ms watermark that traces 99% of leaks beats a 5s watermark that's "perfect." Users won't wait. 5. **Graceful degradation.** If watermarking fails (corrupted file, unsupported variant), serve with audit log + fallback watermark strategy, never block access entirely. --- ## 2. Watermark Content Standard watermark string (configurable per project): ``` {user_name} · {org_name} · {iso_timestamp} · CONFIDENTIAL ``` Example: ``` John Smith · Acme Capital · 2026-02-28T14:32:17Z · CONFIDENTIAL ``` ### 2.1 Watermark Variants | Variant | Use Case | Content | |---------|----------|---------| | `standard` | Normal access | Full string as above | | `screen_deterrent` | Tiled background for PDF preview | Repeated diagonal pattern | | `minimal` | Fallback when processing fails | `{user_id}:{timestamp}` (short, traceable) | ### 2.2 Watermark Styling (Project Config) ```go type WatermarkConfig struct { Text string // Template: "{user_name} · {org_name} · {timestamp} · CONFIDENTIAL" FontFamily string // Default: "Helvetica" (PDF), "Calibri" (Office) FontSize int // Default: 10 (text), 48 (tiled background) Color string // RGBA hex: "#FF0000AA" (semi-transparent red) Position string // "footer" | "header" | "diagonal" | "tiled" Opacity float64 // 0.0-1.0, default 0.3 for diagonal/tiled } ``` --- ## 3. File Type Implementations ### 3.1 PDF Watermarking **Library:** `github.com/pdfcpu/pdfcpu` (pure Go, FIPS-compatible, actively maintained) **Approach:** 1. Parse PDF into memory 2. Add watermark as text annotation or stamped content on each page 3. Serialize modified PDF to output stream **Watermark Placement:** - **Footer watermark:** Bottom center of each page, 10pt gray text - **Diagonal tiled (screen deterrent):** 45° repeated pattern across entire page, 0.15 opacity **Algorithm:** ```go func WatermarkPDF(input io.Reader, output io.Writer, wm WatermarkParams) error { // 1. Read PDF ctx, err := pdfcpu.ReadContext(input, nil) if err != nil { return fmt.Errorf("pdf parse: %w", err) } // 2. Build watermark spec wmSpec := pdfcpu.TextWatermark{ Text: wm.Text, FontName: "Helvetica", FontSize: 10, Color: pdfcpu.Gray, Pos: pdfcpu.BottomCenter, } // 3. Apply to all pages if err := pdfcpu.AddWatermarks(ctx, nil, wmSpec); err != nil { return fmt.Errorf("pdf watermark: %w", err) } // 4. Optionally add diagonal tiled pattern for screen deterrent if wm.ScreenDeterrent { tiledSpec := pdfcpu.TextWatermark{ Text: wm.Text, FontSize: 48, Color: pdfcpu.LightGray, Opacity: 0.15, Rotation: 45, Diagonal: true, } pdfcpu.AddWatermarks(ctx, nil, tiledSpec) } // 5. Write output return pdfcpu.WriteContext(ctx, output) } ``` **Performance:** - Small PDF (1-10 pages): ~20-50ms - Large PDF (100+ pages): ~200-500ms - Memory: ~2x file size during processing **Caching:** ❌ **Never cache watermarked PDFs.** Each serve includes user-specific timestamp. Caching would serve stale timestamps or wrong user identities. The whole point is forensic traceability. **Edge Cases:** | Case | Handling | |------|----------| | Password-protected PDF | Reject with error: "Cannot watermark encrypted PDF. Contact administrator." Log to audit. | | Corrupted PDF | Attempt parse; if fails, serve original with `minimal` watermark in filename + audit log | | PDF/A strict | pdfcpu preserves PDF/A compliance; no special handling needed | | Scanned PDF (images) | Watermark overlays images; no text extraction needed | | 1000+ page PDF | Stream processing; set timeout at 30s, fallback to minimal if exceeded | --- ### 3.2 Word Document (.docx) Watermarking **Library:** `github.com/unidoc/unioffice` (pure Go, Office Open XML manipulation) **Approach:** 1. Unzip DOCX (it's a ZIP of XML files) 2. Modify `word/document.xml` to add footer content 3. Create/modify `word/footer1.xml` with watermark text 4. Update `[Content_Types].xml` and relationships 5. Rezip and serve **Watermark Placement:** - **Footer:** Centered text in document footer, appears on every page - **Header alternative:** For "CONFIDENTIAL" prominence, add to header **Algorithm:** ```go func WatermarkDOCX(input io.Reader, output io.Writer, wm WatermarkParams) error { // 1. Open DOCX doc, err := document.Read(input, int64(size)) if err != nil { return fmt.Errorf("docx parse: %w", err) } // 2. Get or create footer footer := doc.AddFooter() footer.SetParagraphProperties(document.ParagraphStyleFooter) // 3. Add watermark paragraph para := footer.AddParagraph() para.SetAlignment(document.AlignmentCenter) run := para.AddRun() run.AddText(wm.Text) run.Properties().SetColor(color.Gray) run.Properties().SetSize(10) // 4. Apply footer to all sections for _, section := range doc.Sections() { section.SetFooter(footer, document.FooterTypeDefault) } // 5. Save return doc.Save(output) } ``` **Performance:** - Typical DOCX: ~30-80ms - Large DOCX with images: ~100-300ms - Memory: ~3x file size (uncompressed XML is verbose) **Caching:** ❌ Never. Same reasoning as PDF. **Edge Cases:** | Case | Handling | |------|----------| | Password-protected DOCX | Reject with error. Office encryption prevents modification. | | Corrupted DOCX | Attempt parse; fallback to encrypted-download-only mode | | DOCX with existing footer | Append watermark to existing footer, don't replace | | DOCM (macro-enabled) | Same process; macros preserved. Consider security warning. | | DOC (legacy binary) | Convert via LibreOffice CLI first, or reject. See 3.2.1. | #### 3.2.1 Legacy DOC Handling Binary `.doc` files cannot be watermarked with pure Go. Options: 1. **Convert to PDF on upload** (recommended for M&A — preserves formatting, prevents editing) 2. **LibreOffice CLI conversion** at serve time: `libreoffice --headless --convert-to docx` 3. **Reject with message:** "Legacy format. Please upload .docx" Recommendation: Option 1 for new uploads; Option 3 for existing files in MVP. --- ### 3.3 Excel (.xlsx) Watermarking **Library:** `github.com/xuri/excelize/v2` (pure Go, actively maintained, 15k+ stars) **Approach:** 1. Open XLSX 2. For each sheet: insert header row with watermark text 3. Optionally: add sheet-level "protection" (cosmetic, not security — easily bypassed) 4. Save to output stream **Watermark Placement:** - **Header row (Row 1):** Merged cells spanning data width, light gray background, watermark text - **Sheet header/footer:** Print-only watermark (visible when printed) **Algorithm:** ```go func WatermarkXLSX(input io.Reader, output io.Writer, wm WatermarkParams) error { // 1. Open workbook f, err := excelize.OpenReader(input) if err != nil { return fmt.Errorf("xlsx parse: %w", err) } defer f.Close() // 2. Watermark each sheet for _, sheet := range f.GetSheetList() { // Get data dimensions dim, _ := f.GetSheetDimension(sheet) cols := parseColumnCount(dim) // e.g., "A1:J50" → 10 columns // Insert row at top if err := f.InsertRows(sheet, 1, 1); err != nil { continue } // Merge cells for watermark banner endCol := columnLetter(cols) f.MergeCell(sheet, "A1", endCol+"1") // Set watermark text f.SetCellValue(sheet, "A1", wm.Text) // Style: light gray background, centered, small font styleID, _ := f.NewStyle(&excelize.Style{ Fill: excelize.Fill{Type: "pattern", Color: []string{"#EEEEEE"}, Pattern: 1}, Font: &excelize.Font{Size: 9, Color: "#888888"}, Alignment: &excelize.Alignment{Horizontal: "center"}, }) f.SetCellStyle(sheet, "A1", endCol+"1", styleID) // Add print header/footer f.SetHeaderFooter(sheet, &excelize.HeaderFooterOptions{ OddFooter: "&C" + wm.Text, }) } // 3. Optional: add sheet protection (cosmetic only) if wm.AddProtection { for _, sheet := range f.GetSheetList() { f.ProtectSheet(sheet, &excelize.SheetProtection{ Password: "", // No password — just prevents casual editing SelectLockedCells: true, }) } } // 4. Write output return f.Write(output) } ``` **Performance:** - Small XLSX: ~20-50ms - Large XLSX (10k+ rows): ~100-400ms - Memory: ~2-4x file size **Caching:** ❌ Never. **Edge Cases:** | Case | Handling | |------|----------| | Password-protected XLSX | Reject. Cannot modify encrypted workbook. | | Workbook with VBA macros (.xlsm) | Process same as .xlsx; macros preserved | | Very wide sheets (1000+ columns) | Skip merge, add watermark to A1 only | | Charts/pivot tables | Unaffected; watermark is in data area | | XLS (legacy binary) | Reject or convert via LibreOffice. Same as DOC. | --- ### 3.4 Image Watermarking (JPG, PNG, WebP) **Library:** Standard library `image` + `golang.org/x/image` + `github.com/fogleman/gg` (2D graphics) **Approach:** 1. Decode image 2. Draw semi-transparent text overlay 3. Encode to output format **Watermark Placement:** - **Bottom-right corner:** Primary watermark, semi-transparent white text with drop shadow - **Tiled diagonal (optional):** For high-value images, repeated pattern across entire image **Algorithm:** ```go func WatermarkImage(input io.Reader, output io.Writer, format string, wm WatermarkParams) error { // 1. Decode image img, _, err := image.Decode(input) if err != nil { return fmt.Errorf("image decode: %w", err) } bounds := img.Bounds() width, height := bounds.Dx(), bounds.Dy() // 2. Create drawing context dc := gg.NewContextForImage(img) // 3. Calculate font size based on image dimensions fontSize := float64(width) / 50 // ~2% of width if fontSize < 12 { fontSize = 12 } if fontSize > 48 { fontSize = 48 } dc.LoadFontFace("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", fontSize) // 4. Position: bottom-right with padding textWidth, textHeight := dc.MeasureString(wm.Text) x := float64(width) - textWidth - 20 y := float64(height) - 20 // 5. Draw drop shadow dc.SetRGBA(0, 0, 0, 0.5) dc.DrawString(wm.Text, x+2, y+2) // 6. Draw watermark text dc.SetRGBA(1, 1, 1, 0.7) dc.DrawString(wm.Text, x, y) // 7. Optional: diagonal tiled pattern if wm.ScreenDeterrent { dc.SetRGBA(0.5, 0.5, 0.5, 0.15) dc.LoadFontFace("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", fontSize*2) for row := -height; row < height*2; row += int(fontSize * 4) { for col := -width; col < width*2; col += int(textWidth * 1.5) { dc.Push() dc.RotateAbout(gg.Radians(45), float64(col), float64(row)) dc.DrawString(wm.Text, float64(col), float64(row)) dc.Pop() } } } // 8. Encode output switch format { case "jpeg", "jpg": return jpeg.Encode(output, dc.Image(), &jpeg.Options{Quality: 90}) case "png": return png.Encode(output, dc.Image()) case "webp": return webp.Encode(output, dc.Image(), &webp.Options{Quality: 90}) default: return png.Encode(output, dc.Image()) } } ``` **Performance:** - Small image (<1MB): ~10-30ms - Large image (10MB+): ~100-300ms - Memory: ~4x pixel dimensions (RGBA in memory) **Caching:** ❌ Never. **Edge Cases:** | Case | Handling | |------|----------| | Animated GIF | Extract first frame, watermark, serve as static. Or reject. | | Very small image (<200px) | Reduce font size; may become illegible — accept this | | HEIC/HEIF | Convert to JPEG first (Apple format, limited Go support) | | TIFF | Decode with `golang.org/x/image/tiff`; serve as PNG | | RAW formats | Reject. Convert on upload. | | SVG | Skip pixel watermarking; add text element to XML | --- ### 3.5 Video Watermarking (MP4, MOV) **Tool:** FFmpeg (external binary) — no pure Go solution exists for video processing **Approach:** 1. Pipe original video to FFmpeg stdin 2. FFmpeg overlays text watermark 3. Stream FFmpeg stdout to HTTP response **Watermark Placement:** - **Bottom-right corner:** Semi-transparent text overlay, visible but not distracting - **Optional burn-in:** More prominent for high-sensitivity content **Algorithm:** ```go func WatermarkVideo(ctx context.Context, objectID string, w http.ResponseWriter, wm WatermarkParams) error { // 1. Build FFmpeg command // Text escape: replace special chars escapedText := strings.ReplaceAll(wm.Text, ":", "\\:") escapedText = strings.ReplaceAll(escapedText, "'", "\\'") // drawtext filter filter := fmt.Sprintf( "drawtext=text='%s':fontsize=24:fontcolor=white@0.7:x=w-tw-20:y=h-th-20:shadowcolor=black@0.5:shadowx=2:shadowy=2", escapedText, ) cmd := exec.CommandContext(ctx, "ffmpeg", "-i", "pipe:0", // Read from stdin "-vf", filter, // Apply text filter "-c:v", "libx264", // Re-encode video "-preset", "fast", // Speed over compression "-crf", "23", // Quality (lower = better) "-c:a", "copy", // Copy audio unchanged "-movflags", "+faststart+frag_keyframe+empty_moov", // Streaming-friendly "-f", "mp4", // Output format "pipe:1", // Write to stdout ) // 2. Set up pipes stdin, _ := cmd.StdinPipe() cmd.Stdout = w cmd.Stderr = os.Stderr // Log errors // 3. Start FFmpeg if err := cmd.Start(); err != nil { return fmt.Errorf("ffmpeg start: %w", err) } // 4. Stream input file to FFmpeg go func() { defer stdin.Close() obj, _ := store.Read(objectID) io.Copy(stdin, bytes.NewReader(obj)) }() // 5. Wait for completion return cmd.Wait() } ``` **Performance:** - 1-minute video: ~5-15 seconds (re-encoding required) - 10-minute video: ~30-90 seconds - **Recommendation:** For videos >5 minutes, use async processing + notification when ready **Caching:** ⚠️ **Consider selective caching for large videos.** - Risk: Cached version has wrong timestamp for subsequent views - Mitigation: Cache key includes `{object_id}:{user_id}:{date}` — same user same day gets cache - Invalidate cache at midnight or on project config change **Edge Cases:** | Case | Handling | |------|----------| | Very long video (>1hr) | Async processing; return 202 with job ID; poll for completion | | Corrupted video | FFmpeg will error; return 500 with audit log | | Unsupported codec | FFmpeg handles most; truly exotic formats: reject | | Audio-only file | No video stream to watermark; add metadata comment instead | | MKV, AVI, WMV | Convert to MP4 on serve (FFmpeg handles this) | --- ### 3.6 Other File Types **Strategy:** Encrypted download only. No preview, no watermarking. **Affected Types:** - ZIP, TAR, 7Z (archives) - CAD files (DWG, DXF) - Database exports (SQL, CSV with sensitive data) - Executables (rare but possible) - Unknown/binary files **Watermark Alternative:** 1. Filename includes minimal watermark: `report_{user_id}_{timestamp}.zip` 2. Audit log captures full context 3. "CONFIDENTIAL" wrapper: Serve inside a new ZIP containing the file + a `NOTICE.txt` ```go func ServeEncryptedDownload(w http.ResponseWriter, objectID string, wm WatermarkParams) error { // Create wrapper ZIP with notice buf := new(bytes.Buffer) zw := zip.NewWriter(buf) // Add notice file notice, _ := zw.Create("NOTICE.txt") fmt.Fprintf(notice, "CONFIDENTIAL\n\nDownloaded by: %s\nOrganization: %s\nTimestamp: %s\n\nUnauthorized distribution is prohibited.", wm.UserName, wm.OrgName, wm.Timestamp) // Add original file original, _ := zw.Create(wm.OriginalFilename) obj, _ := store.Read(objectID) original.Write(obj) zw.Close() // Set download filename with watermark info filename := fmt.Sprintf("%s_%s_%s.zip", strings.TrimSuffix(wm.OriginalFilename, filepath.Ext(wm.OriginalFilename)), wm.UserID[:8], time.Now().Format("20060102")) w.Header().Set("Content-Disposition", fmt.Sprintf(`attachment; filename="%s"`, filename)) w.Header().Set("Content-Type", "application/zip") w.Write(buf.Bytes()) return nil } ``` --- ## 4. Screen Capture Protection **Reality Check:** True screen capture protection is impossible. Any DRM can be defeated by pointing a camera at a screen. Our goal is **deterrence and traceability**, not prevention. ### 4.1 Visual Deterrent Strategy **For PDFs served in-browser:** 1. Apply diagonal tiled watermark pattern (45°, repeated every 200px) 2. Use user-specific text in the pattern 3. Opacity 0.15 — visible in screenshots but doesn't obstruct reading **For images:** 1. Same diagonal tiled pattern 2. Consider more aggressive opacity (0.25) for high-sensitivity images **For video:** 1. Persistent corner watermark (already implemented) 2. Optional: periodic full-screen flash of watermark text (every 60s, 2s duration, 0.3 opacity) ### 4.2 Additional Deterrents | Technique | Effectiveness | Implementation | |-----------|---------------|----------------| | Diagonal tiled watermark | High | Built into watermark functions | | Random position micro-watermarks | Medium | Add 5-10 tiny (8px) watermarks at random positions | | Invisible watermarks (steganography) | Low (easily stripped) | Not recommended — complexity vs. value | | JavaScript screenshot detection | Low (easily bypassed) | Not recommended | | CSS `-webkit-user-select: none` | Cosmetic only | Add to viewer CSS | ### 4.3 Recommended Approach ```go type ScreenDeterrentLevel int const ( DeterrentNone ScreenDeterrentLevel = 0 DeterrentStandard ScreenDeterrentLevel = 1 // Footer + diagonal tiled DeterrentHigh ScreenDeterrentLevel = 2 // + random micro-watermarks ) ``` Default to `DeterrentStandard` for all data room documents. Project admins can escalate to `DeterrentHigh` for specific folders. --- ## 5. Audit Trail Every file serve MUST be logged. No exceptions. ### 5.1 Audit Entry Structure ```go type FileServeAudit struct { ID string `json:"id"` // UUID ProjectID string `json:"project_id"` ObjectID string `json:"object_id"` // File being served EntryID string `json:"entry_id"` // Parent answer entry ActorID string `json:"actor_id"` // User requesting ActorOrg string `json:"actor_org"` // Organization Action string `json:"action"` // "view" | "download" | "print" IP string `json:"ip"` // Client IP (X-Forwarded-For aware) UserAgent string `json:"user_agent"` Timestamp time.Time `json:"timestamp"` WatermarkID string `json:"watermark_id"` // Unique ID embedded in watermark FileType string `json:"file_type"` // "pdf", "docx", etc. FileSize int64 `json:"file_size"` Success bool `json:"success"` ErrorMsg string `json:"error_msg,omitempty"` } ``` ### 5.2 Watermark ID Each watermark includes a unique, traceable ID: ```go func GenerateWatermarkID(actorID, objectID string, timestamp time.Time) string { // Short, human-readable, globally unique h := hmac.New(sha256.New, watermarkSecret) h.Write([]byte(actorID + objectID + timestamp.Format(time.RFC3339))) sum := h.Sum(nil) return base32.StdEncoding.EncodeToString(sum[:8])[:13] // e.g., "JBSWY3DPEHPK3" } ``` This ID appears in the watermark text and audit log. If a document leaks, grep for the ID → instant attribution. ### 5.3 Audit Table ```sql CREATE TABLE file_serves ( id TEXT PRIMARY KEY, project_id TEXT NOT NULL, object_id TEXT NOT NULL, entry_id TEXT, actor_id TEXT NOT NULL, actor_org TEXT, action TEXT NOT NULL, ip TEXT NOT NULL, user_agent TEXT, ts INTEGER NOT NULL, watermark_id TEXT NOT NULL, file_type TEXT NOT NULL, file_size INTEGER, success INTEGER NOT NULL, error_msg TEXT ); CREATE INDEX idx_serves_project ON file_serves(project_id); CREATE INDEX idx_serves_actor ON file_serves(actor_id); CREATE INDEX idx_serves_object ON file_serves(object_id); CREATE INDEX idx_serves_watermark ON file_serves(watermark_id); CREATE INDEX idx_serves_ts ON file_serves(ts); ``` ### 5.4 Audit Logging Function ```go func LogFileServe(ctx context.Context, audit FileServeAudit) error { audit.ID = uuid.NewString() if audit.Timestamp.IsZero() { audit.Timestamp = time.Now() } // Pack sensitive fields (action, user_agent, error_msg) packed, err := Pack(audit) if err != nil { return err } _, err = db.ExecContext(ctx, ` INSERT INTO file_serves (id, project_id, object_id, entry_id, actor_id, actor_org, action, ip, user_agent, ts, watermark_id, file_type, file_size, success, error_msg) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`, audit.ID, audit.ProjectID, audit.ObjectID, audit.EntryID, audit.ActorID, audit.ActorOrg, packed.Action, audit.IP, packed.UserAgent, audit.Timestamp.UnixMilli(), audit.WatermarkID, audit.FileType, audit.FileSize, boolToInt(audit.Success), packed.ErrorMsg, ) return err } ``` --- ## 6. Burn After Reading Mode Optional per-file setting: file can only be downloaded N times total, or N times per user. ### 6.1 Configuration ```go type BurnConfig struct { Enabled bool `json:"enabled"` MaxDownloads int `json:"max_downloads"` // Total across all users (0 = unlimited) MaxPerUser int `json:"max_per_user"` // Per individual user (0 = unlimited) ExpiresAt *int64 `json:"expires_at"` // Unix ms timestamp (optional) NotifyOnBurn bool `json:"notify_on_burn"` // Alert admins when limit reached } ``` ### 6.2 Tracking Table ```sql CREATE TABLE burn_tracking ( object_id TEXT NOT NULL, actor_id TEXT NOT NULL, download_count INTEGER NOT NULL DEFAULT 0, last_download INTEGER, PRIMARY KEY (object_id, actor_id) ); CREATE TABLE burn_totals ( object_id TEXT PRIMARY KEY, total_downloads INTEGER NOT NULL DEFAULT 0, burned_at INTEGER -- When limit was hit ); ``` ### 6.3 Burn Check Function ```go func CheckBurnLimit(ctx context.Context, objectID, actorID string) (allowed bool, remaining int, err error) { // 1. Load burn config from entry Data config, err := GetBurnConfig(ctx, objectID) if err != nil || !config.Enabled { return true, -1, nil // No burn limit } // 2. Check expiration if config.ExpiresAt != nil && time.Now().UnixMilli() > *config.ExpiresAt { return false, 0, nil } // 3. Check total downloads var total int db.QueryRowContext(ctx, "SELECT total_downloads FROM burn_totals WHERE object_id = ?", objectID).Scan(&total) if config.MaxDownloads > 0 && total >= config.MaxDownloads { return false, 0, nil } // 4. Check per-user downloads var userCount int db.QueryRowContext(ctx, "SELECT download_count FROM burn_tracking WHERE object_id = ? AND actor_id = ?", objectID, actorID).Scan(&userCount) if config.MaxPerUser > 0 && userCount >= config.MaxPerUser { return false, 0, nil } // 5. Calculate remaining remaining = -1 // Unlimited if config.MaxDownloads > 0 { remaining = config.MaxDownloads - total } if config.MaxPerUser > 0 { userRemaining := config.MaxPerUser - userCount if remaining < 0 || userRemaining < remaining { remaining = userRemaining } } return true, remaining, nil } func IncrementBurnCount(ctx context.Context, objectID, actorID string) error { tx, _ := db.BeginTx(ctx, nil) defer tx.Rollback() // Upsert user tracking tx.ExecContext(ctx, ` INSERT INTO burn_tracking (object_id, actor_id, download_count, last_download) VALUES (?, ?, 1, ?) ON CONFLICT (object_id, actor_id) DO UPDATE SET download_count = download_count + 1, last_download = ?`, objectID, actorID, time.Now().UnixMilli(), time.Now().UnixMilli()) // Upsert total tx.ExecContext(ctx, ` INSERT INTO burn_totals (object_id, total_downloads) VALUES (?, 1) ON CONFLICT (object_id) DO UPDATE SET total_downloads = total_downloads + 1`, objectID) return tx.Commit() } ``` ### 6.4 Burn Notification When a file hits its limit: ```go func NotifyBurn(ctx context.Context, objectID string, config BurnConfig) { if !config.NotifyOnBurn { return } // Update burned_at timestamp db.ExecContext(ctx, "UPDATE burn_totals SET burned_at = ? WHERE object_id = ?", time.Now().UnixMilli(), objectID) // Notify project admins (via existing notification system) entry, _ := GetEntryByObjectID(ctx, objectID) NotifyProjectAdmins(ctx, entry.ProjectID, NotificationBurnLimitReached, map[string]any{ "object_id": objectID, "filename": entry.Data["filename"], }) } ``` --- ## 7. Go Implementation Design ### 7.1 lib/watermark.go — Function Signatures ```go package lib import ( "context" "io" "time" ) // WatermarkParams contains all info needed to generate a watermark type WatermarkParams struct { UserID string UserName string OrgID string OrgName string Timestamp time.Time WatermarkID string // Unique traceable ID // Styling (from project config) Config WatermarkConfig // Original file info OriginalFilename string FileType string // "pdf", "docx", "xlsx", "jpg", etc. // Options ScreenDeterrent bool // Add aggressive visual deterrent } // WatermarkConfig is project-level styling configuration type WatermarkConfig struct { Text string // Template with placeholders FontFamily string FontSize int Color string // RGBA hex Position string // "footer", "header", "diagonal", "tiled" Opacity float64 DeterrentLevel ScreenDeterrentLevel } // ScreenDeterrentLevel controls anti-screenshot measures type ScreenDeterrentLevel int const ( DeterrentNone ScreenDeterrentLevel = 0 DeterrentStandard ScreenDeterrentLevel = 1 DeterrentHigh ScreenDeterrentLevel = 2 ) // BuildWatermarkText generates the actual watermark string from params func BuildWatermarkText(params WatermarkParams) string // GenerateWatermarkID creates a unique, traceable watermark identifier func GenerateWatermarkID(actorID, objectID string, timestamp time.Time) string // ---- Per-Type Watermarking Functions ---- // WatermarkPDF adds watermark to PDF, writing result to output func WatermarkPDF(input io.Reader, output io.Writer, params WatermarkParams) error // WatermarkDOCX adds watermark to Word document func WatermarkDOCX(input io.Reader, output io.Writer, params WatermarkParams) error // WatermarkXLSX adds watermark to Excel spreadsheet func WatermarkXLSX(input io.Reader, output io.Writer, params WatermarkParams) error // WatermarkImage adds watermark to image (JPG, PNG, WebP) // Returns the output format used (may differ from input for unsupported formats) func WatermarkImage(input io.Reader, output io.Writer, inputFormat string, params WatermarkParams) (outputFormat string, err error) // WatermarkVideo streams video with watermark overlay // This is special: it writes directly to http.ResponseWriter, handling streaming func WatermarkVideo(ctx context.Context, objectReader io.Reader, w io.Writer, params WatermarkParams) error // ServeProtectedFile is the unified entry point for the protection pipeline // It detects file type and applies appropriate watermarking func ServeProtectedFile(ctx context.Context, objectID string, w io.Writer, params WatermarkParams) error // ServeEncryptedDownload wraps non-watermarkable files with NOTICE.txt func ServeEncryptedDownload(ctx context.Context, objectID string, w io.Writer, params WatermarkParams) error // ---- Audit Functions ---- // LogFileServe records every file access to the audit table func LogFileServe(ctx context.Context, audit FileServeAudit) error // FileServeAudit contains all info about a file serve event type FileServeAudit struct { ID string ProjectID string ObjectID string EntryID string ActorID string ActorOrg string Action string // "view", "download", "print" IP string UserAgent string Timestamp time.Time WatermarkID string FileType string FileSize int64 Success bool ErrorMsg string } // ---- Burn After Reading ---- // BurnConfig defines download limits for a file type BurnConfig struct { Enabled bool MaxDownloads int // Total across all users MaxPerUser int // Per individual user ExpiresAt *int64 NotifyOnBurn bool } // CheckBurnLimit verifies if download is allowed and returns remaining count func CheckBurnLimit(ctx context.Context, objectID, actorID string) (allowed bool, remaining int, err error) // IncrementBurnCount records a successful download func IncrementBurnCount(ctx context.Context, objectID, actorID string) error // GetBurnConfig retrieves burn settings for an object func GetBurnConfig(ctx context.Context, objectID string) (BurnConfig, error) // SetBurnConfig updates burn settings for an object func SetBurnConfig(ctx context.Context, objectID string, config BurnConfig) error ``` ### 7.2 Dependencies (go.mod additions) ```go // PDF processing require github.com/pdfcpu/pdfcpu v0.8.0 // Office documents require github.com/unidoc/unioffice v1.33.0 require github.com/xuri/excelize/v2 v2.8.1 // Image processing require github.com/fogleman/gg v1.3.0 require golang.org/x/image v0.18.0 // WebP support (for image encoding) require github.com/chai2010/webp v1.1.1 // Video (FFmpeg is external binary, no Go dep needed) // Ensure ffmpeg is installed: apt install ffmpeg // Crypto (FIPS 140-3 compliance) // Use standard library crypto/aes, crypto/hmac, crypto/sha256 // These are FIPS-approved when built with GOEXPERIMENT=boringcrypto ``` **FIPS 140-3 Note:** Build with `GOEXPERIMENT=boringcrypto` to use BoringSSL for FIPS-compliant crypto operations. ### 7.3 Serve Pipeline Complete request flow from HTTP request to watermarked response: ``` ┌─────────────────────────────────────────────────────────────────────────────────┐ │ FILE SERVE PIPELINE │ └─────────────────────────────────────────────────────────────────────────────────┘ HTTP Request: GET /api/files/{object_id} │ ▼ ┌──────────────────────┐ │ 1. Auth Middleware │ Extract JWT, validate session └──────────────────────┘ │ ▼ ┌──────────────────────┐ │ 2. CheckAccess() │ Verify actor can access this object │ (lib/rbac.go) │ Walk to parent entry → workstream → project └──────────────────────┘ Return 403 if denied │ ▼ ┌──────────────────────┐ │ 3. CheckBurnLimit() │ Verify download quota not exceeded │ (lib/watermark.go)│ Return 410 Gone if burned └──────────────────────┘ │ ▼ ┌──────────────────────┐ │ 4. ObjectRead() │ Fetch encrypted object from store │ (lib/store.go) │ Decrypt with project key └──────────────────────┘ │ ▼ ┌──────────────────────┐ │ 5. Build Params │ Construct WatermarkParams from: │ │ - Actor (user_id, name, org) │ │ - Project config (styling) │ │ - Timestamp + generated watermark ID │ │ - File metadata (type, name) └──────────────────────┘ │ ▼ ┌──────────────────────┐ │ 6. Route by Type │ Detect file type from extension/magic bytes │ │ │ PDF ──────────► │ WatermarkPDF() │ DOCX ─────────► │ WatermarkDOCX() │ XLSX ─────────► │ WatermarkXLSX() │ Image ────────► │ WatermarkImage() │ Video ────────► │ WatermarkVideo() │ Other ────────► │ ServeEncryptedDownload() └──────────────────────┘ │ ▼ ┌──────────────────────┐ │ 7. Stream to Client │ Set Content-Type, Content-Disposition │ │ Write watermarked content to ResponseWriter └──────────────────────┘ │ ▼ ┌──────────────────────┐ │ 8. LogFileServe() │ Record to audit table (async, don't block) │ (lib/watermark.go)│ └──────────────────────┘ │ ▼ ┌──────────────────────┐ │ 9. IncrementBurn() │ Update download counters if burn enabled │ (lib/watermark.go)│ Check if limit now hit → notify admins └──────────────────────┘ │ ▼ HTTP Response: Watermarked file stream ``` ### 7.4 Handler Implementation ```go // api/handlers.go func (h *Handler) ServeFile(w http.ResponseWriter, r *http.Request) { ctx := r.Context() objectID := chi.URLParam(r, "objectID") actor := auth.ActorFromContext(ctx) // 1. Access check entry, err := lib.GetEntryByObjectID(ctx, h.db, objectID) if err != nil { http.Error(w, "not found", 404) return } if !lib.CheckAccess(ctx, h.db, actor.ID, entry.ProjectID, entry.ID, "read") { http.Error(w, "forbidden", 403) return } // 2. Burn limit check allowed, remaining, err := lib.CheckBurnLimit(ctx, h.db, objectID, actor.ID) if err != nil { http.Error(w, "internal error", 500) return } if !allowed { http.Error(w, "download limit exceeded", 410) // Gone return } // 3. Read object data, err := lib.ObjectRead(ctx, h.store, h.projectKey, objectID) if err != nil { http.Error(w, "not found", 404) return } // 4. Build watermark params fileInfo := entry.Data["files"].([]any)[0].(map[string]any) // Simplified projectConfig, _ := lib.GetProjectWatermarkConfig(ctx, h.db, entry.ProjectID) params := lib.WatermarkParams{ UserID: actor.ID, UserName: actor.Name, OrgID: actor.OrgID, OrgName: actor.OrgName, Timestamp: time.Now(), WatermarkID: lib.GenerateWatermarkID(actor.ID, objectID, time.Now()), Config: projectConfig, OriginalFilename: fileInfo["name"].(string), FileType: fileInfo["type"].(string), ScreenDeterrent: projectConfig.DeterrentLevel >= lib.DeterrentStandard, } // 5. Prepare audit entry (log after response) audit := lib.FileServeAudit{ ProjectID: entry.ProjectID, ObjectID: objectID, EntryID: entry.ID, ActorID: actor.ID, ActorOrg: actor.OrgName, Action: "download", IP: realIP(r), UserAgent: r.UserAgent(), WatermarkID: params.WatermarkID, FileType: params.FileType, FileSize: int64(len(data)), Success: true, } // 6. Set response headers contentType := mimeTypeFromExt(params.FileType) w.Header().Set("Content-Type", contentType) w.Header().Set("Content-Disposition", fmt.Sprintf(`attachment; filename="%s"`, params.OriginalFilename)) if remaining > 0 { w.Header().Set("X-Downloads-Remaining", fmt.Sprintf("%d", remaining)) } // 7. Apply watermark and stream err = lib.ServeProtectedFile(ctx, bytes.NewReader(data), w, params) if err != nil { audit.Success = false audit.ErrorMsg = err.Error() // Still log the attempt } // 8. Log and update burn count (async) go func() { lib.LogFileServe(context.Background(), h.db, audit) if audit.Success { lib.IncrementBurnCount(context.Background(), h.db, objectID, actor.ID) } }() } ``` --- ## 8. Performance Considerations ### 8.1 Processing Time Budgets | File Type | Target | Acceptable | Timeout | |-----------|--------|------------|---------| | PDF (≤10pg) | <100ms | <500ms | 10s | | PDF (>100pg) | <1s | <5s | 30s | | DOCX | <100ms | <500ms | 10s | | XLSX | <100ms | <500ms | 10s | | Image | <50ms | <200ms | 5s | | Video (1min) | <10s | <30s | 120s | | Video (10min) | async | async | 300s | ### 8.2 Memory Management - PDF: Stream pages when possible; full memory load for complex watermarks - DOCX/XLSX: Full memory load required (ZIP structure) - Image: Decode → process → encode; ~4x pixel dimensions in RAM - Video: Stream through FFmpeg; no full file in memory **Memory limits per request:** - PDF: 256MB max - Office: 128MB max - Image: 512MB max (4K images at 4 bytes/pixel) - Video: Streaming, no limit ### 8.3 Caching Strategy | What | Cache? | Rationale | |------|--------|-----------| | Watermarked files | ❌ Never | Timestamp and user-specific content | | Project watermark config | ✅ 5min TTL | Rarely changes | | User/org lookups | ✅ 5min TTL | Rarely changes | | Object decryption | ⚠️ Per-request | Could cache decrypted bytes briefly | | Burn counts | ❌ Never | Must be accurate | ### 8.4 Async Processing for Large Files Videos >5 minutes and PDFs >500 pages should use async processing: ```go func (h *Handler) ServeFileLarge(w http.ResponseWriter, r *http.Request) { // ... access checks ... if isLargeFile(params.FileType, fileSize) { // Queue for background processing jobID := lib.QueueWatermarkJob(ctx, h.queue, objectID, params) w.Header().Set("Content-Type", "application/json") w.WriteHeader(202) // Accepted json.NewEncoder(w).Encode(map[string]string{ "status": "processing", "job_id": jobID, "poll_url": "/api/files/jobs/" + jobID, }) return } // Normal sync processing... } ``` --- ## 9. Error Handling & Fallbacks ### 9.1 Graceful Degradation | Error | Fallback | User Message | |-------|----------|--------------| | PDF parse failure | Serve original with audit + filename watermark | "Processing unavailable" | | Office parse failure | Encrypt + download only | "Preview unavailable" | | Image decode failure | Serve original with audit | "Processing unavailable" | | Video FFmpeg failure | Encrypt + download only | "Streaming unavailable" | | Timeout exceeded | Serve original with audit | "Processing timeout" | ### 9.2 Error Logging All watermark failures are logged with full context: ```go type WatermarkError struct { ObjectID string FileType string Error string Stack string Fallback string // What we did instead Timestamp time.Time } ``` --- ## 10. Security Considerations ### 10.1 FIPS 140-3 Compliance All crypto operations use FIPS-approved algorithms: - AES-256-GCM for encryption - SHA-256 for hashing - HMAC-SHA256 for watermark ID generation - Build with `GOEXPERIMENT=boringcrypto` ### 10.2 Watermark Tampering Resistance Watermarks are applied at serve time, so tampering with stored files doesn't help. However: - Digital signatures on PDFs are invalidated by watermarking (expected) - Office documents could have watermark row deleted (accepted risk; audit trail remains) - Image watermarks can be cropped (tiled pattern mitigates) - Video watermarks can be cropped (corner + periodic full-screen mitigates) ### 10.3 Audit Integrity Audit logs should be: - Write-only (no DELETE endpoint) - Integrity-checked (HMAC of each row, chain hash) - Replicated off-server for SOX/compliance --- ## 11. Testing Strategy ### 11.1 Unit Tests ```go func TestWatermarkPDF(t *testing.T) func TestWatermarkPDFPassword(t *testing.T) // Should fail gracefully func TestWatermarkPDFCorrupt(t *testing.T) // Should fallback func TestWatermarkDOCX(t *testing.T) func TestWatermarkXLSX(t *testing.T) func TestWatermarkImage(t *testing.T) func TestBurnLimitEnforced(t *testing.T) func TestBurnLimitPerUser(t *testing.T) func TestAuditLogCreated(t *testing.T) func TestWatermarkIDUnique(t *testing.T) ``` ### 11.2 Integration Tests - Upload file → download → verify watermark present - Exceed burn limit → verify 410 response - Concurrent downloads → verify accurate burn counting - Large video → verify async handling ### 11.3 Sample Files Maintain a test corpus of: - Normal files (all types) - Password-protected files - Corrupted files - Maximum-size files (stress test) - Edge cases per type (see sections 3.x) --- ## 12. Open Questions / Future Work 1. **PDF/A compliance:** Verify pdfcpu maintains PDF/A compliance after watermarking. May need explicit flag. 2. **Office 365 online preview:** When files are previewed in Office Online, watermarks must persist. May need server-side rendering instead. 3. **Mobile app considerations:** Native mobile viewers may strip/ignore some watermarks. Test thoroughly. 4. **Print watermarks:** Physical prints should show watermark. PDF print header/footer may be more robust than visual overlay. 5. **Invisible forensic watermarks:** Steganographic watermarks that survive screenshots/prints. Complex, may add later. 6. **Video DRM:** HLS with encryption + Widevine. Overkill for MVP, but worth considering for future. --- *This specification is complete and ready for implementation. Questions → Johan.*