1313 lines
43 KiB
Markdown
1313 lines
43 KiB
Markdown
# Dealspace — Watermark & File Protection Pipeline
|
||
|
||
**Version:** 0.1 — 2026-02-28
|
||
**Status:** Design specification for section 6.2 of SPEC.md
|
||
**Author:** James (subagent)
|
||
|
||
---
|
||
|
||
## 1. Design Principles
|
||
|
||
1. **Original files are sacred.** Storage always contains the clean, unmodified original. Watermarks are applied at serve time, never persisted.
|
||
|
||
2. **Watermarks are forensic, not decorative.** If a document leaks, we must trace it to a specific user/org/timestamp. Watermarks are evidence, not theater.
|
||
|
||
3. **FIPS 140-3 throughout.** All crypto operations use FIPS-approved algorithms. No exceptions.
|
||
|
||
4. **Performance over perfection.** A 50ms watermark that traces 99% of leaks beats a 5s watermark that's "perfect." Users won't wait.
|
||
|
||
5. **Graceful degradation.** If watermarking fails (corrupted file, unsupported variant), serve with audit log + fallback watermark strategy, never block access entirely.
|
||
|
||
---
|
||
|
||
## 2. Watermark Content
|
||
|
||
Standard watermark string (configurable per project):
|
||
|
||
```
|
||
{user_name} · {org_name} · {iso_timestamp} · CONFIDENTIAL
|
||
```
|
||
|
||
Example:
|
||
```
|
||
John Smith · Acme Capital · 2026-02-28T14:32:17Z · CONFIDENTIAL
|
||
```
|
||
|
||
### 2.1 Watermark Variants
|
||
|
||
| Variant | Use Case | Content |
|
||
|---------|----------|---------|
|
||
| `standard` | Normal access | Full string as above |
|
||
| `screen_deterrent` | Tiled background for PDF preview | Repeated diagonal pattern |
|
||
| `minimal` | Fallback when processing fails | `{user_id}:{timestamp}` (short, traceable) |
|
||
|
||
### 2.2 Watermark Styling (Project Config)
|
||
|
||
```go
|
||
type WatermarkConfig struct {
|
||
Text string // Template: "{user_name} · {org_name} · {timestamp} · CONFIDENTIAL"
|
||
FontFamily string // Default: "Helvetica" (PDF), "Calibri" (Office)
|
||
FontSize int // Default: 10 (text), 48 (tiled background)
|
||
Color string // RGBA hex: "#FF0000AA" (semi-transparent red)
|
||
Position string // "footer" | "header" | "diagonal" | "tiled"
|
||
Opacity float64 // 0.0-1.0, default 0.3 for diagonal/tiled
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 3. File Type Implementations
|
||
|
||
### 3.1 PDF Watermarking
|
||
|
||
**Library:** `github.com/pdfcpu/pdfcpu` (pure Go, FIPS-compatible, actively maintained)
|
||
|
||
**Approach:**
|
||
1. Parse PDF into memory
|
||
2. Add watermark as text annotation or stamped content on each page
|
||
3. Serialize modified PDF to output stream
|
||
|
||
**Watermark Placement:**
|
||
- **Footer watermark:** Bottom center of each page, 10pt gray text
|
||
- **Diagonal tiled (screen deterrent):** 45° repeated pattern across entire page, 0.15 opacity
|
||
|
||
**Algorithm:**
|
||
```go
|
||
func WatermarkPDF(input io.Reader, output io.Writer, wm WatermarkParams) error {
|
||
// 1. Read PDF
|
||
ctx, err := pdfcpu.ReadContext(input, nil)
|
||
if err != nil {
|
||
return fmt.Errorf("pdf parse: %w", err)
|
||
}
|
||
|
||
// 2. Build watermark spec
|
||
wmSpec := pdfcpu.TextWatermark{
|
||
Text: wm.Text,
|
||
FontName: "Helvetica",
|
||
FontSize: 10,
|
||
Color: pdfcpu.Gray,
|
||
Pos: pdfcpu.BottomCenter,
|
||
}
|
||
|
||
// 3. Apply to all pages
|
||
if err := pdfcpu.AddWatermarks(ctx, nil, wmSpec); err != nil {
|
||
return fmt.Errorf("pdf watermark: %w", err)
|
||
}
|
||
|
||
// 4. Optionally add diagonal tiled pattern for screen deterrent
|
||
if wm.ScreenDeterrent {
|
||
tiledSpec := pdfcpu.TextWatermark{
|
||
Text: wm.Text,
|
||
FontSize: 48,
|
||
Color: pdfcpu.LightGray,
|
||
Opacity: 0.15,
|
||
Rotation: 45,
|
||
Diagonal: true,
|
||
}
|
||
pdfcpu.AddWatermarks(ctx, nil, tiledSpec)
|
||
}
|
||
|
||
// 5. Write output
|
||
return pdfcpu.WriteContext(ctx, output)
|
||
}
|
||
```
|
||
|
||
**Performance:**
|
||
- Small PDF (1-10 pages): ~20-50ms
|
||
- Large PDF (100+ pages): ~200-500ms
|
||
- Memory: ~2x file size during processing
|
||
|
||
**Caching:** ❌ **Never cache watermarked PDFs.** Each serve includes user-specific timestamp. Caching would serve stale timestamps or wrong user identities. The whole point is forensic traceability.
|
||
|
||
**Edge Cases:**
|
||
| Case | Handling |
|
||
|------|----------|
|
||
| Password-protected PDF | Reject with error: "Cannot watermark encrypted PDF. Contact administrator." Log to audit. |
|
||
| Corrupted PDF | Attempt parse; if fails, serve original with `minimal` watermark in filename + audit log |
|
||
| PDF/A strict | pdfcpu preserves PDF/A compliance; no special handling needed |
|
||
| Scanned PDF (images) | Watermark overlays images; no text extraction needed |
|
||
| 1000+ page PDF | Stream processing; set timeout at 30s, fallback to minimal if exceeded |
|
||
|
||
---
|
||
|
||
### 3.2 Word Document (.docx) Watermarking
|
||
|
||
**Library:** `github.com/unidoc/unioffice` (pure Go, Office Open XML manipulation)
|
||
|
||
**Approach:**
|
||
1. Unzip DOCX (it's a ZIP of XML files)
|
||
2. Modify `word/document.xml` to add footer content
|
||
3. Create/modify `word/footer1.xml` with watermark text
|
||
4. Update `[Content_Types].xml` and relationships
|
||
5. Rezip and serve
|
||
|
||
**Watermark Placement:**
|
||
- **Footer:** Centered text in document footer, appears on every page
|
||
- **Header alternative:** For "CONFIDENTIAL" prominence, add to header
|
||
|
||
**Algorithm:**
|
||
```go
|
||
func WatermarkDOCX(input io.Reader, output io.Writer, wm WatermarkParams) error {
|
||
// 1. Open DOCX
|
||
doc, err := document.Read(input, int64(size))
|
||
if err != nil {
|
||
return fmt.Errorf("docx parse: %w", err)
|
||
}
|
||
|
||
// 2. Get or create footer
|
||
footer := doc.AddFooter()
|
||
footer.SetParagraphProperties(document.ParagraphStyleFooter)
|
||
|
||
// 3. Add watermark paragraph
|
||
para := footer.AddParagraph()
|
||
para.SetAlignment(document.AlignmentCenter)
|
||
run := para.AddRun()
|
||
run.AddText(wm.Text)
|
||
run.Properties().SetColor(color.Gray)
|
||
run.Properties().SetSize(10)
|
||
|
||
// 4. Apply footer to all sections
|
||
for _, section := range doc.Sections() {
|
||
section.SetFooter(footer, document.FooterTypeDefault)
|
||
}
|
||
|
||
// 5. Save
|
||
return doc.Save(output)
|
||
}
|
||
```
|
||
|
||
**Performance:**
|
||
- Typical DOCX: ~30-80ms
|
||
- Large DOCX with images: ~100-300ms
|
||
- Memory: ~3x file size (uncompressed XML is verbose)
|
||
|
||
**Caching:** ❌ Never. Same reasoning as PDF.
|
||
|
||
**Edge Cases:**
|
||
| Case | Handling |
|
||
|------|----------|
|
||
| Password-protected DOCX | Reject with error. Office encryption prevents modification. |
|
||
| Corrupted DOCX | Attempt parse; fallback to encrypted-download-only mode |
|
||
| DOCX with existing footer | Append watermark to existing footer, don't replace |
|
||
| DOCM (macro-enabled) | Same process; macros preserved. Consider security warning. |
|
||
| DOC (legacy binary) | Convert via LibreOffice CLI first, or reject. See 3.2.1. |
|
||
|
||
#### 3.2.1 Legacy DOC Handling
|
||
|
||
Binary `.doc` files cannot be watermarked with pure Go. Options:
|
||
|
||
1. **Convert to PDF on upload** (recommended for M&A — preserves formatting, prevents editing)
|
||
2. **LibreOffice CLI conversion** at serve time: `libreoffice --headless --convert-to docx`
|
||
3. **Reject with message:** "Legacy format. Please upload .docx"
|
||
|
||
Recommendation: Option 1 for new uploads; Option 3 for existing files in MVP.
|
||
|
||
---
|
||
|
||
### 3.3 Excel (.xlsx) Watermarking
|
||
|
||
**Library:** `github.com/xuri/excelize/v2` (pure Go, actively maintained, 15k+ stars)
|
||
|
||
**Approach:**
|
||
1. Open XLSX
|
||
2. For each sheet: insert header row with watermark text
|
||
3. Optionally: add sheet-level "protection" (cosmetic, not security — easily bypassed)
|
||
4. Save to output stream
|
||
|
||
**Watermark Placement:**
|
||
- **Header row (Row 1):** Merged cells spanning data width, light gray background, watermark text
|
||
- **Sheet header/footer:** Print-only watermark (visible when printed)
|
||
|
||
**Algorithm:**
|
||
```go
|
||
func WatermarkXLSX(input io.Reader, output io.Writer, wm WatermarkParams) error {
|
||
// 1. Open workbook
|
||
f, err := excelize.OpenReader(input)
|
||
if err != nil {
|
||
return fmt.Errorf("xlsx parse: %w", err)
|
||
}
|
||
defer f.Close()
|
||
|
||
// 2. Watermark each sheet
|
||
for _, sheet := range f.GetSheetList() {
|
||
// Get data dimensions
|
||
dim, _ := f.GetSheetDimension(sheet)
|
||
cols := parseColumnCount(dim) // e.g., "A1:J50" → 10 columns
|
||
|
||
// Insert row at top
|
||
if err := f.InsertRows(sheet, 1, 1); err != nil {
|
||
continue
|
||
}
|
||
|
||
// Merge cells for watermark banner
|
||
endCol := columnLetter(cols)
|
||
f.MergeCell(sheet, "A1", endCol+"1")
|
||
|
||
// Set watermark text
|
||
f.SetCellValue(sheet, "A1", wm.Text)
|
||
|
||
// Style: light gray background, centered, small font
|
||
styleID, _ := f.NewStyle(&excelize.Style{
|
||
Fill: excelize.Fill{Type: "pattern", Color: []string{"#EEEEEE"}, Pattern: 1},
|
||
Font: &excelize.Font{Size: 9, Color: "#888888"},
|
||
Alignment: &excelize.Alignment{Horizontal: "center"},
|
||
})
|
||
f.SetCellStyle(sheet, "A1", endCol+"1", styleID)
|
||
|
||
// Add print header/footer
|
||
f.SetHeaderFooter(sheet, &excelize.HeaderFooterOptions{
|
||
OddFooter: "&C" + wm.Text,
|
||
})
|
||
}
|
||
|
||
// 3. Optional: add sheet protection (cosmetic only)
|
||
if wm.AddProtection {
|
||
for _, sheet := range f.GetSheetList() {
|
||
f.ProtectSheet(sheet, &excelize.SheetProtection{
|
||
Password: "", // No password — just prevents casual editing
|
||
SelectLockedCells: true,
|
||
})
|
||
}
|
||
}
|
||
|
||
// 4. Write output
|
||
return f.Write(output)
|
||
}
|
||
```
|
||
|
||
**Performance:**
|
||
- Small XLSX: ~20-50ms
|
||
- Large XLSX (10k+ rows): ~100-400ms
|
||
- Memory: ~2-4x file size
|
||
|
||
**Caching:** ❌ Never.
|
||
|
||
**Edge Cases:**
|
||
| Case | Handling |
|
||
|------|----------|
|
||
| Password-protected XLSX | Reject. Cannot modify encrypted workbook. |
|
||
| Workbook with VBA macros (.xlsm) | Process same as .xlsx; macros preserved |
|
||
| Very wide sheets (1000+ columns) | Skip merge, add watermark to A1 only |
|
||
| Charts/pivot tables | Unaffected; watermark is in data area |
|
||
| XLS (legacy binary) | Reject or convert via LibreOffice. Same as DOC. |
|
||
|
||
---
|
||
|
||
### 3.4 Image Watermarking (JPG, PNG, WebP)
|
||
|
||
**Library:** Standard library `image` + `golang.org/x/image` + `github.com/fogleman/gg` (2D graphics)
|
||
|
||
**Approach:**
|
||
1. Decode image
|
||
2. Draw semi-transparent text overlay
|
||
3. Encode to output format
|
||
|
||
**Watermark Placement:**
|
||
- **Bottom-right corner:** Primary watermark, semi-transparent white text with drop shadow
|
||
- **Tiled diagonal (optional):** For high-value images, repeated pattern across entire image
|
||
|
||
**Algorithm:**
|
||
```go
|
||
func WatermarkImage(input io.Reader, output io.Writer, format string, wm WatermarkParams) error {
|
||
// 1. Decode image
|
||
img, _, err := image.Decode(input)
|
||
if err != nil {
|
||
return fmt.Errorf("image decode: %w", err)
|
||
}
|
||
|
||
bounds := img.Bounds()
|
||
width, height := bounds.Dx(), bounds.Dy()
|
||
|
||
// 2. Create drawing context
|
||
dc := gg.NewContextForImage(img)
|
||
|
||
// 3. Calculate font size based on image dimensions
|
||
fontSize := float64(width) / 50 // ~2% of width
|
||
if fontSize < 12 {
|
||
fontSize = 12
|
||
}
|
||
if fontSize > 48 {
|
||
fontSize = 48
|
||
}
|
||
|
||
dc.LoadFontFace("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", fontSize)
|
||
|
||
// 4. Position: bottom-right with padding
|
||
textWidth, textHeight := dc.MeasureString(wm.Text)
|
||
x := float64(width) - textWidth - 20
|
||
y := float64(height) - 20
|
||
|
||
// 5. Draw drop shadow
|
||
dc.SetRGBA(0, 0, 0, 0.5)
|
||
dc.DrawString(wm.Text, x+2, y+2)
|
||
|
||
// 6. Draw watermark text
|
||
dc.SetRGBA(1, 1, 1, 0.7)
|
||
dc.DrawString(wm.Text, x, y)
|
||
|
||
// 7. Optional: diagonal tiled pattern
|
||
if wm.ScreenDeterrent {
|
||
dc.SetRGBA(0.5, 0.5, 0.5, 0.15)
|
||
dc.LoadFontFace("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", fontSize*2)
|
||
for row := -height; row < height*2; row += int(fontSize * 4) {
|
||
for col := -width; col < width*2; col += int(textWidth * 1.5) {
|
||
dc.Push()
|
||
dc.RotateAbout(gg.Radians(45), float64(col), float64(row))
|
||
dc.DrawString(wm.Text, float64(col), float64(row))
|
||
dc.Pop()
|
||
}
|
||
}
|
||
}
|
||
|
||
// 8. Encode output
|
||
switch format {
|
||
case "jpeg", "jpg":
|
||
return jpeg.Encode(output, dc.Image(), &jpeg.Options{Quality: 90})
|
||
case "png":
|
||
return png.Encode(output, dc.Image())
|
||
case "webp":
|
||
return webp.Encode(output, dc.Image(), &webp.Options{Quality: 90})
|
||
default:
|
||
return png.Encode(output, dc.Image())
|
||
}
|
||
}
|
||
```
|
||
|
||
**Performance:**
|
||
- Small image (<1MB): ~10-30ms
|
||
- Large image (10MB+): ~100-300ms
|
||
- Memory: ~4x pixel dimensions (RGBA in memory)
|
||
|
||
**Caching:** ❌ Never.
|
||
|
||
**Edge Cases:**
|
||
| Case | Handling |
|
||
|------|----------|
|
||
| Animated GIF | Extract first frame, watermark, serve as static. Or reject. |
|
||
| Very small image (<200px) | Reduce font size; may become illegible — accept this |
|
||
| HEIC/HEIF | Convert to JPEG first (Apple format, limited Go support) |
|
||
| TIFF | Decode with `golang.org/x/image/tiff`; serve as PNG |
|
||
| RAW formats | Reject. Convert on upload. |
|
||
| SVG | Skip pixel watermarking; add text element to XML |
|
||
|
||
---
|
||
|
||
### 3.5 Video Watermarking (MP4, MOV)
|
||
|
||
**Tool:** FFmpeg (external binary) — no pure Go solution exists for video processing
|
||
|
||
**Approach:**
|
||
1. Pipe original video to FFmpeg stdin
|
||
2. FFmpeg overlays text watermark
|
||
3. Stream FFmpeg stdout to HTTP response
|
||
|
||
**Watermark Placement:**
|
||
- **Bottom-right corner:** Semi-transparent text overlay, visible but not distracting
|
||
- **Optional burn-in:** More prominent for high-sensitivity content
|
||
|
||
**Algorithm:**
|
||
```go
|
||
func WatermarkVideo(ctx context.Context, objectID string, w http.ResponseWriter, wm WatermarkParams) error {
|
||
// 1. Build FFmpeg command
|
||
// Text escape: replace special chars
|
||
escapedText := strings.ReplaceAll(wm.Text, ":", "\\:")
|
||
escapedText = strings.ReplaceAll(escapedText, "'", "\\'")
|
||
|
||
// drawtext filter
|
||
filter := fmt.Sprintf(
|
||
"drawtext=text='%s':fontsize=24:fontcolor=white@0.7:x=w-tw-20:y=h-th-20:shadowcolor=black@0.5:shadowx=2:shadowy=2",
|
||
escapedText,
|
||
)
|
||
|
||
cmd := exec.CommandContext(ctx, "ffmpeg",
|
||
"-i", "pipe:0", // Read from stdin
|
||
"-vf", filter, // Apply text filter
|
||
"-c:v", "libx264", // Re-encode video
|
||
"-preset", "fast", // Speed over compression
|
||
"-crf", "23", // Quality (lower = better)
|
||
"-c:a", "copy", // Copy audio unchanged
|
||
"-movflags", "+faststart+frag_keyframe+empty_moov", // Streaming-friendly
|
||
"-f", "mp4", // Output format
|
||
"pipe:1", // Write to stdout
|
||
)
|
||
|
||
// 2. Set up pipes
|
||
stdin, _ := cmd.StdinPipe()
|
||
cmd.Stdout = w
|
||
cmd.Stderr = os.Stderr // Log errors
|
||
|
||
// 3. Start FFmpeg
|
||
if err := cmd.Start(); err != nil {
|
||
return fmt.Errorf("ffmpeg start: %w", err)
|
||
}
|
||
|
||
// 4. Stream input file to FFmpeg
|
||
go func() {
|
||
defer stdin.Close()
|
||
obj, _ := store.Read(objectID)
|
||
io.Copy(stdin, bytes.NewReader(obj))
|
||
}()
|
||
|
||
// 5. Wait for completion
|
||
return cmd.Wait()
|
||
}
|
||
```
|
||
|
||
**Performance:**
|
||
- 1-minute video: ~5-15 seconds (re-encoding required)
|
||
- 10-minute video: ~30-90 seconds
|
||
- **Recommendation:** For videos >5 minutes, use async processing + notification when ready
|
||
|
||
**Caching:** ⚠️ **Consider selective caching for large videos.**
|
||
- Risk: Cached version has wrong timestamp for subsequent views
|
||
- Mitigation: Cache key includes `{object_id}:{user_id}:{date}` — same user same day gets cache
|
||
- Invalidate cache at midnight or on project config change
|
||
|
||
**Edge Cases:**
|
||
| Case | Handling |
|
||
|------|----------|
|
||
| Very long video (>1hr) | Async processing; return 202 with job ID; poll for completion |
|
||
| Corrupted video | FFmpeg will error; return 500 with audit log |
|
||
| Unsupported codec | FFmpeg handles most; truly exotic formats: reject |
|
||
| Audio-only file | No video stream to watermark; add metadata comment instead |
|
||
| MKV, AVI, WMV | Convert to MP4 on serve (FFmpeg handles this) |
|
||
|
||
---
|
||
|
||
### 3.6 Other File Types
|
||
|
||
**Strategy:** Encrypted download only. No preview, no watermarking.
|
||
|
||
**Affected Types:**
|
||
- ZIP, TAR, 7Z (archives)
|
||
- CAD files (DWG, DXF)
|
||
- Database exports (SQL, CSV with sensitive data)
|
||
- Executables (rare but possible)
|
||
- Unknown/binary files
|
||
|
||
**Watermark Alternative:**
|
||
1. Filename includes minimal watermark: `report_{user_id}_{timestamp}.zip`
|
||
2. Audit log captures full context
|
||
3. "CONFIDENTIAL" wrapper: Serve inside a new ZIP containing the file + a `NOTICE.txt`
|
||
|
||
```go
|
||
func ServeEncryptedDownload(w http.ResponseWriter, objectID string, wm WatermarkParams) error {
|
||
// Create wrapper ZIP with notice
|
||
buf := new(bytes.Buffer)
|
||
zw := zip.NewWriter(buf)
|
||
|
||
// Add notice file
|
||
notice, _ := zw.Create("NOTICE.txt")
|
||
fmt.Fprintf(notice, "CONFIDENTIAL\n\nDownloaded by: %s\nOrganization: %s\nTimestamp: %s\n\nUnauthorized distribution is prohibited.",
|
||
wm.UserName, wm.OrgName, wm.Timestamp)
|
||
|
||
// Add original file
|
||
original, _ := zw.Create(wm.OriginalFilename)
|
||
obj, _ := store.Read(objectID)
|
||
original.Write(obj)
|
||
|
||
zw.Close()
|
||
|
||
// Set download filename with watermark info
|
||
filename := fmt.Sprintf("%s_%s_%s.zip",
|
||
strings.TrimSuffix(wm.OriginalFilename, filepath.Ext(wm.OriginalFilename)),
|
||
wm.UserID[:8],
|
||
time.Now().Format("20060102"))
|
||
|
||
w.Header().Set("Content-Disposition", fmt.Sprintf(`attachment; filename="%s"`, filename))
|
||
w.Header().Set("Content-Type", "application/zip")
|
||
w.Write(buf.Bytes())
|
||
return nil
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 4. Screen Capture Protection
|
||
|
||
**Reality Check:** True screen capture protection is impossible. Any DRM can be defeated by pointing a camera at a screen. Our goal is **deterrence and traceability**, not prevention.
|
||
|
||
### 4.1 Visual Deterrent Strategy
|
||
|
||
**For PDFs served in-browser:**
|
||
1. Apply diagonal tiled watermark pattern (45°, repeated every 200px)
|
||
2. Use user-specific text in the pattern
|
||
3. Opacity 0.15 — visible in screenshots but doesn't obstruct reading
|
||
|
||
**For images:**
|
||
1. Same diagonal tiled pattern
|
||
2. Consider more aggressive opacity (0.25) for high-sensitivity images
|
||
|
||
**For video:**
|
||
1. Persistent corner watermark (already implemented)
|
||
2. Optional: periodic full-screen flash of watermark text (every 60s, 2s duration, 0.3 opacity)
|
||
|
||
### 4.2 Additional Deterrents
|
||
|
||
| Technique | Effectiveness | Implementation |
|
||
|-----------|---------------|----------------|
|
||
| Diagonal tiled watermark | High | Built into watermark functions |
|
||
| Random position micro-watermarks | Medium | Add 5-10 tiny (8px) watermarks at random positions |
|
||
| Invisible watermarks (steganography) | Low (easily stripped) | Not recommended — complexity vs. value |
|
||
| JavaScript screenshot detection | Low (easily bypassed) | Not recommended |
|
||
| CSS `-webkit-user-select: none` | Cosmetic only | Add to viewer CSS |
|
||
|
||
### 4.3 Recommended Approach
|
||
|
||
```go
|
||
type ScreenDeterrentLevel int
|
||
|
||
const (
|
||
DeterrentNone ScreenDeterrentLevel = 0
|
||
DeterrentStandard ScreenDeterrentLevel = 1 // Footer + diagonal tiled
|
||
DeterrentHigh ScreenDeterrentLevel = 2 // + random micro-watermarks
|
||
)
|
||
```
|
||
|
||
Default to `DeterrentStandard` for all data room documents. Project admins can escalate to `DeterrentHigh` for specific folders.
|
||
|
||
---
|
||
|
||
## 5. Audit Trail
|
||
|
||
Every file serve MUST be logged. No exceptions.
|
||
|
||
### 5.1 Audit Entry Structure
|
||
|
||
```go
|
||
type FileServeAudit struct {
|
||
ID string `json:"id"` // UUID
|
||
ProjectID string `json:"project_id"`
|
||
ObjectID string `json:"object_id"` // File being served
|
||
EntryID string `json:"entry_id"` // Parent answer entry
|
||
ActorID string `json:"actor_id"` // User requesting
|
||
ActorOrg string `json:"actor_org"` // Organization
|
||
Action string `json:"action"` // "view" | "download" | "print"
|
||
IP string `json:"ip"` // Client IP (X-Forwarded-For aware)
|
||
UserAgent string `json:"user_agent"`
|
||
Timestamp time.Time `json:"timestamp"`
|
||
WatermarkID string `json:"watermark_id"` // Unique ID embedded in watermark
|
||
FileType string `json:"file_type"` // "pdf", "docx", etc.
|
||
FileSize int64 `json:"file_size"`
|
||
Success bool `json:"success"`
|
||
ErrorMsg string `json:"error_msg,omitempty"`
|
||
}
|
||
```
|
||
|
||
### 5.2 Watermark ID
|
||
|
||
Each watermark includes a unique, traceable ID:
|
||
|
||
```go
|
||
func GenerateWatermarkID(actorID, objectID string, timestamp time.Time) string {
|
||
// Short, human-readable, globally unique
|
||
h := hmac.New(sha256.New, watermarkSecret)
|
||
h.Write([]byte(actorID + objectID + timestamp.Format(time.RFC3339)))
|
||
sum := h.Sum(nil)
|
||
return base32.StdEncoding.EncodeToString(sum[:8])[:13] // e.g., "JBSWY3DPEHPK3"
|
||
}
|
||
```
|
||
|
||
This ID appears in the watermark text and audit log. If a document leaks, grep for the ID → instant attribution.
|
||
|
||
### 5.3 Audit Table
|
||
|
||
```sql
|
||
CREATE TABLE file_serves (
|
||
id TEXT PRIMARY KEY,
|
||
project_id TEXT NOT NULL,
|
||
object_id TEXT NOT NULL,
|
||
entry_id TEXT,
|
||
actor_id TEXT NOT NULL,
|
||
actor_org TEXT,
|
||
action TEXT NOT NULL,
|
||
ip TEXT NOT NULL,
|
||
user_agent TEXT,
|
||
ts INTEGER NOT NULL,
|
||
watermark_id TEXT NOT NULL,
|
||
file_type TEXT NOT NULL,
|
||
file_size INTEGER,
|
||
success INTEGER NOT NULL,
|
||
error_msg TEXT
|
||
);
|
||
|
||
CREATE INDEX idx_serves_project ON file_serves(project_id);
|
||
CREATE INDEX idx_serves_actor ON file_serves(actor_id);
|
||
CREATE INDEX idx_serves_object ON file_serves(object_id);
|
||
CREATE INDEX idx_serves_watermark ON file_serves(watermark_id);
|
||
CREATE INDEX idx_serves_ts ON file_serves(ts);
|
||
```
|
||
|
||
### 5.4 Audit Logging Function
|
||
|
||
```go
|
||
func LogFileServe(ctx context.Context, audit FileServeAudit) error {
|
||
audit.ID = uuid.NewString()
|
||
if audit.Timestamp.IsZero() {
|
||
audit.Timestamp = time.Now()
|
||
}
|
||
|
||
// Pack sensitive fields (action, user_agent, error_msg)
|
||
packed, err := Pack(audit)
|
||
if err != nil {
|
||
return err
|
||
}
|
||
|
||
_, err = db.ExecContext(ctx, `
|
||
INSERT INTO file_serves
|
||
(id, project_id, object_id, entry_id, actor_id, actor_org, action, ip, user_agent, ts, watermark_id, file_type, file_size, success, error_msg)
|
||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`,
|
||
audit.ID, audit.ProjectID, audit.ObjectID, audit.EntryID,
|
||
audit.ActorID, audit.ActorOrg, packed.Action, audit.IP, packed.UserAgent,
|
||
audit.Timestamp.UnixMilli(), audit.WatermarkID, audit.FileType, audit.FileSize,
|
||
boolToInt(audit.Success), packed.ErrorMsg,
|
||
)
|
||
return err
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 6. Burn After Reading Mode
|
||
|
||
Optional per-file setting: file can only be downloaded N times total, or N times per user.
|
||
|
||
### 6.1 Configuration
|
||
|
||
```go
|
||
type BurnConfig struct {
|
||
Enabled bool `json:"enabled"`
|
||
MaxDownloads int `json:"max_downloads"` // Total across all users (0 = unlimited)
|
||
MaxPerUser int `json:"max_per_user"` // Per individual user (0 = unlimited)
|
||
ExpiresAt *int64 `json:"expires_at"` // Unix ms timestamp (optional)
|
||
NotifyOnBurn bool `json:"notify_on_burn"` // Alert admins when limit reached
|
||
}
|
||
```
|
||
|
||
### 6.2 Tracking Table
|
||
|
||
```sql
|
||
CREATE TABLE burn_tracking (
|
||
object_id TEXT NOT NULL,
|
||
actor_id TEXT NOT NULL,
|
||
download_count INTEGER NOT NULL DEFAULT 0,
|
||
last_download INTEGER,
|
||
PRIMARY KEY (object_id, actor_id)
|
||
);
|
||
|
||
CREATE TABLE burn_totals (
|
||
object_id TEXT PRIMARY KEY,
|
||
total_downloads INTEGER NOT NULL DEFAULT 0,
|
||
burned_at INTEGER -- When limit was hit
|
||
);
|
||
```
|
||
|
||
### 6.3 Burn Check Function
|
||
|
||
```go
|
||
func CheckBurnLimit(ctx context.Context, objectID, actorID string) (allowed bool, remaining int, err error) {
|
||
// 1. Load burn config from entry Data
|
||
config, err := GetBurnConfig(ctx, objectID)
|
||
if err != nil || !config.Enabled {
|
||
return true, -1, nil // No burn limit
|
||
}
|
||
|
||
// 2. Check expiration
|
||
if config.ExpiresAt != nil && time.Now().UnixMilli() > *config.ExpiresAt {
|
||
return false, 0, nil
|
||
}
|
||
|
||
// 3. Check total downloads
|
||
var total int
|
||
db.QueryRowContext(ctx, "SELECT total_downloads FROM burn_totals WHERE object_id = ?", objectID).Scan(&total)
|
||
if config.MaxDownloads > 0 && total >= config.MaxDownloads {
|
||
return false, 0, nil
|
||
}
|
||
|
||
// 4. Check per-user downloads
|
||
var userCount int
|
||
db.QueryRowContext(ctx,
|
||
"SELECT download_count FROM burn_tracking WHERE object_id = ? AND actor_id = ?",
|
||
objectID, actorID).Scan(&userCount)
|
||
if config.MaxPerUser > 0 && userCount >= config.MaxPerUser {
|
||
return false, 0, nil
|
||
}
|
||
|
||
// 5. Calculate remaining
|
||
remaining = -1 // Unlimited
|
||
if config.MaxDownloads > 0 {
|
||
remaining = config.MaxDownloads - total
|
||
}
|
||
if config.MaxPerUser > 0 {
|
||
userRemaining := config.MaxPerUser - userCount
|
||
if remaining < 0 || userRemaining < remaining {
|
||
remaining = userRemaining
|
||
}
|
||
}
|
||
|
||
return true, remaining, nil
|
||
}
|
||
|
||
func IncrementBurnCount(ctx context.Context, objectID, actorID string) error {
|
||
tx, _ := db.BeginTx(ctx, nil)
|
||
defer tx.Rollback()
|
||
|
||
// Upsert user tracking
|
||
tx.ExecContext(ctx, `
|
||
INSERT INTO burn_tracking (object_id, actor_id, download_count, last_download)
|
||
VALUES (?, ?, 1, ?)
|
||
ON CONFLICT (object_id, actor_id) DO UPDATE SET
|
||
download_count = download_count + 1,
|
||
last_download = ?`,
|
||
objectID, actorID, time.Now().UnixMilli(), time.Now().UnixMilli())
|
||
|
||
// Upsert total
|
||
tx.ExecContext(ctx, `
|
||
INSERT INTO burn_totals (object_id, total_downloads)
|
||
VALUES (?, 1)
|
||
ON CONFLICT (object_id) DO UPDATE SET
|
||
total_downloads = total_downloads + 1`,
|
||
objectID)
|
||
|
||
return tx.Commit()
|
||
}
|
||
```
|
||
|
||
### 6.4 Burn Notification
|
||
|
||
When a file hits its limit:
|
||
|
||
```go
|
||
func NotifyBurn(ctx context.Context, objectID string, config BurnConfig) {
|
||
if !config.NotifyOnBurn {
|
||
return
|
||
}
|
||
|
||
// Update burned_at timestamp
|
||
db.ExecContext(ctx, "UPDATE burn_totals SET burned_at = ? WHERE object_id = ?",
|
||
time.Now().UnixMilli(), objectID)
|
||
|
||
// Notify project admins (via existing notification system)
|
||
entry, _ := GetEntryByObjectID(ctx, objectID)
|
||
NotifyProjectAdmins(ctx, entry.ProjectID, NotificationBurnLimitReached, map[string]any{
|
||
"object_id": objectID,
|
||
"filename": entry.Data["filename"],
|
||
})
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 7. Go Implementation Design
|
||
|
||
### 7.1 lib/watermark.go — Function Signatures
|
||
|
||
```go
|
||
package lib
|
||
|
||
import (
|
||
"context"
|
||
"io"
|
||
"time"
|
||
)
|
||
|
||
// WatermarkParams contains all info needed to generate a watermark
|
||
type WatermarkParams struct {
|
||
UserID string
|
||
UserName string
|
||
OrgID string
|
||
OrgName string
|
||
Timestamp time.Time
|
||
WatermarkID string // Unique traceable ID
|
||
|
||
// Styling (from project config)
|
||
Config WatermarkConfig
|
||
|
||
// Original file info
|
||
OriginalFilename string
|
||
FileType string // "pdf", "docx", "xlsx", "jpg", etc.
|
||
|
||
// Options
|
||
ScreenDeterrent bool // Add aggressive visual deterrent
|
||
}
|
||
|
||
// WatermarkConfig is project-level styling configuration
|
||
type WatermarkConfig struct {
|
||
Text string // Template with placeholders
|
||
FontFamily string
|
||
FontSize int
|
||
Color string // RGBA hex
|
||
Position string // "footer", "header", "diagonal", "tiled"
|
||
Opacity float64
|
||
DeterrentLevel ScreenDeterrentLevel
|
||
}
|
||
|
||
// ScreenDeterrentLevel controls anti-screenshot measures
|
||
type ScreenDeterrentLevel int
|
||
|
||
const (
|
||
DeterrentNone ScreenDeterrentLevel = 0
|
||
DeterrentStandard ScreenDeterrentLevel = 1
|
||
DeterrentHigh ScreenDeterrentLevel = 2
|
||
)
|
||
|
||
// BuildWatermarkText generates the actual watermark string from params
|
||
func BuildWatermarkText(params WatermarkParams) string
|
||
|
||
// GenerateWatermarkID creates a unique, traceable watermark identifier
|
||
func GenerateWatermarkID(actorID, objectID string, timestamp time.Time) string
|
||
|
||
// ---- Per-Type Watermarking Functions ----
|
||
|
||
// WatermarkPDF adds watermark to PDF, writing result to output
|
||
func WatermarkPDF(input io.Reader, output io.Writer, params WatermarkParams) error
|
||
|
||
// WatermarkDOCX adds watermark to Word document
|
||
func WatermarkDOCX(input io.Reader, output io.Writer, params WatermarkParams) error
|
||
|
||
// WatermarkXLSX adds watermark to Excel spreadsheet
|
||
func WatermarkXLSX(input io.Reader, output io.Writer, params WatermarkParams) error
|
||
|
||
// WatermarkImage adds watermark to image (JPG, PNG, WebP)
|
||
// Returns the output format used (may differ from input for unsupported formats)
|
||
func WatermarkImage(input io.Reader, output io.Writer, inputFormat string, params WatermarkParams) (outputFormat string, err error)
|
||
|
||
// WatermarkVideo streams video with watermark overlay
|
||
// This is special: it writes directly to http.ResponseWriter, handling streaming
|
||
func WatermarkVideo(ctx context.Context, objectReader io.Reader, w io.Writer, params WatermarkParams) error
|
||
|
||
// ServeProtectedFile is the unified entry point for the protection pipeline
|
||
// It detects file type and applies appropriate watermarking
|
||
func ServeProtectedFile(ctx context.Context, objectID string, w io.Writer, params WatermarkParams) error
|
||
|
||
// ServeEncryptedDownload wraps non-watermarkable files with NOTICE.txt
|
||
func ServeEncryptedDownload(ctx context.Context, objectID string, w io.Writer, params WatermarkParams) error
|
||
|
||
// ---- Audit Functions ----
|
||
|
||
// LogFileServe records every file access to the audit table
|
||
func LogFileServe(ctx context.Context, audit FileServeAudit) error
|
||
|
||
// FileServeAudit contains all info about a file serve event
|
||
type FileServeAudit struct {
|
||
ID string
|
||
ProjectID string
|
||
ObjectID string
|
||
EntryID string
|
||
ActorID string
|
||
ActorOrg string
|
||
Action string // "view", "download", "print"
|
||
IP string
|
||
UserAgent string
|
||
Timestamp time.Time
|
||
WatermarkID string
|
||
FileType string
|
||
FileSize int64
|
||
Success bool
|
||
ErrorMsg string
|
||
}
|
||
|
||
// ---- Burn After Reading ----
|
||
|
||
// BurnConfig defines download limits for a file
|
||
type BurnConfig struct {
|
||
Enabled bool
|
||
MaxDownloads int // Total across all users
|
||
MaxPerUser int // Per individual user
|
||
ExpiresAt *int64
|
||
NotifyOnBurn bool
|
||
}
|
||
|
||
// CheckBurnLimit verifies if download is allowed and returns remaining count
|
||
func CheckBurnLimit(ctx context.Context, objectID, actorID string) (allowed bool, remaining int, err error)
|
||
|
||
// IncrementBurnCount records a successful download
|
||
func IncrementBurnCount(ctx context.Context, objectID, actorID string) error
|
||
|
||
// GetBurnConfig retrieves burn settings for an object
|
||
func GetBurnConfig(ctx context.Context, objectID string) (BurnConfig, error)
|
||
|
||
// SetBurnConfig updates burn settings for an object
|
||
func SetBurnConfig(ctx context.Context, objectID string, config BurnConfig) error
|
||
```
|
||
|
||
### 7.2 Dependencies (go.mod additions)
|
||
|
||
```go
|
||
// PDF processing
|
||
require github.com/pdfcpu/pdfcpu v0.8.0
|
||
|
||
// Office documents
|
||
require github.com/unidoc/unioffice v1.33.0
|
||
require github.com/xuri/excelize/v2 v2.8.1
|
||
|
||
// Image processing
|
||
require github.com/fogleman/gg v1.3.0
|
||
require golang.org/x/image v0.18.0
|
||
|
||
// WebP support (for image encoding)
|
||
require github.com/chai2010/webp v1.1.1
|
||
|
||
// Video (FFmpeg is external binary, no Go dep needed)
|
||
// Ensure ffmpeg is installed: apt install ffmpeg
|
||
|
||
// Crypto (FIPS 140-3 compliance)
|
||
// Use standard library crypto/aes, crypto/hmac, crypto/sha256
|
||
// These are FIPS-approved when built with GOEXPERIMENT=boringcrypto
|
||
```
|
||
|
||
**FIPS 140-3 Note:** Build with `GOEXPERIMENT=boringcrypto` to use BoringSSL for FIPS-compliant crypto operations.
|
||
|
||
### 7.3 Serve Pipeline
|
||
|
||
Complete request flow from HTTP request to watermarked response:
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────────────────────┐
|
||
│ FILE SERVE PIPELINE │
|
||
└─────────────────────────────────────────────────────────────────────────────────┘
|
||
|
||
HTTP Request: GET /api/files/{object_id}
|
||
│
|
||
▼
|
||
┌──────────────────────┐
|
||
│ 1. Auth Middleware │ Extract JWT, validate session
|
||
└──────────────────────┘
|
||
│
|
||
▼
|
||
┌──────────────────────┐
|
||
│ 2. CheckAccess() │ Verify actor can access this object
|
||
│ (lib/rbac.go) │ Walk to parent entry → workstream → project
|
||
└──────────────────────┘ Return 403 if denied
|
||
│
|
||
▼
|
||
┌──────────────────────┐
|
||
│ 3. CheckBurnLimit() │ Verify download quota not exceeded
|
||
│ (lib/watermark.go)│ Return 410 Gone if burned
|
||
└──────────────────────┘
|
||
│
|
||
▼
|
||
┌──────────────────────┐
|
||
│ 4. ObjectRead() │ Fetch encrypted object from store
|
||
│ (lib/store.go) │ Decrypt with project key
|
||
└──────────────────────┘
|
||
│
|
||
▼
|
||
┌──────────────────────┐
|
||
│ 5. Build Params │ Construct WatermarkParams from:
|
||
│ │ - Actor (user_id, name, org)
|
||
│ │ - Project config (styling)
|
||
│ │ - Timestamp + generated watermark ID
|
||
│ │ - File metadata (type, name)
|
||
└──────────────────────┘
|
||
│
|
||
▼
|
||
┌──────────────────────┐
|
||
│ 6. Route by Type │ Detect file type from extension/magic bytes
|
||
│ │
|
||
│ PDF ──────────► │ WatermarkPDF()
|
||
│ DOCX ─────────► │ WatermarkDOCX()
|
||
│ XLSX ─────────► │ WatermarkXLSX()
|
||
│ Image ────────► │ WatermarkImage()
|
||
│ Video ────────► │ WatermarkVideo()
|
||
│ Other ────────► │ ServeEncryptedDownload()
|
||
└──────────────────────┘
|
||
│
|
||
▼
|
||
┌──────────────────────┐
|
||
│ 7. Stream to Client │ Set Content-Type, Content-Disposition
|
||
│ │ Write watermarked content to ResponseWriter
|
||
└──────────────────────┘
|
||
│
|
||
▼
|
||
┌──────────────────────┐
|
||
│ 8. LogFileServe() │ Record to audit table (async, don't block)
|
||
│ (lib/watermark.go)│
|
||
└──────────────────────┘
|
||
│
|
||
▼
|
||
┌──────────────────────┐
|
||
│ 9. IncrementBurn() │ Update download counters if burn enabled
|
||
│ (lib/watermark.go)│ Check if limit now hit → notify admins
|
||
└──────────────────────┘
|
||
│
|
||
▼
|
||
HTTP Response: Watermarked file stream
|
||
```
|
||
|
||
### 7.4 Handler Implementation
|
||
|
||
```go
|
||
// api/handlers.go
|
||
|
||
func (h *Handler) ServeFile(w http.ResponseWriter, r *http.Request) {
|
||
ctx := r.Context()
|
||
objectID := chi.URLParam(r, "objectID")
|
||
actor := auth.ActorFromContext(ctx)
|
||
|
||
// 1. Access check
|
||
entry, err := lib.GetEntryByObjectID(ctx, h.db, objectID)
|
||
if err != nil {
|
||
http.Error(w, "not found", 404)
|
||
return
|
||
}
|
||
|
||
if !lib.CheckAccess(ctx, h.db, actor.ID, entry.ProjectID, entry.ID, "read") {
|
||
http.Error(w, "forbidden", 403)
|
||
return
|
||
}
|
||
|
||
// 2. Burn limit check
|
||
allowed, remaining, err := lib.CheckBurnLimit(ctx, h.db, objectID, actor.ID)
|
||
if err != nil {
|
||
http.Error(w, "internal error", 500)
|
||
return
|
||
}
|
||
if !allowed {
|
||
http.Error(w, "download limit exceeded", 410) // Gone
|
||
return
|
||
}
|
||
|
||
// 3. Read object
|
||
data, err := lib.ObjectRead(ctx, h.store, h.projectKey, objectID)
|
||
if err != nil {
|
||
http.Error(w, "not found", 404)
|
||
return
|
||
}
|
||
|
||
// 4. Build watermark params
|
||
fileInfo := entry.Data["files"].([]any)[0].(map[string]any) // Simplified
|
||
projectConfig, _ := lib.GetProjectWatermarkConfig(ctx, h.db, entry.ProjectID)
|
||
|
||
params := lib.WatermarkParams{
|
||
UserID: actor.ID,
|
||
UserName: actor.Name,
|
||
OrgID: actor.OrgID,
|
||
OrgName: actor.OrgName,
|
||
Timestamp: time.Now(),
|
||
WatermarkID: lib.GenerateWatermarkID(actor.ID, objectID, time.Now()),
|
||
Config: projectConfig,
|
||
OriginalFilename: fileInfo["name"].(string),
|
||
FileType: fileInfo["type"].(string),
|
||
ScreenDeterrent: projectConfig.DeterrentLevel >= lib.DeterrentStandard,
|
||
}
|
||
|
||
// 5. Prepare audit entry (log after response)
|
||
audit := lib.FileServeAudit{
|
||
ProjectID: entry.ProjectID,
|
||
ObjectID: objectID,
|
||
EntryID: entry.ID,
|
||
ActorID: actor.ID,
|
||
ActorOrg: actor.OrgName,
|
||
Action: "download",
|
||
IP: realIP(r),
|
||
UserAgent: r.UserAgent(),
|
||
WatermarkID: params.WatermarkID,
|
||
FileType: params.FileType,
|
||
FileSize: int64(len(data)),
|
||
Success: true,
|
||
}
|
||
|
||
// 6. Set response headers
|
||
contentType := mimeTypeFromExt(params.FileType)
|
||
w.Header().Set("Content-Type", contentType)
|
||
w.Header().Set("Content-Disposition",
|
||
fmt.Sprintf(`attachment; filename="%s"`, params.OriginalFilename))
|
||
if remaining > 0 {
|
||
w.Header().Set("X-Downloads-Remaining", fmt.Sprintf("%d", remaining))
|
||
}
|
||
|
||
// 7. Apply watermark and stream
|
||
err = lib.ServeProtectedFile(ctx, bytes.NewReader(data), w, params)
|
||
if err != nil {
|
||
audit.Success = false
|
||
audit.ErrorMsg = err.Error()
|
||
// Still log the attempt
|
||
}
|
||
|
||
// 8. Log and update burn count (async)
|
||
go func() {
|
||
lib.LogFileServe(context.Background(), h.db, audit)
|
||
if audit.Success {
|
||
lib.IncrementBurnCount(context.Background(), h.db, objectID, actor.ID)
|
||
}
|
||
}()
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 8. Performance Considerations
|
||
|
||
### 8.1 Processing Time Budgets
|
||
|
||
| File Type | Target | Acceptable | Timeout |
|
||
|-----------|--------|------------|---------|
|
||
| PDF (≤10pg) | <100ms | <500ms | 10s |
|
||
| PDF (>100pg) | <1s | <5s | 30s |
|
||
| DOCX | <100ms | <500ms | 10s |
|
||
| XLSX | <100ms | <500ms | 10s |
|
||
| Image | <50ms | <200ms | 5s |
|
||
| Video (1min) | <10s | <30s | 120s |
|
||
| Video (10min) | async | async | 300s |
|
||
|
||
### 8.2 Memory Management
|
||
|
||
- PDF: Stream pages when possible; full memory load for complex watermarks
|
||
- DOCX/XLSX: Full memory load required (ZIP structure)
|
||
- Image: Decode → process → encode; ~4x pixel dimensions in RAM
|
||
- Video: Stream through FFmpeg; no full file in memory
|
||
|
||
**Memory limits per request:**
|
||
- PDF: 256MB max
|
||
- Office: 128MB max
|
||
- Image: 512MB max (4K images at 4 bytes/pixel)
|
||
- Video: Streaming, no limit
|
||
|
||
### 8.3 Caching Strategy
|
||
|
||
| What | Cache? | Rationale |
|
||
|------|--------|-----------|
|
||
| Watermarked files | ❌ Never | Timestamp and user-specific content |
|
||
| Project watermark config | ✅ 5min TTL | Rarely changes |
|
||
| User/org lookups | ✅ 5min TTL | Rarely changes |
|
||
| Object decryption | ⚠️ Per-request | Could cache decrypted bytes briefly |
|
||
| Burn counts | ❌ Never | Must be accurate |
|
||
|
||
### 8.4 Async Processing for Large Files
|
||
|
||
Videos >5 minutes and PDFs >500 pages should use async processing:
|
||
|
||
```go
|
||
func (h *Handler) ServeFileLarge(w http.ResponseWriter, r *http.Request) {
|
||
// ... access checks ...
|
||
|
||
if isLargeFile(params.FileType, fileSize) {
|
||
// Queue for background processing
|
||
jobID := lib.QueueWatermarkJob(ctx, h.queue, objectID, params)
|
||
|
||
w.Header().Set("Content-Type", "application/json")
|
||
w.WriteHeader(202) // Accepted
|
||
json.NewEncoder(w).Encode(map[string]string{
|
||
"status": "processing",
|
||
"job_id": jobID,
|
||
"poll_url": "/api/files/jobs/" + jobID,
|
||
})
|
||
return
|
||
}
|
||
|
||
// Normal sync processing...
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 9. Error Handling & Fallbacks
|
||
|
||
### 9.1 Graceful Degradation
|
||
|
||
| Error | Fallback | User Message |
|
||
|-------|----------|--------------|
|
||
| PDF parse failure | Serve original with audit + filename watermark | "Processing unavailable" |
|
||
| Office parse failure | Encrypt + download only | "Preview unavailable" |
|
||
| Image decode failure | Serve original with audit | "Processing unavailable" |
|
||
| Video FFmpeg failure | Encrypt + download only | "Streaming unavailable" |
|
||
| Timeout exceeded | Serve original with audit | "Processing timeout" |
|
||
|
||
### 9.2 Error Logging
|
||
|
||
All watermark failures are logged with full context:
|
||
|
||
```go
|
||
type WatermarkError struct {
|
||
ObjectID string
|
||
FileType string
|
||
Error string
|
||
Stack string
|
||
Fallback string // What we did instead
|
||
Timestamp time.Time
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 10. Security Considerations
|
||
|
||
### 10.1 FIPS 140-3 Compliance
|
||
|
||
All crypto operations use FIPS-approved algorithms:
|
||
- AES-256-GCM for encryption
|
||
- SHA-256 for hashing
|
||
- HMAC-SHA256 for watermark ID generation
|
||
- Build with `GOEXPERIMENT=boringcrypto`
|
||
|
||
### 10.2 Watermark Tampering Resistance
|
||
|
||
Watermarks are applied at serve time, so tampering with stored files doesn't help. However:
|
||
- Digital signatures on PDFs are invalidated by watermarking (expected)
|
||
- Office documents could have watermark row deleted (accepted risk; audit trail remains)
|
||
- Image watermarks can be cropped (tiled pattern mitigates)
|
||
- Video watermarks can be cropped (corner + periodic full-screen mitigates)
|
||
|
||
### 10.3 Audit Integrity
|
||
|
||
Audit logs should be:
|
||
- Write-only (no DELETE endpoint)
|
||
- Integrity-checked (HMAC of each row, chain hash)
|
||
- Replicated off-server for SOX/compliance
|
||
|
||
---
|
||
|
||
## 11. Testing Strategy
|
||
|
||
### 11.1 Unit Tests
|
||
|
||
```go
|
||
func TestWatermarkPDF(t *testing.T)
|
||
func TestWatermarkPDFPassword(t *testing.T) // Should fail gracefully
|
||
func TestWatermarkPDFCorrupt(t *testing.T) // Should fallback
|
||
func TestWatermarkDOCX(t *testing.T)
|
||
func TestWatermarkXLSX(t *testing.T)
|
||
func TestWatermarkImage(t *testing.T)
|
||
func TestBurnLimitEnforced(t *testing.T)
|
||
func TestBurnLimitPerUser(t *testing.T)
|
||
func TestAuditLogCreated(t *testing.T)
|
||
func TestWatermarkIDUnique(t *testing.T)
|
||
```
|
||
|
||
### 11.2 Integration Tests
|
||
|
||
- Upload file → download → verify watermark present
|
||
- Exceed burn limit → verify 410 response
|
||
- Concurrent downloads → verify accurate burn counting
|
||
- Large video → verify async handling
|
||
|
||
### 11.3 Sample Files
|
||
|
||
Maintain a test corpus of:
|
||
- Normal files (all types)
|
||
- Password-protected files
|
||
- Corrupted files
|
||
- Maximum-size files (stress test)
|
||
- Edge cases per type (see sections 3.x)
|
||
|
||
---
|
||
|
||
## 12. Open Questions / Future Work
|
||
|
||
1. **PDF/A compliance:** Verify pdfcpu maintains PDF/A compliance after watermarking. May need explicit flag.
|
||
|
||
2. **Office 365 online preview:** When files are previewed in Office Online, watermarks must persist. May need server-side rendering instead.
|
||
|
||
3. **Mobile app considerations:** Native mobile viewers may strip/ignore some watermarks. Test thoroughly.
|
||
|
||
4. **Print watermarks:** Physical prints should show watermark. PDF print header/footer may be more robust than visual overlay.
|
||
|
||
5. **Invisible forensic watermarks:** Steganographic watermarks that survive screenshots/prints. Complex, may add later.
|
||
|
||
6. **Video DRM:** HLS with encryption + Widevine. Overkill for MVP, but worth considering for future.
|
||
|
||
---
|
||
|
||
*This specification is complete and ready for implementation. Questions → Johan.*
|