43 KiB
Dealspace — Watermark & File Protection Pipeline
Version: 0.1 — 2026-02-28
Status: Design specification for section 6.2 of SPEC.md
Author: James (subagent)
1. Design Principles
-
Original files are sacred. Storage always contains the clean, unmodified original. Watermarks are applied at serve time, never persisted.
-
Watermarks are forensic, not decorative. If a document leaks, we must trace it to a specific user/org/timestamp. Watermarks are evidence, not theater.
-
FIPS 140-3 throughout. All crypto operations use FIPS-approved algorithms. No exceptions.
-
Performance over perfection. A 50ms watermark that traces 99% of leaks beats a 5s watermark that's "perfect." Users won't wait.
-
Graceful degradation. If watermarking fails (corrupted file, unsupported variant), serve with audit log + fallback watermark strategy, never block access entirely.
2. Watermark Content
Standard watermark string (configurable per project):
{user_name} · {org_name} · {iso_timestamp} · CONFIDENTIAL
Example:
John Smith · Acme Capital · 2026-02-28T14:32:17Z · CONFIDENTIAL
2.1 Watermark Variants
| Variant | Use Case | Content |
|---|---|---|
standard |
Normal access | Full string as above |
screen_deterrent |
Tiled background for PDF preview | Repeated diagonal pattern |
minimal |
Fallback when processing fails | {user_id}:{timestamp} (short, traceable) |
2.2 Watermark Styling (Project Config)
type WatermarkConfig struct {
Text string // Template: "{user_name} · {org_name} · {timestamp} · CONFIDENTIAL"
FontFamily string // Default: "Helvetica" (PDF), "Calibri" (Office)
FontSize int // Default: 10 (text), 48 (tiled background)
Color string // RGBA hex: "#FF0000AA" (semi-transparent red)
Position string // "footer" | "header" | "diagonal" | "tiled"
Opacity float64 // 0.0-1.0, default 0.3 for diagonal/tiled
}
3. File Type Implementations
3.1 PDF Watermarking
Library: github.com/pdfcpu/pdfcpu (pure Go, FIPS-compatible, actively maintained)
Approach:
- Parse PDF into memory
- Add watermark as text annotation or stamped content on each page
- Serialize modified PDF to output stream
Watermark Placement:
- Footer watermark: Bottom center of each page, 10pt gray text
- Diagonal tiled (screen deterrent): 45° repeated pattern across entire page, 0.15 opacity
Algorithm:
func WatermarkPDF(input io.Reader, output io.Writer, wm WatermarkParams) error {
// 1. Read PDF
ctx, err := pdfcpu.ReadContext(input, nil)
if err != nil {
return fmt.Errorf("pdf parse: %w", err)
}
// 2. Build watermark spec
wmSpec := pdfcpu.TextWatermark{
Text: wm.Text,
FontName: "Helvetica",
FontSize: 10,
Color: pdfcpu.Gray,
Pos: pdfcpu.BottomCenter,
}
// 3. Apply to all pages
if err := pdfcpu.AddWatermarks(ctx, nil, wmSpec); err != nil {
return fmt.Errorf("pdf watermark: %w", err)
}
// 4. Optionally add diagonal tiled pattern for screen deterrent
if wm.ScreenDeterrent {
tiledSpec := pdfcpu.TextWatermark{
Text: wm.Text,
FontSize: 48,
Color: pdfcpu.LightGray,
Opacity: 0.15,
Rotation: 45,
Diagonal: true,
}
pdfcpu.AddWatermarks(ctx, nil, tiledSpec)
}
// 5. Write output
return pdfcpu.WriteContext(ctx, output)
}
Performance:
- Small PDF (1-10 pages): ~20-50ms
- Large PDF (100+ pages): ~200-500ms
- Memory: ~2x file size during processing
Caching: ❌ Never cache watermarked PDFs. Each serve includes user-specific timestamp. Caching would serve stale timestamps or wrong user identities. The whole point is forensic traceability.
Edge Cases:
| Case | Handling |
|---|---|
| Password-protected PDF | Reject with error: "Cannot watermark encrypted PDF. Contact administrator." Log to audit. |
| Corrupted PDF | Attempt parse; if fails, serve original with minimal watermark in filename + audit log |
| PDF/A strict | pdfcpu preserves PDF/A compliance; no special handling needed |
| Scanned PDF (images) | Watermark overlays images; no text extraction needed |
| 1000+ page PDF | Stream processing; set timeout at 30s, fallback to minimal if exceeded |
3.2 Word Document (.docx) Watermarking
Library: github.com/unidoc/unioffice (pure Go, Office Open XML manipulation)
Approach:
- Unzip DOCX (it's a ZIP of XML files)
- Modify
word/document.xmlto add footer content - Create/modify
word/footer1.xmlwith watermark text - Update
[Content_Types].xmland relationships - Rezip and serve
Watermark Placement:
- Footer: Centered text in document footer, appears on every page
- Header alternative: For "CONFIDENTIAL" prominence, add to header
Algorithm:
func WatermarkDOCX(input io.Reader, output io.Writer, wm WatermarkParams) error {
// 1. Open DOCX
doc, err := document.Read(input, int64(size))
if err != nil {
return fmt.Errorf("docx parse: %w", err)
}
// 2. Get or create footer
footer := doc.AddFooter()
footer.SetParagraphProperties(document.ParagraphStyleFooter)
// 3. Add watermark paragraph
para := footer.AddParagraph()
para.SetAlignment(document.AlignmentCenter)
run := para.AddRun()
run.AddText(wm.Text)
run.Properties().SetColor(color.Gray)
run.Properties().SetSize(10)
// 4. Apply footer to all sections
for _, section := range doc.Sections() {
section.SetFooter(footer, document.FooterTypeDefault)
}
// 5. Save
return doc.Save(output)
}
Performance:
- Typical DOCX: ~30-80ms
- Large DOCX with images: ~100-300ms
- Memory: ~3x file size (uncompressed XML is verbose)
Caching: ❌ Never. Same reasoning as PDF.
Edge Cases:
| Case | Handling |
|---|---|
| Password-protected DOCX | Reject with error. Office encryption prevents modification. |
| Corrupted DOCX | Attempt parse; fallback to encrypted-download-only mode |
| DOCX with existing footer | Append watermark to existing footer, don't replace |
| DOCM (macro-enabled) | Same process; macros preserved. Consider security warning. |
| DOC (legacy binary) | Convert via LibreOffice CLI first, or reject. See 3.2.1. |
3.2.1 Legacy DOC Handling
Binary .doc files cannot be watermarked with pure Go. Options:
- Convert to PDF on upload (recommended for M&A — preserves formatting, prevents editing)
- LibreOffice CLI conversion at serve time:
libreoffice --headless --convert-to docx - Reject with message: "Legacy format. Please upload .docx"
Recommendation: Option 1 for new uploads; Option 3 for existing files in MVP.
3.3 Excel (.xlsx) Watermarking
Library: github.com/xuri/excelize/v2 (pure Go, actively maintained, 15k+ stars)
Approach:
- Open XLSX
- For each sheet: insert header row with watermark text
- Optionally: add sheet-level "protection" (cosmetic, not security — easily bypassed)
- Save to output stream
Watermark Placement:
- Header row (Row 1): Merged cells spanning data width, light gray background, watermark text
- Sheet header/footer: Print-only watermark (visible when printed)
Algorithm:
func WatermarkXLSX(input io.Reader, output io.Writer, wm WatermarkParams) error {
// 1. Open workbook
f, err := excelize.OpenReader(input)
if err != nil {
return fmt.Errorf("xlsx parse: %w", err)
}
defer f.Close()
// 2. Watermark each sheet
for _, sheet := range f.GetSheetList() {
// Get data dimensions
dim, _ := f.GetSheetDimension(sheet)
cols := parseColumnCount(dim) // e.g., "A1:J50" → 10 columns
// Insert row at top
if err := f.InsertRows(sheet, 1, 1); err != nil {
continue
}
// Merge cells for watermark banner
endCol := columnLetter(cols)
f.MergeCell(sheet, "A1", endCol+"1")
// Set watermark text
f.SetCellValue(sheet, "A1", wm.Text)
// Style: light gray background, centered, small font
styleID, _ := f.NewStyle(&excelize.Style{
Fill: excelize.Fill{Type: "pattern", Color: []string{"#EEEEEE"}, Pattern: 1},
Font: &excelize.Font{Size: 9, Color: "#888888"},
Alignment: &excelize.Alignment{Horizontal: "center"},
})
f.SetCellStyle(sheet, "A1", endCol+"1", styleID)
// Add print header/footer
f.SetHeaderFooter(sheet, &excelize.HeaderFooterOptions{
OddFooter: "&C" + wm.Text,
})
}
// 3. Optional: add sheet protection (cosmetic only)
if wm.AddProtection {
for _, sheet := range f.GetSheetList() {
f.ProtectSheet(sheet, &excelize.SheetProtection{
Password: "", // No password — just prevents casual editing
SelectLockedCells: true,
})
}
}
// 4. Write output
return f.Write(output)
}
Performance:
- Small XLSX: ~20-50ms
- Large XLSX (10k+ rows): ~100-400ms
- Memory: ~2-4x file size
Caching: ❌ Never.
Edge Cases:
| Case | Handling |
|---|---|
| Password-protected XLSX | Reject. Cannot modify encrypted workbook. |
| Workbook with VBA macros (.xlsm) | Process same as .xlsx; macros preserved |
| Very wide sheets (1000+ columns) | Skip merge, add watermark to A1 only |
| Charts/pivot tables | Unaffected; watermark is in data area |
| XLS (legacy binary) | Reject or convert via LibreOffice. Same as DOC. |
3.4 Image Watermarking (JPG, PNG, WebP)
Library: Standard library image + golang.org/x/image + github.com/fogleman/gg (2D graphics)
Approach:
- Decode image
- Draw semi-transparent text overlay
- Encode to output format
Watermark Placement:
- Bottom-right corner: Primary watermark, semi-transparent white text with drop shadow
- Tiled diagonal (optional): For high-value images, repeated pattern across entire image
Algorithm:
func WatermarkImage(input io.Reader, output io.Writer, format string, wm WatermarkParams) error {
// 1. Decode image
img, _, err := image.Decode(input)
if err != nil {
return fmt.Errorf("image decode: %w", err)
}
bounds := img.Bounds()
width, height := bounds.Dx(), bounds.Dy()
// 2. Create drawing context
dc := gg.NewContextForImage(img)
// 3. Calculate font size based on image dimensions
fontSize := float64(width) / 50 // ~2% of width
if fontSize < 12 {
fontSize = 12
}
if fontSize > 48 {
fontSize = 48
}
dc.LoadFontFace("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", fontSize)
// 4. Position: bottom-right with padding
textWidth, textHeight := dc.MeasureString(wm.Text)
x := float64(width) - textWidth - 20
y := float64(height) - 20
// 5. Draw drop shadow
dc.SetRGBA(0, 0, 0, 0.5)
dc.DrawString(wm.Text, x+2, y+2)
// 6. Draw watermark text
dc.SetRGBA(1, 1, 1, 0.7)
dc.DrawString(wm.Text, x, y)
// 7. Optional: diagonal tiled pattern
if wm.ScreenDeterrent {
dc.SetRGBA(0.5, 0.5, 0.5, 0.15)
dc.LoadFontFace("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", fontSize*2)
for row := -height; row < height*2; row += int(fontSize * 4) {
for col := -width; col < width*2; col += int(textWidth * 1.5) {
dc.Push()
dc.RotateAbout(gg.Radians(45), float64(col), float64(row))
dc.DrawString(wm.Text, float64(col), float64(row))
dc.Pop()
}
}
}
// 8. Encode output
switch format {
case "jpeg", "jpg":
return jpeg.Encode(output, dc.Image(), &jpeg.Options{Quality: 90})
case "png":
return png.Encode(output, dc.Image())
case "webp":
return webp.Encode(output, dc.Image(), &webp.Options{Quality: 90})
default:
return png.Encode(output, dc.Image())
}
}
Performance:
- Small image (<1MB): ~10-30ms
- Large image (10MB+): ~100-300ms
- Memory: ~4x pixel dimensions (RGBA in memory)
Caching: ❌ Never.
Edge Cases:
| Case | Handling |
|---|---|
| Animated GIF | Extract first frame, watermark, serve as static. Or reject. |
| Very small image (<200px) | Reduce font size; may become illegible — accept this |
| HEIC/HEIF | Convert to JPEG first (Apple format, limited Go support) |
| TIFF | Decode with golang.org/x/image/tiff; serve as PNG |
| RAW formats | Reject. Convert on upload. |
| SVG | Skip pixel watermarking; add text element to XML |
3.5 Video Watermarking (MP4, MOV)
Tool: FFmpeg (external binary) — no pure Go solution exists for video processing
Approach:
- Pipe original video to FFmpeg stdin
- FFmpeg overlays text watermark
- Stream FFmpeg stdout to HTTP response
Watermark Placement:
- Bottom-right corner: Semi-transparent text overlay, visible but not distracting
- Optional burn-in: More prominent for high-sensitivity content
Algorithm:
func WatermarkVideo(ctx context.Context, objectID string, w http.ResponseWriter, wm WatermarkParams) error {
// 1. Build FFmpeg command
// Text escape: replace special chars
escapedText := strings.ReplaceAll(wm.Text, ":", "\\:")
escapedText = strings.ReplaceAll(escapedText, "'", "\\'")
// drawtext filter
filter := fmt.Sprintf(
"drawtext=text='%s':fontsize=24:fontcolor=white@0.7:x=w-tw-20:y=h-th-20:shadowcolor=black@0.5:shadowx=2:shadowy=2",
escapedText,
)
cmd := exec.CommandContext(ctx, "ffmpeg",
"-i", "pipe:0", // Read from stdin
"-vf", filter, // Apply text filter
"-c:v", "libx264", // Re-encode video
"-preset", "fast", // Speed over compression
"-crf", "23", // Quality (lower = better)
"-c:a", "copy", // Copy audio unchanged
"-movflags", "+faststart+frag_keyframe+empty_moov", // Streaming-friendly
"-f", "mp4", // Output format
"pipe:1", // Write to stdout
)
// 2. Set up pipes
stdin, _ := cmd.StdinPipe()
cmd.Stdout = w
cmd.Stderr = os.Stderr // Log errors
// 3. Start FFmpeg
if err := cmd.Start(); err != nil {
return fmt.Errorf("ffmpeg start: %w", err)
}
// 4. Stream input file to FFmpeg
go func() {
defer stdin.Close()
obj, _ := store.Read(objectID)
io.Copy(stdin, bytes.NewReader(obj))
}()
// 5. Wait for completion
return cmd.Wait()
}
Performance:
- 1-minute video: ~5-15 seconds (re-encoding required)
- 10-minute video: ~30-90 seconds
- Recommendation: For videos >5 minutes, use async processing + notification when ready
Caching: ⚠️ Consider selective caching for large videos.
- Risk: Cached version has wrong timestamp for subsequent views
- Mitigation: Cache key includes
{object_id}:{user_id}:{date}— same user same day gets cache - Invalidate cache at midnight or on project config change
Edge Cases:
| Case | Handling |
|---|---|
| Very long video (>1hr) | Async processing; return 202 with job ID; poll for completion |
| Corrupted video | FFmpeg will error; return 500 with audit log |
| Unsupported codec | FFmpeg handles most; truly exotic formats: reject |
| Audio-only file | No video stream to watermark; add metadata comment instead |
| MKV, AVI, WMV | Convert to MP4 on serve (FFmpeg handles this) |
3.6 Other File Types
Strategy: Encrypted download only. No preview, no watermarking.
Affected Types:
- ZIP, TAR, 7Z (archives)
- CAD files (DWG, DXF)
- Database exports (SQL, CSV with sensitive data)
- Executables (rare but possible)
- Unknown/binary files
Watermark Alternative:
- Filename includes minimal watermark:
report_{user_id}_{timestamp}.zip - Audit log captures full context
- "CONFIDENTIAL" wrapper: Serve inside a new ZIP containing the file + a
NOTICE.txt
func ServeEncryptedDownload(w http.ResponseWriter, objectID string, wm WatermarkParams) error {
// Create wrapper ZIP with notice
buf := new(bytes.Buffer)
zw := zip.NewWriter(buf)
// Add notice file
notice, _ := zw.Create("NOTICE.txt")
fmt.Fprintf(notice, "CONFIDENTIAL\n\nDownloaded by: %s\nOrganization: %s\nTimestamp: %s\n\nUnauthorized distribution is prohibited.",
wm.UserName, wm.OrgName, wm.Timestamp)
// Add original file
original, _ := zw.Create(wm.OriginalFilename)
obj, _ := store.Read(objectID)
original.Write(obj)
zw.Close()
// Set download filename with watermark info
filename := fmt.Sprintf("%s_%s_%s.zip",
strings.TrimSuffix(wm.OriginalFilename, filepath.Ext(wm.OriginalFilename)),
wm.UserID[:8],
time.Now().Format("20060102"))
w.Header().Set("Content-Disposition", fmt.Sprintf(`attachment; filename="%s"`, filename))
w.Header().Set("Content-Type", "application/zip")
w.Write(buf.Bytes())
return nil
}
4. Screen Capture Protection
Reality Check: True screen capture protection is impossible. Any DRM can be defeated by pointing a camera at a screen. Our goal is deterrence and traceability, not prevention.
4.1 Visual Deterrent Strategy
For PDFs served in-browser:
- Apply diagonal tiled watermark pattern (45°, repeated every 200px)
- Use user-specific text in the pattern
- Opacity 0.15 — visible in screenshots but doesn't obstruct reading
For images:
- Same diagonal tiled pattern
- Consider more aggressive opacity (0.25) for high-sensitivity images
For video:
- Persistent corner watermark (already implemented)
- Optional: periodic full-screen flash of watermark text (every 60s, 2s duration, 0.3 opacity)
4.2 Additional Deterrents
| Technique | Effectiveness | Implementation |
|---|---|---|
| Diagonal tiled watermark | High | Built into watermark functions |
| Random position micro-watermarks | Medium | Add 5-10 tiny (8px) watermarks at random positions |
| Invisible watermarks (steganography) | Low (easily stripped) | Not recommended — complexity vs. value |
| JavaScript screenshot detection | Low (easily bypassed) | Not recommended |
CSS -webkit-user-select: none |
Cosmetic only | Add to viewer CSS |
4.3 Recommended Approach
type ScreenDeterrentLevel int
const (
DeterrentNone ScreenDeterrentLevel = 0
DeterrentStandard ScreenDeterrentLevel = 1 // Footer + diagonal tiled
DeterrentHigh ScreenDeterrentLevel = 2 // + random micro-watermarks
)
Default to DeterrentStandard for all data room documents. Project admins can escalate to DeterrentHigh for specific folders.
5. Audit Trail
Every file serve MUST be logged. No exceptions.
5.1 Audit Entry Structure
type FileServeAudit struct {
ID string `json:"id"` // UUID
ProjectID string `json:"project_id"`
ObjectID string `json:"object_id"` // File being served
EntryID string `json:"entry_id"` // Parent answer entry
ActorID string `json:"actor_id"` // User requesting
ActorOrg string `json:"actor_org"` // Organization
Action string `json:"action"` // "view" | "download" | "print"
IP string `json:"ip"` // Client IP (X-Forwarded-For aware)
UserAgent string `json:"user_agent"`
Timestamp time.Time `json:"timestamp"`
WatermarkID string `json:"watermark_id"` // Unique ID embedded in watermark
FileType string `json:"file_type"` // "pdf", "docx", etc.
FileSize int64 `json:"file_size"`
Success bool `json:"success"`
ErrorMsg string `json:"error_msg,omitempty"`
}
5.2 Watermark ID
Each watermark includes a unique, traceable ID:
func GenerateWatermarkID(actorID, objectID string, timestamp time.Time) string {
// Short, human-readable, globally unique
h := hmac.New(sha256.New, watermarkSecret)
h.Write([]byte(actorID + objectID + timestamp.Format(time.RFC3339)))
sum := h.Sum(nil)
return base32.StdEncoding.EncodeToString(sum[:8])[:13] // e.g., "JBSWY3DPEHPK3"
}
This ID appears in the watermark text and audit log. If a document leaks, grep for the ID → instant attribution.
5.3 Audit Table
CREATE TABLE file_serves (
id TEXT PRIMARY KEY,
project_id TEXT NOT NULL,
object_id TEXT NOT NULL,
entry_id TEXT,
actor_id TEXT NOT NULL,
actor_org TEXT,
action TEXT NOT NULL,
ip TEXT NOT NULL,
user_agent TEXT,
ts INTEGER NOT NULL,
watermark_id TEXT NOT NULL,
file_type TEXT NOT NULL,
file_size INTEGER,
success INTEGER NOT NULL,
error_msg TEXT
);
CREATE INDEX idx_serves_project ON file_serves(project_id);
CREATE INDEX idx_serves_actor ON file_serves(actor_id);
CREATE INDEX idx_serves_object ON file_serves(object_id);
CREATE INDEX idx_serves_watermark ON file_serves(watermark_id);
CREATE INDEX idx_serves_ts ON file_serves(ts);
5.4 Audit Logging Function
func LogFileServe(ctx context.Context, audit FileServeAudit) error {
audit.ID = uuid.NewString()
if audit.Timestamp.IsZero() {
audit.Timestamp = time.Now()
}
// Pack sensitive fields (action, user_agent, error_msg)
packed, err := Pack(audit)
if err != nil {
return err
}
_, err = db.ExecContext(ctx, `
INSERT INTO file_serves
(id, project_id, object_id, entry_id, actor_id, actor_org, action, ip, user_agent, ts, watermark_id, file_type, file_size, success, error_msg)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`,
audit.ID, audit.ProjectID, audit.ObjectID, audit.EntryID,
audit.ActorID, audit.ActorOrg, packed.Action, audit.IP, packed.UserAgent,
audit.Timestamp.UnixMilli(), audit.WatermarkID, audit.FileType, audit.FileSize,
boolToInt(audit.Success), packed.ErrorMsg,
)
return err
}
6. Burn After Reading Mode
Optional per-file setting: file can only be downloaded N times total, or N times per user.
6.1 Configuration
type BurnConfig struct {
Enabled bool `json:"enabled"`
MaxDownloads int `json:"max_downloads"` // Total across all users (0 = unlimited)
MaxPerUser int `json:"max_per_user"` // Per individual user (0 = unlimited)
ExpiresAt *int64 `json:"expires_at"` // Unix ms timestamp (optional)
NotifyOnBurn bool `json:"notify_on_burn"` // Alert admins when limit reached
}
6.2 Tracking Table
CREATE TABLE burn_tracking (
object_id TEXT NOT NULL,
actor_id TEXT NOT NULL,
download_count INTEGER NOT NULL DEFAULT 0,
last_download INTEGER,
PRIMARY KEY (object_id, actor_id)
);
CREATE TABLE burn_totals (
object_id TEXT PRIMARY KEY,
total_downloads INTEGER NOT NULL DEFAULT 0,
burned_at INTEGER -- When limit was hit
);
6.3 Burn Check Function
func CheckBurnLimit(ctx context.Context, objectID, actorID string) (allowed bool, remaining int, err error) {
// 1. Load burn config from entry Data
config, err := GetBurnConfig(ctx, objectID)
if err != nil || !config.Enabled {
return true, -1, nil // No burn limit
}
// 2. Check expiration
if config.ExpiresAt != nil && time.Now().UnixMilli() > *config.ExpiresAt {
return false, 0, nil
}
// 3. Check total downloads
var total int
db.QueryRowContext(ctx, "SELECT total_downloads FROM burn_totals WHERE object_id = ?", objectID).Scan(&total)
if config.MaxDownloads > 0 && total >= config.MaxDownloads {
return false, 0, nil
}
// 4. Check per-user downloads
var userCount int
db.QueryRowContext(ctx,
"SELECT download_count FROM burn_tracking WHERE object_id = ? AND actor_id = ?",
objectID, actorID).Scan(&userCount)
if config.MaxPerUser > 0 && userCount >= config.MaxPerUser {
return false, 0, nil
}
// 5. Calculate remaining
remaining = -1 // Unlimited
if config.MaxDownloads > 0 {
remaining = config.MaxDownloads - total
}
if config.MaxPerUser > 0 {
userRemaining := config.MaxPerUser - userCount
if remaining < 0 || userRemaining < remaining {
remaining = userRemaining
}
}
return true, remaining, nil
}
func IncrementBurnCount(ctx context.Context, objectID, actorID string) error {
tx, _ := db.BeginTx(ctx, nil)
defer tx.Rollback()
// Upsert user tracking
tx.ExecContext(ctx, `
INSERT INTO burn_tracking (object_id, actor_id, download_count, last_download)
VALUES (?, ?, 1, ?)
ON CONFLICT (object_id, actor_id) DO UPDATE SET
download_count = download_count + 1,
last_download = ?`,
objectID, actorID, time.Now().UnixMilli(), time.Now().UnixMilli())
// Upsert total
tx.ExecContext(ctx, `
INSERT INTO burn_totals (object_id, total_downloads)
VALUES (?, 1)
ON CONFLICT (object_id) DO UPDATE SET
total_downloads = total_downloads + 1`,
objectID)
return tx.Commit()
}
6.4 Burn Notification
When a file hits its limit:
func NotifyBurn(ctx context.Context, objectID string, config BurnConfig) {
if !config.NotifyOnBurn {
return
}
// Update burned_at timestamp
db.ExecContext(ctx, "UPDATE burn_totals SET burned_at = ? WHERE object_id = ?",
time.Now().UnixMilli(), objectID)
// Notify project admins (via existing notification system)
entry, _ := GetEntryByObjectID(ctx, objectID)
NotifyProjectAdmins(ctx, entry.ProjectID, NotificationBurnLimitReached, map[string]any{
"object_id": objectID,
"filename": entry.Data["filename"],
})
}
7. Go Implementation Design
7.1 lib/watermark.go — Function Signatures
package lib
import (
"context"
"io"
"time"
)
// WatermarkParams contains all info needed to generate a watermark
type WatermarkParams struct {
UserID string
UserName string
OrgID string
OrgName string
Timestamp time.Time
WatermarkID string // Unique traceable ID
// Styling (from project config)
Config WatermarkConfig
// Original file info
OriginalFilename string
FileType string // "pdf", "docx", "xlsx", "jpg", etc.
// Options
ScreenDeterrent bool // Add aggressive visual deterrent
}
// WatermarkConfig is project-level styling configuration
type WatermarkConfig struct {
Text string // Template with placeholders
FontFamily string
FontSize int
Color string // RGBA hex
Position string // "footer", "header", "diagonal", "tiled"
Opacity float64
DeterrentLevel ScreenDeterrentLevel
}
// ScreenDeterrentLevel controls anti-screenshot measures
type ScreenDeterrentLevel int
const (
DeterrentNone ScreenDeterrentLevel = 0
DeterrentStandard ScreenDeterrentLevel = 1
DeterrentHigh ScreenDeterrentLevel = 2
)
// BuildWatermarkText generates the actual watermark string from params
func BuildWatermarkText(params WatermarkParams) string
// GenerateWatermarkID creates a unique, traceable watermark identifier
func GenerateWatermarkID(actorID, objectID string, timestamp time.Time) string
// ---- Per-Type Watermarking Functions ----
// WatermarkPDF adds watermark to PDF, writing result to output
func WatermarkPDF(input io.Reader, output io.Writer, params WatermarkParams) error
// WatermarkDOCX adds watermark to Word document
func WatermarkDOCX(input io.Reader, output io.Writer, params WatermarkParams) error
// WatermarkXLSX adds watermark to Excel spreadsheet
func WatermarkXLSX(input io.Reader, output io.Writer, params WatermarkParams) error
// WatermarkImage adds watermark to image (JPG, PNG, WebP)
// Returns the output format used (may differ from input for unsupported formats)
func WatermarkImage(input io.Reader, output io.Writer, inputFormat string, params WatermarkParams) (outputFormat string, err error)
// WatermarkVideo streams video with watermark overlay
// This is special: it writes directly to http.ResponseWriter, handling streaming
func WatermarkVideo(ctx context.Context, objectReader io.Reader, w io.Writer, params WatermarkParams) error
// ServeProtectedFile is the unified entry point for the protection pipeline
// It detects file type and applies appropriate watermarking
func ServeProtectedFile(ctx context.Context, objectID string, w io.Writer, params WatermarkParams) error
// ServeEncryptedDownload wraps non-watermarkable files with NOTICE.txt
func ServeEncryptedDownload(ctx context.Context, objectID string, w io.Writer, params WatermarkParams) error
// ---- Audit Functions ----
// LogFileServe records every file access to the audit table
func LogFileServe(ctx context.Context, audit FileServeAudit) error
// FileServeAudit contains all info about a file serve event
type FileServeAudit struct {
ID string
ProjectID string
ObjectID string
EntryID string
ActorID string
ActorOrg string
Action string // "view", "download", "print"
IP string
UserAgent string
Timestamp time.Time
WatermarkID string
FileType string
FileSize int64
Success bool
ErrorMsg string
}
// ---- Burn After Reading ----
// BurnConfig defines download limits for a file
type BurnConfig struct {
Enabled bool
MaxDownloads int // Total across all users
MaxPerUser int // Per individual user
ExpiresAt *int64
NotifyOnBurn bool
}
// CheckBurnLimit verifies if download is allowed and returns remaining count
func CheckBurnLimit(ctx context.Context, objectID, actorID string) (allowed bool, remaining int, err error)
// IncrementBurnCount records a successful download
func IncrementBurnCount(ctx context.Context, objectID, actorID string) error
// GetBurnConfig retrieves burn settings for an object
func GetBurnConfig(ctx context.Context, objectID string) (BurnConfig, error)
// SetBurnConfig updates burn settings for an object
func SetBurnConfig(ctx context.Context, objectID string, config BurnConfig) error
7.2 Dependencies (go.mod additions)
// PDF processing
require github.com/pdfcpu/pdfcpu v0.8.0
// Office documents
require github.com/unidoc/unioffice v1.33.0
require github.com/xuri/excelize/v2 v2.8.1
// Image processing
require github.com/fogleman/gg v1.3.0
require golang.org/x/image v0.18.0
// WebP support (for image encoding)
require github.com/chai2010/webp v1.1.1
// Video (FFmpeg is external binary, no Go dep needed)
// Ensure ffmpeg is installed: apt install ffmpeg
// Crypto (FIPS 140-3 compliance)
// Use standard library crypto/aes, crypto/hmac, crypto/sha256
// These are FIPS-approved when built with GOEXPERIMENT=boringcrypto
FIPS 140-3 Note: Build with GOEXPERIMENT=boringcrypto to use BoringSSL for FIPS-compliant crypto operations.
7.3 Serve Pipeline
Complete request flow from HTTP request to watermarked response:
┌─────────────────────────────────────────────────────────────────────────────────┐
│ FILE SERVE PIPELINE │
└─────────────────────────────────────────────────────────────────────────────────┘
HTTP Request: GET /api/files/{object_id}
│
▼
┌──────────────────────┐
│ 1. Auth Middleware │ Extract JWT, validate session
└──────────────────────┘
│
▼
┌──────────────────────┐
│ 2. CheckAccess() │ Verify actor can access this object
│ (lib/rbac.go) │ Walk to parent entry → workstream → project
└──────────────────────┘ Return 403 if denied
│
▼
┌──────────────────────┐
│ 3. CheckBurnLimit() │ Verify download quota not exceeded
│ (lib/watermark.go)│ Return 410 Gone if burned
└──────────────────────┘
│
▼
┌──────────────────────┐
│ 4. ObjectRead() │ Fetch encrypted object from store
│ (lib/store.go) │ Decrypt with project key
└──────────────────────┘
│
▼
┌──────────────────────┐
│ 5. Build Params │ Construct WatermarkParams from:
│ │ - Actor (user_id, name, org)
│ │ - Project config (styling)
│ │ - Timestamp + generated watermark ID
│ │ - File metadata (type, name)
└──────────────────────┘
│
▼
┌──────────────────────┐
│ 6. Route by Type │ Detect file type from extension/magic bytes
│ │
│ PDF ──────────► │ WatermarkPDF()
│ DOCX ─────────► │ WatermarkDOCX()
│ XLSX ─────────► │ WatermarkXLSX()
│ Image ────────► │ WatermarkImage()
│ Video ────────► │ WatermarkVideo()
│ Other ────────► │ ServeEncryptedDownload()
└──────────────────────┘
│
▼
┌──────────────────────┐
│ 7. Stream to Client │ Set Content-Type, Content-Disposition
│ │ Write watermarked content to ResponseWriter
└──────────────────────┘
│
▼
┌──────────────────────┐
│ 8. LogFileServe() │ Record to audit table (async, don't block)
│ (lib/watermark.go)│
└──────────────────────┘
│
▼
┌──────────────────────┐
│ 9. IncrementBurn() │ Update download counters if burn enabled
│ (lib/watermark.go)│ Check if limit now hit → notify admins
└──────────────────────┘
│
▼
HTTP Response: Watermarked file stream
7.4 Handler Implementation
// api/handlers.go
func (h *Handler) ServeFile(w http.ResponseWriter, r *http.Request) {
ctx := r.Context()
objectID := chi.URLParam(r, "objectID")
actor := auth.ActorFromContext(ctx)
// 1. Access check
entry, err := lib.GetEntryByObjectID(ctx, h.db, objectID)
if err != nil {
http.Error(w, "not found", 404)
return
}
if !lib.CheckAccess(ctx, h.db, actor.ID, entry.ProjectID, entry.ID, "read") {
http.Error(w, "forbidden", 403)
return
}
// 2. Burn limit check
allowed, remaining, err := lib.CheckBurnLimit(ctx, h.db, objectID, actor.ID)
if err != nil {
http.Error(w, "internal error", 500)
return
}
if !allowed {
http.Error(w, "download limit exceeded", 410) // Gone
return
}
// 3. Read object
data, err := lib.ObjectRead(ctx, h.store, h.projectKey, objectID)
if err != nil {
http.Error(w, "not found", 404)
return
}
// 4. Build watermark params
fileInfo := entry.Data["files"].([]any)[0].(map[string]any) // Simplified
projectConfig, _ := lib.GetProjectWatermarkConfig(ctx, h.db, entry.ProjectID)
params := lib.WatermarkParams{
UserID: actor.ID,
UserName: actor.Name,
OrgID: actor.OrgID,
OrgName: actor.OrgName,
Timestamp: time.Now(),
WatermarkID: lib.GenerateWatermarkID(actor.ID, objectID, time.Now()),
Config: projectConfig,
OriginalFilename: fileInfo["name"].(string),
FileType: fileInfo["type"].(string),
ScreenDeterrent: projectConfig.DeterrentLevel >= lib.DeterrentStandard,
}
// 5. Prepare audit entry (log after response)
audit := lib.FileServeAudit{
ProjectID: entry.ProjectID,
ObjectID: objectID,
EntryID: entry.ID,
ActorID: actor.ID,
ActorOrg: actor.OrgName,
Action: "download",
IP: realIP(r),
UserAgent: r.UserAgent(),
WatermarkID: params.WatermarkID,
FileType: params.FileType,
FileSize: int64(len(data)),
Success: true,
}
// 6. Set response headers
contentType := mimeTypeFromExt(params.FileType)
w.Header().Set("Content-Type", contentType)
w.Header().Set("Content-Disposition",
fmt.Sprintf(`attachment; filename="%s"`, params.OriginalFilename))
if remaining > 0 {
w.Header().Set("X-Downloads-Remaining", fmt.Sprintf("%d", remaining))
}
// 7. Apply watermark and stream
err = lib.ServeProtectedFile(ctx, bytes.NewReader(data), w, params)
if err != nil {
audit.Success = false
audit.ErrorMsg = err.Error()
// Still log the attempt
}
// 8. Log and update burn count (async)
go func() {
lib.LogFileServe(context.Background(), h.db, audit)
if audit.Success {
lib.IncrementBurnCount(context.Background(), h.db, objectID, actor.ID)
}
}()
}
8. Performance Considerations
8.1 Processing Time Budgets
| File Type | Target | Acceptable | Timeout |
|---|---|---|---|
| PDF (≤10pg) | <100ms | <500ms | 10s |
| PDF (>100pg) | <1s | <5s | 30s |
| DOCX | <100ms | <500ms | 10s |
| XLSX | <100ms | <500ms | 10s |
| Image | <50ms | <200ms | 5s |
| Video (1min) | <10s | <30s | 120s |
| Video (10min) | async | async | 300s |
8.2 Memory Management
- PDF: Stream pages when possible; full memory load for complex watermarks
- DOCX/XLSX: Full memory load required (ZIP structure)
- Image: Decode → process → encode; ~4x pixel dimensions in RAM
- Video: Stream through FFmpeg; no full file in memory
Memory limits per request:
- PDF: 256MB max
- Office: 128MB max
- Image: 512MB max (4K images at 4 bytes/pixel)
- Video: Streaming, no limit
8.3 Caching Strategy
| What | Cache? | Rationale |
|---|---|---|
| Watermarked files | ❌ Never | Timestamp and user-specific content |
| Project watermark config | ✅ 5min TTL | Rarely changes |
| User/org lookups | ✅ 5min TTL | Rarely changes |
| Object decryption | ⚠️ Per-request | Could cache decrypted bytes briefly |
| Burn counts | ❌ Never | Must be accurate |
8.4 Async Processing for Large Files
Videos >5 minutes and PDFs >500 pages should use async processing:
func (h *Handler) ServeFileLarge(w http.ResponseWriter, r *http.Request) {
// ... access checks ...
if isLargeFile(params.FileType, fileSize) {
// Queue for background processing
jobID := lib.QueueWatermarkJob(ctx, h.queue, objectID, params)
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(202) // Accepted
json.NewEncoder(w).Encode(map[string]string{
"status": "processing",
"job_id": jobID,
"poll_url": "/api/files/jobs/" + jobID,
})
return
}
// Normal sync processing...
}
9. Error Handling & Fallbacks
9.1 Graceful Degradation
| Error | Fallback | User Message |
|---|---|---|
| PDF parse failure | Serve original with audit + filename watermark | "Processing unavailable" |
| Office parse failure | Encrypt + download only | "Preview unavailable" |
| Image decode failure | Serve original with audit | "Processing unavailable" |
| Video FFmpeg failure | Encrypt + download only | "Streaming unavailable" |
| Timeout exceeded | Serve original with audit | "Processing timeout" |
9.2 Error Logging
All watermark failures are logged with full context:
type WatermarkError struct {
ObjectID string
FileType string
Error string
Stack string
Fallback string // What we did instead
Timestamp time.Time
}
10. Security Considerations
10.1 FIPS 140-3 Compliance
All crypto operations use FIPS-approved algorithms:
- AES-256-GCM for encryption
- SHA-256 for hashing
- HMAC-SHA256 for watermark ID generation
- Build with
GOEXPERIMENT=boringcrypto
10.2 Watermark Tampering Resistance
Watermarks are applied at serve time, so tampering with stored files doesn't help. However:
- Digital signatures on PDFs are invalidated by watermarking (expected)
- Office documents could have watermark row deleted (accepted risk; audit trail remains)
- Image watermarks can be cropped (tiled pattern mitigates)
- Video watermarks can be cropped (corner + periodic full-screen mitigates)
10.3 Audit Integrity
Audit logs should be:
- Write-only (no DELETE endpoint)
- Integrity-checked (HMAC of each row, chain hash)
- Replicated off-server for SOX/compliance
11. Testing Strategy
11.1 Unit Tests
func TestWatermarkPDF(t *testing.T)
func TestWatermarkPDFPassword(t *testing.T) // Should fail gracefully
func TestWatermarkPDFCorrupt(t *testing.T) // Should fallback
func TestWatermarkDOCX(t *testing.T)
func TestWatermarkXLSX(t *testing.T)
func TestWatermarkImage(t *testing.T)
func TestBurnLimitEnforced(t *testing.T)
func TestBurnLimitPerUser(t *testing.T)
func TestAuditLogCreated(t *testing.T)
func TestWatermarkIDUnique(t *testing.T)
11.2 Integration Tests
- Upload file → download → verify watermark present
- Exceed burn limit → verify 410 response
- Concurrent downloads → verify accurate burn counting
- Large video → verify async handling
11.3 Sample Files
Maintain a test corpus of:
- Normal files (all types)
- Password-protected files
- Corrupted files
- Maximum-size files (stress test)
- Edge cases per type (see sections 3.x)
12. Open Questions / Future Work
-
PDF/A compliance: Verify pdfcpu maintains PDF/A compliance after watermarking. May need explicit flag.
-
Office 365 online preview: When files are previewed in Office Online, watermarks must persist. May need server-side rendering instead.
-
Mobile app considerations: Native mobile viewers may strip/ignore some watermarks. Test thoroughly.
-
Print watermarks: Physical prints should show watermark. PDF print header/footer may be more robust than visual overlay.
-
Invisible forensic watermarks: Steganographic watermarks that survive screenshots/prints. Complex, may add later.
-
Video DRM: HLS with encryption + Widevine. Overkill for MVP, but worth considering for future.
This specification is complete and ready for implementation. Questions → Johan.