# import-genome

Fast genetic data importer using lib.Save() for direct database access.

## Performance

~1.5 seconds to:
- Read 18MB file
- Parse 674,160 variants
- Sort by rsid
- Match against 9,403 SNPedia rsids
- Insert 5,382 entries via lib.Save()

## Installation

```bash
cd ~/dev/inou
make import-genome
```

## Usage

```bash
import-genome <plain-file> <dossier-id>

# Help
import-genome --help
```

## Supported Formats

| Format      | Delimiter | Columns | Alleles    |
|-------------|-----------|---------|------------|
| AncestryDNA | Tab       | 5       | Split      |
| 23andMe     | Tab       | 4       | Combined   |
| MyHeritage  | CSV+Quotes| 4       | Combined   |
| FTDNA       | CSV       | 4       | Combined   |

Auto-detected from file structure.

## Data Model

Creates hierarchical entries:

```
Parent (genome/extraction):
  id: 3b38234f2b0f7ee6
  data: {"source": "ancestry", "variants": 5381}

Children (genome/variant):
  parent_id: 3b38234f2b0f7ee6
  type: rs1801133 (rsid)
  value: TT (genotype)
```

## Databases

- **SNPedia reference**: `~/dev/inou/snpedia-genotypes/genotypes.db` (read-only, direct SQL)
- **Entries**: via `lib.Save()` to `/tank/inou/data/inou.db` (single transaction)

## Algorithm

1. Read plain-text genome file
2. Auto-detect format from first data line
3. Parse all variants (rsid + genotype)
4. Sort by rsid
5. Load SNPedia rsid set into memory
6. Match user variants against SNPedia (O(1) lookup)
7. Delete existing genome entries for dossier
8. Build []lib.Entry slice
9. lib.Save() - single transaction with prepared statements

## Example

```bash
./bin/import-genome /path/to/ancestry.txt 3b38234f2b0f7ee6

# Output:
# Phase 1 - Read: 24ms (18320431 bytes)
# Detected format: ancestry
# Phase 2 - Parse: 162ms (674160 variants)
# Phase 3 - Sort: 306ms
# Phase 4 - Load SNPedia: 47ms (9403 rsids)
# Phase 5 - Match & normalize: 40ms (5381 matched)
# Phase 6 - Init & delete existing: 15ms
# Phase 7 - Build entries: 8ms (5382 entries)
# Phase 8 - lib.Save: 850ms (5382 entries saved)
#
# TOTAL: 1.5s
# Parent ID: c286564f3195445a
```