# import-genome Fast genetic data importer using lib.Save() for direct database access. ## Performance ~1.5 seconds to: - Read 18MB file - Parse 674,160 variants - Sort by rsid - Match against 9,403 SNPedia rsids - Insert 5,382 entries via lib.Save() ## Installation ```bash cd ~/dev/inou make import-genome ``` ## Usage ```bash import-genome # Help import-genome --help ``` ## Supported Formats | Format | Delimiter | Columns | Alleles | |-------------|-----------|---------|------------| | AncestryDNA | Tab | 5 | Split | | 23andMe | Tab | 4 | Combined | | MyHeritage | CSV+Quotes| 4 | Combined | | FTDNA | CSV | 4 | Combined | Auto-detected from file structure. ## Data Model Creates hierarchical entries: ``` Parent (genome/extraction): id: 3b38234f2b0f7ee6 data: {"source": "ancestry", "variants": 5381} Children (genome/variant): parent_id: 3b38234f2b0f7ee6 type: rs1801133 (rsid) value: TT (genotype) ``` ## Databases - **SNPedia reference**: `~/dev/inou/snpedia-genotypes/genotypes.db` (read-only, direct SQL) - **Entries**: via `lib.Save()` to `/tank/inou/data/inou.db` (single transaction) ## Algorithm 1. Read plain-text genome file 2. Auto-detect format from first data line 3. Parse all variants (rsid + genotype) 4. Sort by rsid 5. Load SNPedia rsid set into memory 6. Match user variants against SNPedia (O(1) lookup) 7. Delete existing genome entries for dossier 8. Build []lib.Entry slice 9. lib.Save() - single transaction with prepared statements ## Example ```bash ./bin/import-genome /path/to/ancestry.txt 3b38234f2b0f7ee6 # Output: # Phase 1 - Read: 24ms (18320431 bytes) # Detected format: ancestry # Phase 2 - Parse: 162ms (674160 variants) # Phase 3 - Sort: 306ms # Phase 4 - Load SNPedia: 47ms (9403 rsids) # Phase 5 - Match & normalize: 40ms (5381 matched) # Phase 6 - Init & delete existing: 15ms # Phase 7 - Build entries: 8ms (5382 entries) # Phase 8 - lib.Save: 850ms (5382 entries saved) # # TOTAL: 1.5s # Parent ID: c286564f3195445a ```