Gene2AI APIv5.1

API Documentation

Gene2AI Genomics Analysis API v5.1 — AI-powered, population-aware genetic analysis with 122K+ SNP markers, 14 PGx genes, LD proxy fallback, HLA allele typing, CYP450 metabolizer phenotyping (incl. CYP1A2), NAT2 acetylator typing, and nutrition-intervention-oriented output.

122K+ SNP Markers24K+ Genes14 PGx Genes4 CYP450 Genes1,222 LD Proxies9 Categories9 HLA Alleles5 Populations

What's New in v5.1

v5.1 extends the analysis engine with nutrition-intervention-oriented output, enabling downstream AI agents and supplement formulation systems to consume structured ingredient guidance directly from genomic findings.

nutrition

New fields: snpDetails, ingredientGuidance, relatedLabIndicators, pathway, risk, risk_level

health_risks

New field: nutritionRelevance with relevant ingredients and monitoring indicators

cyp450

New gene: CYP1A2 (caffeine metabolism). New field: supplementRelevance on all 4 CYP genes

meta

New field: snpCoverage — per-category SNP coverage breakdown

Backward compatible: The need field in nutrition findings is preserved. New risk and risk_level fields mirror the same value. Existing integrations require no changes.

Overview

The Gene2AI Genomics Analysis API processes raw genetic data files (23andMe V3/V4/V5, AncestryDNA V1/V2, WeGene) and returns AI-enriched health insights in structured JSON format. The API supports format auto-detection. The analysis pipeline includes:

122,896 SNP markers across 24,812 genes, sourced from GWAS Catalog, CPIC, SNPedia, PharmGKB, and manual curation
1,222 LD proxy variants (R² ≥ 0.7) for 124 target SNPs from LDlink/1000 Genomes Phase 3 for automatic fallback when target SNPs are missing
9 analysis categories: health risks (52K+), drug response (2.3K+, incl. PharmGKB + CPIC), traits (60K+), nutrition (6.9K+), ancestry (97), APOE genotyping (ε2/ε3/ε4), HLA allele typing (9 alleles via tag SNP inference), CYP450 metabolizer phenotyping (CYP2C19/CYP2D6/CYP2C9/CYP1A2 with CPIC star allele definitions), and NAT2 acetylator typing (18 alleles, 7-SNP panel, CPIC 2025 guidelines)
Nutrition-intervention output (v5.1): ingredient guidance, metabolic pathway classification, lab indicator mapping, and supplement relevance for CYP450 phenotypes
Population-aware analysis with notes when variant frequencies differ significantly across EUR, EAS, AFR, SAS, AMR groups
LLM enrichment for user-friendly descriptions and actionable recommendations
Coverage statistics reporting which markers were found, which used LD proxies, and which were missing — both overall and per-category

All output is in English. All confidence values are numeric (0.0–1.0).

Base URL

https://api.gene2.ai

All REST endpoints are accessible at this base URL. Use /api/upload-url and /api/query-result for production.

Supported Data Formats & Chip Versions

23andMe

V3 (Illumina OmniExpress) — ~960K SNPs

V4 (Illumina HumanOmniExpress) — ~570K SNPs

V5 / V5.1 (Illumina GSA) — ~640K SNPs

Format: Tab-separated, lines starting with # are comments. Columns: rsid, chromosome, position, genotype.

AncestryDNA

V1 (Illumina OmniExpress) — ~700K SNPs

V2 (Illumina GSA) — ~670K SNPs

Format: Tab-separated, lines starting with # are comments. Columns: rsid, chromosome, position, allele1, allele2.

WeGene

Standard chip — ~1.35M SNPs (GRCh37, plus strand)

Format: Tab-separated, lines starting with # are comments. Columns: rsid, chromosome, position, genotype. Same layout as 23andMe but includes WeGene internal IDs (ws/wi/w0/w1/w2 prefixes) and indel genotypes (DD/II/DI).

KB coverage: high direct rsID overlap with 122K+ markers. Indel genotypes are skipped as they are not applicable to SNP-based analysis.

Authentication

All API calls require HMAC-MD5 authentication via HTTP Headers. Three headers are required:

idVerify ID — "UP02" for upload, "GE01" for query

tokenHMAC-MD5 token (16 hex chars)

timeUTC time string in format "YYYY-MM-DD HH:MM"

Token Generation Algorithm

# Token = md5(verify_id + secret + time)[4:20]
# Time format: "YYYY-MM-DD HH:MM" (UTC)

import hashlib
from datetime import datetime, timezone

verify_id = "UP02"
secret = "your_secret_here"
time_str = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M")

raw = f"{verify_id}{secret}{time_str}"
full_hash = hashlib.md5(raw.encode()).hexdigest()
token = full_hash[4:20]  # 16 hex chars

Tokens are valid for a 5-minute window around the specified time. Ensure your system clock is synchronized with UTC.

POST

/api/upload-url

Creates an analysis job and returns an upload URL for the genetic data file.

Request Headers

id: UP02
token: a1b2c3d4e5f6g7h8
time: 2026-03-02 14:30
Content-Type: application/json

Request Body

{
  "key": "gene2ai_user12345_1709389800",
  "type": "txt",           // "txt" | "zip"
  "format": "23andme"      // "23andme" | "ancestry" | "wegene" | "auto"
}

Response (Success)

{
  "code": 0,
  "msg": "success",
  "data": {
    "key": "gene2ai_user12345_1709389800",
    "url": "https://api.gene2.ai/api/genomics/upload/gene2ai_user12345_..."
  }
}

Error Codes

code: 0Job created successfully

code: 1Invalid token — HMAC validation failed

code: 2Missing required fields (key, type, or format)

code: 3Invalid format or type value

cURL Example

curl -X POST 'https://api.gene2.ai/api/upload-url' \
  -H 'Content-Type: application/json' \
  -H 'id: UP02' \
  -H 'token: YOUR_TOKEN' \
  -H 'time: 2026-03-02 14:30' \
  -d '{
    "key": "gene2ai_user12345_1709389800",
    "type": "txt",
    "format": "23andme"
  }'

Result Schema — Field Reference

The analysis result contains 9 categories of findings. Fields marked with v5.1 are new in this version. All new fields are optional and only present when relevant data exists.

nutrition[]expanded in v5.1

Field	Type	Description
nutrient	string	Nutrient name, e.g. "Folate (MTHFR)"
need	string	"decreased" \| "normal" \| "increased" — preserved for backward compat
risk v5.1	string	Same value as `need`. Alias for clearer semantics in risk-oriented contexts
risk_level v5.1	string	Same value as `need`. Alias for structured consumption
confidence	number	0.0–1.0
gene	string	Gene symbol, e.g. "MTHFR"
snps	string[]	rsID list
snpDetails v5.1	object[]	Per-SNP detail: rsid, gene, genotype, effect description
ingredientGuidance v5.1	object	primaryIngredients[], supportingIngredients[], avoidIngredients[], doseModifier, rationale
relatedLabIndicators v5.1	string[]	Lab tests to monitor, e.g. ["Serum Folate", "Homocysteine"]
pathway v5.1	string	Metabolic pathway enum (see below)
populationNote?	string	Population-specific context

ingredientGuidance.doseModifier enum: standard | increased | high | reduced | avoid

health_risks[]expanded in v5.1

Field	Type	Description
condition	string	Condition name
risk	string	"low" \| "average" \| "slightly_elevated" \| "elevated" \| "high"
confidence	number	0.0–1.0
snps	string[]	rsID list
description	string	LLM-enriched description
nutritionRelevance v5.1	object	relevantIngredients[] and monitorIndicators[] for this condition
populationNote?	string	Population-specific context

nutritionRelevance is available for 13 condition categories: Type 2 Diabetes, Coronary Artery Disease, Obesity, Osteoporosis, Alzheimer's Disease, Age-related Macular Degeneration, Autoimmune Disease, Celiac Disease, Gout/Hyperuricemia, Prostate Cancer, Breast Cancer, Colorectal Cancer, Venous Thromboembolism.

cyp450.genes[]expanded in v5.1

Field	Type	Description
gene	string	"CYP2C19" \| "CYP2D6" \| "CYP2C9" \| "CYP1A2" (new)
diplotype	string	Star allele diplotype, e.g. "1/2", "1F/1F"
activityScore	number	Combined activity score
phenotype	string	Metabolizer phenotype
drugRecommendations	object[]	Drug-specific recommendations
supplementRelevance v5.1	object	affectedSupplements[], guidanceNotes, monitorIndicators[]
confidence	number	0.0–1.0
confidenceTier	string	"high" \| "moderate" \| "low" \| "insufficient"
coverage	object	totalDefiningSNPs, foundInData, missing, missingRsids[]

CYP1A2 star alleles: *1 (reference), *1C (rs2069514), *1F (rs762551), *1K (rs12720461). Drug recommendations: Caffeine, Clozapine, Theophylline, Melatonin.

drug_response

Fields: drug, sensitivity, gene, recommendation, snps, populationNote?

Values: sensitivity: normal | increased | reduced

traits

Fields: trait, value, confidence, gene, snps

Values: confidence: 0.0–1.0

ancestry

Fields: regions[].region, regions[].percentage

Values: percentage: 0–100

apoe

Fields: genotype, alleles, alzheimerRisk, cardiovascularNote, confidence, description, snps

Values: alzheimerRisk: reduced | average | slightly_elevated | elevated | high

hla

Fields: alleles[].allele, carrier, confidence, drugAssociations[]

Values: 9 HLA alleles via tag SNP inference

nat2

Fields: diplotype, phenotype, acetylatorStatus, drugRecommendations[]

Values: phenotype: Rapid | Intermediate | Slow Acetylator

coverage

Fields: total_snps_in_kb, found_in_data, found_via_proxy, missing, coverage_pct

Values: coverage_pct: 0.0–100.0

snpCoverage[] ✨

Fields: category, totalSnps, directHits, proxyHits, normalDefaults, misses, coveragePct

Values: Per-category breakdown (v5.1)

Backward Compatibility

v5.1 is fully backward compatible with v5.0 integrations. All existing fields retain their names, types, and value ranges. The following design decisions ensure zero-disruption upgrades:

Concern	Resolution
nutrition.need vs risk	`need` is preserved. `risk` and `risk_level` are additive aliases with the same value. Consumers can use either.
New optional fields	All v5.1 fields (`snpDetails`, `ingredientGuidance`, `nutritionRelevance`, `supplementRelevance`, `snpCoverage`) are optional. Absent when no matching knowledge data exists.
CYP1A2 addition	`cyp450.genes[]` now contains 4 entries instead of 3. Consumers iterating over the array will automatically include CYP1A2.
meta.version	Changed from "5.0" to "5.1". `meta.kbVersion` remains "v5.0-gwas-cpic".

LD Proxy Fallback Mechanism

Different DTC chip versions (23andMe V3/V4/V5, AncestryDNA V1/V2) cover different sets of SNPs. When a target marker is not present in the user's data, the system automatically looks up linkage disequilibrium (LD) proxy variants — nearby SNPs that are strongly correlated (R² ≥ 0.7) with the target.

The LD proxy database contains 1,222 pre-computed proxy entries for 124 target SNPs, sourced from LDlink using the 1000 Genomes Phase 3 reference panel (GRCh37). Proxies are population-specific, with R² values recorded for EUR, EAS, AFR, SAS, and AMR super-populations.

When a proxy is used, the confidence score is automatically adjusted by multiplying with the R² value. The coverage object in the result reports how many markers used direct hits vs. proxy fallback vs. missing entirely.

Full Integration Example (Python)

import hashlib, time, requests
from datetime import datetime, timezone

BASE_URL = "https://api.gene2.ai"

def make_token(verify_id, secret):
    t = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M")
    raw = f"{verify_id}{secret}{t}"
    h = hashlib.md5(raw.encode()).hexdigest()
    return h[4:20], t

# Step 1: Request upload URL
token, time_str = make_token("UP02", "your_upload_secret")
resp = requests.post(f"{BASE_URL}/api/upload-url",
    headers={
        "Content-Type": "application/json",
        "id": "UP02",
        "token": token,
        "time": time_str
    },
    json={
        "key": "gene2ai_user001_" + str(int(time.time())),
        "type": "txt",
        "format": "23andme"
    }
)
data = resp.json()
assert data["code"] == 0
job_key = data["data"]["key"]
upload_url = data["data"]["url"]

# Step 2: Upload the file
with open("genome_data.txt", "rb") as f:
    requests.put(upload_url, data=f.read(),
                 headers={"Content-Type": "text/plain"})

# Step 3: Poll for results
while True:
    token, time_str = make_token("GE01", "your_query_secret")
    resp = requests.post(f"{BASE_URL}/api/query-result",
        headers={
            "Content-Type": "application/json",
            "id": "GE01",
            "token": token,
            "time": time_str
        },
        json={"key": job_key}
    )
    result = resp.json()
    status = result["data"]["status"]
    if status in ("succeeded", "failed"):
        if status == "succeeded":
            r = result["data"]["result"]
            # v5.1: Access new nutrition fields
            for n in r["nutrition"]:
                if n.get("ingredientGuidance"):
                    print(f"{n['nutrient']}: {n['ingredientGuidance']['doseModifier']}")
                    print(f"  Primary: {n['ingredientGuidance']['primaryIngredients']}")
                    print(f"  Lab: {n.get('relatedLabIndicators', [])}")
            # v5.1: Access CYP1A2 supplement relevance
            for g in r["cyp450"]["genes"]:
                if g.get("supplementRelevance"):
                    print(f"{g['gene']} ({g['phenotype']}): {g['supplementRelevance']['guidanceNotes']}")
        break
    time.sleep(5)

Integration Example (Node.js)

import crypto from 'crypto';
import fs from 'fs';

const BASE_URL = 'https://api.gene2.ai';

function makeToken(verifyId, secret) {
  const now = new Date();
  const time = now.toISOString().slice(0, 16).replace('T', ' ');
  const raw = verifyId + secret + time;
  const hash = crypto.createHash('md5').update(raw).digest('hex');
  return { token: hash.slice(4, 20), time };
}

async function analyzeGenome(filePath, format = '23andme') {
  // Step 1: Request upload URL
  const { token: upToken, time: upTime } = makeToken('UP02', 'your_upload_secret');
  const key = 'gene2ai_' + Date.now();
  
  const uploadResp = await fetch(BASE_URL + '/api/upload-url', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json', id: 'UP02', token: upToken, time: upTime },
    body: JSON.stringify({ key, type: 'txt', format })
  });
  const { data } = await uploadResp.json();

  // Step 2: Upload file
  const fileData = fs.readFileSync(filePath);
  await fetch(data.url, {
    method: 'PUT',
    headers: { 'Content-Type': 'text/plain' },
    body: fileData
  });

  // Step 3: Poll for results
  while (true) {
    const { token: qToken, time: qTime } = makeToken('GE01', 'your_query_secret');
    const resp = await fetch(BASE_URL + '/api/query-result', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json', id: 'GE01', token: qToken, time: qTime },
      body: JSON.stringify({ key })
    });
    const result = await resp.json();
    if (['succeeded', 'failed'].includes(result.data.status)) {
      return result.data;
    }
    await new Promise(r => setTimeout(r, 5000));
  }
}

// Usage
analyzeGenome('./genome_data.txt', '23andme').then(console.log);

Version History

v5.1Current

Nutrition-intervention-oriented output: ingredientGuidance, snpDetails, pathway, relatedLabIndicators for 18 P0 genes. nutritionRelevance for 13 health_risk categories. CYP1A2 star allele typing (4 alleles, 4 drugs). supplementRelevance for all 4 CYP genes. Per-category snpCoverage metadata.

v5.0

GWAS Catalog + CPIC expansion to 122K+ SNP markers, 24K+ genes. 14 PGx genes (10 new CPIC genes). LD proxy fallback (1,222 proxies). HLA allele typing (9 alleles). NAT2 acetylator typing (18 alleles, CPIC 2025). Population-aware analysis (5 super-populations).

v3.5

Initial release with CYP2C19/CYP2D6/CYP2C9 typing, 5 analysis categories, basic SNP knowledge base.