# papers-extraction API

Read-only HTTP access to the *Vibrio fischeri* gene-extraction database:
paper metadata, per-gene extracted attributes, and aliases — always
returning the latest extraction run for each (paper, gene).

Interactive docs: [/docs](/docs)  |  [/redoc](/redoc)  |  [OpenAPI](/openapi.json)

## Auth

If the server has `API_KEY` configured, every endpoint except `/` and
`/health` requires header `X-API-Key: <key>`. Otherwise the API is open.

    curl -H "X-API-Key: $API_KEY" <url>

## Endpoints

All example URLs below are relative to this host.

### GET /papers/{pmid}

Paper metadata + latest-run extracted genes (with aliases). 404 if unknown.

    curl /papers/15901683

### GET /papers?doi=…  |  GET /papers?pmc_id=…

Same payload as `/papers/{pmid}`, looked up by DOI or PMC ID. Provide
exactly one of `doi` or `pmc_id`. Matching is case-insensitive; `pmc_id`
accepts either `PMC1698219` or `1698219`.

    curl "/papers?doi=10.1128/JB.01214-06"
    curl "/papers?pmc_id=PMC1698219"

### GET /genes

Gene rows across all papers (latest run only). Paged response:

    {"items": [...], "total": 799, "next": "/genes?limit=100&offset=100"}

`next` is a relative URL preserving all query params, or `null` on the
last page. Follow it until it's `null` to stream the full result set.

Query params:
  name                        substring match on gene_name OR alias
  exact=true                  treat `name` as exact (case-insensitive)
  limit (1-5000, default 100), offset (default 0)
  has_<flag>=true|false       filter by a precomputed flag; any of:
    has_function_role, has_pathway_regulation, has_localization_context,
    has_phenotype, has_biofilm, has_relationships, has_aliases, has_notes,
    has_transcriptional_regulators, has_post_transcriptional_regulators,
    has_regulated_targets, has_physical_interactions,
    has_functional_interactions, has_operon_members, has_chromosomal_locus

    curl "/genes?name=luxR&exact=true"
    curl "/genes?has_relationships=true&limit=20"

### GET /search

Case-insensitive substring across paper title/abstract/keywords/authors
and every text field on gene records (including aliases). Latest run.
Each list is paged independently:

    {
      "papers": {"items": [...], "total": N, "next": "/search?...&paper_offset=50"},
      "genes":  {"items": [...], "total": M, "next": "/search?...&gene_offset=50"}
    }

Query params: `q` (required), `limit` (1-500, default 50), `paper_offset`,
`gene_offset`.

    curl "/search?q=quorum&limit=10"

### GET /health

Liveness probe (no auth). Returns {"status": "ok"}.

### GET /llms.txt

Same information as this page, formatted for LLM agents per
[llmstxt.org](https://llmstxt.org/).