Lab PDF Extraction

The Lab PDF Extraction Service processes clinical laboratory reports (PDFs) and returns structured, standardized results with PHI redaction, LOINC code mapping, and UCUM unit standardization — ready for downstream health scoring or storage.

Base URL: https://api.voloridgehealth.com/extraction/

Currently supports lab reports from Quest Diagnostics and LabCorp. Documents must be in English and represent a single US lab visit.

Authentication

All requests must include your API key in the x-api-key header. Contact [email protected] to obtain a key.

x-api-key: your-api-key-here

Webhook deliveries also include an X-API-Key header so you can verify the source before processing incoming events.

Processing Pipeline

Every uploaded PDF passes through an automated 2–5 minute pipeline:

Upload — Submit via base64 or presigned S3 URL
Language Detection — English validation
PHI Detection & Redaction — Protected health information is identified and removed
Lab Result Extraction — AI vision parses the document
LOINC Mapping — Each biomarker is mapped to a standardized LOINC code
UCUM Standardization — Units are normalized to UCUM codes

Async Processing & Webhooks

The upload endpoint returns immediately with a jobId. Because processing takes 2–5 minutes, you have two options to retrieve results:

Option A — Webhook (recommended)

Provide a webhookUrl on upload. The API will POST the full results payload to your endpoint when processing completes, fails, or is quarantined. Deliveries include an X-API-Key header for authentication and are retried up to 5 times with exponential backoff.

Option B — Polling

Periodically call GET /lab/status/{jobId} until status is completed, then read the full results directly from that response (or fetch them separately from GET /lab/results/{jobId}).

Webhook Event Types

Event	Description
`job.completed`	Processing succeeded. `data` contains full extraction results.
`job.failed`	Unrecoverable processing error.
`job.quarantined`	Non-English document detected early in pipeline.
`job.rejected_phi`	PHI found and `rejectIfPHI=true` was set on upload.
`job.invalid_document`	Document is not a supported single-visit US lab report.

Endpoints

Lab Upload

POST /lab/upload

Upload a lab result PDF for processing. Returns immediately with a jobId. Supports two modes depending on file size:

Base64 upload (≤10 MB): Include pdfFile in the request body. Returns 202 Accepted.
Presigned URL upload (>10 MB, up to 50 MB): Omit pdfFile. Returns 200 OK with an uploadUrl. PUT your PDF directly to that URL within 5 minutes — processing starts automatically.

Request Body

Field	Type	Required	Description
`pdfFile`	string (base64)	No	Base64-encoded PDF. Max 10 MB. Omit to receive a presigned S3 URL instead.
`filename`	string	No	Original filename for display purposes.
`dateOfBirth`	string (YYYY-MM-DD)	No	Patient DOB for verification. A boolean match flag is returned — no PII is stored or returned.
`webhookUrl`	string (URI)	No	Public HTTPS endpoint to receive push notifications when processing completes.
`rejectIfPHI`	boolean	No	Default `false`. If `true`, immediately reject and delete the document if PHI is detected instead of redacting it.

Responses

Status	Description
`202 Accepted`	PDF queued. Returns `{ jobId, status: "queued", message }`.
`200 OK`	Presigned URL generated (when `pdfFile` is omitted).
`400 Bad Request`	Invalid request parameters.
`413 Content Too Large`	File exceeds 10 MB base64 limit — use the presigned URL flow.
`429 Too Many Requests`	Rate limit exceeded (100 uploads/hour).

Get Lab Status

GET /lab/status/{jobId}

Check the processing status of a job. When status is completed, the response includes the full extraction results — no separate results call is needed.

Path Parameters

Parameter	Required	Description
`jobId`	Yes	Job ID returned from the upload endpoint.

Get Lab Results

GET /lab/results/{jobId}

Retrieve full extraction results for a completed job. Useful if webhook delivery failed or if you prefer polling. Returns 425 Too Early if processing is not yet complete.

Path Parameters

Parameter	Required	Description
`jobId`	Yes	Job ID returned from the upload endpoint.

Code Examples

Base64 Upload

async function uploadLabPDF(filePath) {
  const pdfBase64 = fs.readFileSync(filePath).toString('base64');



  const response = await fetch(
    'https://api.voloridgehealth.com/extraction/lab/upload',
    {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'x-api-key': 'YOUR_API_KEY'
      },
      body: JSON.stringify({
        pdfFile: pdfBase64,
        filename: 'lab_results_2024.pdf',
        dateOfBirth: '1985-03-15',
        webhookUrl: 'https://yourapp.com/api/webhooks/lab-results'
      })
    }
  );



  const { jobId, status } = await response.json();
  console.log(`Job created: ${jobId} — status: ${status}`);
  return jobId;
}

import base64, requests

def upload_lab_pdf(file_path):
    with open(file_path, 'rb') as f:
        pdf_base64 = base64.b64encode(f.read()).decode('utf-8')

    response = requests.post(
        'https://api.voloridgehealth.com/extraction/lab/upload',
        headers={
            'Content-Type': 'application/json',
            'x-api-key': 'YOUR_API_KEY'
        },
        json={
            'pdfFile': pdf_base64,
            'filename': 'lab_results_2024.pdf',
            'dateOfBirth': '1985-03-15',
            'webhookUrl': 'https://yourapp.com/api/webhooks/lab-results'
        }
    )
    data = response.json()
    print(f"Job created: {data['jobId']} — status: {data['status']}")
    return data['jobId']

Sample response (202):

{
  "jobId": "550e8400-e29b-41d4-a716-446655440000_1699564800",
  "status": "queued",
  "message": "PDF uploaded successfully. Processing started."
}

Large File Upload (>10 MB)

Omit pdfFile to receive a presigned S3 URL, then PUT your file directly to S3.

async function uploadLargePDF(filePath) {
  // Step 1: Request a presigned URL (omit pdfFile)
  const initRes = await fetch(
    'https://api.voloridgehealth.com/extraction/lab/upload',
    {
      method: 'POST',
      headers: { 'Content-Type': 'application/json', 'x-api-key': 'YOUR_API_KEY' },
      body: JSON.stringify({
        filename: 'large_lab_report.pdf',
        dateOfBirth: '1985-03-15',
        webhookUrl: 'https://yourapp.com/api/webhooks/lab-results'
      })
    }
  );
  const { jobId, uploadUrl } = await initRes.json();

  // Step 2: PUT the PDF directly to S3
  const fs = require('fs');
  await fetch(uploadUrl, {
    method: 'PUT',
    headers: { 'Content-Type': 'application/pdf' },
    body: fs.readFileSync(filePath)
  });

  console.log('Upload complete. Processing will start automatically.');
  return jobId;
}

import requests

def upload_large_pdf(file_path):
    # Step 1: Request a presigned URL (no pdfFile)
    data = requests.post(
        'https://api.voloridgehealth.com/extraction/lab/upload',
        headers={'Content-Type': 'application/json', 'x-api-key': 'YOUR_API_KEY'},
        json={
            'filename': 'large_lab_report.pdf',
            'dateOfBirth': '1985-03-15',
            'webhookUrl': 'https://yourapp.com/api/webhooks/lab-results'
        }
    ).json()

    # Step 2: PUT the PDF directly to S3
    with open(file_path, 'rb') as f:
        requests.put(data['uploadUrl'], data=f, headers={'Content-Type': 'application/pdf'})

    print('Upload complete. Processing starts automatically.')
    return data['jobId']

Polling for Results

Poll GET /lab/status/{jobId} until the job reaches a terminal status.

async function waitForResults(jobId, intervalMs = 15000) {
  const url = `https://api.voloridgehealth.com/extraction/lab/status/${jobId}`;
  const headers = { 'x-api-key': 'YOUR_API_KEY' };

  while (true) {
    const data = await fetch(url, { headers }).then(r => r.json());
    console.log(`Status: ${data.status}`);

    if (data.status === 'completed') {
      return data; // full results included in status response
    }
    if (['failed', 'quarantined', 'rejected_phi', 'invalid_document'].includes(data.status)) {
      throw new Error(`Job ended with status: ${data.status}`);
    }

    await new Promise(r => setTimeout(r, intervalMs));
  }
}

import time, requests

TERMINAL = {'completed', 'failed', 'quarantined', 'rejected_phi', 'invalid_document'}

def wait_for_results(job_id, interval=15):
    url = f'https://api.voloridgehealth.com/extraction/lab/status/{job_id}'
    headers = {'x-api-key': 'YOUR_API_KEY'}

    while True:
        data = requests.get(url, headers=headers).json()
        print(f"Status: {data['status']}")

        if data['status'] == 'completed':
            return data  # full results included
        if data['status'] in TERMINAL:
            raise Exception(f"Job ended: {data['status']}")

        time.sleep(interval)

Webhook Handler

Always verify the X-API-Key header before processing incoming webhook events.

const express = require('express');
const app = express();
app.use(express.json());

app.post('/api/webhooks/lab-results', (req, res) => {
  // 1. Authenticate
  if (req.headers['x-api-key'] !== process.env.VOLORIDGE_WEBHOOK_KEY) {
    return res.status(401).send('Unauthorized');
  }

  const { event, jobId, data } = req.body;

  if (event === 'job.completed') {
    const { labName, dateCollected, panels, verification } = data;

    panels.forEach(panel => {
      console.log(`Panel: ${panel.panelName}`);
      panel.panelResults.forEach(r => {
        console.log(`  ${r.biomarker}: ${r.resultValue} ${r.resultUnit}`
          + ` (LOINC: ${r.loincCode}, UCUM: ${r.ucumCode})`);
      });
    });

    if (verification?.dobMatches === false) {
      console.warn('DOB mismatch — flag for review');
    }

  } else if (event === 'job.failed') {
    console.error(`Job failed: ${data.error.message}`);

  } else if (event === 'job.invalid_document') {
    console.warn(`Invalid document: ${data.error.rejection_reason}`);
  }

  res.status(200).send('OK');
});

from flask import Flask, request, jsonify
import os

app = Flask(__name__)

@app.route('/api/webhooks/lab-results', methods=['POST'])
def lab_webhook():
    # 1. Authenticate
    if request.headers.get('X-API-Key') != os.environ['VOLORIDGE_WEBHOOK_KEY']:
        return jsonify({'error': 'Unauthorized'}), 401

    body = request.json
    event, data = body['event'], body['data']

    if event == 'job.completed':
        for panel in data['panels']:
            print(f"Panel: {panel['panelName']}")
            for r in panel['panelResults']:
                print(f"  {r['biomarker']}: {r['resultValue']} "
                      f"{r['resultUnit']} (LOINC: {r['loincCode']})")

    elif event == 'job.invalid_document':
        print(f"Rejected: {data['error']['rejection_reason']}")

    return jsonify({'received': True}), 200

Sample completed webhook payload:

{
  "event": "job.completed",
  "jobId": "550e8400-e29b-41d4-a716-446655440000_1699564800",
  "timestamp": "2025-11-15T10:35:18.415821+00:00",
  "data": {
    "jobId": "550e8400-e29b-41d4-a716-446655440000_1699564800",
    "status": "completed",
    "labName": "Quest Diagnostics",
    "dateCollected": "2024-04-11",
    "verification": { "dobFound": true, "dobMatches": true },
    "panels": [
      {
        "panelName": "Complete Blood Count",
        "panelCode": "CBC",
        "dateCollected": "2024-04-11",
        "pageNumber": 1,
        "panelResults": [
          {
            "biomarker": "White Blood Cell Count",
            "resultValue": "4.9",
            "resultType": "numeric",
            "resultUnit": "Thousand/uL",
            "loincCode": "6690-2",
            "loincMatchType": "exact",
            "ucumCode": "10*3/uL",
            "ucumMatchMethod": "static_map",
            "minRefRangeValue": 3.8,
            "maxRefRangeValue": 10.8,
            "referenceRangeText": "3.8-10.8",
            "pageNumber": 1
          }
        ]
      }
    ]
  }
}

Job Statuses

Status	Description
`pending_upload`	Presigned URL issued; awaiting file upload to S3.
`queued`	File received; waiting for a processing slot.
`processing_language`	Detecting document language.
`processing_phi`	Detecting and redacting PHI.
`processing_extraction`	Extracting lab results with AI vision.
`completed`	Results ready. Full data included in the status response.
`failed`	Unrecoverable error during processing.
`quarantined`	Non-English document detected by language classifier.
`rejected_phi`	PHI detected and `rejectIfPHI=true`; all files deleted.
`invalid_document`	Document cannot be processed — see `rejection_reason`.

Invalid Document Rejection Reasons

`rejection_reason`	Description
`non_supported_lab`	Lab is not Quest Diagnostics or LabCorp. `detected_lab` will be `null` if no lab name was found.
`non_english_document`	Document is not predominantly in English.
`multiple_collection_dates`	Multiple specimen dates detected — likely a concatenated report.
`non_us_document`	Non-US date formatting (e.g., DD/MM/YYYY) that cannot be disambiguated.
`non_lab_document`	Document is not a clinical lab report (e.g., correspondence, insurance form).

Result Schema

ExtractionResults

Field	Type	Description
`jobId`	string	Job identifier.
`status`	string	Always `"completed"`.
`labName`	string \| null	Laboratory name extracted from the document.
`dateCollected`	date \| null	Specimen collection date (YYYY-MM-DD).
`verification`	object	`dobFound` (bool) and `dobMatches` (bool \| null). No PII is returned.
`panels`	LabPanel[]	Array of test panels, each containing `panelResults`.
`mergeActions`	array	Actions taken to merge multi-page test results.
`validationFlags`	array	Validation issues detected during extraction.

PanelResult

Field	Type	Description
`biomarker`	string	Name of the test (e.g., "White Blood Cell Count").
`resultValue`	string \| null	Measured value.
`resultType`	enum	`numeric` \| `string` \| `range`
`resultUnit`	string \| null	Original unit from the report.
`loincCode`	string \| null	Standardized LOINC identifier.
`loincMatchType`	enum \| null	`exact` \| `biomarker_only` \| `llm_fuzzy` \| `not_found`
`ucumCode`	string \| null	Standardized UCUM unit code.
`ucumMatchMethod`	enum \| null	`static_map` \| `pattern_rule` \| `prefix_map` \| `llm` \| `not_found`
`minRefRangeValue`	float \| null	Lower reference range bound.
`maxRefRangeValue`	float \| null	Upper reference range bound.
`referenceRangeText`	string \| null	Reference range as printed on the report.
`pageNumber`	integer \| null	Page in the PDF where this result appears (1-based).

Errors

All error responses share a common structure:

{
  "error": "invalid_pdf",
  "message": "PDF file is corrupted or unreadable",
  "timestamp": "2025-11-15T10:30:22Z",
  "requestId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

HTTP Status	`error` code	Description
400	`invalid_request`	Missing or malformed request parameters.
404	`not_found`	Job ID does not exist.
413	`file_too_large`	Base64 PDF exceeds 10 MB — use the presigned URL flow.
425	`too_early`	Results requested before processing is complete.
429	`rate_limit_exceeded`	Over 100 uploads/hour. Check `details.retryAfterSeconds`.

Include the requestId from error responses when contacting [email protected].