Lab PDF Extraction

Lab PDF processing API - upload PDFs, extract structured lab results with PHI redaction, LOINC mapping, and UCUM unit standardization.

The Lab PDF Extraction Service processes clinical laboratory reports (PDFs) and returns structured, standardized results with PHI redaction, LOINC code mapping, and UCUM unit standardization — ready for downstream health scoring or storage.

Base URL: https://api.voloridgehealth.com/extraction/

Currently supports lab reports from Quest Diagnostics and LabCorp. Documents must be in English and represent a single US lab visit.


Authentication

All requests must include your API key in the x-api-key header. Contact [email protected] to obtain a key.

x-api-key: your-api-key-here

Webhook deliveries also include an X-API-Key header so you can verify the source before processing incoming events.


Processing Pipeline

Every uploaded PDF passes through an automated 2–5 minute pipeline:

  1. Upload — Submit via base64 or presigned S3 URL
  2. Language Detection — English validation
  3. PHI Detection & Redaction — Protected health information is identified and removed
  4. Lab Result Extraction — AI vision parses the document
  5. LOINC Mapping — Each biomarker is mapped to a standardized LOINC code
  6. UCUM Standardization — Units are normalized to UCUM codes

Async Processing & Webhooks

The upload endpoint returns immediately with a jobId. Because processing takes 2–5 minutes, you have two options to retrieve results:

Option A — Webhook (recommended)

Provide a webhookUrl on upload. The API will POST the full results payload to your endpoint when processing completes, fails, or is quarantined. Deliveries include an X-API-Key header for authentication and are retried up to 5 times with exponential backoff.

Option B — Polling

Periodically call GET /lab/status/{jobId} until status is completed, then read the full results directly from that response (or fetch them separately from GET /lab/results/{jobId}).

Webhook Event Types

EventDescription
job.completedProcessing succeeded. data contains full extraction results.
job.failedUnrecoverable processing error.
job.quarantinedNon-English document detected early in pipeline.
job.rejected_phiPHI found and rejectIfPHI=true was set on upload.
job.invalid_documentDocument is not a supported single-visit US lab report.

Endpoints

Lab Upload

POST /lab/upload

Upload a lab result PDF for processing. Returns immediately with a jobId. Supports two modes depending on file size:

  • Base64 upload (≤10 MB): Include pdfFile in the request body. Returns 202 Accepted.
  • Presigned URL upload (>10 MB, up to 50 MB): Omit pdfFile. Returns 200 OK with an uploadUrl. PUT your PDF directly to that URL within 5 minutes — processing starts automatically.

Request Body

FieldTypeRequiredDescription
pdfFilestring (base64)NoBase64-encoded PDF. Max 10 MB. Omit to receive a presigned S3 URL instead.
filenamestringNoOriginal filename for display purposes.
dateOfBirthstring (YYYY-MM-DD)NoPatient DOB for verification. A boolean match flag is returned — no PII is stored or returned.
webhookUrlstring (URI)NoPublic HTTPS endpoint to receive push notifications when processing completes.
rejectIfPHIbooleanNoDefault false. If true, immediately reject and delete the document if PHI is detected instead of redacting it.

Responses

StatusDescription
202 AcceptedPDF queued. Returns { jobId, status: "queued", message }.
200 OKPresigned URL generated (when pdfFile is omitted).
400 Bad RequestInvalid request parameters.
413 Content Too LargeFile exceeds 10 MB base64 limit — use the presigned URL flow.
429 Too Many RequestsRate limit exceeded (100 uploads/hour).

Get Lab Status

GET /lab/status/{jobId}

Check the processing status of a job. When status is completed, the response includes the full extraction results — no separate results call is needed.

Path Parameters

ParameterRequiredDescription
jobIdYesJob ID returned from the upload endpoint.

Get Lab Results

GET /lab/results/{jobId}

Retrieve full extraction results for a completed job. Useful if webhook delivery failed or if you prefer polling. Returns 425 Too Early if processing is not yet complete.

Path Parameters

ParameterRequiredDescription
jobIdYesJob ID returned from the upload endpoint.


Code Examples

Base64 Upload

async function uploadLabPDF(filePath) {
  const pdfBase64 = fs.readFileSync(filePath).toString('base64');



  const response = await fetch(
    'https://api.voloridgehealth.com/extraction/lab/upload',
    {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'x-api-key': 'YOUR_API_KEY'
      },
      body: JSON.stringify({
        pdfFile: pdfBase64,
        filename: 'lab_results_2024.pdf',
        dateOfBirth: '1985-03-15',
        webhookUrl: 'https://yourapp.com/api/webhooks/lab-results'
      })
    }
  );



  const { jobId, status } = await response.json();
  console.log(`Job created: ${jobId} — status: ${status}`);
  return jobId;
}
import base64, requests

def upload_lab_pdf(file_path):
    with open(file_path, 'rb') as f:
        pdf_base64 = base64.b64encode(f.read()).decode('utf-8')

    response = requests.post(
        'https://api.voloridgehealth.com/extraction/lab/upload',
        headers={
            'Content-Type': 'application/json',
            'x-api-key': 'YOUR_API_KEY'
        },
        json={
            'pdfFile': pdf_base64,
            'filename': 'lab_results_2024.pdf',
            'dateOfBirth': '1985-03-15',
            'webhookUrl': 'https://yourapp.com/api/webhooks/lab-results'
        }
    )
    data = response.json()
    print(f"Job created: {data['jobId']} — status: {data['status']}")
    return data['jobId']

Sample response (202):

{
  "jobId": "550e8400-e29b-41d4-a716-446655440000_1699564800",
  "status": "queued",
  "message": "PDF uploaded successfully. Processing started."
}

Large File Upload (>10 MB)

Omit pdfFile to receive a presigned S3 URL, then PUT your file directly to S3.

async function uploadLargePDF(filePath) {
  // Step 1: Request a presigned URL (omit pdfFile)
  const initRes = await fetch(
    'https://api.voloridgehealth.com/extraction/lab/upload',
    {
      method: 'POST',
      headers: { 'Content-Type': 'application/json', 'x-api-key': 'YOUR_API_KEY' },
      body: JSON.stringify({
        filename: 'large_lab_report.pdf',
        dateOfBirth: '1985-03-15',
        webhookUrl: 'https://yourapp.com/api/webhooks/lab-results'
      })
    }
  );
  const { jobId, uploadUrl } = await initRes.json();

  // Step 2: PUT the PDF directly to S3
  const fs = require('fs');
  await fetch(uploadUrl, {
    method: 'PUT',
    headers: { 'Content-Type': 'application/pdf' },
    body: fs.readFileSync(filePath)
  });

  console.log('Upload complete. Processing will start automatically.');
  return jobId;
}
import requests

def upload_large_pdf(file_path):
    # Step 1: Request a presigned URL (no pdfFile)
    data = requests.post(
        'https://api.voloridgehealth.com/extraction/lab/upload',
        headers={'Content-Type': 'application/json', 'x-api-key': 'YOUR_API_KEY'},
        json={
            'filename': 'large_lab_report.pdf',
            'dateOfBirth': '1985-03-15',
            'webhookUrl': 'https://yourapp.com/api/webhooks/lab-results'
        }
    ).json()

    # Step 2: PUT the PDF directly to S3
    with open(file_path, 'rb') as f:
        requests.put(data['uploadUrl'], data=f, headers={'Content-Type': 'application/pdf'})

    print('Upload complete. Processing starts automatically.')
    return data['jobId']

Polling for Results

Poll GET /lab/status/{jobId} until the job reaches a terminal status.

async function waitForResults(jobId, intervalMs = 15000) {
  const url = `https://api.voloridgehealth.com/extraction/lab/status/${jobId}`;
  const headers = { 'x-api-key': 'YOUR_API_KEY' };

  while (true) {
    const data = await fetch(url, { headers }).then(r => r.json());
    console.log(`Status: ${data.status}`);

    if (data.status === 'completed') {
      return data; // full results included in status response
    }
    if (['failed', 'quarantined', 'rejected_phi', 'invalid_document'].includes(data.status)) {
      throw new Error(`Job ended with status: ${data.status}`);
    }

    await new Promise(r => setTimeout(r, intervalMs));
  }
}
import time, requests

TERMINAL = {'completed', 'failed', 'quarantined', 'rejected_phi', 'invalid_document'}

def wait_for_results(job_id, interval=15):
    url = f'https://api.voloridgehealth.com/extraction/lab/status/{job_id}'
    headers = {'x-api-key': 'YOUR_API_KEY'}

    while True:
        data = requests.get(url, headers=headers).json()
        print(f"Status: {data['status']}")

        if data['status'] == 'completed':
            return data  # full results included
        if data['status'] in TERMINAL:
            raise Exception(f"Job ended: {data['status']}")

        time.sleep(interval)

Webhook Handler

Always verify the X-API-Key header before processing incoming webhook events.

const express = require('express');
const app = express();
app.use(express.json());

app.post('/api/webhooks/lab-results', (req, res) => {
  // 1. Authenticate
  if (req.headers['x-api-key'] !== process.env.VOLORIDGE_WEBHOOK_KEY) {
    return res.status(401).send('Unauthorized');
  }

  const { event, jobId, data } = req.body;

  if (event === 'job.completed') {
    const { labName, dateCollected, panels, verification } = data;

    panels.forEach(panel => {
      console.log(`Panel: ${panel.panelName}`);
      panel.panelResults.forEach(r => {
        console.log(`  ${r.biomarker}: ${r.resultValue} ${r.resultUnit}`
          + ` (LOINC: ${r.loincCode}, UCUM: ${r.ucumCode})`);
      });
    });

    if (verification?.dobMatches === false) {
      console.warn('DOB mismatch — flag for review');
    }

  } else if (event === 'job.failed') {
    console.error(`Job failed: ${data.error.message}`);

  } else if (event === 'job.invalid_document') {
    console.warn(`Invalid document: ${data.error.rejection_reason}`);
  }

  res.status(200).send('OK');
});
from flask import Flask, request, jsonify
import os

app = Flask(__name__)

@app.route('/api/webhooks/lab-results', methods=['POST'])
def lab_webhook():
    # 1. Authenticate
    if request.headers.get('X-API-Key') != os.environ['VOLORIDGE_WEBHOOK_KEY']:
        return jsonify({'error': 'Unauthorized'}), 401

    body = request.json
    event, data = body['event'], body['data']

    if event == 'job.completed':
        for panel in data['panels']:
            print(f"Panel: {panel['panelName']}")
            for r in panel['panelResults']:
                print(f"  {r['biomarker']}: {r['resultValue']} "
                      f"{r['resultUnit']} (LOINC: {r['loincCode']})")

    elif event == 'job.invalid_document':
        print(f"Rejected: {data['error']['rejection_reason']}")

    return jsonify({'received': True}), 200

Sample completed webhook payload:

{
  "event": "job.completed",
  "jobId": "550e8400-e29b-41d4-a716-446655440000_1699564800",
  "timestamp": "2025-11-15T10:35:18.415821+00:00",
  "data": {
    "jobId": "550e8400-e29b-41d4-a716-446655440000_1699564800",
    "status": "completed",
    "labName": "Quest Diagnostics",
    "dateCollected": "2024-04-11",
    "verification": { "dobFound": true, "dobMatches": true },
    "panels": [
      {
        "panelName": "Complete Blood Count",
        "panelCode": "CBC",
        "dateCollected": "2024-04-11",
        "pageNumber": 1,
        "panelResults": [
          {
            "biomarker": "White Blood Cell Count",
            "resultValue": "4.9",
            "resultType": "numeric",
            "resultUnit": "Thousand/uL",
            "loincCode": "6690-2",
            "loincMatchType": "exact",
            "ucumCode": "10*3/uL",
            "ucumMatchMethod": "static_map",
            "minRefRangeValue": 3.8,
            "maxRefRangeValue": 10.8,
            "referenceRangeText": "3.8-10.8",
            "pageNumber": 1
          }
        ]
      }
    ]
  }
}

Job Statuses

StatusDescription
pending_uploadPresigned URL issued; awaiting file upload to S3.
queuedFile received; waiting for a processing slot.
processing_languageDetecting document language.
processing_phiDetecting and redacting PHI.
processing_extractionExtracting lab results with AI vision.
completedResults ready. Full data included in the status response.
failedUnrecoverable error during processing.
quarantinedNon-English document detected by language classifier.
rejected_phiPHI detected and rejectIfPHI=true; all files deleted.
invalid_documentDocument cannot be processed — see rejection_reason.

Invalid Document Rejection Reasons

rejection_reasonDescription
non_supported_labLab is not Quest Diagnostics or LabCorp. detected_lab will be null if no lab name was found.
non_english_documentDocument is not predominantly in English.
multiple_collection_datesMultiple specimen dates detected — likely a concatenated report.
non_us_documentNon-US date formatting (e.g., DD/MM/YYYY) that cannot be disambiguated.
non_lab_documentDocument is not a clinical lab report (e.g., correspondence, insurance form).

Result Schema

ExtractionResults

FieldTypeDescription
jobIdstringJob identifier.
statusstringAlways "completed".
labNamestring | nullLaboratory name extracted from the document.
dateCollecteddate | nullSpecimen collection date (YYYY-MM-DD).
verificationobjectdobFound (bool) and dobMatches (bool | null). No PII is returned.
panelsLabPanel[]Array of test panels, each containing panelResults.
mergeActionsarrayActions taken to merge multi-page test results.
validationFlagsarrayValidation issues detected during extraction.

PanelResult

FieldTypeDescription
biomarkerstringName of the test (e.g., "White Blood Cell Count").
resultValuestring | nullMeasured value.
resultTypeenumnumeric | string | range
resultUnitstring | nullOriginal unit from the report.
loincCodestring | nullStandardized LOINC identifier.
loincMatchTypeenum | nullexact | biomarker_only | llm_fuzzy | not_found
ucumCodestring | nullStandardized UCUM unit code.
ucumMatchMethodenum | nullstatic_map | pattern_rule | prefix_map | llm | not_found
minRefRangeValuefloat | nullLower reference range bound.
maxRefRangeValuefloat | nullUpper reference range bound.
referenceRangeTextstring | nullReference range as printed on the report.
pageNumberinteger | nullPage in the PDF where this result appears (1-based).

Errors

All error responses share a common structure:

{
  "error": "invalid_pdf",
  "message": "PDF file is corrupted or unreadable",
  "timestamp": "2025-11-15T10:30:22Z",
  "requestId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}
HTTP Statuserror codeDescription
400invalid_requestMissing or malformed request parameters.
404not_foundJob ID does not exist.
413file_too_largeBase64 PDF exceeds 10 MB — use the presigned URL flow.
425too_earlyResults requested before processing is complete.
429rate_limit_exceededOver 100 uploads/hour. Check details.retryAfterSeconds.

Include the requestId from error responses when contacting [email protected].