Lab PDF Extraction
Lab PDF processing API - upload PDFs, extract structured lab results with PHI redaction, LOINC mapping, and UCUM unit standardization.
The Lab PDF Extraction Service processes clinical laboratory reports (PDFs) and returns structured, standardized results with PHI redaction, LOINC code mapping, and UCUM unit standardization — ready for downstream health scoring or storage.
Base URL: https://api.voloridgehealth.com/extraction/
Currently supports lab reports from Quest Diagnostics and LabCorp. Documents must be in English and represent a single US lab visit.
Authentication
All requests must include your API key in the x-api-key header. Contact [email protected] to obtain a key.
x-api-key: your-api-key-hereWebhook deliveries also include an X-API-Key header so you can verify the source before processing incoming events.
Processing Pipeline
Every uploaded PDF passes through an automated 2–5 minute pipeline:
- Upload — Submit via base64 or presigned S3 URL
- Language Detection — English validation
- PHI Detection & Redaction — Protected health information is identified and removed
- Lab Result Extraction — AI vision parses the document
- LOINC Mapping — Each biomarker is mapped to a standardized LOINC code
- UCUM Standardization — Units are normalized to UCUM codes
Async Processing & Webhooks
The upload endpoint returns immediately with a jobId. Because processing takes 2–5 minutes, you have two options to retrieve results:
Option A — Webhook (recommended)
Provide a webhookUrl on upload. The API will POST the full results payload to your endpoint when processing completes, fails, or is quarantined. Deliveries include an X-API-Key header for authentication and are retried up to 5 times with exponential backoff.
Option B — Polling
Periodically call GET /lab/status/{jobId} until status is completed, then read the full results directly from that response (or fetch them separately from GET /lab/results/{jobId}).
Webhook Event Types
| Event | Description |
|---|---|
job.completed | Processing succeeded. data contains full extraction results. |
job.failed | Unrecoverable processing error. |
job.quarantined | Non-English document detected early in pipeline. |
job.rejected_phi | PHI found and rejectIfPHI=true was set on upload. |
job.invalid_document | Document is not a supported single-visit US lab report. |
Endpoints
Lab Upload
POST /lab/upload
Upload a lab result PDF for processing. Returns immediately with a jobId. Supports two modes depending on file size:
- Base64 upload (≤10 MB): Include
pdfFilein the request body. Returns202 Accepted. - Presigned URL upload (>10 MB, up to 50 MB): Omit
pdfFile. Returns200 OKwith anuploadUrl. PUT your PDF directly to that URL within 5 minutes — processing starts automatically.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
pdfFile | string (base64) | No | Base64-encoded PDF. Max 10 MB. Omit to receive a presigned S3 URL instead. |
filename | string | No | Original filename for display purposes. |
dateOfBirth | string (YYYY-MM-DD) | No | Patient DOB for verification. A boolean match flag is returned — no PII is stored or returned. |
webhookUrl | string (URI) | No | Public HTTPS endpoint to receive push notifications when processing completes. |
rejectIfPHI | boolean | No | Default false. If true, immediately reject and delete the document if PHI is detected instead of redacting it. |
Responses
| Status | Description |
|---|---|
202 Accepted | PDF queued. Returns { jobId, status: "queued", message }. |
200 OK | Presigned URL generated (when pdfFile is omitted). |
400 Bad Request | Invalid request parameters. |
413 Content Too Large | File exceeds 10 MB base64 limit — use the presigned URL flow. |
429 Too Many Requests | Rate limit exceeded (100 uploads/hour). |
Get Lab Status
GET /lab/status/{jobId}
Check the processing status of a job. When status is completed, the response includes the full extraction results — no separate results call is needed.
Path Parameters
| Parameter | Required | Description |
|---|---|---|
jobId | Yes | Job ID returned from the upload endpoint. |
Get Lab Results
GET /lab/results/{jobId}
Retrieve full extraction results for a completed job. Useful if webhook delivery failed or if you prefer polling. Returns 425 Too Early if processing is not yet complete.
Path Parameters
| Parameter | Required | Description |
|---|---|---|
jobId | Yes | Job ID returned from the upload endpoint. |
Code Examples
Base64 Upload
async function uploadLabPDF(filePath) {
const pdfBase64 = fs.readFileSync(filePath).toString('base64');
const response = await fetch(
'https://api.voloridgehealth.com/extraction/lab/upload',
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-api-key': 'YOUR_API_KEY'
},
body: JSON.stringify({
pdfFile: pdfBase64,
filename: 'lab_results_2024.pdf',
dateOfBirth: '1985-03-15',
webhookUrl: 'https://yourapp.com/api/webhooks/lab-results'
})
}
);
const { jobId, status } = await response.json();
console.log(`Job created: ${jobId} — status: ${status}`);
return jobId;
}import base64, requests
def upload_lab_pdf(file_path):
with open(file_path, 'rb') as f:
pdf_base64 = base64.b64encode(f.read()).decode('utf-8')
response = requests.post(
'https://api.voloridgehealth.com/extraction/lab/upload',
headers={
'Content-Type': 'application/json',
'x-api-key': 'YOUR_API_KEY'
},
json={
'pdfFile': pdf_base64,
'filename': 'lab_results_2024.pdf',
'dateOfBirth': '1985-03-15',
'webhookUrl': 'https://yourapp.com/api/webhooks/lab-results'
}
)
data = response.json()
print(f"Job created: {data['jobId']} — status: {data['status']}")
return data['jobId']Sample response (202):
{
"jobId": "550e8400-e29b-41d4-a716-446655440000_1699564800",
"status": "queued",
"message": "PDF uploaded successfully. Processing started."
}Large File Upload (>10 MB)
Omit pdfFile to receive a presigned S3 URL, then PUT your file directly to S3.
async function uploadLargePDF(filePath) {
// Step 1: Request a presigned URL (omit pdfFile)
const initRes = await fetch(
'https://api.voloridgehealth.com/extraction/lab/upload',
{
method: 'POST',
headers: { 'Content-Type': 'application/json', 'x-api-key': 'YOUR_API_KEY' },
body: JSON.stringify({
filename: 'large_lab_report.pdf',
dateOfBirth: '1985-03-15',
webhookUrl: 'https://yourapp.com/api/webhooks/lab-results'
})
}
);
const { jobId, uploadUrl } = await initRes.json();
// Step 2: PUT the PDF directly to S3
const fs = require('fs');
await fetch(uploadUrl, {
method: 'PUT',
headers: { 'Content-Type': 'application/pdf' },
body: fs.readFileSync(filePath)
});
console.log('Upload complete. Processing will start automatically.');
return jobId;
}import requests
def upload_large_pdf(file_path):
# Step 1: Request a presigned URL (no pdfFile)
data = requests.post(
'https://api.voloridgehealth.com/extraction/lab/upload',
headers={'Content-Type': 'application/json', 'x-api-key': 'YOUR_API_KEY'},
json={
'filename': 'large_lab_report.pdf',
'dateOfBirth': '1985-03-15',
'webhookUrl': 'https://yourapp.com/api/webhooks/lab-results'
}
).json()
# Step 2: PUT the PDF directly to S3
with open(file_path, 'rb') as f:
requests.put(data['uploadUrl'], data=f, headers={'Content-Type': 'application/pdf'})
print('Upload complete. Processing starts automatically.')
return data['jobId']Polling for Results
Poll GET /lab/status/{jobId} until the job reaches a terminal status.
async function waitForResults(jobId, intervalMs = 15000) {
const url = `https://api.voloridgehealth.com/extraction/lab/status/${jobId}`;
const headers = { 'x-api-key': 'YOUR_API_KEY' };
while (true) {
const data = await fetch(url, { headers }).then(r => r.json());
console.log(`Status: ${data.status}`);
if (data.status === 'completed') {
return data; // full results included in status response
}
if (['failed', 'quarantined', 'rejected_phi', 'invalid_document'].includes(data.status)) {
throw new Error(`Job ended with status: ${data.status}`);
}
await new Promise(r => setTimeout(r, intervalMs));
}
}import time, requests
TERMINAL = {'completed', 'failed', 'quarantined', 'rejected_phi', 'invalid_document'}
def wait_for_results(job_id, interval=15):
url = f'https://api.voloridgehealth.com/extraction/lab/status/{job_id}'
headers = {'x-api-key': 'YOUR_API_KEY'}
while True:
data = requests.get(url, headers=headers).json()
print(f"Status: {data['status']}")
if data['status'] == 'completed':
return data # full results included
if data['status'] in TERMINAL:
raise Exception(f"Job ended: {data['status']}")
time.sleep(interval)Webhook Handler
Always verify the X-API-Key header before processing incoming webhook events.
const express = require('express');
const app = express();
app.use(express.json());
app.post('/api/webhooks/lab-results', (req, res) => {
// 1. Authenticate
if (req.headers['x-api-key'] !== process.env.VOLORIDGE_WEBHOOK_KEY) {
return res.status(401).send('Unauthorized');
}
const { event, jobId, data } = req.body;
if (event === 'job.completed') {
const { labName, dateCollected, panels, verification } = data;
panels.forEach(panel => {
console.log(`Panel: ${panel.panelName}`);
panel.panelResults.forEach(r => {
console.log(` ${r.biomarker}: ${r.resultValue} ${r.resultUnit}`
+ ` (LOINC: ${r.loincCode}, UCUM: ${r.ucumCode})`);
});
});
if (verification?.dobMatches === false) {
console.warn('DOB mismatch — flag for review');
}
} else if (event === 'job.failed') {
console.error(`Job failed: ${data.error.message}`);
} else if (event === 'job.invalid_document') {
console.warn(`Invalid document: ${data.error.rejection_reason}`);
}
res.status(200).send('OK');
});from flask import Flask, request, jsonify
import os
app = Flask(__name__)
@app.route('/api/webhooks/lab-results', methods=['POST'])
def lab_webhook():
# 1. Authenticate
if request.headers.get('X-API-Key') != os.environ['VOLORIDGE_WEBHOOK_KEY']:
return jsonify({'error': 'Unauthorized'}), 401
body = request.json
event, data = body['event'], body['data']
if event == 'job.completed':
for panel in data['panels']:
print(f"Panel: {panel['panelName']}")
for r in panel['panelResults']:
print(f" {r['biomarker']}: {r['resultValue']} "
f"{r['resultUnit']} (LOINC: {r['loincCode']})")
elif event == 'job.invalid_document':
print(f"Rejected: {data['error']['rejection_reason']}")
return jsonify({'received': True}), 200Sample completed webhook payload:
{
"event": "job.completed",
"jobId": "550e8400-e29b-41d4-a716-446655440000_1699564800",
"timestamp": "2025-11-15T10:35:18.415821+00:00",
"data": {
"jobId": "550e8400-e29b-41d4-a716-446655440000_1699564800",
"status": "completed",
"labName": "Quest Diagnostics",
"dateCollected": "2024-04-11",
"verification": { "dobFound": true, "dobMatches": true },
"panels": [
{
"panelName": "Complete Blood Count",
"panelCode": "CBC",
"dateCollected": "2024-04-11",
"pageNumber": 1,
"panelResults": [
{
"biomarker": "White Blood Cell Count",
"resultValue": "4.9",
"resultType": "numeric",
"resultUnit": "Thousand/uL",
"loincCode": "6690-2",
"loincMatchType": "exact",
"ucumCode": "10*3/uL",
"ucumMatchMethod": "static_map",
"minRefRangeValue": 3.8,
"maxRefRangeValue": 10.8,
"referenceRangeText": "3.8-10.8",
"pageNumber": 1
}
]
}
]
}
}Job Statuses
| Status | Description |
|---|---|
pending_upload | Presigned URL issued; awaiting file upload to S3. |
queued | File received; waiting for a processing slot. |
processing_language | Detecting document language. |
processing_phi | Detecting and redacting PHI. |
processing_extraction | Extracting lab results with AI vision. |
completed | Results ready. Full data included in the status response. |
failed | Unrecoverable error during processing. |
quarantined | Non-English document detected by language classifier. |
rejected_phi | PHI detected and rejectIfPHI=true; all files deleted. |
invalid_document | Document cannot be processed — see rejection_reason. |
Invalid Document Rejection Reasons
rejection_reason | Description |
|---|---|
non_supported_lab | Lab is not Quest Diagnostics or LabCorp. detected_lab will be null if no lab name was found. |
non_english_document | Document is not predominantly in English. |
multiple_collection_dates | Multiple specimen dates detected — likely a concatenated report. |
non_us_document | Non-US date formatting (e.g., DD/MM/YYYY) that cannot be disambiguated. |
non_lab_document | Document is not a clinical lab report (e.g., correspondence, insurance form). |
Result Schema
ExtractionResults
| Field | Type | Description |
|---|---|---|
jobId | string | Job identifier. |
status | string | Always "completed". |
labName | string | null | Laboratory name extracted from the document. |
dateCollected | date | null | Specimen collection date (YYYY-MM-DD). |
verification | object | dobFound (bool) and dobMatches (bool | null). No PII is returned. |
panels | LabPanel[] | Array of test panels, each containing panelResults. |
mergeActions | array | Actions taken to merge multi-page test results. |
validationFlags | array | Validation issues detected during extraction. |
PanelResult
| Field | Type | Description |
|---|---|---|
biomarker | string | Name of the test (e.g., "White Blood Cell Count"). |
resultValue | string | null | Measured value. |
resultType | enum | numeric | string | range |
resultUnit | string | null | Original unit from the report. |
loincCode | string | null | Standardized LOINC identifier. |
loincMatchType | enum | null | exact | biomarker_only | llm_fuzzy | not_found |
ucumCode | string | null | Standardized UCUM unit code. |
ucumMatchMethod | enum | null | static_map | pattern_rule | prefix_map | llm | not_found |
minRefRangeValue | float | null | Lower reference range bound. |
maxRefRangeValue | float | null | Upper reference range bound. |
referenceRangeText | string | null | Reference range as printed on the report. |
pageNumber | integer | null | Page in the PDF where this result appears (1-based). |
Errors
All error responses share a common structure:
{
"error": "invalid_pdf",
"message": "PDF file is corrupted or unreadable",
"timestamp": "2025-11-15T10:30:22Z",
"requestId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}| HTTP Status | error code | Description |
|---|---|---|
| 400 | invalid_request | Missing or malformed request parameters. |
| 404 | not_found | Job ID does not exist. |
| 413 | file_too_large | Base64 PDF exceeds 10 MB — use the presigned URL flow. |
| 425 | too_early | Results requested before processing is complete. |
| 429 | rate_limit_exceeded | Over 100 uploads/hour. Check details.retryAfterSeconds. |
Include the
requestIdfrom error responses when contacting [email protected].
Updated 15 days ago
