PDF Extraction Quick Start Guide

Get pricing data from supplier catalogs automatically using Claude AI.


πŸš€ Quick Setup (5 minutes)

1. Get Anthropic API Key

# Visit https://console.anthropic.com/
# Sign up for free tier (includes $5 credit)
# Navigate to: Settings β†’ API Keys β†’ Create Key

2. Add to Environment

Create .env file in project root:

VITE_ANTHROPIC_API_KEY=sk-ant-api03-xxxxxxxxxxxxxxxxxxxxxxxxxxxxx

3. Restart Dev Server

npm run dev

That's it! PDF extraction is now active.


πŸ“€ How to Use (Shepherd View)

Step 1: Navigate to Service Chat

  1. Log in as Shepherd

  2. Click any service card (e.g., "Tires")

  3. Chat interface opens

Step 2: Upload Supplier's PDF Catalog

  1. Click πŸ“Ž button

  2. Select supplier's PDF pricing catalog

  3. Watch the magic happen:

Step 3: View Extracted Data

  • Extracted data stored in conversation

  • Available for quote comparison

  • Can re-upload if extraction fails


πŸ“₯ How to Use (Supplier View)

Step 1: Navigate to RFQ

  1. Log in as Supplier

  2. View open RFQs

  3. Click RFQ to open chat

Step 2: Upload Your Catalog

  1. Click πŸ“Ž button

  2. Select your pricing PDF

  3. AI extracts your services:

Step 3: Send Quote

  • Use extracted data to respond to RFQ

  • AI can auto-generate quote message

  • Pricing data saved for future RFQs


πŸ“„ Supported File Formats

  • βœ… Text-based PDFs (best results)

  • βœ… Scanned PDFs with OCR (good results)

  • βœ… Multi-page catalogs

  • βœ… Tables and structured pricing

  • ⚠️ Image-heavy PDFs (variable results)

  • ❌ Password-protected PDFs

CSVs (Alternative)

  • βœ… Simple pricing spreadsheets

  • βœ… Headers: name, price, description

  • βœ… Faster processing than PDF

  • ⚠️ Less metadata extracted

Size Limits

  • Max file size: 10MB

  • Max pages: ~50 pages (varies by complexity)

  • Average processing time: 5-10 seconds


πŸ’° Cost Tracking

Pricing

  • Input tokens: $3 per million

  • Output tokens: $15 per million

Typical Costs

Document Type
Input Tokens
Output Tokens
Cost

1-page PDF

500

200

$0.005

5-page PDF

2000

500

$0.014

10-page PDF

4000

800

$0.024

20-page catalog

8000

1000

$0.039

Free Tier

  • Anthropic gives $5 credit for new accounts

  • That's ~200-500 PDF extractions

  • Perfect for testing and MVP

Monitoring Costs

  • Each extraction logs cost to browser console

  • Check Anthropic dashboard for usage

  • Set billing alerts at $10, $50, $100


🎯 What Gets Extracted

Service Information

Optional Features

  • features: ["24/7 availability", "Mobile service"]

  • includesTPMS: true (tire pressure monitoring)

  • warrantyMonths: 12

  • responseTimeMinutes: 60

  • coverageZones: ["San Francisco", "Oakland"]

Volume Discounts

Contact Info


πŸ”§ Troubleshooting

"Extraction failed: API key not found"

Fix: Add VITE_ANTHROPIC_API_KEY to .env file

"Extraction failed: Invalid PDF format"

Fix:

  • Ensure PDF is not password-protected

  • Try re-saving PDF from original source

  • Try converting to CSV instead

"Confidence: low" results

Reasons:

  • PDF is image-only (no text layer)

  • Unusual formatting/layout

  • Missing pricing information

  • Handwritten pricing

Fix:

  • Ask supplier for text-based PDF

  • Convert to CSV manually

  • Edit PDF to add text layer with OCR tool

High token costs

Reasons:

  • Very large multi-page catalogs

  • Repeated extractions of same file

  • Image-heavy PDFs

Fix:

  • Extract pricing pages only

  • Cache extraction results

  • Use CSV for simple price lists


πŸŽ“ Best Practices

For Shepherds

  1. Ask suppliers for digital catalogs (not scanned)

  2. Request CSV exports for simpler price lists

  3. Upload once per supplier (results cached in conversation)

  4. Verify extracted prices before finalizing contracts

For Suppliers

  1. Provide clean, text-based PDFs

  2. Include all pricing tiers in one document

  3. Clearly label volume discounts

  4. Update catalog regularly (upload new version when prices change)

General

  • Always review confidence score

  • Cross-check critical prices manually

  • Use extraction as starting point, not final source of truth

  • Report extraction errors to improve prompt engineering


🚨 Security Notes

Development Mode

  • API key in .env file (local only)

  • dangerouslyAllowBrowser: true enabled

  • ⚠️ DO NOT COMMIT .env TO GIT

Production Mode (TODO)

  • Move API calls to backend

  • Create /api/extract-pdf endpoint

  • API key stored server-side only

  • Rate limiting per user

  • Cost monitoring and alerts


πŸ“Š Example Extraction

Input PDF

Output JSON

Token Usage

  • Input: 1,234 tokens

  • Output: 456 tokens

  • Cost: $0.0105


πŸŽ‰ Success Stories (Expected)

Scenario 1: Tire Services

Before: Shepherd calls 10 tire shops, manually records pricing After: 10 suppliers upload PDFs, AI extracts all pricing in 2 minutes Time saved: 3 hours β†’ 2 minutes

Scenario 2: Fleet Parking

Before: Receive 5 email quotes, copy prices to spreadsheet After: Ask suppliers to upload catalogs in chat, instant comparison Time saved: 30 minutes β†’ instant

Scenario 3: Multi-Service Comparison

Before: Manually build comparison table across 8 categories After: Suppliers upload master catalogs once, all services extracted Time saved: 8 hours β†’ 5 minutes


πŸ“ž Support

Questions?

Issues?

  • Low confidence extractions β†’ Contact Claude to improve prompt

  • API errors β†’ Check Anthropic status page

  • Cost concerns β†’ Monitor usage in dashboard


Ready to extract pricing? Upload your first PDF and watch the magic! πŸͺ„

Last updated

Was this helpful?