This document outlines the data capture and persistence strategy for Robo-Hub's negotiation system, focusing on leveraging Claude Code's built-in PDF extraction capabilities for automated data structuring.
1. Available Claude Code Skills for Data Extraction
PDF Skill (Built-in to Claude Code)
Claude Code has native PDF processing capabilities that can:
✅ Build DocumentExtractionService with Claude integration
✅ Add file upload UI to chat interfaces
⏳ Test PDF extraction with real tire catalogs (Phase 2)
⏳ Build pricing comparison UI (Phase 3)
8. Questions for Decision
Storage preference: S3 (AWS) or R2 (Cloudflare) or IPFS (decentralized)?
Immediate implementation: Should we start with Phase 1 (basic persistence) now?
PDF extraction priority: Is automated extraction critical for MVP or can it wait?
Privacy level: How anonymous should cross-shepherd analytics be?
Recommendation: Start with Phase 1 (basic chat persistence) this week, add PDF extraction (Phase 2) when you have 5+ pilot suppliers uploading catalogs.
-- Privacy-preserving view for cross-shepherd analytics
CREATE VIEW anonymous_pricing_data AS
SELECT
service_category,
zone,
CASE
WHEN fleet_size < 25 THEN '1-25'
WHEN fleet_size < 50 THEN '26-50'
WHEN fleet_size < 100 THEN '51-100'
ELSE '100+'
END as fleet_size_range,
final_agreed_price,
-- NO shepherd_id or supplier_id exposed
key_factors->>'includes_tpms' as includes_tpms
FROM negotiation_outcomes
WHERE outcome = 'agreed'
AND created_at > NOW() - INTERVAL '90 days';