OCR and document extraction
BiVelio integrates OCR (Optical Character Recognition) capabilities that allow you to digitize physical documents, extract text from images and obtain structured data from any type of document.
OCR process
Automatic detection
When a document is uploaded to BiVelio (manually or via ChannelHub), the system automatically detects whether it requires OCR processing:
- PDFs with embedded text — indexed directly without OCR
- Scanned PDFs (image) — processed with OCR to extract text
- Images (JPG, PNG, TIFF) — processed with OCR
- Text documents — indexed directly
Processing pipeline
Document received
→ Type detection (text vs image)
→ If image: OCR → text extraction
→ Indexing for full-text search
→ (Optional) AI agent → structured data extraction
→ Data available in the systemIntelligent data extraction
The AI layer on top of OCR allows you to go beyond text extraction: it interprets the content and generates structured data usable by the system.
Supported document types
| Document type | Extracted data |
|---|---|
| Invoices | Issuer, tax ID, date, line items, amounts, VAT, total |
| Delivery notes | Supplier, products, quantities, delivery date |
| ID / Passport | First name, last name, document number, date of birth, nationality |
| Contracts | Parties, subject matter, main conditions, dates, key clauses |
| Forms | Filled fields with their values (adaptive structure) |
| Receipts | Establishment, date, amounts, payment method |
| Certificates | Issuing entity, beneficiary, certified data, validity period |
Confidence level
Each extracted piece of data includes a confidence level (0-1) indicating the reliability of the extraction:
- 0.90 - 1.00 — high confidence, reliable data
- 0.70 - 0.89 — medium confidence, review recommended
- < 0.70 — low confidence, manual review required
BiVelio automatically flags fields with confidence below 0.70 so that an operator can review them before using the data.
Integration with the system
Data extracted by OCR is automatically integrated with BiVelio modules:
With cases
- Documents attached to cases are processed automatically
- Extracted data is associated with the case as metadata
- Full-text search across OCR documents from within the case
With CRM
- Contact data extracted from business cards or documents is suggested for creating contacts
- Identity documents linked to contact records
With billing
- Supplier invoices processed automatically
- Tax data extracted and verified
- Suggested accounting entry based on extracted data
With workflows
- The Data Extractor invocable agent can be used in workflow nodes
- Automated batch document processing
- Validation of extracted data via Decision nodes
Credit consumption
| Operation | Credits |
|---|---|
| Basic OCR (text extraction) | 1 credit |
| OCR + structured extraction (AI agent) | 2-5 credits |
| Batch processing (>10 documents) | 20% credit discount |
Supported languages
The OCR engine supports multiple languages for text extraction:
- Spanish, Catalan, English, French, Portuguese, Italian, German
- Automatic language detection from the document