OCR and document extraction

BiVelio integrates OCR (Optical Character Recognition) capabilities that allow you to digitize physical documents, extract text from images and obtain structured data from any type of document.

OCR process

Automatic detection

When a document is uploaded to BiVelio (manually or via ChannelHub), the system automatically detects whether it requires OCR processing:

PDFs with embedded text — indexed directly without OCR
Scanned PDFs (image) — processed with OCR to extract text
Images (JPG, PNG, TIFF) — processed with OCR
Text documents — indexed directly

Processing pipeline


Document received
  → Type detection (text vs image)
  → If image: OCR → text extraction
  → Indexing for full-text search
  → (Optional) AI agent → structured data extraction
  → Data available in the system

Intelligent data extraction

The AI layer on top of OCR allows you to go beyond text extraction: it interprets the content and generates structured data usable by the system.

Supported document types

Document type	Extracted data
Invoices	Issuer, tax ID, date, line items, amounts, VAT, total
Delivery notes	Supplier, products, quantities, delivery date
ID / Passport	First name, last name, document number, date of birth, nationality
Contracts	Parties, subject matter, main conditions, dates, key clauses
Forms	Filled fields with their values (adaptive structure)
Receipts	Establishment, date, amounts, payment method
Certificates	Issuing entity, beneficiary, certified data, validity period

Confidence level

Each extracted piece of data includes a confidence level (0-1) indicating the reliability of the extraction:

0.90 - 1.00 — high confidence, reliable data
0.70 - 0.89 — medium confidence, review recommended
< 0.70 — low confidence, manual review required

BiVelio automatically flags fields with confidence below 0.70 so that an operator can review them before using the data.

Integration with the system

Data extracted by OCR is automatically integrated with BiVelio modules:

With cases

Documents attached to cases are processed automatically
Extracted data is associated with the case as metadata
Full-text search across OCR documents from within the case

With CRM

Contact data extracted from business cards or documents is suggested for creating contacts
Identity documents linked to contact records

With billing

Supplier invoices processed automatically
Tax data extracted and verified
Suggested accounting entry based on extracted data

With workflows

The Data Extractor invocable agent can be used in workflow nodes
Automated batch document processing
Validation of extracted data via Decision nodes

Credit consumption

Operation	Credits
Basic OCR (text extraction)	1 credit
OCR + structured extraction (AI agent)	2-5 credits
Batch processing (>10 documents)	20% credit discount

Supported languages

The OCR engine supports multiple languages for text extraction:

Spanish, Catalan, English, French, Portuguese, Italian, German
Automatic language detection from the document