Skip to Content
AI & AgentsOCR & extraction

OCR and document extraction

OCR processing pipeline

BiVelio integrates OCR (Optical Character Recognition) capabilities that allow you to digitize physical documents, extract text from images and obtain structured data from any type of document.

OCR process

Automatic detection

When a document is uploaded to BiVelio (manually or via ChannelHub), the system automatically detects whether it requires OCR processing:

  • PDFs with embedded text — indexed directly without OCR
  • Scanned PDFs (image) — processed with OCR to extract text
  • Images (JPG, PNG, TIFF) — processed with OCR
  • Text documents — indexed directly

Processing pipeline

Document received → Type detection (text vs image) → If image: OCR → text extraction → Indexing for full-text search → (Optional) AI agent → structured data extraction → Data available in the system

Intelligent data extraction

The AI layer on top of OCR allows you to go beyond text extraction: it interprets the content and generates structured data usable by the system.

Supported document types

Document typeExtracted data
InvoicesIssuer, tax ID, date, line items, amounts, VAT, total
Delivery notesSupplier, products, quantities, delivery date
ID / PassportFirst name, last name, document number, date of birth, nationality
ContractsParties, subject matter, main conditions, dates, key clauses
FormsFilled fields with their values (adaptive structure)
ReceiptsEstablishment, date, amounts, payment method
CertificatesIssuing entity, beneficiary, certified data, validity period

Confidence level

Each extracted piece of data includes a confidence level (0-1) indicating the reliability of the extraction:

  • 0.90 - 1.00 — high confidence, reliable data
  • 0.70 - 0.89 — medium confidence, review recommended
  • < 0.70 — low confidence, manual review required

BiVelio automatically flags fields with confidence below 0.70 so that an operator can review them before using the data.

Integration with the system

Data extracted by OCR is automatically integrated with BiVelio modules:

With cases

  • Documents attached to cases are processed automatically
  • Extracted data is associated with the case as metadata
  • Full-text search across OCR documents from within the case

With CRM

  • Contact data extracted from business cards or documents is suggested for creating contacts
  • Identity documents linked to contact records

With billing

  • Supplier invoices processed automatically
  • Tax data extracted and verified
  • Suggested accounting entry based on extracted data

With workflows

  • The Data Extractor invocable agent can be used in workflow nodes
  • Automated batch document processing
  • Validation of extracted data via Decision nodes

Credit consumption

OperationCredits
Basic OCR (text extraction)1 credit
OCR + structured extraction (AI agent)2-5 credits
Batch processing (>10 documents)20% credit discount

Supported languages

The OCR engine supports multiple languages for text extraction:

  • Spanish, Catalan, English, French, Portuguese, Italian, German
  • Automatic language detection from the document
Last updated on