Skip to main content
Back
API Integrations · Dev Tools

Zoho CRM & Books Migration Pipeline

Automated pipeline migrating thousands of historical invoices from Excel into Zoho Books and Zoho CRM — with LLM validation and 95%+ time saving.

PythonZoho APISQLiteLLMAutomationopenpyxl

End-to-end automation pipeline that parses years of historical Excel invoices and migrates them into Zoho Books (accounting) and Zoho CRM (sales) — including LLM-powered validation, smart VAT classification, and a full 7-sheet audit report.

Tech Stack

LayerTools
LanguagePython 3.10+
APIsZoho Books API · Zoho CRM API v3 (OAuth 2.0)
StorageSQLite — persistent state, resume after crash
Parsingopenpyxl · pandas · nameparser · thefuzz · phonenumbers · pgeocode
LLM ValidationOllama (local) · Qwen 2.5 14B
Output7-sheet Excel audit report

Scale

ThousandsInvoices Migrated
Multi-yearData History
95%+Time Saved vs Manual
7Report Sheets

Data Flow

Raw Company Folders (Excel + PDFs)
        │
        ▼
  excel_parser.py ──── smart_validators.py
  (dynamic table        (nameparser, thefuzz,
   detection)            phonenumbers, pgeocode)
        │
        ▼
  SQLite State DB  ◄──  llm_validator.py (optional)
        │                (Ollama · Qwen 2.5 14B)
        ├──▶ zoho_books_api.py ──▶ Customers + Invoices + Payments
        └──▶ zoho_crm_api.py  ──▶ Accounts + Contacts + Deals + PDFs

Pipeline Phases

  1. 01Extract — scan company folders, parse invoices dynamically (no fixed row numbers), detect payment status from file presence
  2. 02Validate — nameparser (0.70 threshold) for contact detection, phonenumbers for cleanup, pgeocode for postcode cross-check, thefuzz for company dedup flagging
  3. 03LLM Review (optional) — Qwen 2.5 14B reviews address classification, contact vs business name, missing fields, and billing sponsor detection
  4. 04Zoho Books Upload — create customers, line-item invoices with per-item VAT classification (Adult 20% · Youth zero-rated), record payments
  5. 05Zoho CRM Upload — create accounts, contacts, deals with stage logic (Paid→Closed Won · Unpaid >30d→Closed Lost), attach PDFs
  6. 06Report — export 7-sheet Excel workbook: overview, all invoices, flags, LLM results, financials math check, billing sponsors, dry-run plan

API Client — Rate Limiting & Token Auto-Refresh

The CRM client handles OAuth token expiry mid-run automatically and enforces a rate limit to stay within Zoho's API quota.

class RateLimiter:
    def __init__(self, max_per_minute: int = 80):
        self.interval = 60.0 / max_per_minute
        self.last_call = 0.0

    def wait(self):
        elapsed = time.time() - self.last_call
        if (gap := self.interval - elapsed) > 0:
            time.sleep(gap)
        self.last_call = time.time()


class ZohoCRMAPI:
    def __init__(self):
        self.access_token  = os.getenv('ZOHO_CRM_ACCESS_TOKEN')
        self._refresh_token = os.getenv('ZOHO_CRM_REFRESH_TOKEN')
        self._client_id    = os.getenv('ZOHO_CRM_CLIENT_ID') or os.getenv('ZOHO_CLIENT_ID')
        self._client_secret = os.getenv('ZOHO_CRM_CLIENT_SECRET') or os.getenv('ZOHO_CLIENT_SECRET')
        self.base_url      = f"https://www.zohoapis.{os.getenv('ZOHO_CRM_DOMAIN', 'com')}/crm/v3"
        self.rate_limiter  = RateLimiter(max_per_minute=80)

    def _request(self, method, endpoint, **kwargs):
        self.rate_limiter.wait()
        response = requests.request(method, f"{self.base_url}/{endpoint}", **kwargs)
        if response.status_code == 401:       # token expired mid-run
            self._do_token_refresh()
            response = requests.request(method, f"{self.base_url}/{endpoint}", **kwargs)
        return response

LLM Validation Design

The LLM reviews parser output for language and classification tasks only — it never touches prices, totals, or company names (those are deterministic).

class LLMValidator:
    """
    Design: LLM REVIEWS parser output, doesn't replace it.
    - Handles: address classification, contact vs business name,
               missing phone/email, billing sponsor detection
    - Never touches: company_name, prices, quantities, totals
    - Returns corrections + confidence; never auto-overwrites
    - Degrades gracefully if Ollama is unreachable
    """

    def validate_invoice(self, parsed_data: dict, raw_texts: list) -> dict:
        checks = [
            self._check_address_fields,
            self._check_contact_vs_business,
            self._check_missing_contact_info,
            self._check_billing_sponsor,
        ]
        results = {}
        for check in checks:
            results.update(check(parsed_data, raw_texts))
        return results

Resilience