API Integrations · Dev Tools
Zoho CRM & Books Migration Pipeline
Automated pipeline migrating thousands of historical invoices from Excel into Zoho Books and Zoho CRM — with LLM validation and 95%+ time saving.
PythonZoho APISQLiteLLMAutomationopenpyxl
End-to-end automation pipeline that parses years of historical Excel invoices and migrates them into Zoho Books (accounting) and Zoho CRM (sales) — including LLM-powered validation, smart VAT classification, and a full 7-sheet audit report.
Tech Stack
| Layer | Tools |
|---|---|
| Language | Python 3.10+ |
| APIs | Zoho Books API · Zoho CRM API v3 (OAuth 2.0) |
| Storage | SQLite — persistent state, resume after crash |
| Parsing | openpyxl · pandas · nameparser · thefuzz · phonenumbers · pgeocode |
| LLM Validation | Ollama (local) · Qwen 2.5 14B |
| Output | 7-sheet Excel audit report |
Scale
ThousandsInvoices Migrated
Multi-yearData History
95%+Time Saved vs Manual
7Report Sheets
Data Flow
Raw Company Folders (Excel + PDFs)
│
▼
excel_parser.py ──── smart_validators.py
(dynamic table (nameparser, thefuzz,
detection) phonenumbers, pgeocode)
│
▼
SQLite State DB ◄── llm_validator.py (optional)
│ (Ollama · Qwen 2.5 14B)
├──▶ zoho_books_api.py ──▶ Customers + Invoices + Payments
└──▶ zoho_crm_api.py ──▶ Accounts + Contacts + Deals + PDFs
Pipeline Phases
- 01Extract — scan company folders, parse invoices dynamically (no fixed row numbers), detect payment status from file presence
- 02Validate — nameparser (0.70 threshold) for contact detection, phonenumbers for cleanup, pgeocode for postcode cross-check, thefuzz for company dedup flagging
- 03LLM Review (optional) — Qwen 2.5 14B reviews address classification, contact vs business name, missing fields, and billing sponsor detection
- 04Zoho Books Upload — create customers, line-item invoices with per-item VAT classification (Adult 20% · Youth zero-rated), record payments
- 05Zoho CRM Upload — create accounts, contacts, deals with stage logic (Paid→Closed Won · Unpaid >30d→Closed Lost), attach PDFs
- 06Report — export 7-sheet Excel workbook: overview, all invoices, flags, LLM results, financials math check, billing sponsors, dry-run plan
API Client — Rate Limiting & Token Auto-Refresh
The CRM client handles OAuth token expiry mid-run automatically and enforces a rate limit to stay within Zoho's API quota.
class RateLimiter:
def __init__(self, max_per_minute: int = 80):
self.interval = 60.0 / max_per_minute
self.last_call = 0.0
def wait(self):
elapsed = time.time() - self.last_call
if (gap := self.interval - elapsed) > 0:
time.sleep(gap)
self.last_call = time.time()
class ZohoCRMAPI:
def __init__(self):
self.access_token = os.getenv('ZOHO_CRM_ACCESS_TOKEN')
self._refresh_token = os.getenv('ZOHO_CRM_REFRESH_TOKEN')
self._client_id = os.getenv('ZOHO_CRM_CLIENT_ID') or os.getenv('ZOHO_CLIENT_ID')
self._client_secret = os.getenv('ZOHO_CRM_CLIENT_SECRET') or os.getenv('ZOHO_CLIENT_SECRET')
self.base_url = f"https://www.zohoapis.{os.getenv('ZOHO_CRM_DOMAIN', 'com')}/crm/v3"
self.rate_limiter = RateLimiter(max_per_minute=80)
def _request(self, method, endpoint, **kwargs):
self.rate_limiter.wait()
response = requests.request(method, f"{self.base_url}/{endpoint}", **kwargs)
if response.status_code == 401: # token expired mid-run
self._do_token_refresh()
response = requests.request(method, f"{self.base_url}/{endpoint}", **kwargs)
return response
LLM Validation Design
The LLM reviews parser output for language and classification tasks only — it never touches prices, totals, or company names (those are deterministic).
class LLMValidator:
"""
Design: LLM REVIEWS parser output, doesn't replace it.
- Handles: address classification, contact vs business name,
missing phone/email, billing sponsor detection
- Never touches: company_name, prices, quantities, totals
- Returns corrections + confidence; never auto-overwrites
- Degrades gracefully if Ollama is unreachable
"""
def validate_invoice(self, parsed_data: dict, raw_texts: list) -> dict:
checks = [
self._check_address_fields,
self._check_contact_vs_business,
self._check_missing_contact_info,
self._check_billing_sponsor,
]
results = {}
for check in checks:
results.update(check(parsed_data, raw_texts))
return results
Resilience
- Crash recovery — SQLite tracks every item; re-runs skip completed records
- Retry logic — 429 waits for
Retry-After; 5xx uses exponential backoff (2s → 4s → 8s) - Dry-run mode — simulates all API calls, generates the full plan report without touching Zoho
- 97–98% parse accuracy — English and French invoices handled; LLM fills gaps on failed extractions