Banner

PHILIP CHEUNG

Head of AI · LLM Orchestration & Decision Runtime Architect · Multi-Product AI Platform

school MSc Computer Science, AI & Data Science (Merit)
location_on London, United Kingdom
contact_mail Contact via form
language www.philip.pm
event_available Book a 30-min chat
Philip Cheung
person

Professional Summary

Head of AI at Weidmann & Cie. AG, owning the firm's AI platform, LLM orchestration and decision-runtime engineering across a multi-product portfolio: Qorinix (regulated AI decision runtime), LaSpend (AI financial intelligence) and Fixxmi (AI-powered Swiss service marketplace), plus the institutional AI-native trading desk. MSc Computer Science (AI & Data Science, Merit), University of Wolverhampton.

Shipping production LLM systems end-to-end: multi-provider routing across OpenAI, Anthropic Claude, Google Gemini, DeepSeek, Qwen, OpenRouter, NVIDIA and Cloudflare with TTFT p50/p95 analytics, cost ledger and automatic fail-over; audit-grade RAG with policy guardrails, evidence packs and human-review escalation; workflow-aware state machines for approval, affordability and exception-handling; distributed inference mesh serving millions of predictions daily at sub-100ms latency on AWS, GCP and Cloudflare.

Led design, training and deployment of the in-house Qorinix LLM, a domain-tuned model for regulated, latency-sensitive decision support. Qorinix delivers 5x to 60x faster TTFT and response latency than mainstream frontier LLMs on comparable workloads, with guardrailed reasoning and evidence-packed output running fully inside the firm's perimeter.

MSc dissertation on LLM-Augmented High-Frequency Trading (65% Sharpe uplift over baseline). Harvard CS50x, Google MLOps and Gemini Certified Educator. Also concurrently serves as Head of Quants, the rare engineer who builds production LLM platforms and institutional-grade trading runtimes from the same keyboard, turning AI research into audit-safe, revenue-bearing product.

emoji_events

Notable Achievements

hub AI Platform & LLM Orchestration

  • Multi-provider LLM router with TTFT p50/p95 analytics, automatic fail-over and per-call cost ledger
  • Shipped audit-grade decision runtime: policy engine, evidence packs, guardrails, human-review gate
  • Sub-100ms inference mesh serving millions of predictions daily across signal, sentiment & anomaly

widgets Multi-Product AI Delivery

  • Qorinix LLM: in-house frontier model, 5x-60x faster TTFT on regulated workloads
  • LaSpend: AI-driven subscription & affordability intelligence with deterministic fallback
  • Fixxmi: AI-powered Swiss B2C service marketplace with intelligent lead matching

engineering Technical Leadership

  • Set AI/ML engineering standards: prompt registry, model evaluation, canary deploy, drift monitoring
  • Built four production systems in parallel on AWS · GCP · Cloudflare with compliance rigour
  • Mentored a team of 6+ engineers across LLM, backend, frontend and MLOps disciplines

school Research & Credentials

  • MSc dissertation: LLM-Augmented High-Frequency Trading, 65% Sharpe-ratio uplift over baseline
  • Gemini Certified Educator (Google) · Harvard CS50x · Google MLOps Specialization
  • 10+ years systematic-trading stewardship informs regulated, latency-critical AI design
psychology

Core Technical Competencies

smart_toy LLM & Generative AI
GPT-5 Claude Gemini OpenRouter DeepSeek Qwen NVIDIA NIM Function Calling MCP Hugging Face
hub AI Platform Architecture
LLM Router Multi-Provider Fallback Prompt Registry Policy Engine Guardrails Audit Ledger Evidence Pack TTFT p50/p95 Cost Governance
account_tree Agent & Workflow AI
Workflow State Machines RAG LangChain Human-Review Gate Case Runtime Tool Composition Memory Tiers Eval Harness Canary Deploy
memory ML & Deep Learning
PyTorch TensorFlow Transformers Fine-tuning Quantisation LSTM/GRU XGBoost RL MLOps
code Programming & Frameworks
Python TypeScript Next.js 15 React 19 FastAPI Hono Node.js C++ Rust Tailwind
storage Data, Storage & Retrieval
PostgreSQL pgvector Redis TimescaleDB Firestore D1 R2 Pinecone Weaviate
cloud Cloud & DevOps
AWS GCP Cloudflare Workers Firebase Vercel Docker Kubernetes OpenTelemetry
receipt_long Product, Billing & Quant
Stripe Entitlements Usage Ledger Webhooks Open Banking HFT Market Making FIX API CCXT
Unified LLM × HFT Tech Stack
work

Professional Experience

Head of AI · LLM Orchestration & Decision Runtime Architect

Weidmann & Cie. AG
location_on London, UK calendar_today December 2024 – Present

Owning AI platform engineering, LLM orchestration and decision-runtime delivery across the firm's multi-product portfolio: Qorinix (regulated AI decision runtime), LaSpend (AI financial intelligence), Fixxmi (AI-powered Swiss service marketplace), plus the firm's AI-native trading desk. Architect of the in-house Qorinix LLM.

AI Platform & Multi-Provider LLM Orchestration
  • Built an internal multi-provider LLM router with cost / latency / risk policy and automatic fail-over across OpenAI, Anthropic, Google Gemini, DeepSeek, Qwen, OpenRouter, NVIDIA and Cloudflare; sub-100ms TTFT hot path with tiered routing across fast, reasoning and fallback model tiers
  • Designed a prompt registry with version pinning, policy prompts and evidence-pack assembly; every execution records prompt version, model route, policy checks, cost and support lineage into a tamper-evident audit ledger
  • Shipped the LLM Arena benchmark harness comparing 12 inference providers side-by-side with TTFT p50/p95, tokens/sec, cost-per-task leaderboard and streaming token-by-token diff
  • MLOps discipline: prompt/eval harness, canary deployment, drift monitoring, per-strategy cost ledger, multi-region active-active deployment with health-weighted failover
Qorinix LLM & Regulated Decision Runtime
  • Led end-to-end design, training and deployment of the in-house Qorinix LLM, a domain-tuned model for regulated decision support; curated a proprietary multi-billion-token corpus with mixture-of-experts routing across policy reasoning, affordability analysis, risk critique and compliance review
  • 5x to 60x faster TTFT vs mainstream frontier LLMs on comparable workloads through custom inference kernels, speculative decoding and domain fine-tuning; runs fully inside the firm's own perimeter
  • Built the Qorinix workflow-aware decision runtime: case state machine (draft → evaluating → review_required → approved/rejected), context governance (PII redaction, evidence-pack retrieval, token-budget control), guardrails (policy-conflict, hallucination, evidence-basis checks) and human-in-the-loop escalation for high-risk cases
  • Policy-based approval assist, subscription affordability intelligence, cash-flow stress early warning and exception-handling orchestration; billing, usage, audit and entitlements are P0 day-one, not bolt-ons
LaSpend & Fixxmi, Consumer-Facing AI Products
  • LaSpend (AI financial intelligence, UK market): designed the AI-orchestration layer for subscription detection, affordability scoring and cash-flow stress alerts over Open Banking data, with deterministic fallback when models fail and strict PII-minimisation into the LLM; Cloudflare Pages + Functions + D1 architecture; Safety-before-Smartness and Cost-aware Routing governance
  • Fixxmi (Swiss-first B2C service marketplace): AI-powered lead-matching and intent classification across 12+ service categories in DE-CH/EN; Next.js 15 + Firebase + Cloud Functions; Firestore real-time admin dashboard; nFADP/GDPR-compliant data boundaries
  • Unified cross-product AI platform: shared prompt registry, eval suites, provider routing, usage ledger and Stripe entitlement plumbing so every product inherits audit-grade AI from day one
AI-Native Trading & Research Systems
  • Transformer-based predictive models for market-direction forecasting with NLP sentiment engine ingesting 500K+ articles / social / alt-data daily; RAG over proprietary research and earnings corpora with grounded citation
  • Distributed ML inference pipeline (<100ms latency) integrated with execution infra at Equinix LD4, LD6 & AWS London region; event-driven order-book reconstruction on Qorinix-annotated features
PythonTypeScriptPyTorchFastAPINext.js 15React 19CloudflareWorkersD1FirebasePostgreSQLpgvectorRedisLangChainRAGLLM RouterStripeOpenTelemetry

Lead AI Engineer & Technical Architect

Pacific Cloud Computing Ltd.
location_on Hong Kong & Remote UK calendar_today January 2015 – December 2024

Spearheaded AI transformation (Dec 2021 – Dec 2024), establishing the firm's AI/ML practice and production intelligent systems processing millions of predictions daily; concurrently led enterprise platform engineering across SaaS, analytics and integration products.

AI / ML Platform Leadership
  • Designed and deployed distributed ML inference system achieving <100ms latency for real-time predictions, serving 5M+ requests / day with warm-pool autoscaling and multi-region failover
  • Built comprehensive RAG system (LangChain + vector DB) reducing information retrieval time by 85% while maintaining 94% grounded-answer accuracy
  • Developed end-to-end MLOps pipeline with automated retraining, A/B testing and canary deployment, improving model performance 40% Q-over-Q
  • Implemented transformer-based sentiment analysis processing 500K+ documents daily across news, social and alt-data feeds
  • Established AI/ML best practices, conducted architecture reviews and mentored a team of 6 engineers across LLM, data and MLOps tracks
Enterprise Platforms (2015 – 2021)
  • Enterprise Document Management System: multi-tenant SaaS, 100+ clients, 1M+ daily API requests, 99.9% uptime
  • Analytics Dashboard Platform: real-time visualisation, 10GB+ daily data, 60% reduction in report generation time
  • E-commerce Integration Suite: multi-platform API layer, automated inventory sync, 5+ payment-gateway integrations
Applied Quantitative Research (Personal Programme)
  • Ran a personal systematic-trading programme throughout the tenure as an applied R&D lab for ML features, inference latency and backtesting rigour that fed back into the firm's AI platform patterns
  • Python predictive models, MT4/MT5 + CCXT backtesting automation, AI-driven feature discovery; authored investment-grade quantitative performance reports
PythonPyTorchLangChainPineconeFastAPIReactNode.jsMongoDBPostgreSQLDockerAWSMLOpsTransformersRAGA/B Testing

Senior Product Manager (Technical) / Web Manager

Groupon.com
location_on Hong Kong calendar_today April 2013 – December 2014
  • Led strategic account management and coordinated with international merchants for brand positioning
  • Developed and maintained project plans, cost estimation and resource allocation
  • Oversaw inventory management, supply chain coordination, and pricing structure
  • Conducted comprehensive market analysis to support business operations

Senior Operations Manager

SoManyCall Telecom
location_on Hong Kong calendar_today March 2008 – April 2013
  • Oversaw strategic and operational aspects within the telecommunications software sector
  • Achieved 25% annual growth rate through strategic leadership
  • Led multi-disciplinary team, implementing development programs
  • Managed entire project lifecycle for custom software solutions
rocket_launch

Research & Selected Projects

LLM Arena Benchmark
zoom_in Expand
LLM Arena, 12-Provider Benchmark LLM Infra

Side-by-side real-time benchmark harness across 12 frontier and fast-inference providers. Streaming token-by-token with live leaderboard, TTFT p50/p95 analytics, tokens/sec, estimated cost-per-task and service-tier selectors. Per-provider model picker, run-count (1-20) for statistical stability and quick TTFT-turbo profiles. Next.js 14 · TypeScript · server-side API isolation · streaming-SSE.

LaSpend AI Financial Intelligence
zoom_in Expand
LaSpend, AI Financial Intelligence Consumer AI

UK-market AI-driven subscription & affordability intelligence layered over Open Banking. AI-Orchestration with Safety-before-Smartness: deterministic fallback when models fail, cost-aware routing, prompt versioning with compliance post-check. PII-minimised inputs (whitelist fields only), informational-only outputs. Cloudflare Pages + Functions + D1 architecture, React 19 + Vite + Tailwind frontend.

Fixxmi Swiss Marketplace
zoom_in Expand
Fixxmi, AI Swiss Service Marketplace Consumer AI

Swiss-first B2C service-job lead marketplace with AI-powered intent classification and lead matching across 12+ categories in DE-CH / EN. Pay-per-lead monetisation with Stripe Credit Packs, nFADP / GDPR-compliant data boundaries, Firestore-backed real-time admin dashboard. Next.js 15 static export + Firebase Cloud Functions (Node 20) + Firestore europe-west6.

Audit-grade AI Governance Loop
zoom_in Expand
Audit-Grade AI Governance Loop Governance

Closed-loop flow applied across every firm AI product: policy check → evidence-pack retrieval → prompt control → model router → action runtime → audit ledger → policy feedback. Every call versioned, costed, support-traceable and replayable. Content-addressed evidence packs, per-product cost attribution, human-in-the-loop escalation for high-risk cases.

Sub-100ms Inference Mesh
zoom_in Expand
Sub-100ms Inference Mesh ML Infra

Distributed ML inference pipeline serving millions of predictions / day at <100ms TTFT across classification, sentiment, affordability scoring and anomaly detection. Shard-aware routing, warm-pool autoscaling, ONNX-compiled models with fp16 quantisation, cache-through Redis tiering, multi-region active-active with health-weighted failover. FastAPI · Redis · TimescaleDB · Triton.

LLM-Augmented HFT (MSc Research)
zoom_in Expand
LLM-Augmented HFT (MSc Research) Research

Novel architecture fusing LLM narrative understanding with real-time HFT. Custom transformer with attention heads over microstructure features, ensemble gating and latency-aware inference scheduling that down-routes heavy heads on tight time budgets. 65% Sharpe-ratio uplift over baseline on 5-year out-of-sample walk-forward evaluation. Python, PyTorch, CCXT, Ray Serve, Triton.

school

Education

psychology
MSc Computer Science, AI & Data Science
University of Wolverhampton, UK
2023 – 2025 | Grade: Merit
Dissertation: LLM-Augmented High-Frequency Trading Strategy Development
Coursework: Deep Learning, NLP, Machine Learning, Neural Networks, Distributed Systems, MLOps
account_balance
Bachelor of Business Administration
Hong Kong University of Science & Technology
1995
Marketing with Information Systems minor
verified

Certifications

auto_awesome
Gemini Certified Educator
Google | 2025–2028
account_balance
Securities & Futures Commission of Hong Kong Exams (Papers 1/7/8/12)
HKSI | 2021
terminal
CS50x Computer Science
Harvard | 2024
cloud
MLOps Specialization
Google Cloud | 2024
translate

Languages

English
Native proficiency
Cantonese
Native proficiency
Mandarin
Professional working proficiency