By James Aspinwall, co-written by Alfred Pennyworth (my trusted AI) — March 7, 2026, 07:57
Lablup makes AI infrastructure accessible. Their platform, Backend.AI, was the first NVIDIA DGX-Ready Software in Asia Pacific — a container-based MLOps platform with patented fractional GPU virtualization that lets organizations slice a single GPU into multiple isolated workloads without code changes. Founded in 2015 in Seoul by a team of physicists and computer scientists, the company now serves 100+ organizations across telecommunications, finance, healthcare, manufacturing, government, and education — including Samsung Electronics, Samsung Medical Center, KT Cloud, LG Electronics, Bank of Korea, and Hyundai Mobis.
In January 2026, Lablup was selected as part of the only all-startup consortium for Korea’s Sovereign AI Foundation Model project, deploying Backend.AI on SK Telecom’s “Haein” cluster with 504 NVIDIA B200 GPUs — where it reduced downtime by 47% and completed a 102-billion-parameter model’s 20 trillion token pre-training in 66 days, 40% faster than projected. They unveiled Backend.AI:GO at CES 2026 — a desktop application running small language models offline on consumer PCs.
This is a company building the GPU infrastructure layer for the AI economy. WorkingAgents operates at the operational layer above it. The two connect in ways that matter.
What Lablup Does
Lablup builds the platform that sits between raw GPU hardware and the people who need to use it. You have GPUs — a DGX cluster, a rack of H100s, a mix of consumer RTX cards and datacenter accelerators — and you need multiple teams, projects, and workloads sharing them efficiently. Backend.AI manages that.
The Core Technology: fGPU
This is Lablup’s differentiator. fGPU (fractional GPU virtualization) is a patented CUDA virtualization layer that partitions GPUs at the container level. An H100 with 80GB of VRAM becomes ten 8GB virtual GPUs, each isolated and allocated to different users or workloads.
Key details:
- Container-level partitioning — No code changes required. Applications use CUDA normally; fGPU intercepts and regulates at the API level
- Patented in Korea, US, and Japan
- CUDA 8-12 support — Works across generations
- Consumer to datacenter — RTX desktop GPUs through DGX systems
- First released September 2018 — “global-first fractional GPU”
- First commercial inference October 2023 — “world-first fractional GPU-based commercial AI inference”
This solves a real problem. A researcher training a small model does not need an entire H100. Without fractional GPU, that H100 sits mostly idle during their session. With fGPU, five researchers share it simultaneously, each getting the memory and compute they need, isolated from each other.
The Platform
Backend.AI — The core infrastructure platform. Container-based session management with pluggable accelerator support:
| Component | Function |
|---|---|
| Manager | Central orchestrator — session scheduling, REST/GraphQL API routing |
| Agent | Per-node controller — container lifecycles, resource monitoring |
| Storage Proxy | Unified virtual folder interface with real-time metrics |
| Web Server | SPA hosting and session management |
| App Proxy | Service routing for in-container apps (Jupyter, VSCode, terminals) |
Sokovan Scheduler — Multi-tier orchestration with NUMA-aware resource mapping. Extended Dominant Resource Fairness (DRF) algorithms distribute compute across mixed GPU clusters. Supports 11+ accelerator vendors: NVIDIA CUDA, AMD ROCm, Intel Gaudi, Google TPU, Graphcore IPU, FuriosaAI NPU, Rebellions, Tenstorrent, HyperAccel.
Backend.AI FastTrack 3 — Enterprise MLOps pipeline platform. DAG-based workflows for data preprocessing, model training, validation, deployment, and monitoring. Drag-and-drop GUI plus CLI. Airflow integration for existing users.
PALI Service Suite — Model inference launcher supporting Llama, Gemma, Qwen with card-based UI. Includes PALANG for fine-tuning and chatbot comparison.
Backend.AI Continuum — Distributed fault tolerance. Seamlessly bridges cloud and on-premises environments, automatically switching to local resources during cloud outages.
Backend.AI:GO — Desktop application running small language models offline on consumer PCs. Document analysis, image understanding, code review — all without an internet connection.
Deployment Options
| Mode | Description |
|---|---|
| On-Premises | Full platform on your hardware, including air-gapped environments |
| Public Cloud | AWS, Azure, GCP, Naver Cloud |
| Hybrid | Cloud + on-prem with automatic failover (Continuum) |
| Air-gapped | Offline installer packages, private package repository (Reservoir) |
Accelerator Support
Backend.AI is hardware-agnostic in a way most competitors are not:
| Vendor | Accelerator |
|---|---|
| NVIDIA | CUDA (desktop RTX through DGX) |
| AMD | ROCm |
| Intel | Gaudi |
| TPU | |
| Graphcore | IPU |
| FuriosaAI | NPU |
| Rebellions | NPU |
| Tenstorrent | NPU |
| HyperAccel | NPU |
Eleven accelerator vendors on one platform. Most MLOps competitors support NVIDIA only.
The Numbers
| Lablup | Value |
|---|---|
| Founded | April 2015 |
| Headquarters | Seoul, South Korea (Gangnam-gu) |
| US Office | San Jose, California |
| Total funding | $9.64M |
| Series A | $7.89M (April 2023) |
| Lead investors | K2 Investment Partners, LB Investment |
| Revenue (2024) | $1.5M |
| Profitable since | 2020 |
| Organizations served | 100+ |
| Team size | ~40 |
| GitHub stars | 623 |
| GPU patents | Korea, US, Japan |
| Accelerator vendors | 11+ |
| DGX-Ready | First in APAC (August 2021) |
| Sovereign AI cluster | 504 NVIDIA B200 GPUs |
| Model trained | 102B parameters, 20T tokens, 66 days |
The Sovereign AI Story
In January 2026, Lablup joined the Upstage-led consortium — the only all-startup team — selected for Phase 1 of Korea’s Sovereign AI Foundation Model project. The consortium includes Upstage (lead, model development), Lablup (GPU infrastructure), Flitto (data preprocessing), and Nota (model compression). Academic partners span Sogang University, KAIST, Stanford, and NYU.
Lablup deployed Backend.AI on SK Telecom’s “Haein” GPU cluster:
- 504 NVIDIA B200 GPUs — 480 for pre-training, 24 backup
- 47% reduction in downtime through automated failure detection and recovery
- 20 trillion tokens pre-trained in 66 days (projected: 120 days — 40% faster)
- 102B-parameter Mixture of Experts model (12B activated during inference)
- Open license nearly equivalent to Apache for commercial use
This is Backend.AI operating at national scale on current-generation hardware, managing the most demanding workloads in AI — large-scale pre-training — and delivering ahead of schedule.
Competitive Position
Backend.AI’s direct competitors include Run:ai (acquired by NVIDIA in 2024), Determined AI (acquired by HPE), ClearML, and Domino Data Lab. Two of the four have been acquired, which narrows the independent options.
Lablup’s differentiators:
- Only APAC-native DGX-Ready Software — First-mover in Korean, Japanese, and Southeast Asian enterprise markets
- Patented fractional GPU — Container-level virtualization, not hardware-dependent partitioning (MIG/MPS). Works on consumer GPUs, not just datacenter SKUs
- 11+ accelerator vendors — Broadest hardware support. Critical for organizations with mixed GPU fleets or planning multi-vendor strategies
- Air-gapped deployment — Offline installer packages for classified and regulated environments
- Open-source core — LGPL-3.0 server, MIT clients. Enterprise features layered on top
- Profitable since 2020 — Sustainable business at 40 people, not a cash-burning scale-up
Why This Matters for WorkingAgents
Lablup manages GPU infrastructure. WorkingAgents manages operational workflows. Lablup decides which GPU runs which workload. WorkingAgents decides which task runs when, who gets notified, and what happens if something fails.
The gap between “your model is trained” and “the right person knows, the next step is scheduled, and the audit trail is complete” — that gap is where WorkingAgents lives.
The Synergy Map
1. Operational Orchestration for ML Pipelines
Backend.AI FastTrack 3 manages the computational side of ML pipelines — DAGs for preprocessing, training, validation, deployment. But ML pipelines generate operational needs that go beyond compute orchestration:
- Training completes → Who gets notified? WorkingAgents’ Pushover integration alerts the ML engineer on their phone
- Validation fails → Create a task. WorkingAgents’ task manager assigns the investigation, sets a deadline, tracks resolution
- Model deployed → Schedule a performance review in 7 days. WorkingAgents’ alarm system fires and triggers an evaluation workflow
- GPU quota exceeded → Escalate. WorkingAgents creates an escalation chain: notify the team lead → if no response in 2 hours, notify the manager → log the entire chain
Backend.AI handles: “run this training job on these GPUs with this data.” WorkingAgents handles: “when it finishes, tell these people, schedule these follow-ups, and escalate if nobody responds.”
FastTrack manages the ML lifecycle. WorkingAgents manages the human lifecycle around it.
2. GPU Resource Management + Scheduling
Backend.AI’s Sokovan scheduler allocates GPUs in real time based on demand. But GPU resource management has temporal dimensions that a real-time scheduler does not handle:
- Scheduled training windows — “Run this training job every night at 2 AM when the cluster is least loaded.” WorkingAgents’ alarm system schedules the trigger. Backend.AI’s scheduler allocates the GPUs.
- Recurring evaluations — “Re-evaluate model performance against production data every Monday.” WorkingAgents schedules. Backend.AI executes.
- Resource recovery follow-ups — Backend.AI automatically reclaims idle GPUs (0% utilization, low CPU, no network). WorkingAgents can schedule a check: “If this user’s session was reclaimed 3 times this week, notify the admin to discuss resource allocation.”
- Quota management — WorkingAgents tracks GPU usage per team per month in its per-user databases. Alarm fires when a team approaches their quota limit.
Backend.AI schedules compute. WorkingAgents schedules everything else.
3. Air-Gapped and Sovereign Deployment Alignment
Both platforms deploy in environments where no data leaves the perimeter:
| Requirement | Backend.AI | WorkingAgents |
|---|---|---|
| Air-gapped deployment | Offline installer packages, Reservoir | Elixir/BEAM, zero cloud dependencies |
| On-premises | Full platform on customer hardware | Per-user SQLite on local storage |
| Data isolation | Container-level isolation, fGPU | Per-user databases, encrypted keys |
| No external calls | Private package repository | No external API requirements |
| Classified environments | DGX-Ready for government systems | Access control with audit trails |
Lablup is already in Korea’s Sovereign AI project, deploying on classified infrastructure with 504 B200 GPUs. WorkingAgents was designed for the same constraints — per-user data isolation, access control at every tool call, crash-recoverable scheduling, all running on Elixir/BEAM with zero cloud dependencies.
A sovereign AI deployment needs both: Backend.AI to manage the GPU infrastructure, WorkingAgents to manage the operational workflows — training schedules, access permissions, escalation chains, audit trails — all inside the air gap.
4. Multi-Team Access Control
Backend.AI manages compute access — which users get which GPUs, how much memory, for how long. WorkingAgents manages operational access — which users can trigger which tools, who can see which data, who can escalate to whom.
In a multi-team AI research environment:
- Backend.AI controls: Team A gets 4 H100s for training. Team B gets 2 A100s for inference. The intern gets 8GB of a shared T4.
- WorkingAgents controls: Team A’s lead can schedule overnight training runs. Team B can query model performance dashboards. The intern can view task boards but cannot trigger deployments.
Both platforms enforce permissions, but at different layers. Backend.AI gates compute resources. WorkingAgents gates operational actions. Combined, every resource and every action is permission-gated.
5. Model Deployment Lifecycle Orchestration
Backend.AI FastTrack handles the technical deployment — containerize, serve, monitor. WorkingAgents handles the business process around deployment:
Model training completes on Backend.AI
→ WorkingAgents creates review task for ML lead
→ If not reviewed in 24 hours → Pushover notification
→ ML lead approves → WorkingAgents triggers deployment via Backend.AI API
→ Alarm schedules performance check in 48 hours
→ Performance check runs → if degradation detected
→ WorkingAgents creates rollback task
→ Escalates to team lead
→ Logs entire chain with timestamps
Every step is persistent. Every alarm survives restarts. Every escalation has an audit trail. This is the operational governance that turns model deployment from a technical event into a managed business process.
6. CRM and Contact Management for AI Teams
Lablup serves 100+ organizations. Each organization has ML engineers, data scientists, team leads, administrators, and executives who interact with the GPU platform. WorkingAgents’ NIS (CRM) module manages these relationships:
- Contact tracking — Who are the key stakeholders at each customer? When was the last interaction?
- Follow-up scheduling — “Check in with Samsung Medical Center about their GPU utilization in 2 weeks”
- Pipeline management — Track prospects from initial demo to deployment to expansion
- Interaction logging — Record every meeting, email, and support call with timestamps
For Lablup’s sales and customer success teams, WorkingAgents provides the operational CRM that keeps customer relationships active and accountable — with per-user data isolation ensuring each team member’s contacts and notes are private.
7. Backend.AI:GO + WorkingAgents: The Edge AI Stack
Backend.AI:GO runs small language models offline on consumer PCs. WorkingAgents runs on Elixir/BEAM anywhere. Together, they create an edge AI stack:
- GO provides the inference — document analysis, code review, image understanding, all offline
- WorkingAgents provides the orchestration — schedule recurring analysis tasks, track results, send notifications when anomalies are detected, maintain audit logs
- Both work offline — No cloud dependency for either component
For regulated industries where data cannot leave the premises — healthcare facilities, financial institutions, government agencies — this combined edge stack provides AI capabilities with full operational control.
8. NVIDIA Ecosystem Connection
Both companies operate deep in the NVIDIA ecosystem:
- Lablup: DGX-Ready Software, GTC multi-year sponsor (Silver Sponsor at GTC 2025, Booth #547), Inception Program member, B200 cluster deployments
- WorkingAgents: Designed for deployment alongside NVIDIA infrastructure, compatible with NVIDIA NIM microservices, positioned for sovereign AI deployments on NVIDIA hardware
Lablup and WorkingAgents at GTC represent complementary layers: Lablup manages NVIDIA’s GPUs, WorkingAgents manages the operational workflows that surround GPU workloads. A joint demo — “fractional GPU training with automated scheduling, notifications, and escalation” — speaks directly to GTC’s enterprise audience.
The Partnership Path
Phase 1: API Integration
Connect WorkingAgents to Backend.AI’s REST/GraphQL API. WorkingAgents’ alarm system triggers training jobs, monitors completion, and manages follow-up workflows. Backend.AI events feed into WorkingAgents’ task manager for human-in-the-loop tracking.
Phase 2: Operational Layer for FastTrack
Position WorkingAgents as the business process layer around FastTrack 3’s ML pipelines. Training completion → notification → review → deployment approval → monitoring — all orchestrated by WorkingAgents with audit trails and access control.
Phase 3: Sovereign AI Reference Stack
Combine Backend.AI’s air-gapped GPU management with WorkingAgents’ air-gapped operational orchestration for Korea’s sovereign AI infrastructure. Both platforms deploy on-premises with zero cloud dependencies. Co-publish the reference architecture for government and regulated industry deployments.
Phase 4: APAC Market Entry
Lablup has deep enterprise relationships across Korea, Japan, and Southeast Asia — Samsung, KT, LG, Hyundai, Bank of Korea. WorkingAgents enters these markets as the operational companion to Backend.AI deployments. Same customers, different layer, one integrated stack.
The Bottom Line
Lablup built the GPU management layer — fractional virtualization that turns expensive hardware into shared, efficient infrastructure. WorkingAgents built the operational management layer — scheduling, task tracking, notifications, and escalation that turns compute events into managed business processes.
Backend.AI tells you which GPU is running which workload. WorkingAgents tells you who needs to know, what should happen next, and what to do when something goes wrong. Backend.AI’s fGPU slices a GPU into ten workloads. WorkingAgents’ alarm system ensures all ten workloads have follow-up schedules, completion notifications, and escalation chains.
Both platforms deploy on-premises, both work air-gapped, both enforce access control, both maintain audit trails. The convergence is not just technical — it is philosophical. Lablup’s motto is “Make AI Accessible.” WorkingAgents’ design principle is “Make AI Operational.” Accessible compute plus operational orchestration equals AI that works in production, not just in notebooks.
They built the GPU layer. We built the workflow layer. Both run inside the same locked room.
Sources: