Lablup: The Fractional GPU Platform and Where WorkingAgents Fits

By James Aspinwall, co-written by Alfred Pennyworth (my trusted AI) — March 7, 2026, 07:57


Lablup makes AI infrastructure accessible. Their platform, Backend.AI, was the first NVIDIA DGX-Ready Software in Asia Pacific — a container-based MLOps platform with patented fractional GPU virtualization that lets organizations slice a single GPU into multiple isolated workloads without code changes. Founded in 2015 in Seoul by a team of physicists and computer scientists, the company now serves 100+ organizations across telecommunications, finance, healthcare, manufacturing, government, and education — including Samsung Electronics, Samsung Medical Center, KT Cloud, LG Electronics, Bank of Korea, and Hyundai Mobis.

In January 2026, Lablup was selected as part of the only all-startup consortium for Korea’s Sovereign AI Foundation Model project, deploying Backend.AI on SK Telecom’s “Haein” cluster with 504 NVIDIA B200 GPUs — where it reduced downtime by 47% and completed a 102-billion-parameter model’s 20 trillion token pre-training in 66 days, 40% faster than projected. They unveiled Backend.AI:GO at CES 2026 — a desktop application running small language models offline on consumer PCs.

This is a company building the GPU infrastructure layer for the AI economy. WorkingAgents operates at the operational layer above it. The two connect in ways that matter.

What Lablup Does

Lablup builds the platform that sits between raw GPU hardware and the people who need to use it. You have GPUs — a DGX cluster, a rack of H100s, a mix of consumer RTX cards and datacenter accelerators — and you need multiple teams, projects, and workloads sharing them efficiently. Backend.AI manages that.

The Core Technology: fGPU

This is Lablup’s differentiator. fGPU (fractional GPU virtualization) is a patented CUDA virtualization layer that partitions GPUs at the container level. An H100 with 80GB of VRAM becomes ten 8GB virtual GPUs, each isolated and allocated to different users or workloads.

Key details:

This solves a real problem. A researcher training a small model does not need an entire H100. Without fractional GPU, that H100 sits mostly idle during their session. With fGPU, five researchers share it simultaneously, each getting the memory and compute they need, isolated from each other.

The Platform

Backend.AI — The core infrastructure platform. Container-based session management with pluggable accelerator support:

Component Function
Manager Central orchestrator — session scheduling, REST/GraphQL API routing
Agent Per-node controller — container lifecycles, resource monitoring
Storage Proxy Unified virtual folder interface with real-time metrics
Web Server SPA hosting and session management
App Proxy Service routing for in-container apps (Jupyter, VSCode, terminals)

Sokovan Scheduler — Multi-tier orchestration with NUMA-aware resource mapping. Extended Dominant Resource Fairness (DRF) algorithms distribute compute across mixed GPU clusters. Supports 11+ accelerator vendors: NVIDIA CUDA, AMD ROCm, Intel Gaudi, Google TPU, Graphcore IPU, FuriosaAI NPU, Rebellions, Tenstorrent, HyperAccel.

Backend.AI FastTrack 3 — Enterprise MLOps pipeline platform. DAG-based workflows for data preprocessing, model training, validation, deployment, and monitoring. Drag-and-drop GUI plus CLI. Airflow integration for existing users.

PALI Service Suite — Model inference launcher supporting Llama, Gemma, Qwen with card-based UI. Includes PALANG for fine-tuning and chatbot comparison.

Backend.AI Continuum — Distributed fault tolerance. Seamlessly bridges cloud and on-premises environments, automatically switching to local resources during cloud outages.

Backend.AI:GO — Desktop application running small language models offline on consumer PCs. Document analysis, image understanding, code review — all without an internet connection.

Deployment Options

Mode Description
On-Premises Full platform on your hardware, including air-gapped environments
Public Cloud AWS, Azure, GCP, Naver Cloud
Hybrid Cloud + on-prem with automatic failover (Continuum)
Air-gapped Offline installer packages, private package repository (Reservoir)

Accelerator Support

Backend.AI is hardware-agnostic in a way most competitors are not:

Vendor Accelerator
NVIDIA CUDA (desktop RTX through DGX)
AMD ROCm
Intel Gaudi
Google TPU
Graphcore IPU
FuriosaAI NPU
Rebellions NPU
Tenstorrent NPU
HyperAccel NPU

Eleven accelerator vendors on one platform. Most MLOps competitors support NVIDIA only.

The Numbers

Lablup Value
Founded April 2015
Headquarters Seoul, South Korea (Gangnam-gu)
US Office San Jose, California
Total funding $9.64M
Series A $7.89M (April 2023)
Lead investors K2 Investment Partners, LB Investment
Revenue (2024) $1.5M
Profitable since 2020
Organizations served 100+
Team size ~40
GitHub stars 623
GPU patents Korea, US, Japan
Accelerator vendors 11+
DGX-Ready First in APAC (August 2021)
Sovereign AI cluster 504 NVIDIA B200 GPUs
Model trained 102B parameters, 20T tokens, 66 days

The Sovereign AI Story

In January 2026, Lablup joined the Upstage-led consortium — the only all-startup team — selected for Phase 1 of Korea’s Sovereign AI Foundation Model project. The consortium includes Upstage (lead, model development), Lablup (GPU infrastructure), Flitto (data preprocessing), and Nota (model compression). Academic partners span Sogang University, KAIST, Stanford, and NYU.

Lablup deployed Backend.AI on SK Telecom’s “Haein” GPU cluster:

This is Backend.AI operating at national scale on current-generation hardware, managing the most demanding workloads in AI — large-scale pre-training — and delivering ahead of schedule.

Competitive Position

Backend.AI’s direct competitors include Run:ai (acquired by NVIDIA in 2024), Determined AI (acquired by HPE), ClearML, and Domino Data Lab. Two of the four have been acquired, which narrows the independent options.

Lablup’s differentiators:

Why This Matters for WorkingAgents

Lablup manages GPU infrastructure. WorkingAgents manages operational workflows. Lablup decides which GPU runs which workload. WorkingAgents decides which task runs when, who gets notified, and what happens if something fails.

The gap between “your model is trained” and “the right person knows, the next step is scheduled, and the audit trail is complete” — that gap is where WorkingAgents lives.

The Synergy Map

1. Operational Orchestration for ML Pipelines

Backend.AI FastTrack 3 manages the computational side of ML pipelines — DAGs for preprocessing, training, validation, deployment. But ML pipelines generate operational needs that go beyond compute orchestration:

Backend.AI handles: “run this training job on these GPUs with this data.” WorkingAgents handles: “when it finishes, tell these people, schedule these follow-ups, and escalate if nobody responds.”

FastTrack manages the ML lifecycle. WorkingAgents manages the human lifecycle around it.

2. GPU Resource Management + Scheduling

Backend.AI’s Sokovan scheduler allocates GPUs in real time based on demand. But GPU resource management has temporal dimensions that a real-time scheduler does not handle:

Backend.AI schedules compute. WorkingAgents schedules everything else.

3. Air-Gapped and Sovereign Deployment Alignment

Both platforms deploy in environments where no data leaves the perimeter:

Requirement Backend.AI WorkingAgents
Air-gapped deployment Offline installer packages, Reservoir Elixir/BEAM, zero cloud dependencies
On-premises Full platform on customer hardware Per-user SQLite on local storage
Data isolation Container-level isolation, fGPU Per-user databases, encrypted keys
No external calls Private package repository No external API requirements
Classified environments DGX-Ready for government systems Access control with audit trails

Lablup is already in Korea’s Sovereign AI project, deploying on classified infrastructure with 504 B200 GPUs. WorkingAgents was designed for the same constraints — per-user data isolation, access control at every tool call, crash-recoverable scheduling, all running on Elixir/BEAM with zero cloud dependencies.

A sovereign AI deployment needs both: Backend.AI to manage the GPU infrastructure, WorkingAgents to manage the operational workflows — training schedules, access permissions, escalation chains, audit trails — all inside the air gap.

4. Multi-Team Access Control

Backend.AI manages compute access — which users get which GPUs, how much memory, for how long. WorkingAgents manages operational access — which users can trigger which tools, who can see which data, who can escalate to whom.

In a multi-team AI research environment:

Both platforms enforce permissions, but at different layers. Backend.AI gates compute resources. WorkingAgents gates operational actions. Combined, every resource and every action is permission-gated.

5. Model Deployment Lifecycle Orchestration

Backend.AI FastTrack handles the technical deployment — containerize, serve, monitor. WorkingAgents handles the business process around deployment:

Model training completes on Backend.AI
  → WorkingAgents creates review task for ML lead
  → If not reviewed in 24 hours → Pushover notification
  → ML lead approves → WorkingAgents triggers deployment via Backend.AI API
  → Alarm schedules performance check in 48 hours
  → Performance check runs → if degradation detected
    → WorkingAgents creates rollback task
    → Escalates to team lead
    → Logs entire chain with timestamps

Every step is persistent. Every alarm survives restarts. Every escalation has an audit trail. This is the operational governance that turns model deployment from a technical event into a managed business process.

6. CRM and Contact Management for AI Teams

Lablup serves 100+ organizations. Each organization has ML engineers, data scientists, team leads, administrators, and executives who interact with the GPU platform. WorkingAgents’ NIS (CRM) module manages these relationships:

For Lablup’s sales and customer success teams, WorkingAgents provides the operational CRM that keeps customer relationships active and accountable — with per-user data isolation ensuring each team member’s contacts and notes are private.

7. Backend.AI:GO + WorkingAgents: The Edge AI Stack

Backend.AI:GO runs small language models offline on consumer PCs. WorkingAgents runs on Elixir/BEAM anywhere. Together, they create an edge AI stack:

For regulated industries where data cannot leave the premises — healthcare facilities, financial institutions, government agencies — this combined edge stack provides AI capabilities with full operational control.

8. NVIDIA Ecosystem Connection

Both companies operate deep in the NVIDIA ecosystem:

Lablup and WorkingAgents at GTC represent complementary layers: Lablup manages NVIDIA’s GPUs, WorkingAgents manages the operational workflows that surround GPU workloads. A joint demo — “fractional GPU training with automated scheduling, notifications, and escalation” — speaks directly to GTC’s enterprise audience.

The Partnership Path

Phase 1: API Integration

Connect WorkingAgents to Backend.AI’s REST/GraphQL API. WorkingAgents’ alarm system triggers training jobs, monitors completion, and manages follow-up workflows. Backend.AI events feed into WorkingAgents’ task manager for human-in-the-loop tracking.

Phase 2: Operational Layer for FastTrack

Position WorkingAgents as the business process layer around FastTrack 3’s ML pipelines. Training completion → notification → review → deployment approval → monitoring — all orchestrated by WorkingAgents with audit trails and access control.

Phase 3: Sovereign AI Reference Stack

Combine Backend.AI’s air-gapped GPU management with WorkingAgents’ air-gapped operational orchestration for Korea’s sovereign AI infrastructure. Both platforms deploy on-premises with zero cloud dependencies. Co-publish the reference architecture for government and regulated industry deployments.

Phase 4: APAC Market Entry

Lablup has deep enterprise relationships across Korea, Japan, and Southeast Asia — Samsung, KT, LG, Hyundai, Bank of Korea. WorkingAgents enters these markets as the operational companion to Backend.AI deployments. Same customers, different layer, one integrated stack.

The Bottom Line

Lablup built the GPU management layer — fractional virtualization that turns expensive hardware into shared, efficient infrastructure. WorkingAgents built the operational management layer — scheduling, task tracking, notifications, and escalation that turns compute events into managed business processes.

Backend.AI tells you which GPU is running which workload. WorkingAgents tells you who needs to know, what should happen next, and what to do when something goes wrong. Backend.AI’s fGPU slices a GPU into ten workloads. WorkingAgents’ alarm system ensures all ten workloads have follow-up schedules, completion notifications, and escalation chains.

Both platforms deploy on-premises, both work air-gapped, both enforce access control, both maintain audit trails. The convergence is not just technical — it is philosophical. Lablup’s motto is “Make AI Accessible.” WorkingAgents’ design principle is “Make AI Operational.” Accessible compute plus operational orchestration equals AI that works in production, not just in notebooks.

They built the GPU layer. We built the workflow layer. Both run inside the same locked room.

Sources: