The hardest question in industrial AI in 2026 is not which model to use — it's what hardware to put under it. The wrong rack-and-cooling decision can stall a plant AI program for a year, lock you into vendor decisions that don't age well, and double your TCO before the first model trains. The right one delivers plant-floor inference in milliseconds, keeps regulated data inside your perimeter, and scales from one production line to a multi-site fleet without re-architecting. This guide walks through the full on-prem AI hardware stack for industrial enterprises — NVIDIA DGX systems for training and inference, Jetson Thor and IGX Thor for the edge, cooling and power strategies that actually work in plant environments, and the sizing and deployment framework iFactory uses to deliver turnkey AI infrastructure in 6 to 12 weeks. Because iFactory ships both on-prem appliances and fully managed cloud, this isn't a partisan argument — it's the playbook for picking what fits.
On-Prem AI Server Hardware Guide for Industrial Enterprises
NVIDIA Blackwell-class DGX systems, GB200 NVL72 rack-scale infrastructure, Jetson Thor for the plant edge, and the cooling, networking, and power architecture that holds it all together. iFactory delivers the full stack as a pre-configured turnkey appliance — or as fully managed cloud, your choice.
What You Get with iFactory's On-Prem Hardware Stack
Hardware is a kit problem only if you build it that way. iFactory ships the complete on-prem AI infrastructure as a single integrated delivery — racked, cabled, cooled, networked, and tested before it ever rolls onto your dock. The same bundle is also available as fully managed cloud if you'd rather skip the rack altogether.
The full hardware bundle
- Pre-configured NVIDIA AI server — DGX B200, DGX B300, or GB200 NVL72 depending on plant scale, software pre-loaded, drivers tested.
- Edge AI devices — Jetson Thor or IGX Thor at the plant floor for sub-50ms inference and functional safety.
- Cooling infrastructure — air-cooled for H100/H200 deployments, direct-to-chip liquid cooling for Blackwell B200/B300/GB200.
- Network fabric — Quantum InfiniBand for multi-node, Spectrum-X Ethernet for hybrid clusters, NVLink 5.0 for intra-node.
- Power and rack engineering — PDU sizing, redundancy, IT/OT segmentation built into the design.
- Storage tier — NVMe local for hot data, parallel filesystem for training, S3-compatible object for cold and compliance.
- Field deployment — cabling, network, PLC integration, operator training, 24×7 remote monitoring.
- Cloud alternative — same models, same software, fully managed in iFactory Cloud if on-prem isn't the right fit.
Why On-Prem Hardware Still Matters in 2026
Cloud is excellent. Most workloads run there. So why does on-prem hardware still command serious capex in industrial enterprises? Three reasons, all sharper in 2026 than they were three years ago — and none of them about ideological purity.
Data sovereignty & security
Regulated chemicals, defense manufacturing, pharma APIs, sensitive IP — plants where production data legally or contractually cannot leave the perimeter. On-prem isn't a preference; it's a compliance requirement.
Latency for control loops
Real-time RL schedulers, vision inspection at line speed, closed-loop quality decisions — all need sub-50ms inference. Cloud round-trips through plant DMZs can't deliver. Local hardware can.
Continuity during outages
Plants don't stop when the cloud link does. On-prem hardware keeps inference running through internet outages, ISP failures, and DMZ misconfigurations — the AI doesn't go dark when the WAN does.
The fourth reason — quietly the strongest — is total cost of ownership at sustained utilization. Cloud GPU rentals are excellent for spiky workloads and experimentation. For 24×7 production inference workloads running at 70%+ utilization, on-prem hardware amortizes faster than most CFOs expect. iFactory's standard sizing exercise quantifies the crossover point for your specific workload before you commit to either deployment. Book a sizing session to see the math on your plant.
The Three-Tier Hardware Architecture
iFactory's on-prem hardware stack is built in three tiers. Each tier has a specific job, a specific hardware class, and a specific deployment pattern. You don't pick one — you compose the right mix for your plant.
Tier 1: Data Center / TrainingData Center — Training & Centralized Inference
Primary hardware — NVIDIA DGX B200, DGX B300, or GB200 NVL72. The DGX B200 is the unified AI platform for develop-to-deploy pipelines, configured as a single 8x Blackwell GPU node delivering Blackwell-class training and inference for enterprises at any AI maturity. For trillion-parameter foundation model work, the GB200 NVL72 is a liquid-cooled rack with 36 Grace Blackwell Superchips — 36 Grace CPUs and 72 Blackwell GPUs — connected as one with NVLink, scaling up to hundreds of thousands of Superchips with NVIDIA Quantum InfiniBand.
Plant Appliance — Local Inference & Operator Copilot
Primary hardware — NVIDIA DGX Station (GB200/GB300 tower) or a mid-range HGX-based server. The DGX Station brings data-center-class AI compute to a tower form factor — designed for office and plant-IT environments, with Blackwell-class compute on the same Grace CPU + Blackwell GPU architecture as the rack-scale systems but without the data center cooling requirements. For plants that need more than a single tower but less than a full DGX, iFactory pairs an HGX-class server with NVIDIA L40S or H200 GPUs (141 GB HBM3e, 4.8 TB/s bandwidth) — enough horsepower for the full 9-model portfolio at plant scale.
Plant Edge — Vision, Robotics, Closed-Loop Control
Primary hardware — NVIDIA Jetson Thor and NVIDIA IGX Thor. Jetson Thor is an AI edge supercomputer delivering 2,070 FP4 teraflops — a 7.5x increase over Jetson Orin, with 3x CPU performance and double the memory, vital for plant environments where robots and vision systems require rich sensor data and low-latency AI processing for concurrent streams. NVIDIA IGX Thor extends this to the industrial edge with enterprise software support and functional safety — built specifically for industrial automation, healthcare, and large-scale manufacturers adopting AI for productivity, safety, and cost reduction. The new Jetson T4000 module provides a cost-effective upgrade path bringing Blackwell architecture to autonomous machines at $1,999 per 1,000-unit volume, with 1,200 FP4 TFLOPS and 64 GB of memory in a configurable 70-watt envelope.
NVIDIA Data Center GPU Comparison — 2026 Lineup
The decision between H100, H200, B200, and B300 isn't religious — it's about matching memory, bandwidth, and compute precision to the model class. Here's the practical breakdown iFactory uses for industrial deployments.
| GPU | Memory | Bandwidth | NVLink | TDP | Best fit for industrial AI |
|---|---|---|---|---|---|
| H100 SXM5 | 80 GB HBM3 | 3.35 TB/s | 900 GB/s | 700 W | Existing H100 clusters, models under 30B params, mature air-cooled rooms |
| H200 | 141 GB HBM3e | 4.8 TB/s | 900 GB/s | 700 W | Plant-scale inference, moderate workloads, air-cooled facilities, value pick |
| L40S | 48 GB GDDR6 | 864 GB/s | N/A | 350 W | Vision inference, smaller models, mixed compute/render workloads, low power |
| B200 SXM | 180 GB HBM3e | 8 TB/s | 1.8 TB/s | 1000 W | Frontier training, 70B+ parameter models, FP4 inference, requires liquid cooling |
| B300 SXM | 192 GB HBM3e | 12 TB/s | 1.8 TB/s | 1200 W | Latest Blackwell Ultra, highest throughput inference, requires liquid cooling |
| GB200 NVL72 | 13.4 TB HBM3e (rack) | 576 TB/s aggregate | NVLink Switch | ~120 kW/rack | Trillion-parameter training, fleet-wide foundation models, liquid-cooled rack |
Practical guidance — most industrial plants don't need B300 or GB200. For the standard 9-model portfolio running at plant scale, an H200-based DGX-class node delivers excellent performance with no liquid cooling requirement, lower power, and significantly easier deployment. iFactory specs the right GPU for the workload during the sizing session, not by default. Get a sized quote for your plant.
Sizing Decision Matrix — What Hardware Fits Which Plant?
The hardware sizing question reduces to four variables — plant size, model portfolio scope, real-time inference requirements, and growth plans. Here's the framework iFactory applies in every sizing session.
| Plant profile | Tag count | Recommended tier | Hardware example | Typical CapEx range |
|---|---|---|---|---|
| Small / Single line pilot | 5,000–20,000 | Tier 2 (compact) + edge | DGX Spark workstation + Jetson Thor edge | $15K–50K |
| Mid-size plant, single site | 20,000–100,000 | Tier 2 appliance + edge | DGX Station GB300 (tower) + Jetson Thor | $80K–300K |
| Large plant, regulated | 100,000–300,000 | Tier 1 + Tier 3 (full stack) | DGX B200 (8x B200) + IGX Thor edge fleet | $350K–600K |
| Multi-site enterprise | 300,000+ across sites | Centralized Tier 1 + per-site Tier 2/3 | GB200 NVL72 (HQ) + DGX Station per plant | $1.5M–6M (multi-site) |
| Frontier / R&D | Any | Tier 1 (high-end) | DGX B300 or GB200 NVL72 rack | $300K–4M per rack |
| Cloud-first / no on-site | Any | iFactory Cloud | Fully managed — no on-prem hardware | OpEx — $2K–50K/mo |
The cloud row at the bottom isn't an afterthought — for many customers it's the right answer. iFactory delivers the same data engineering, the same 9-model portfolio, and the same Operational Intelligence Score on cloud as on-prem. The hardware question becomes "where does the compute live?" not "what software runs?"
Cooling — Air, Liquid, or Hybrid?
The Blackwell generation changed the cooling calculus. B200 draws 1000W per GPU, 43% more than the H100/H200 at 700W — and B200 deployments require direct-to-chip liquid cooling, which is mandatory rather than optional. If your facility isn't already liquid-cooled, that means new infrastructure. Here's the practical decision framework iFactory uses for industrial plants.
Air-cooled — the default
H100, H200, L40S, and Jetson Thor deployments run on standard air cooling. CRAC units, hot/cold aisle separation, and properly sized fans handle 700W-per-GPU densities. For most industrial plants installing AI for the first time, this is the right starting point — no facility renovation required.
Liquid-cooled — for Blackwell
B200, B300, and GB200 systems require direct-to-chip liquid cooling. A Coolant Distribution Unit (CDU) provides facility water, in-rack manifolds distribute coolant to each GPU's cold plate, and a heat exchanger returns it. Adds infrastructure cost but delivers 3–4x the inference throughput per dollar versus Hopper at sustained utilization.
Hybrid — most plants land here
Tier 1 training in a liquid-cooled colocation or central data center, Tier 2 plant appliance air-cooled in the IT/OT room, Tier 3 edge in industrial enclosures with passive or filtered cooling. This is what most multi-site industrial customers actually deploy — and what iFactory's standard configurations target.
What iFactory handles for you
Cooling decisions you don't have to make
- Right cooling class chosen for the GPU tier — no over-engineering a small plant, no under-cooling a Blackwell deployment.
- CDU sizing, redundancy planning, and integration with existing facility water (if liquid-cooled).
- Plenum design, hot/cold aisle layout, and CFM calculations for air-cooled deployments.
- Industrial enclosure rating (IP54 / NEMA 12) for edge devices in dusty or high-vibration plant environments.
- Filter and fan maintenance plans built into the 24×7 monitoring contract.
- Coolant chemistry, leak detection, and emergency response procedures documented and trained.
Network Architecture — InfiniBand, NVLink, and the Plant Network
AI hardware is only as fast as the network connecting it. The Blackwell-class systems use multiple network fabrics — each for a specific job — and the plant network on the other side has its own constraints. Getting this right is the difference between a model that runs and a model that runs at full utilization.
Inside the box — NVLink & NVSwitch
NVLink 5.0 delivers 1.8 TB/s of bidirectional bandwidth per GPU, double the H100's 900 GB/s. Inside a DGX B200, all 8 GPUs talk to each other via NVSwitch at full NVLink speed — no bottleneck. This is what makes 70B+ parameter models fit in a single node without tensor parallelism across the network.
Between racks — Quantum InfiniBand or Spectrum-X
Multi-node training and large inference clusters need a high-bandwidth, low-latency fabric. NVIDIA Quantum InfiniBand at 800 Gb/s per port is the gold standard for training. Spectrum-X is the Ethernet-based alternative, deployed where existing infrastructure favors Ethernet. iFactory picks the right one based on your data center.
To the plant — IT/OT segmentation
The AI server connects to the plant's unified namespace through a DMZ — never directly to control networks. iFactory's standard architecture has the AI layer on a dedicated VLAN, with read-only or carefully scoped write paths back to MES/SCADA. Plant-floor PLCs don't see the AI server; they see the broker.
To the edge — TSN & deterministic Ethernet
Jetson Thor and IGX Thor edge devices connect via Time-Sensitive Networking (TSN) Ethernet for deterministic latency. Critical for vision inspection at line speed and closed-loop control where any jitter ruins the decision. Standard Ethernet works for most copilot and analytics use cases.
Power, Rack, & Facility Planning
The boring part of the deployment that quietly determines whether everything works. iFactory's facility checklist runs to about 80 line items — here are the highlights that matter most for industrial customers planning their first on-prem AI deployment.
Facility checklist — what you provide
What the plant supplies
- Rack space — typically 8–48U depending on tier and capacity planning.
- Line power — 208V or 415V three-phase for DGX-class systems; standard 120V/240V for edge.
- Network drop — fiber recommended for Tier 1/2, copper sufficient for Tier 3 edge.
- Facility water if liquid-cooled (Blackwell deployments) — flow, return, chemistry within NVIDIA spec.
- Environmental — temperature controlled IT/OT room for Tier 2 (typically 20–25°C, <60% RH).
- Physical security — access control, camera coverage, audit logging on entry to AI rack.
What iFactory delivers
Pre-configured and ready to plug in
- NVIDIA AI server racked, GPUs installed, drivers tested, software pre-loaded.
- Cabling — power, network, NVLink/InfiniBand cables, fiber for inter-rack as needed.
- Cooling infrastructure — CDU and manifolds for liquid systems, fan/plenum design for air.
- PDU sizing and redundancy — A/B feed, switched or metered as required.
- Network configuration — VLANs, DMZ, IT/OT segmentation, TLS, certificate management.
- Storage tier — NVMe local, parallel filesystem (training), S3 object (cold).
- Edge device commissioning — Jetson Thor or IGX Thor onboarded to plant unified namespace.
- Field tech dispatch for cabling, PLC integration, and on-site verification.
- Operator training, runbook delivery, 24×7 remote monitoring on day one.
On-Prem or Cloud — Same Stack, Your Call
The same 9-model AI portfolio runs identically on iFactory's on-prem appliance and in iFactory Cloud. What changes is where the compute lives and who manages it. Many customers run a hybrid — on-prem for the regulated production site, cloud for satellite plants and fleet benchmarking, edge devices at the plant floor regardless.
iFactory On-Prem Appliance NVIDIA Blackwell-class hardware in your perimeter
- Pre-configured NVIDIA AI server — DGX B200/B300, DGX Station, or HGX H200 sized to plant.
- Edge AI — Jetson Thor and IGX Thor at the plant floor for sub-50ms inference and functional safety.
- Local inference, local control — no cloud round-trip for real-time models.
- Works without internet — pipeline buffers, inference continues during WAN outages.
- Best fit — regulated industries, defense, pharma, sensitive IP, air-gapped sites.
- Deployment — 6–12 weeks turnkey, hardware + cabling + training all included.
iFactory Cloud Same models, no hardware to ship
- Fully managed — iFactory operates the NVIDIA hardware in our cloud, you consume the AI.
- Fastest deployment — first pipeline live in 2–4 weeks, full rollout in 6–8 weeks.
- Elastic scale — model training and large historical backfills don't require new on-site capacity.
- Fleet benchmarking — multi-plant model comparison without per-site infrastructure.
- Best fit — multi-site CPG, food & bev, discrete manufacturing, greenfield plants.
- Compliance — SOC 2 Type II, ISO 27001, region-locked data residency.
Not sure which hardware tier fits your plant?
The 60-minute hardware sizing session covers your tag inventory, model portfolio scope, real-time latency requirements, growth plans, and facility constraints. Output is a sized hardware quote with CapEx, OpEx, and a 12-week deployment plan — or a cloud-only alternative with TCO comparison.
The 3-Phase Turnkey Hardware Delivery — Live in 6 to 12 Weeks
The full hardware deployment — DGX system + edge devices + cooling + network + commissioning — runs on a 6 to 12 week timeline. Identical to the cloud roadmap except for the hardware shipment step in Phase 1. iFactory's standard process — no surprises, no scope creep.
Ship + Network + Power
NVIDIA hardware arrives racked, software-loaded, ready to plug in. Field techs handle cabling, network, power. Cooling commissioned (CDU and water flow if liquid-cooled, fan plenum if air-cooled). Plant network DMZ configured.
- DGX system racked and powered on-site
- Cooling commissioned and verified
- Network DMZ live, IT/OT segmentation in place
- Edge devices (Jetson Thor / IGX Thor) commissioned
- Hardware acceptance test signed off
Model Train + Pilot
Pipeline running end-to-end, GPU utilization climbing as models train on plant historical data. Vector index built. Pilot copilot live for a small group of operators. Edge devices serving live inference at the plant floor.
- LSTM, Autoencoder, GNN, RL Scheduler trained
- Vector index built from plant docs
- Edge inference live on Jetson Thor
- Pilot OIS in operator hands
- GPU utilization baseline captured
Go-Live + Training
Full plant rollout. Operators, technicians, supervisors trained. SAP work-order integration live. 24×7 hardware monitoring active. ROI baseline captured. Handover to ongoing managed service contract.
- Full plant model coverage live
- Operator and supervisor training delivered
- SAP integration live, work orders flowing
- Hardware monitoring + SLA active
- ROI baseline captured for tracking
What an Operator Actually Sees — Hardware in Action
The whole point of the hardware stack is that operators never have to think about it. The GPU, the cooling, the network — all invisible to the people doing the work. Here's a typical interaction from a customer running the full three-tier deployment.
Recommended action — divert coil 2241 to grade-B inventory before downstream rolling. I've drafted SAP work order WO-44128 for inspection at the next stop point.
Sources — IGX Thor vision (real-time), Plant DGX autoencoder (cross-check), SAP IH08 equipment history.
This single interaction touched all three hardware tiers — Tier 3 IGX Thor at the furnace did the real-time vision inference, Tier 2 plant DGX did the cross-check, and the SAP integration wrote back to the system of record. None of it required cloud round-trips. None of it stalled when the WAN got slow. That's what on-prem hardware buys you.
Frequently Asked Questions
Do I have to buy NVIDIA servers separately?
No. iFactory's on-prem appliance ships fully loaded — NVIDIA hardware (DGX B200, DGX Station, HGX H200, or Jetson Thor depending on your tier), software pre-installed, network gear, cabling, and cooling infrastructure where needed. You provide rack space, line power, and Ethernet. iFactory provides everything else. No separate hardware procurement, no firmware tuning, no driver compatibility issues. The cloud deployment option has no hardware at all.
What's the difference between H100, H200, B200, and B300 — and which one do I need?
H100 (80 GB) is the proven workhorse, fine for existing clusters and models under 30B parameters. H200 (141 GB, 4.8 TB/s) is the value pick for plant-scale inference with air cooling. B200 (180 GB, 8 TB/s) delivers Blackwell-class performance with FP4 support but requires liquid cooling at 1000W per GPU. B300 (192 GB, 12 TB/s) is the latest Blackwell Ultra for highest-throughput inference. Most plants don't need B300 — H200-based deployments handle the full iFactory model portfolio without the cooling complexity. iFactory's sizing session matches the GPU to your workload, not by default.
Do we need liquid cooling for our plant?
Only if you choose Blackwell-class GPUs (B200, B300, or GB200). H100, H200, L40S, and all Jetson Thor edge devices run on standard air cooling, which works in existing IT/OT rooms without facility renovation. Liquid cooling becomes mandatory at the B200's 1000W per GPU — direct-to-chip cold plates plus a Coolant Distribution Unit (CDU). For most industrial plants installing AI for the first time, iFactory recommends an air-cooled H200 or DGX Station deployment to avoid the facility renovation. Upgrade to Blackwell when the workload and budget justify it.
How long does it take to deploy the hardware?
Standard turnkey delivery is 6–12 weeks from contract signature to operator-facing copilot. Phase 1 (weeks 1–4) covers hardware shipment, racking, cabling, cooling, network, and DMZ configuration. Greenfield cloud deployments can be live in 4 weeks. Complex multi-site on-prem rollouts with legacy SCADA integration typically land at the 12-week end. Lead times on B200/B300 systems can extend the start date — supply has been constrained through mid-2026 with an approximately 3.6 million unit backlog — but iFactory pre-allocates inventory under our purchase agreement so customer-facing lead times stay predictable.
Can we start with a smaller deployment and scale up?
Yes — this is the most common path. Start with a DGX Station or HGX H200 Tier 2 appliance plus one or two Jetson Thor edge devices for a single production line. Run for 60–90 days. If the ROI justifies expansion, add a Tier 1 DGX B200 for centralized training and additional edge devices for plant-wide coverage. iFactory's architecture is designed for this scale-out pattern — no rip-and-replace at any growth step.
What about edge AI specifically — Jetson Thor versus IGX Thor?
Jetson Thor is the general-purpose edge AI platform delivering 2,070 FP4 teraflops with a 70W envelope — excellent for robotics, vision inspection, and operator copilot at the plant floor. IGX Thor extends Jetson Thor to the industrial edge with enterprise software support and functional safety, specifically designed for industrial automation, healthcare, and manufacturing applications where safety certifications matter. Use Jetson Thor for general plant edge inference. Use IGX Thor where functional safety or regulatory certification is required.
How does the cloud option compare on cost?
Cloud is faster to start and cheaper for spiky workloads. On-prem amortizes faster for sustained workloads above 70% GPU utilization. Practical numbers — cloud B200 instances run roughly $3.79–6/hour on demand, $2.25/hour on long-term commit. At sustained utilization, that's $20K–50K per GPU per year in cloud OpEx. An H200 GPU on-prem costs roughly $30–40K CapEx with 4–5 year useful life — under $10K/year amortized. iFactory's sizing session computes the crossover point for your specific workload before you commit.
What's the difference between iFactory's on-prem and cloud deployments?
Data engineering pipeline, model portfolio, Operational Intelligence Score, operator interface — all identical. What differs is where the compute lives. On-prem keeps everything inside the plant perimeter with sub-50ms inference and continuity during WAN outages. Cloud is fully managed by iFactory with faster initial deployment and elastic scale. Many customers run a hybrid — sensitive production sites on-prem, satellite plants on cloud, fleet benchmarking in the cloud layer, edge devices at the plant floor regardless.
What network bandwidth do we need at the plant?
For the on-prem appliance — minimal. The pipeline runs inside the plant, so only management traffic and optional cloud sync traverse the WAN, typically <100 Mbps sustained. For the cloud deployment, plan for 100 Mbps to 1 Gbps depending on data volume and edge device count. iFactory's edge gateways buffer with store-and-forward, so transient network issues don't lose data. Inside the plant, the AI server connects to the OPC UA layer over Gigabit or 10G Ethernet.
How do you handle hardware warranty and replacement?
NVIDIA hardware ships under standard manufacturer warranty (typically 3 years on DGX systems). iFactory's managed service contract wraps that with 24×7 monitoring, predictive component health tracking, and same-day or next-business-day replacement of failed components. For Blackwell-class deployments, the warranty includes on-site service from NVIDIA-certified technicians. iFactory holds spare components for common failure modes to minimize downtime.
What's the upgrade path when newer GPUs come out?
NVIDIA's roadmap is public — Vera Rubin in H2 2026 with HBM4 memory at 288 GB per GPU and 13 TB/s bandwidth, then Rubin Ultra in 2027 and Feynman in 2028. iFactory's architecture is designed so models port between hardware generations — your trained LSTM, autoencoder, and GNN models on H200 today run on Rubin tomorrow with no re-training. The hardware refresh is a swap, not a rebuild. Most customers refresh on a 4–5 year cycle, often skipping a generation.
Can we integrate iFactory's hardware with existing AI infrastructure we already have?
Yes. If you already have NVIDIA hardware (H100 cluster, existing DGX, custom HGX build), iFactory's software stack runs on top — no hardware replacement required. The 9-model portfolio, data engineering pipeline, and Operational Intelligence Score deploy as a software layer on your existing infrastructure. Conversely, iFactory's hardware integrates with your existing MES, CMMS, SAP, and historians via standard connectors. Nothing about the stack assumes greenfield.
Pick the right hardware for your plant — or let us pick it for you
NVIDIA Blackwell-class data center systems, plant-floor appliances, edge AI devices, cooling, networking, and rack engineering — all delivered turnkey by iFactory in 6 to 12 weeks. Or skip the rack and run the same models on iFactory Cloud. Either way, the architecture is yours from day one.







