Edge AI Compute Architecture for Factory Vision Systems

By Jacob Bethell on March 13, 2026

edge-ai-compute-architecture-factory-vision

Your AI vision cameras generate terabytes daily. A single 12 MP camera at 60 fps produces 5.7 Gbps of raw image data. Multiply that across 10, 50, or 200 inspection stations and you need an infrastructure that can ingest, process, and respond in under 50 milliseconds — not the 200ms+ round-trip a cloud connection requires. Edge AI solves this, but only with correctly planned server rooms, precision cooling, dual-feed power, and dedicated network backbone. Most factories undersize GPU compute by 40-60% because they calculate for average load rather than peak burst, or they provision servers without adequate cooling and watch GPUs throttle to 60% capacity in the first summer. We size GPU servers, design cooling, and plan power and network topology before construction begins — so your vision system runs at full performance from day one. Size Your Edge Infrastructure — we'll calculate the exact GPU count, rack space, power, and cooling for your factory.

Data Flow: Camera to Decision
2.4–5.7 Gbps Cameras Per station raw data rate
10GbE Fiber Network Dedicated vision backbone
120–1,979 TOPS GPU Inference Edge server room
<50ms Decision Pass/fail + reject gate

Why Edge — Not Cloud — for Factory Vision

Cloud Inference
Latency200-500ms round-trip
Bandwidth Cost$50K-$200K/yr for 10+ cameras
Data SovereigntyProduction images leave factory
UptimeDepends on internet connection
At-Scale TCO (100+ cameras)40% higher than edge
VS
Edge Inference
Latency<10-50ms end-to-end
Bandwidth Cost$0 — all processing local
Data SovereigntyZero data leaves factory floor
Uptime99.99% with local redundancy
At-Scale TCO (100+ cameras)40% lower at 3-year horizon

Need to justify edge over cloud for your factory vision project? Size Your Edge Infrastructure — we'll provide a detailed TCO comparison for your specific camera count and throughput requirements.

Vision Workload Sizing: Cameras to GPU TOPS

GPU sizing starts with total data throughput — not camera count. A single high-speed line scan camera can demand more compute than ten low-resolution area scan cameras. The formula: cameras × resolution × frame rate × model complexity = required inference TOPS, plus 30% headroom for future expansion and peak burst handling.

Factory ScaleCamera ConfigTotal Data RateInference DemandRecommended GPURack Space
Small (5-10 stations)10× 5 MP area scan @ 30 fps~24 Gbps total~200-400 TOPS2× NVIDIA L40S (or 1× A100)2U in single rack
Medium (20-50 stations)30× area scan + 10× line scan + 5× 3D~120 Gbps total~1,500-3,000 TOPS4× NVIDIA A100 (or 2× H100)4-8U, dedicated rack
Large (50-100 stations)60× area scan + 20× line scan + 10× 3D + 5× hyperspectral~350 Gbps total~5,000-8,000 TOPS4× NVIDIA H100 (or DGX station)Full rack, N+1 redundancy
Enterprise (100-500+ stations)Multi-line, multi-building deployment500+ Gbps total10,000+ TOPSMultiple DGX or HGX pods; distributed architectureDedicated server room; multiple racks

Server Room Physical Design

A factory edge server room is not an IT closet. GPU servers generate 2-5x the heat density of standard IT equipment. Without purpose-designed cooling, power, and physical layout, GPUs will thermal-throttle within minutes — dropping from rated performance to 50-60% capacity. In greenfield, the server room is designed as a critical facility room with the same engineering rigor as an electrical switchgear room.

Location: Within 50m of Inspection Points

Fiber latency adds ~5 ns/meter. More critically, longer cable runs increase failure points and installation cost. The server room should be centrally located relative to inspection stations. In greenfield, this is specified in the facility layout during schematic design — not discovered during commissioning.

Raised Floor / Overhead Cable Tray

Under-floor air distribution for front-to-back cooling airflow through server racks. Overhead cable trays for fiber and power — separated to prevent EMI. Floor loading capacity: 1,500-2,000 kg/m² minimum for GPU-dense racks (standard office floor is 250-500 kg/m²).

Fire Suppression

Clean agent (FM-200 or Novec 1230) fire suppression — not water sprinklers. Pre-action detection (VESDA smoke detection) for earliest possible alert. Designed into the room from construction — retrofitting fire suppression around installed equipment is expensive and disruptive.

Physical Security & Access Control

Card/biometric access control. Environmental monitoring (temperature, humidity, water leak detection) with alerts to maintenance and IT teams. CCTV monitoring. Designed as a secure, controlled-access room in the facility security plan.

Precision Cooling & Power Engineering

Design ParameterSpecificationWhy It MattersGreenfield Advantage
Cooling CapacityPlan 1.5-2× total GPU TDP; N+1 CRAC/CRAH redundancyGPUs throttle at 80-85°C junction temperature. Without adequate cooling, rated TOPS drops to 50-60%HVAC sized for GPU heat load from day one; chilled water piped before walls go up
Temperature Range20-25°C inlet air; max 27°C (ASHRAE A1)Every 10°C above optimal reduces GPU lifespan by ~50% (Arrhenius acceleration)Precision CRAC units specified in MEP design; not retrofitted window AC
AirflowFront-to-back (cold aisle/hot aisle containment)Prevents hot exhaust recirculating to GPU intake. Mixing hot/cold air wastes 30-40% cooling capacityAisle containment designed into rack layout; raised floor plenum sized for airflow
Power — PrimaryDual-feed utility power; automatic transfer switch (ATS)Single-feed power = single point of failure. ATS switches to backup feed in <10 secondsDual feeds from different utility transformers; specified in electrical design
Power — UPSOnline double-conversion UPS; 15-30 min runtimeBridges gap between power loss and generator start. Protects against sags, surges, harmonicsUPS room adjacent to server room; battery weight factored into structural design
Power — GeneratorDiesel generator with auto-start; N+1 if criticalExtended outage protection. Generator fuel supply sized for 24-72 hoursGenerator pad, fuel storage, and exhaust routing designed into site plan
Power Density10-30 kW per rack for GPU servers (vs. 5-8 kW for standard IT)Undersized PDUs and breakers trip under GPU load; standard racks can't handle the heatPower distribution designed for GPU density from the start; bus bars, PDUs, breakers all rated

Not sure what cooling capacity your GPU deployment needs? Size Your Edge Infrastructure — we'll calculate BTU/hr, airflow CFM, power draw, and UPS runtime for your exact GPU configuration.

Network Topology Design

Vision data is the highest-bandwidth traffic on a factory network. A single 12 MP camera at 60 fps generates more data than an entire production floor of PLCs and SCADA nodes. Vision traffic must be isolated from OT and IT networks on a dedicated backbone — otherwise, inspection latency becomes unpredictable and production network performance degrades.

Tier 1
Camera to Access Switch

1 GbE or 10 GbE per camera (depending on resolution/fps). PoE+ for camera power where applicable. CAT6A shielded for GigE Vision; OM4 fiber for 10 GbE. Max 100m horizontal run. Dedicated vision VLAN — no shared ports with OT devices.

Tier 2
Access Switch to Aggregation

25-100 GbE uplinks from access switches to aggregation layer. Fiber trunk cables (OM4 multimode or OS2 singlemode for runs >300m). Redundant paths (LAG/MLAG) for zero-downtime failover. Managed switches with QoS for vision traffic prioritization.

Tier 3
Aggregation to GPU Server

100 GbE or 200 GbE into GPU server NICs (ConnectX-7 or equivalent). Low-latency switching (<1 μs port-to-port). Direct fiber connections to server room. This is the bottleneck point — aggregation bandwidth must exceed total camera throughput with 30% headroom.

Tier 4
GPU Server to MES/SCADA

Separate interface from vision ingress. 1-10 GbE for inspection results, reject gate signals (Ethernet/IP, Profinet), and dashboard data. Firewall/DMZ between vision network and OT/IT networks. Only metadata and decisions cross this boundary — raw images stay on vision network.

Failover & Redundancy Architecture

GPU Failover (N+1)

Provision one additional GPU server beyond minimum required capacity. If any single server fails, workload automatically redistributes to remaining servers. Orchestration software (Kubernetes or custom load balancer) manages camera-to-GPU assignment dynamically. Mean time to failover: <5 seconds.

Network Path Redundancy

Dual fiber paths from every access switch to aggregation layer (LAG/MLAG). If any single fiber or switch fails, traffic fails over to the redundant path with zero packet loss. Spanning Tree or EVPN-VXLAN fabric eliminates loops while maintaining redundancy.

Power Redundancy

Dual-corded servers with A+B power feeds from separate UPS systems. Each UPS sized to carry full server room load independently. Generator auto-starts within 10 seconds of utility loss. UPS bridges the gap. Result: no single power failure stops vision inspection.

Graceful Degradation

If capacity drops below full coverage, AI prioritizes critical inspection stations over non-critical ones. Operator dashboard shows degraded status with estimated time to recovery. No silent failures — every camera and GPU reports health status continuously. CMMS auto-generates maintenance tickets for failed components.

Key Benefits & ROI

10xFaster inference vs. cloud — sub-50ms vs. 200-500ms round-trip
ZeroProduction data leaving the factory — complete data sovereignty
99.99%Uptime with N+1 GPU, dual power, and redundant network paths
10–500+Cameras scalable on same architecture — add GPUs as you grow
40%Lower 3-year TCO vs. cloud at 100+ camera scale

Your Cameras Are Only as Good as the Compute Behind Them

iFactory designs complete edge AI infrastructure for factory vision — GPU sizing, server room layout, precision cooling, dual-feed power, fiber backbone, and failover architecture — delivered as construction-ready specifications.

Frequently Asked Questions

How many GPU servers do I need for 100 cameras?
It depends on camera resolution, frame rate, and model complexity — not just camera count. For 100 standard area scan cameras (5 MP @ 30 fps) running CNN defect detection models, you need approximately 4,000-6,000 TOPS of inference capacity. That's roughly 4× NVIDIA A100 or 2-3× H100 GPUs, plus N+1 redundancy. For mixed deployments with line scan and 3D cameras, compute requirements increase significantly. We calculate the exact GPU count from your specific camera matrix — resolution × fps × model FLOPS = required TOPS, plus 30% headroom for burst and expansion.
What cooling capacity does a GPU server room need?
Plan for 1.5-2× the total GPU TDP (thermal design power). An NVIDIA A100 draws 300W; an H100 draws 700W. A rack with 4× H100 servers generates 2,800W of GPU heat alone, plus CPU, memory, and fans — total rack power can reach 15-25 kW. Cooling must maintain 20-25°C inlet air temperature with N+1 redundancy. For a 4-rack GPU deployment: expect 60-100 kW of cooling capacity needed. In greenfield, this is specified as a chilled water loop or precision CRAC unit in the MEP design — not a portable AC unit bolted on after GPUs start throttling.
What's the TCO comparison between edge and cloud at scale?
At small scale (5-10 cameras), cloud can be cost-competitive due to zero upfront hardware investment. At 50+ cameras, edge wins decisively. A 100-camera factory processing 5 MP @ 30 fps generates ~240 Gbps — uploading this to cloud costs $50K-$200K annually in bandwidth alone, plus $100K-$300K in cloud GPU inference costs. Edge infrastructure (GPUs, servers, networking, cooling) costs $200K-$500K upfront but runs for 5+ years with electricity as the primary ongoing cost. Three-year TCO: edge is typically 40% lower at 100+ cameras, and the gap widens with scale.
What happens when a GPU fails?
With N+1 redundancy, the orchestration layer automatically redistributes camera workloads to remaining GPUs within 5 seconds. No inspection station goes offline. The failed GPU generates a CMMS work order automatically with diagnostic data. For critical production lines, we design N+2 redundancy — two simultaneous GPU failures covered. Hot-swappable server designs allow GPU replacement during production without powering down the rack. The key: redundancy must be designed into the infrastructure from the start, not added as an afterthought.
What bandwidth do I need between cameras and GPU servers?
Per camera: 1 GbE for standard area scan (5 MP @ 30 fps), 10 GbE for high-resolution or high-speed cameras. Total backbone: aggregate all camera feeds plus 30% headroom. For 50 cameras at mixed resolutions, expect 50-150 Gbps aggregate — requiring 100 GbE or 200 GbE uplinks to the GPU server room. The critical design principle: vision traffic on a dedicated fiber backbone, isolated from OT production and IT corporate networks. Star topology from server room to each inspection zone for fault isolation. Get your infrastructure blueprint with exact bandwidth calculations per station.

Undersized Compute = Missed Defects at Production Speed

GPU thermal throttling, network congestion, and power brownouts don't show up in pilot projects — they show up at full production speed on hot summer days. Design for reality, not for demos.


Share This Story, Choose Your Platform!