Pixel-precise.Not bounding-box.

YOLO and its descendants draw rectangles. SpiraVision 3D draws masks. The difference matters when two avocados are touching, when a bell pepper's true centroid is offset from its bounding box, or when orientation determines whether the gripper closes on fruit or on stem. Our proprietary CNN — licensed from Spira Vision Systems — segments every produce class at the pixel, fuses depth at the centroid, and fits a PCA principal axis for orientation. Built on commodity hardware, streaming 6-DoF pick poses today.

Fig. 01 · Detection Output, Top-Down

Per-pixel masks, centroid + axis fusion, and 6-DoF pose in a single forward pass — outputs that bounding-box detectors cannot produce.

The Pipeline · Four Stages, One Camera, One GPU

CAPTURE

Dz · Crgb · Ir

Active IR stereo at 90 fps, color-agnostic depth, global shutter for moving conveyors.

SEGMENT

Mp · Mc

Dilated CNN delivers per-pixel multiclass masks in a single forward pass — not bounding boxes.

LOCALIZE

Lx · Ly · Lz

Mask-averaged depth fusion gives 3D centroid in robot frame, calibrated by Kabsch SVD.

OUTPUT

Oθ · Otcp

PCA principal axis plus 6-DoF pose streamed via TCP, ROS 2, or JSON to the robot controller.

>90%

mAP detection accuracy

3+ produce classes, IoU @ 0.5

±3mm

3D centroid accuracy

at 0.5 m standoff

30+

fps inference

at 640×480, mid-range GPU

The System

i.Structured-light 3D capture

Intel RealSense D435 projects a Class-1 IR speckle pattern at 850 nm and computes per-pixel depth from active stereo at up to 90 fps. The IR pattern is color-agnostic — separating touching fruit by physical geometry rather than RGB contrast.

ii.SpiraNet Lite inference

A pixel-to-pixel dilated CNN runs in a GPU-accelerated Docker container, outputting multiclass masks in a single forward pass. The accompanying VCC tool lets non-experts add new SKUs.

iii.Robot-frame output

Sensate's calibration pipeline registers the depth frame into the robot's coordinate system via Kabsch SVD, fits PCA principal axes, and streams 6-DoF pick poses over TCP, ROS 2, or JSON.