Every swing we analyze in BRD Labs starts as a grid of pixels. But coaches don't care about pixels—they care about where the club actually was in space, how much the hips shifted, and how the shaft was oriented relative to the player's posture.
Bridging that gap—from pixels to meaningful geometry—is where the pinhole camera model comes in. It's a simple mathematical model of how cameras see the world, and it's the foundation for everything else in our vision stack: perspective correction, stereo depth, even future simulator work.
In this article, we'll walk through what the pinhole model actually is, why it matters for golf, and how it helps us get the most out of normal phones instead of custom hardware.
From “taking a video” to “casting rays into space”
Conceptually, the pinhole camera model is simple: imagine a tiny hole in a wall and a flat screen behind it. Light from the world passes through the hole and lands on the screen, forming an image.
In math terms, that process does one key thing: it maps every 3D point in the world to a 2D point on the image. You can think of each pixel as a ray shooting out from the camera into the world—if you follow that ray far enough, you'll eventually hit the club, the ball, or the player's hip.
BRD's job is to understand those rays. When we say "the club moved this direction" or "the hands shifted this much," we're really saying: given the camera, these pixels correspond to these directions in 3D space.
The intrinsics: how a specific phone sees the world
Not every camera maps rays the same way. A wide-angle lens on an iPhone sees differently than a telephoto lens on a tablet. The intrinsic parameters of a camera describe how that specific device projects 3D points onto its sensor.
In the pinhole model, the intrinsics boil down to a few ideas:
- Focal length: how “zoomed in” the camera is. A longer focal length makes the scene look tighter and flatter; a shorter one looks wider and more distorted.
- Principal point: the pixel coordinate that represents the optical center of the image (not always the exact center of the frame).
- Pixel scale: how many pixels correspond to a unit of length on the sensor in the horizontal and vertical directions.
Together, these define a matrix (often called K, the intrinsics matrix) that lets us convert back and forth between pixel coordinates and rays. In code, it's what turns "pixel (x, y)" into "ray pointing slightly left and up from the camera."
Why focal length and zoom matter for golf angles
For coaches, focal length shows up as the difference between "that looks about right" and "why does everything look warped?"
Wide lenses (like the default phone camera at arm's length) can make:
- closer objects (hands, club head) look bigger and more curved,
- straight lines near the edge of the frame appear bent,
- and depth relationships (how far in front or behind something is) harder to interpret.
A more "normal" focal length—what we'll guide coaches toward—keeps geometry more faithful. But even when the lens isn't perfect, the pinhole model lets us understand how that particular setup will distort the scene, so we can compensate or at least warn when certain angles might be unreliable.
Principal point: where “straight ahead” actually lives
In a perfect world, the camera's optical center would land exactly at the center of the image. In reality, it might be shifted slightly up, down, left, or right.
That offset matters because it defines which direction is truly "straight ahead" for the camera. If we ignore it, then:
- verticals may lean even when the camera is level, and
- our estimates of tilt and rotation can slowly drift.
By modeling the principal point, we anchor our rays correctly. When we say "this line is vertical" or "the camera is tilted a few degrees," that statement rests on this calibrated understanding of how the image is centered.
Dealing with lens distortion: the part the pinhole model ignores
Real phone cameras aren't perfect pinholes. Lenses introduce radial distortion (straight lines bowing outward or inward) and sometimes tangential distortion (a small skewing effect).
The clean way to handle this is a two-step process:
- Undistort the image: use a distortion model to map the original pixels to an "ideal" pinhole view where straight lines in the world look straight in the image.
- Apply the pinhole model: once we're in this idealized space, we can safely use the intrinsics to turn pixels into rays.
In BRD, whenever we care about precise angles or paths, we lean on this flow. You may not see it as a user, but under the hood, we're constantly nudging the video toward a cleaner geometric representation before we measure anything.
How this helps us interpret swings from many different devices
In the real world, coaches and players bring whatever phone they have. That means:
- different brands,
- different lens configurations,
- different sensor sizes and focal lengths.
The pinhole camera model gives us a common language for all of them. As long as we know (or can estimate) each device's intrinsics and distortion, we can:
- interpret a swing recorded on one phone in the same geometric framework as a swing recorded on another,
- compare angles and patterns across devices and sessions, and
- build tools (like perspective correction and future stereo vision) on top of that shared foundation.
It's the difference between "this looks steeper on my new phone" and "this actually is steeper than last month."
Zoom, crop, and framing: not all close-ups are equal
One subtle place the pinhole model shows up is in the difference between moving the camera closer and using digital zoom.
From a geometric standpoint:
- Moving the camera changes the rays themselves—everything about perspective shifts. You're literally viewing the swing from a new vantage point.
- Digital zoom just crops and resizes the existing image. The rays are the same; you're just looking at a smaller patch of the sensor.
Because we model the camera explicitly, we can handle both cases sensibly. Cropping doesn't confuse our geometry; moving the camera does, and we treat that as a change in viewpoint, not just a "closer" version of the same setup.
Why the pinhole model matters for everything that comes next
On its own, the pinhole model might feel like just another math abstraction. But it underpins a lot of what makes BRD different:
- Perspective geometry: Our work on correcting angles and understanding camera placement (from the previous article) assumes we know how each camera projects the world. That knowledge comes directly from the pinhole model.
- Stereo vision: When we combine two phones for 3D reconstruction, we're effectively intersecting rays from two calibrated pinhole cameras to recover depth.
- Future simulators: To simulate ball flight or build 3D experiences from 2D video, we need reliable mappings from pixels to rays and rays to motion. Again, it all starts here.
The better our pinhole models, the more "3D" intelligence we can extract from very normal, very 2D videos.
From phone cameras to coaching instruments
The core philosophy behind BRD's vision stack is simple: treat everyday devices like serious measurement tools. That doesn't mean pretending they're perfect—it means modeling their imperfections and incorporating them into the way we measure, visualize, and coach.
The pinhole camera model is the foundation for that approach. It tells us how each device is seeing the swing, so every line, angle, and overlay we draw is grounded in a physical model of the world, not just a screenshot.
As we keep building out stereo capture, 3D projections, and simulator-like experiences, this model will stay at the core. It’s the quiet piece of math that lets a coach pull out a phone on the range and trust that what they’re seeing on screen actually reflects what the golfer just did in space.