Table of Contents

AR-driven 3D rendering

Developing AR applications requires solving a fundamental problem: rendering AR content. This article will use planar image tracking as an example to describe the basic modules, processes, and rendering implementation of AR applications.

Typical AR application process

A typical AR application usually involves recognizing specific images, objects, or scenes from camera images, tracking their positions and poses, and rendering virtual content (3D models) accordingly.

image tracking

For example, the above image shows an AR application for planar graphic tracking

Below is a schematic diagram of the application process.

flowchart TD
    CameraDevice[Camera Device]
    Tracker[Tracker]
    Renderer[Renderer]

    CameraDevice -->|Image Frame| Tracker
    Tracker -->|Image Frame + Tracked Pose| Renderer

The process includes the following modules.

Module Function
Physical camera Provides a sequence of input image frames. Image frames include images, timestamps of image generation, and sometimes the position and pose of the camera in space
Tracker Calculates the position and pose of the tracking target from image frames. Depending on the tracking target, various trackers exist, such as planar image trackers and 3D object trackers
Renderer Used to render camera images and corresponding 3D models of tracked objects onto the screen. On some AR glasses, camera images may not be rendered, only 3D models

Rendering on mobile phones

Rendering on mobile phones is divided into two parts: rendering camera images and rendering virtual objects.

Rendering camera images

camera image

When rendering camera images, some parameters need attention.

  • Scaling mode

    Usually, the camera image needs to fill the entire screen or a window, which may lead to aspect ratio mismatches between the camera image and the screen/window.

    Assuming we require the center of the camera image to align with the center of the screen/window while maintaining the aspect ratio, there are two common scaling modes: scale to fit and scale to fill.

    Scaling mode Effect
    Scale to fit Displays all content on the screen but may leave black bars on the sides or top/bottom
    Scale to fill No black bars but crops part of the image on the sides or top/bottom
  • Camera image rotation

    On mobile phones, images captured by the physical camera are typically fixed relative to the device body and do not change with the screen orientation. However, changes in the device's orientation affect our definition of the image's up, down, left, and right directions. During rendering, the current screen display orientation also affects the direction of the displayed image.

    Usually, during rendering, the rotation angle of the camera image relative to the screen display direction needs to be determined.

  • Camera image flip

    In some cases, the front camera is used, and the image often needs to be flipped horizontally to make it appear like a mirror.

Rendering virtual objects

virtual object

Rendering virtual objects on mobile phones requires aligning virtual objects with camera images. This necessitates placing both the rendering camera and objects in a virtual space that exactly corresponds to the real space, using the same field of view and aspect ratio as the physical camera for rendering. The perspective projection transformation applied to the camera image and virtual objects is identical, except that the perspective projection for the camera image mostly occurs in the physical camera, while for virtual objects, it is entirely a computational process.

Rendering on head-mounted displays

Rendering on head-mounted displays differs somewhat from mobile phones and can be divided into two scenarios.

  • VST

    Video See-Through refers to AR technology where the headset captures images through a physical camera and displays both the camera image and virtual content on the headset's screen. A typical example is Vision Pro. Usually, the perspective projection matrix for the camera image and virtual content is set by the headset's SDK, and external parties only need to set the position and pose of the virtual content. The physical camera used for tracking and the camera rendering the image on the screen may be in different positions, requiring coordinate transformations during rendering.

  • OST

    Optical See-Through refers to AR technology where the headset's screen is transparent, displaying only virtual content on the screen. A typical example is HoloLens. Usually, the perspective projection matrix for virtual content is set by the headset's SDK, and external parties only need to set the position and pose of the virtual content. The physical camera used for tracking and the camera rendering the image on the screen may be in different positions, requiring coordinate transformations during rendering.

Platform-specific guides

AR-driven 3D rendering is closely tied to the platform. Please refer to the following guides based on your target platform for development: