Table of Contents

Cameras and input extensions

This article introduces the camera model, parameters, and some other usage notes for physical cameras, as well as how to use custom cameras for input extensions.

camera

Input frame

The input frame (Input Frame) is the fundamental data unit in AR, representing all relevant information captured from a camera or other data sources in a single frame. An input frame typically includes:

  • Raw image data (camera image)
  • Camera parameters (e.g., intrinsic parameters)
  • Timestamp
  • The transformation matrix of the camera in world coordinates
  • Tracking status

This information provides the spatiotemporal context data required for AR algorithms such as positioning, tracking, and rendering.

Physical camera

The cameras used on electronic devices today typically consist of multiple lenses and mirrors. However, actual optical structures are generally not used to build camera models; instead, some simplified models are employed.

Pinhole camera model

pinhole camera

This is the simplest model commonly used, where light passes through a small hole to form an image rotated by 180 degrees. However, the output data from the camera will flip the image upright. Six parameters are required to describe this model: pixel width and height \(w, h\), pixel focal lengths \(f_x, f_y\), and the pixel position of the principal point \(c_x, c_y\). Note that when scaling the pixel width and height, the pixel focal lengths and the principal point pixel position also scale accordingly, ensuring the image position remains unchanged.

Opencv camera model

Some cameras may exhibit significant radial and tangential distortion. The OpenCV camera model extends the pinhole camera model by adding higher-order parameters to describe radial and tangential distortion. Radial distortion is described using \(k_1, k_2, k_3, \cdots\), while tangential distortion is described using \(p_1, p_2\).

Note

Some trackers do not support the OpenCV camera model.

Opencv fisheye camera model

The fisheye camera compresses a large field of view into a smaller imaging area through perspective projection. The OpenCV fisheye camera model does not include distortion correction. It extends the 6 parameters of the pinhole camera model by using \(k_1, k_2, k_3, k_4, \cdots\) for description.

Note

Some trackers do not support the OpenCV fisheye camera model.

fisheye camera

Camera orientation and image orientation

On mobile phones, when held horizontally (rotated 90 degrees counterclockwise from the normal vertical position) and the screen display orientation is also horizontal, the image output by the rear camera will match the real scene when displayed on the screen. Changing only the screen display orientation without altering the physical screen direction does not change the orientation of the image output by the physical camera. When held vertically in the normal position and the screen display orientation is also vertical, the image output by the rear camera needs to be rotated 90 degrees clockwise before being displayed on the screen to match the real scene. When the screen display orientation rotates, the rendered camera image requires a reverse rotation compensation to match the real scene.

Camera orientation and image orientation are usually defined relative to the device's natural orientation:

  • Mobile phones

    • Android

      Android defines a natural orientation, which refers to the normal vertical holding position of the phone. The inertial measurement unit (IMU) also uses this orientation as a reference. The rotation angle of the camera output image relative to this orientation can be obtained as a camera parameter.

    • iOS

      On iOS, although the natural orientation is not explicitly defined, the inertial measurement unit uses the same reference as Android.

  • Tablets

    The natural orientation of tablets is sometimes the horizontal holding position, while others are the same as mobile phones, the normal vertical holding position.

  • Glasses

    The natural orientation of glasses is usually the horizontal holding position.

When rendering the camera image, the camera orientation and screen orientation are combined for rendering.

Camera types and camera flipping

Mobile phones generally have rear cameras and front cameras. The image output from the front camera needs to be flipped horizontally before being displayed on the screen to simulate a mirror. If not flipped horizontally, it will look very unnatural.

Input extension

EasyAR supports input extension through custom camera methods. Custom cameras can support obtaining input frames from external sources and transmitting them to the AR system for use by trackers. Custom cameras can be implemented by you to acquire image data.

Platform-specific guides

The use of cameras and input extensions is closely related to the platform. Please refer to the following guides for development based on your target platform: