Introduction to planar image tracking

Planar Image Tracking is used to detect and track textured planar objects in daily life. The so-called "planar" objects can be small items like a book, a business card, or a poster, or large targets like a graffiti wall. Such objects or scenes have flat surfaces and rich, non-repeating textures.

This article will outline the basic principles, expected effects, and platform adaptation solutions for planar image detection and tracking, helping you quickly understand the functional boundaries and key development points.

Basic principles

Understanding these principles helps developers optimize recognition performance and avoid common issues.

Core process

Loading and preprocessing stage:
- The system loads the target image, extracts a large number of visual feature points from it, generates a feature description of the image, and inserts it into the feature library.
- Images with richer textures are easier to identify and track. You can use the target image detection tool to check the recognizability of your target image in advance.
Reference image on the left: rich in texture and easy to recognize (5 stars); reference image on the right: simple elements, lack of texture, difficult to recognize (1 star).
We recommend using images with 4~5 star quality as your target images.
Real-time detection and tracking stage:
- After the camera captures the frame, the system analyzes the feature points of the current frame and performs feature matching with the feature library of the target image.
- The PnP (Perspective-n-Point) algorithm is used to calculate the pose (position + rotation) of the image in 3D space.
- Once the target is successfully detected, the system enters tracking mode. At this time, the system compares the frames and analyzes the motion between frames to achieve real-time tracking.
Optimization mechanism:
- Tracking loss recovery: The system automatically redetects the target after brief occlusion or fast motion blur.
- Multi-target simultaneous tracking: Control the concurrency of a single Tracker through the Simultaneous Number parameter, enabling one Tracker call to track multiple targets simultaneously.

Technical limitations

Only supports flat images (non-3D objects or dynamic content).
Relies on environmental lighting; overly dark or overexposed conditions may cause detection difficulties or tracking loss.
During detection, the camera cannot be too far from the target, ensuring the target image occupies at least 30% of the frame.
Multi-target tracking is limited by device performance. Typically, PCs can track over 10 targets simultaneously, while mobile devices can track 4–6 flat targets.

Effects and expected results

After understanding the working mechanism and technical limitations of image detection and tracking, you also need to have an understanding of the effects that this feature can achieve. Clarifying these effects will help you set reasonable testing standards during the development process.

Ideal effect

Accurate overlay: Virtual objects align with image edges.
Fast detection: Ultra-low latency from app loading to successful detection.
Stable tracking: Maintains tracking even when the image is rotated, moved, or partially occluded.

Non-ideal situations and countermeasures

Phenomenon	Cause	User perception	Solution preview (see details in later sections)
Unable to recognize	Insufficient image texture or too small	Virtual object does not appear	Optimize the target image, use tools to detect recognizability
Tracking jitter	Target occupies too small a portion of the frame, insufficient trackable points	Virtual object shakes noticeably	Avoid being too far from the target image, set tracking mode to `PreferQuality`
Frequent loss	Rapid movement or complete occlusion of the image	Virtual object flickers/disappears	Stabilize the device/target image, or increase target size
Missing multi-image targets	Affected by hardware performance	Some target images cannot be tracked	Balance performance by setting a reasonable `Simultaneous Number` parameter

Expected result verification method

Development phase: Use a PC camera to preview through Unity editor Play mode.
Testing phase: Use the official Sample scene or self-built test images to cover different lighting/angle/distance conditions.

Best practices for target images

The effectiveness of plane image tracking heavily depends on the quality of the target image. To ensure recognition success, it is recommended to follow the guidelines below when preparing target images.

Depending on the usage scenario, you can prepare target images in various ways: directly photographing the target object from a frontal angle with a camera, or designing the pattern first and then printing it. Whether it's a photo or a design draft, both can serve as template images.

Basic requirements

Image format: JPG or PNG is recommended.
Transparency handling: If the image has a transparent channel, the system will process it with a white background. To avoid unexpected results, please avoid using transparent channels.

Core optimization points

Ensure rich texture details
The template image should have sufficient details and edge variations, avoiding solid colors or simple graphics.

Reference left: images with rich textures can be detected; reference right: solid-color images cannot be detected
Avoid repetitive patterns
Patterns with regular repetition (such as checkerboards or stripes) reduce the uniqueness of feature points.

Reference: repetitive pattern images cannot be tracked
Fill the frame with content
The subject should occupy the entire frame as much as possible, minimizing blank areas.

Reference: the left image with a full subject is easier to detect and track than the right image with excessive blank space
Control the aspect ratio
The image should not be overly elongated, with the shorter side being at least 20% of the longer side.

Reference: elongated images are difficult to track
Choose an appropriate resolution
Recommended range: between SQCIF(128×96) and QVGA(1280×960).
Too small: insufficient feature points, leading to reduced recognition rates.
Too large: unnecessary increase in memory overhead and computation time when generating Target data.

Table of Contents