Input frame data requirements for external frame data sources
To ensure the proper functioning of external frame data sources, the most critical yet challenging task is guaranteeing data accuracy. This document outlines the input frame data requirements for external frame data sources.
Before you begin
- Understand fundamental concepts like cameras, input frames.
- Grasp the basic concepts and common types of external frame data sources.
Input frame data types
In Unity, external frame data sources typically require receiving different data at two distinct times. Based on when the external data is input and its characteristics, we categorize these two sets of data as:
- Camera frame data
- Rendering frame data
Different types of external frame data sources have varying requirements for these two data sets:
- Image and device motion data input extension: Requires both camera frame data and rendering frame data.
- Image input extension: Only requires camera frame data.
Camera frame data
Data requirements:
- Timestamp
- Raw physical camera image data
- Intrinsics (including image size, focal length, principal point. Distortion model and parameters are also needed if distortion exists)
- Extrinsics (Tcw or Twc, calibrated matrix expressing the physical offset of the physical camera relative to the device/head pose origin)
- Tracking status
- Device pose
Data timing:
- Midpoint of the physical camera exposure.
Data usage:
- API call timing: Can vary based on external code design. A common approach used by most devices is to query during the 3D engine's render update, then decide whether to proceed with further data processing based on the device data's timestamp.
- API call thread: 3D engine's game thread or any other thread (if all external APIs used are thread-safe).
Unity API call examples are as follows:
void TryInputCameraFrameData()
{
double timestamp;
if (timestamp == curTimestamp) { return; }
curTimestamp = timestamp;
PixelFormat format;
Vector2Int size;
Vector2Int pixelSize;
int bufferSize;
var bufferO = TryAcquireBuffer(bufferSize);
if (bufferO.OnNone) { return; }
var buffer = bufferO.Value;
IntPtr imageData;
buffer.tryCopyFrom(imageData, 0, 0, bufferSize);
var historicalHeadPose = new Pose();
MotionTrackingStatus trackingStatus = (MotionTrackingStatus)(-1);
using (buffer)
using (var image = Image.create(buffer, format, size.x, size.y, pixelSize.x, pixelSize.y))
{
HandleCameraFrameData(deviceCamera, timestamp, image, cameraParameters, historicalHeadPose, trackingStatus);
}
}
Rendering frame data
Data requirements:
- Timestamp
- Tracking status
- Device pose
Data timing:
- Display time. TimeWarp is not accounted for. The device pose data for the same moment will be used by the external system (e.g., device SDK) to set the virtual camera's transform for rendering the current frame.
Note
TimeWarp (sometimes called Reprojection or ATW/PTW) is a common latency-reduction technique in VR/AR headsets. It warps the image after rendering is complete, based on the latest head pose, to compensate for head movement during rendering. EasyAR requires the pose data corresponding to the moment when the virtual camera is set at the start of rendering, not the actual display time after TimeWarp.
Data usage:
- API call timing: Every render frame of the 3D engine.
- API call thread: 3D engine's game thread.
Unity API call example as follows:
private void InputRenderFrameMotionData()
{
double timestamp = 0e-9;
var headPose = new Pose();
MotionTrackingStatus trackingStatus = (MotionTrackingStatus)(-1);
HandleRenderFrameData(timestamp, headPose, trackingStatus);
}
Data requirement details
Physical camera image data:
- Image coordinate system: Data acquired when the sensor is level should also be level. Data should be stored with the top-left corner as the origin, row-major order. Images should not be flipped or inverted.
- Image FPS: Normal 30 or 60 fps data is acceptable. If high fps has a special impact, the minimum acceptable frame rate for reasonable algorithm performance is 2. Using an fps higher than 2 is recommended; typically, the raw data frame rate is sufficient.
- Image size: For better calculation results, the maximum side should be 960 or larger. Performing time-consuming image scaling in the data pipeline is discouraged; use raw data directly unless copying full-size data takes unacceptably long. Image resolution must not be smaller than 640x480.
- Pixel format: Prioritizing tracking performance while considering overall performance, the typical order of preference is YUV > RGB > RGBA > Gray (Y channel from YUV). When using YUV data, a complete data definition is required, including data packing and padding details. Using color images generally yields better Mega results compared to single-channel images, but other features are less affected.
- Data access: Data pointer or equivalent implementation. It's best to eliminate all possible unnecessary copies in the data pipeline. In
HandleRenderFrameData, EasyAR makes a copy of the data for asynchronous use; the image data is no longer used after this synchronous call completes. Pay attention to data ownership.
Timestamps:
- All timestamps must be clock-synchronized, preferably hardware-synchronized. The unit of data is seconds, but precision should reach nanoseconds or as high as possible.
Tracking state:
- The tracking state is defined by the device and needs to include a state for tracking loss (VIO unavailable). More levels are better if available.
Device pose:
- All poses (including the transform of the virtual camera in the 3D engine) should use the same origin.
- All poses and extrinsic parameters should use the same coordinate system.
- In Unity, the coordinate system type for pose data should be either the Unity coordinate system or the EasyAR coordinate system. If the input extension is implemented by EasyAR and uses other coordinate system definitions, a clear coordinate system definition must be provided, or a method for converting to the Unity or EasyAR coordinate system should be given.
- In Unity, if using the Unity XR framework, only compatibility with
<xref:Unity.XR.CoreUtils.XROrigin.TrackingOriginMode.Device?displayProperty=nameWithType>mode is required.
Intrinsic parameters:
- All values must match the image data. Scaling of intrinsic parameters should be performed before inputting to EasyAR if necessary.
- If the input extension is implemented by EasyAR, it should be specified whether the intrinsic parameters change per frame (indicating whether the corresponding API should be called once or every frame).
Extrinsic parameters:
- Real data must be provided on head-mounted displays.
- It is a calibration matrix expressing the physical offset of the physical camera relative to the device/head pose origin. If the device's pose and the physical camera pose are the same, it should be an identity matrix.
- For Apple Vision Pro, the corresponding interface is: CameraFrame.Sample.Parameters.extrinsics. Note that its data definition differs from the required interface data; EasyAR internally converts it before use.
- In Unity, the coordinate system type for extrinsic parameters should be either the Unity coordinate system or the EasyAR coordinate system. If the input extension is implemented by EasyAR and uses other coordinate system definitions, a clear coordinate system definition must be provided, or a method for converting to the Unity or EasyAR coordinate system should be given.
- In head-mounted devices, multiple coordinate systems with different definitions usually exist. These differences may include origin, orientation, left/right-handed expression, etc. Extrinsic parameters should be calculated within the same coordinate system. The interface data requires coordinate transformation within the same coordinate system, not a transformation matrix between two differently defined coordinate systems.
Performance:
- Data should be provided with optimal efficiency. In most implementations, API calls occur during the rendering process. Therefore, it is recommended not to block API calls even if underlying time-consuming operations are required, or to use these APIs in a reasonable manner.
- If the input extension is implemented by EasyAR, all time-consuming API calls must be documented.
Multi-camera:
- Data from at least one camera is required. This camera can be an RGB camera, a VST camera, a positioning camera, or any other type. On a head-mounted display, if only one camera's data is input, it is generally recommended to use an RGB or VST camera located centrally or near the eye.
- Using multiple cameras can enhance the performance of EasyAR algorithms. Camera frame data from all available cameras for a specific moment should be input simultaneously at the same point in time.
Multi-camera support is not yet fully implemented; contact EasyAR for more details.
Next steps
- Create an image and device motion data input extension
- Create an image input extension
- Create a headset extension package
Related topics
- EasyAR coordinate system
- Image input extension sample: Workflow_FrameSource_ExternalImageStream