Under Review 2026

Learning Agile and Robust Omnidirectional Aerial Motion on Overactuated Tiltable-Quadrotors

Wentao Zhang1, Zhaoqi Ma1, Jinjie Li1, Huayi Wang2, Haokun Liu1, Junichiro Sugihara1, Chen Chen1, Yicheng Chen1, Moju Zhao1†

1The University of Tokyo  ·  2Shanghai Jiao Tong University    Corresponding Author

Our RL policy enables agile and robust full 6-DoF flight across diverse real-world scenarios: (a) waypoints hovering, (b) trajectory tracking, (c) external force recovery, (d) wind disturbance rejection, and (e) payload variation.
Our RL policy enables agile and robust full 6-DoF flight across diverse real-world scenarios: (a) waypoints hovering, (b) trajectory tracking, (c) external force recovery, (d) wind disturbance rejection, and (e) payload variation.

Abstract

Tilt-rotor aerial robots enable omnidirectional maneuvering through thrust vectoring, but introduce significant control challenges due to strong coupling between joint and rotor dynamics. While model-based controllers achieve high accuracy under nominal conditions, their robustness degrades in the presence of disturbances and modeling uncertainties. We present a reinforcement learning (RL) framework for agile and robust omnidirectional aerial motion control on an overactuated tiltable quadrotor. A single learned policy outputs coordinated joint position commands and rotor thrust commands to reach arbitrary target poses in the SE(3)space. To achieve reliable sim-to-real transfer while preserving accuracy, we integrate system identification with minimal, physically consistent domain randomization. Compared with a state-of-the-art NMPC controller, the proposed method achieves comparable 6-DoF pose tracking accuracy while demonstrating superior robustness, agility, and 30× faster inference across diverse tasks, enabling zero-shot deployment on real hardware.


Robot Platform

The Beetle tiltable quadrotor is equipped with four single-DOF tilt joints (DYNAMIXEL XC330 servomotors) and four T-Motor AT2814 brushless rotors. By actively rotating the joint angles, the thrust direction of each rotor can be independently redirected, enabling full 6-DoF actuation in 3D space.


Method

Reinforcement Learning Framework

We formulate the control problem as a Partially Observable MDP (POMDP) and train an actor-critic policy using PPO (Proximal Policy Optimization) in Isaac Sim / Isaac Lab.

RL framework overview

System Identification for Sim-to-Real

A key contribution is physically consistent simulation via system identification:

Module Model Identified From
Rotor thrust & torque Polynomial map: $(f, \tau) = g(\text{cmd}, V)$ 6-axis F/T sensor measurements
Joint motor 2nd-order transfer function $\frac{\omega_n^2}{s^2+2\zeta\omega_n s+\omega_n^2}$ Sin Wave-response experiments
System latency Communication + estimation delays (3–5 ms) Timing measurements

Rather than relying on heavy domain randomization, we randomize only body mass, inertia, and joint offsets, preserving model fidelity while improving robustness.


Experiments

All policies are deployed zero-shot on the real hardware without any fine-tuning. Comparisons are made against an NMPC controller implemented following [Li et al., 2024].

Waypoints Hovering

The robot sequentially reaches 5 predefined 6-DoF target poses (repeated 3 times). The RL policy achieves comparable position accuracy and better orientation accuracy. State trajectories show RL reaches higher peak velocities (> 2 m/s linear, ~5 rad/s angular), confirming more agile transient behavior.

Disturbance Rejection

The robot hovers stably while subjected to (1) continuous wind from a fan and (2) impulsive stick pushes.

Payload Adaptation

The robot maintains stable hovering under additional payloads up to 0.25 kg × 4 without re-tuning. Increasing payload primarily increases vertical ($z$-axis) position offset while orientation accuracy remains stable — consistent for both RL and NMPC.

Trajectory Tracking

Although trained only for pose reaching, the RL policy generalizes to tracking a 3D lemniscate trajectory with simultaneous roll, pitch, and yaw modulation — zero additional training required.


BibTeX

@article{zhang2026omnirl,
  title   = {Learning Agile and Robust Omnidirectional Aerial Motion
             on Overactuated Tiltable-Quadrotors},
  author  = {Zhang, Wentao and Ma, Zhaoqi and Li, Jinjie and Wang, Huayi
             and Liu, Haokun and Sugihara, Junichiro and Chen, Chen
             and Chen, Yicheng, Moju Zhao},
  journal = {arXiv preprint},
  year    = {2026}
}