Learning Agile and Robust Omnidirectional Aerial Motion on Overactuated Tiltable-Quadrotors

Abstract

Tilt-rotor aerial robots enable omnidirectional maneuvering through thrust vectoring, but introduce significant control challenges due to strong coupling between joint and rotor dynamics. While model-based controllers achieve high accuracy under nominal conditions, their robustness degrades in the presence of disturbances and modeling uncertainties. We present a reinforcement learning (RL) framework for agile and robust omnidirectional aerial motion control on an overactuated tiltable quadrotor. A single learned policy outputs coordinated joint position commands and rotor thrust commands to reach arbitrary target poses in the SE(3)space. To achieve reliable sim-to-real transfer while preserving accuracy, we integrate system identification with minimal, physically consistent domain randomization. Compared with a state-of-the-art NMPC controller, the proposed method achieves comparable 6-DoF pose tracking accuracy while demonstrating superior robustness, agility, and 30× faster inference across diverse tasks, enabling zero-shot deployment on real hardware.

Robot Platform

The Beetle tiltable quadrotor is equipped with four single-DOF tilt joints (DYNAMIXEL XC330 servomotors) and four T-Motor AT2814 brushless rotors. By actively rotating the joint angles, the thrust direction of each rotor can be independently redirected, enabling full 6-DoF actuation in 3D space.

Method

Reinforcement Learning Framework

We formulate the control problem as a Partially Observable MDP (POMDP) and train an actor-critic policy using PPO (Proximal Policy Optimization) in Isaac Sim / Isaac Lab.

Action space (8-dim): joint position targets $\boldsymbol{q}^{\ast}_j \in \mathbb{R}^4$ + rotor thrust targets $\boldsymbol{f}^{\ast}_r \in \mathbb{R}^4$
Observation (33-dim): body velocities, relative target pose, current orientation, joint positions, previous actions
Asymmetric actor-critic: the critic receives privileged rotor thrust and torque signals during training only

RL framework overview

System Identification for Sim-to-Real

A key contribution is physically consistent simulation via system identification:

Module	Model	Identified From
Rotor thrust & torque	Polynomial map: $(f, \tau) = g(\text{cmd}, V)$	6-axis F/T sensor measurements
Joint motor	2nd-order transfer function $\frac{\omega_n^2}{s^2+2\zeta\omega_n s+\omega_n^2}$	Sin Wave-response experiments
System latency	Communication + estimation delays (3–5 ms)	Timing measurements

Rather than relying on heavy domain randomization, we randomize only body mass, inertia, and joint offsets, preserving model fidelity while improving robustness.

Experiments

All policies are deployed zero-shot on the real hardware without any fine-tuning. Comparisons are made against an NMPC controller implemented following [Li et al., 2024].

Waypoints Hovering

The robot sequentially reaches 5 predefined 6-DoF target poses (repeated 3 times). The RL policy achieves comparable position accuracy and better orientation accuracy. State trajectories show RL reaches higher peak velocities (> 2 m/s linear, ~5 rad/s angular), confirming more agile transient behavior.

Disturbance Rejection

The robot hovers stably while subjected to (1) continuous wind from a fan and (2) impulsive stick pushes.

Wind: RL shows lower orientation fluctuation (std 0.63° vs 1.32°) with tighter velocity response.
Stick push: RL recovers to target pose faster with sharper, shorter velocity peaks.

Payload Adaptation

The robot maintains stable hovering under additional payloads up to 0.25 kg × 4 without re-tuning. Increasing payload primarily increases vertical ($z$-axis) position offset while orientation accuracy remains stable — consistent for both RL and NMPC.

Trajectory Tracking

Although trained only for pose reaching, the RL policy generalizes to tracking a 3D lemniscate trajectory with simultaneous roll, pitch, and yaw modulation — zero additional training required.