If you read my last post on drone technology, I ended it with a promise — I was going to stop theorizing and start building. This is that post.

The goal is an end-to-end autonomous drone test environment where an LLM controls real drone flight software through natural language. Not a toy demo — a full simulation stack that can scale to fleet operations, computer vision, and eventually real hardware. The target applications are infrastructure inspection, wildfire detection, and search and rescue. They all share a common pattern — autonomous search over a defined area with adaptive re-tasking based on what the drone finds.

Architecture

I landed on four layers, each with a clear responsibility:

Layer Role Technology Runs On
Operator Natural language commands Claude Code + MCP client MacBook (local)
Orchestration Mission control, drone API droneserver (MCP server) + MAVSDK AWS EC2
Flight Controller Autopilot firmware PX4 SITL (Software In The Loop) AWS EC2 (Docker)
Physics Simulation Real-world physics Gazebo Harmonic AWS EC2 (Docker)

Communication flows down: Claude speaks MCP (Model Context Protocol) over HTTP/SSE to the orchestration layer. The orchestration layer speaks MAVLink (the standard drone communication protocol) over UDP to PX4. PX4 talks to Gazebo over a shared memory bridge for physics simulation. Each layer is independently replaceable — that’s the key. If I want to swap Gazebo for a higher fidelity sim later, the layers above don’t care.

What Gazebo Actually Does

This one took me a bit to wrap my head around. Gazebo is the physics engine. While PX4 runs the actual flight controller firmware (the same code that runs on real drone hardware), Gazebo simulates the physical world that firmware thinks it’s operating in. That means:

  • Gravity, thrust, and aerodynamics — the drone has mass, motors produce force, wind exists
  • Sensors — accelerometers, gyroscopes, barometers, GPS, magnetometers all produce simulated data with realistic noise
  • Environment — a 3D world the drone exists in (for now a default flat world — later, detailed terrain)

PX4 and Gazebo communicate through gz_bridge — PX4 sends motor commands, Gazebo computes the physics and sends back sensor data. PX4 doesn’t know it’s in a simulation. That’s the whole point of SITL: the firmware is identical to what flies on real hardware. So when this thing eventually controls a physical drone, the software is already proven.

Phase 1: First LLM-Controlled Flight (Feb 19)

The Stack

The foundation is droneserver, an MCP server built by Peter Burke at UC Irvine that bridges LLM tool calls to MAVLink drone commands via MAVSDK Python. I forked it and started extending it. The full path for a single command looks like this:

Claude Code (MacBook)
  → SSH tunnel (port 8080)
    → droneserver MCP server (EC2)
      → MAVSDK Python
        → MAVLink UDP (port 14540)
          → PX4 SITL autopilot (EC2)
            → Gazebo physics sim (EC2)

That’s a lot of layers for “make the drone go up.” But each one exists for a reason and they’ll matter a lot more when this scales to multiple drones with vision.

Docker Attempt #1: Failed

My first attempt was to containerize PX4 in Docker on EC2. It failed. MAVLink heartbeats — the UDP packets PX4 sends every second to announce “I’m alive” — were going to Docker’s default gateway instead of reaching droneserver on the host. Without heartbeats, MAVSDK never connects. I spent an hour debugging application-level code before a raw UDP socket test (socket.recvfrom) on the host instantly confirmed the real problem: no packets arriving on port 14540. The network layer was broken. Lesson learned — test the lowest layer first.

Pivot to Native Install

I abandoned Docker temporarily and installed PX4 natively on a t3.large EC2 instance (Ubuntu 24.04, 2 vCPU, 8GB RAM, 50GB disk). PX4’s ubuntu.sh setup script installed all dependencies. Built from source. Took about 40 minutes, but it worked immediately — MAVLink heartbeats flowing, MAVSDK connected, droneserver came online.

PX4 SITL booting natively on EC2 — Gazebo world loading, x500 drone model spawning

The Flight

With PX4 running natively, droneserver started up and connected — MAVSDK found the drone on UDP port 14540, GPS lock acquired, ready for commands.

droneserver MCP server starting on EC2 — Uvicorn on port 8080, initializing the MAVLink connection

droneserver connected — "SUCCESS: Connected to drone at :14540!", GPS lock acquired, drone READY for commands

I connected Claude via SSH tunnel and it immediately discovered the full MCP toolset — 41 drone control tools spanning flight basics, navigation, missions, telemetry, status, control, and safety. Seeing all those tools populate in the terminal was a moment. This thing has a lot of capability.

Claude Code discovering 41+ MCP drone control tools — flight, navigation, missions, telemetry, and safety

First command: “get the drone’s position.” The drone reported its location on the ground at the default PX4 spawn point near Zurich, Switzerland — altitude ~0m, sitting on the ground, ready to fly.

First position check — drone on the ground at 47.39797°N, 8.54616°E, altitude 0.17m

Then the real test: “Arm the drone, take off to 20 meters, then get the position again.” Claude called arm_drone (success), then takeoff with altitude 20 (takeoff complete at 19.7m AGL), then get_position — hovering at 19.95m relative altitude. Armed, airborne, and holding position.

Arm, takeoff to 20m, position confirmed — the full flight sequence through MCP tool calls

Post-takeoff position — hovering at 19.95m, "armed, airborne, and hovering"

The split-screen view tells the whole story. On the left: droneserver logs showing altitude climbing second by second (3.6m → 5.0m → 6.4m → … → 19.7m, “Takeoff complete”). On the right: Claude Code receiving the MCP responses and interpreting the telemetry. Two systems talking to each other, one of them an LLM.

Split screen — droneserver altitude logs on the left, Claude Code MCP responses on the right

Split screen — PX4 boot and flight logs on the left, Claude's position readout on the right

That was the Phase 1 milestone: first LLM-controlled drone flight. Natural language in, drone commands out, real telemetry back. It works.

Phase 1.5: Docker Containerization (Feb 20)

Native installs work but they don’t scale. If I want multiple drones, I need each one in its own container. Docker gives me reproducible builds, easy scaling (one container per drone for fleet operations), and clean teardown. But I had to go back and solve that networking problem from attempt #1.

The MAVLink Networking Fix

The root cause was straightforward once I understood it: PX4 broadcasts MAVLink UDP to localhost by default. In a Docker bridge network, localhost is the container itself — not the host, not other containers. So the heartbeats were going nowhere. The fix:

  1. Put PX4 and droneserver in the same Docker bridge network (drone-net)
  2. PX4’s entrypoint script resolves droneserver hostname via Docker DNS (getent hosts droneserver172.18.0.3)
  3. sed patches PX4’s MAVLink startup config to inject -t 172.18.0.3, directing heartbeats to the droneserver container specifically
  4. droneserver listens with MAVSDK in listen mode (udp://:14540) — accepting incoming heartbeats rather than connecting outbound

This required patching droneserver’s MAVSDK connection logic. The original code raised an error on empty addresses. My fork now treats 0.0.0.0 or empty as “listen mode” — sit on the port and wait for PX4 to send heartbeats to you.

Building the PX4 Docker Image: Six Iterations

I built a custom PX4 SITL Docker image from scratch (no third-party images) which meant compiling PX4-Autopilot v1.16.1 from source inside a container. This took six iterations to get right. Each build on a t3.large takes about 30 minutes so this was a long day of waiting:

  1. Missing OpenCV — PX4’s cmake couldn’t find OpenCV. Added libopencv-dev.
  2. Missing cppzmq — Another missing library. I abandoned hand-picking dependencies and switched to PX4’s own Tools/setup/ubuntu.sh --no-nuttx script, which installs everything reliably.
  3. Missing git tags — Shallow clone didn’t include tags that PX4’s build system needs for version headers. Added git fetch --tags --depth 1.
  4. Build command hungDONT_RUN=1 make px4_sitl gz_x500 launched the full simulation anyway despite the flag. The gz_x500 target includes simulation launch regardless.
  5. DDS submodule missing — Tried cmake --build directly to avoid the simulation launch, but PX4’s make targets handle submodule initialization that raw cmake doesn’t.
  6. Successmake px4_sitl_default builds only the firmware without launching any simulation. 5.64GB image.

Gazebo Environment: Three More Iterations

With the image built, the entrypoint had to launch Gazebo and PX4 correctly. PX4 and Gazebo kept disagreeing on the drone specification — this was a recurring theme:

  1. Model not found — Gazebo couldn’t find the x500 drone model. Needed GZ_SIM_RESOURCE_PATH pointing to PX4’s model directories.
  2. Sensors missing — Gazebo loaded the model but PX4 reported no accelerometer, gyroscope, barometer, or compass data. The Gazebo system plugins path wasn’t set, so the physics plugins weren’t loaded.
  3. Unbound variable crash — PX4 generates a gz_env.sh script during build that sets all the correct Gazebo paths. Sourcing it in the entrypoint was the right call, but it appends to $GZ_SIM_RESOURCE_PATH which didn’t exist yet, and bash’s strict mode killed the script. Fixed by initializing the variables to empty strings before sourcing.

The Final Docker Stack

┌─────────────────────────────────────────────┐
│  EC2 (t3.large, Ubuntu 24.04)               │
│                                              │
│  ┌──────────────┐    ┌───────────────────┐  │
│  │  px4-sitl-0   │───▶│  droneserver-0    │  │
│  │  (PX4 + Gz)   │UDP │  (MCP server)     │  │
│  │  Port 14580   │14540│  Port 8080        │  │
│  └──────────────┘    └───────────────────┘  │
│         drone-net (bridge)                   │
│                         │ :8080              │
└─────────────────────────┼────────────────────┘
                          │ SSH tunnel
                    ┌─────┴─────┐
                    │  Operator  │
                    │  (MacBook) │
                    └───────────┘

docker compose up -d and both containers start. PX4 boots, Gazebo loads the x500 drone model with full sensor simulation, MAVLink heartbeats flow across the Docker bridge to droneserver, MAVSDK connects, GPS locks, and the drone reports ready for commands. The whole thing is reproducible and teardown is one command.

Lessons Learned

Don’t containerize what you don’t understand. The Docker attempt in Phase 1 failed because I didn’t understand MAVLink’s UDP networking behavior inside bridge networks. Once the native install proved the full stack worked, containerizing was a matter of reproducing a known-good configuration — not debugging two unknowns at once.

Test the lowest layer first. A raw UDP socket test instantly diagnosed the networking failure after an hour of debugging higher-level code. Always isolate the problem layer before looking higher in the stack.

Use the project’s own setup tools. Hand-picking PX4’s C++ dependencies was a losing game. Their ubuntu.sh script handles dozens of packages and version-specific requirements. Same goes for Gazebo environment setup — PX4 generates gz_env.sh at build time with all the correct paths. Don’t reinvent what the project already provides.

What’s Next

The stack works. A drone flies on command. But it’s still one drone, no eyes, no intelligence. Here’s where this is going:

  • Phase 2: Vision — Add Cosys-AirSim (the maintained fork of Microsoft AirSim) on a GPU instance for camera simulation. Feed frames to Claude’s vision API for real-time scene analysis.
  • Phase 3: Fleet — Scale to multiple drones. Each drone is a PX4 + droneserver container pair. A fleet coordinator service handles multi-drone mission planning and deconfliction.
  • Phase 4: Intelligence — Adaptive re-tasking. Drone spots something interesting, autonomously adjusts its search pattern, coordinates with other drones to investigate.

The FAA’s proposed Part 108 rule for Beyond Visual Line of Sight (BVLOS) operations is expected to be finalized in 2026. The framework emphasizes autonomous operations with human oversight — which is exactly what this system is designed for. The regulatory window is opening at the same time the technology is becoming viable. I plan to be ready.

-Jake