Lane Meshes or Textures?

Fri Nov 24 202391 views

A recent paper by Inverted AI researchers poses an interesting business question: lane meshes or textures?

The following picture sums it up:

Five by two panel of lane mesh versus texture images for pedestrian prediction.

First, many of our customers are AV/ADAS developers and many of those rely upon structured map representations to represent driveable surfaces. This mostly makes sense. Drivable affordances are generally fairly well represented and covered by road maps. Of course mining and military AV applications can't rely upon structured road map representations, but these markets are relatively small, so this inconvenience is often glossed over.

Our first DRIVE and INITIALIZE releases used such a structured map representation exclusively. And with these tools our users have experienced substantially improved simulation in terms of vehicle NPC realism, reactivity, and diversity.

A more interesting issue is that road maps are generally terrible at representing walkability affordances. This issue can't be so easily glossed over, particularly in settings where AV/ADAS systems interact with pedestrians and other actors such as cities, suburbs and, well, basically everywhere except for highways.

Our recent paper addressed this issue head-on. Our ITRA foundation model of behavior is structured so that each agent "sees" (in the literal perception sense) a local, differentiably rendered, environment around itself. This means that we can easily swap out our structured map representation (which is just converted into lane mesh and rendered before perception anyway) with a visual, texture representation. A top-down view of the real world.

We've done this and it works, both for DRIVE and INITIALIZE. The most significant upshot is that we can initialize pedestrians and "walk" pedestrians in and amongst vehicles with significantly enhanced realism and with all the realistic, diverse, and reactive interactions that occur between. It is convenient that we will soon release a version of DRIVE that works anywhere in the world now too, given just overhead imagery.

Now to the business (and community) question: which representation should we prioritize? Maps to accommodate current usage or textures to spearhead innovation? Extending maps to represent walkable areas is expensive and prone to semantic errors and leads to a poorly modeled pedestrian trap. Textures remain problematic in simulation as designers rarely put serious effort into the tops of assets. This opens a sim-to-real gap that is problematic for NPCs that perceive in texture space.

Our answer, coming soon, is to support both. And to do better when both representations are provided. Stay tuned for corresponding upcoming DRIVE and INITIALIZE API releases.