Winning the Waymo Sim Agents Challenge
Fri Aug 09 2024379 viewsWe love open science and leaderboards. Congratulations to Wei Wu, Xiaoxin Feng, Ziyan Gao, Yuheng Kan of SenseTime Research and Tsinghua University, the "winners" of the 2024 Waymo Sim Agents Challenge.
We at inverted.ai would love to have participated, but, Waymo's license terms continue to be a problem. Here's the critical bit of the Sim Agent Challenge license terms:
License to Submissions and Technical Reports: Upon participating in the WOD Challenges, submitting anything to the leaderboard, or submitting (or being a member of a Team whose Team Leader (defined below) submits) anything to the leaderboard, You grant to Sponsor a perpetual, irrevocable, royalty-free, worldwide, nonexclusive license under Your intellectual property rights to make, use, import, publish, reproduce, display, perform, distribute, adapt, edit, modify, translate, and create derivative works based upon Your Submission (including any docker image and supporting code, e.g., to enable automated evaluation) and Technical Report, or any portion thereof (including Your name and likeness as shown and conveyed in the Submission or Technical Report), and any works, products and services that incorporate the foregoing or combine the foregoing with other Submissions and Technical Reports, or portions thereof, in any manner, in connection with the WOD Challenges and for other advertising, marketing, promotional, commercial, and business, and educational purposes. For the avoidance of doubt, the license above includes the right for Sponsor to sell, offer for sale, or sublicense Sponsor's works, products or services, even though such works, products, or services may combine, incorporate, or otherwise process Your Submission, Technical Report, and the intellectual property rights therein, in connection with the activities above.
These make it suicidal for companies like ours to participate (and not ideal for academic teams hoping to commercialize their work). This means people can be pretty strongly misled about the current state of AV sim agent progress and inefficiently misallocate resources as a result.
In various discussions we have been asked to show how the models behind our DRIVE API stack up according to the Sim Agent Challenge metrics. So, to avoid license issues, we reimplemented the metrics and even the then winning SMART model as faithfully as possible ourselves.
Here is what the leader board would have looked like if we had participated:
Method | Realism Meta Metric | Kinematic Metrics | Interactive Metrics | Map-based Metrics | minADE |
inverted.ai | 0.7687 (0.80) | 0.6029 | 0.8461 | 0.7638 | 1.4678 |
Fdriver-tint | 0.7584 | 0.4614 | 0.8069 | 0.8658 | 1.4475 |
SMART-large | 0.7564 | 0.4769 | 0.7986 | 0.8618 | 1.5501 |
SMART | 0.7511 | 0.4445 | 0.8050 | 0.8571 | 1.5447 |
To be transparent this isn't an apples to apples comparison. Here we ran our own implementation of the Waymo metrics and evaluated using our own data. Our implementation of said metrics is as faithful as possible but could have bugs. It also differs intentionally in one way - computing distances to centerline rather than road edges. In this table our numbers are on our data. Others are on Waymo data. License issues all over the place.
It must be noted here that our data is substantially more diverse than the Waymo data, intentionally sourced from all around the world. If we restrict ourselves to North America data for Realism Meta Metric evaluation our score actually goes up to 0.80!
So, did we win or not?
After implementing SMART we did do apples to apples on our data and we see pretty much the same trend with raw values fairly well calibrated to evaluations on Waymo data.
Model | Realism Meta Metric | Kinematic Metric | Interactive Metric | Map-based Metric | Collision Rate |
inverted.ai | 0.80 | 0.64 | 0.87 | 0.81 | 0.003 |
SMART | 0.76 | 0.58 | 0.79 | 0.83 | 0.098 |
So it looks like we handily would have won, and, notable here, with a collision rate that is a order of magnitude better. Of course our implementation of SMART might not be perfect, but, we tried hard to get it right. It is filled with ideas that we've tried, considered, and gone beyond in our own work. But, if it worked crazily well we would have happily switched over. It didn't. Maybe someday.
Here are a couple example gifs on CARLA maps to compare:
Academic users get free access to our models through our API and can participate in a grant program for access to large numbers of calls. Signup and a small amount of usage is free for commercial users, otherwise terms are simple and cost efficient. Be in touch if you want your simulation environments to actually work for you.