AI: Training data to get more training data

Ensign! Prepare the Venn diagram! On the left: “virtuous circle.” On the right: “ML training data.” And put Tesla in the middle.

We need more Venn diagrams right meow

In all honesty, I thought there’d be funny Venn diagrams easily available to fill the blank above after I wrote the intro sentence. Turns out there’s probably a Venn diagram in the works with the words “funny” and “Venn diagram” in two completely isolated circles.

OK, some terminology:

Webster’s dictionary describes (yes, I’m doing that thing!) a virtuous circle as “a chain of events in which one desirable occurrence leads to another which further promotes the first occurrence and so on resulting in a continuous process of improvement.” In business land it’s often mixed with a flywheel metaphor, often tied to Amazon’s practices.

ML training data, on the other hand, is a core concept in machine learning. These algorithms generally work best with lots of examples to build from, so the more examples you have, the more likely you are to be successful. Most ML courses have sections on how to boost your training data set by making small changes to existing data, like flipping a picture horizontally, etc.

Now the diagram reveal, courtesy of  The Batch:

Engineers understood that AutoPilot was having trouble recognizing occluded stop signs because, among other things, the bounding boxes around them flickered. Using images from the existing dataset, they trained a model to detect occluded stop signs. They sent this model to the fleet with instructions to send back similar images. This gave them tens of thousands of new examples.

Obviously it helps to have a fleet of networked cars out there that can roam the earth, but this particular feedback loop has haunted my imagination for a week or so now, not so much from a “good going, Tesla” view as much as the implications of mobile networked cameras and incentives. What happens when this gets linked into navigation software? Will minor deviations to optimize for data collection start to be a thing? Will Tesla owners start a side hustle running Uber Data runs? How fast could maps, street view, etc shift if it turned out Waze was only the bottom left edge of the hockey stick?