Early leaks and reports on Uber weren't too long on the truth

Topic: 
Tags: 

With the story of the Uber fatality now behind us, I thought I would do a review of the various leaks and early releases that we saw about the incident, and how well they scored once the final NTSB report came out. The score is not at all good.

Read my report on Forbes.com at Early leaks and reports on Uber weren't too long on the truth

Comments

"Like other autonomous vehicle systems, Uber’s software has the ability to ignore “false positives,” or objects in its path that wouldn’t actually be a problem for the vehicle, such as a plastic bag floating over a road. In this case, Uber executives believe the company’s system was tuned so that it reacted less to such objects. But the tuning went too far, and the car didn’t react fast enough..." https://www.theinformation.com/articles/uber-finds-deadly-accident-likely-caused-by-software-set-to-ignore-objects-on-road

Seems entirely correct. They added a delay to braking in order to protect against false positives. Because of the delay, the car didn't react fast enough. A very short delay is probably a good idea. On several occasions as a human driver I've found myself perceiving an object in the road, moving my foot to the brake as a reaction, and then catching myself before fully slamming on the brakes after realizing it was something innocuous (like a plastic bag). The lack of such a delay to physically apply the brakes in a robocar, without explicitly coding one in, might be a problem. But a one second delay is way too much.

This was further compounded by emergency braking being disabled altogether, which was also reported. And this was particularly idiotic. If safety requires you to limit emergency braking, okay, but just saying "if you want to brake more than 0.7g, don't brake at all"? Very dumb.

The earliest reports were bad, but within a few weeks we knew pretty much what caused the crash. Inattentive safety driver, terrible classification, and intentional disabling of emergency braking.

The NTSB report gave some details about specifics. The one second delay was news to me, and I guess should be added to the list (inattentive safety driver, terrible classification, intentionally delayed braking, and intentional disablement of emergency braking). Overall I think the report was much too kind to Uber, though. There are lots of reasons that might be, and the allegations being untrue is only one of the possibilities. Between the New York Times and the NTSB, you seem to trust the NTSB a lot more. I'm not sure that's a good assumption.

The difference is that Herzberg was not treated as a "false positive" which was then ignored. Rather, the system treated any obstacle which called for emergency braking as a potential false positive. Not very well though, because if you are going to do that, you should at least be sounding an alarm for the safety driver.

The leak to the Information implied that some sort of specific false positive nullification applied to her, and it was just the general coding of the system. An example of what I would mean would be software that decides not to brake for plastic bags, or birds, or car exhaust (which does show up on LIDAR) or radar reflections from overhead signs. These are all specific false positives which you then make decisions to ignore.

The more correct description of what happened was that they did not do emergency braking beyond a certain threshold, and the reason for that was that they didn't want brake jabs from false positives of any kind.

I don't know anything about the leak except what was reported publicly. I was commenting on the quote from the article, which didn't say she was treated as a false positive. It said that the system was tuned to reduce false positives, and that tuning went too far, which is exactly what happened, referring to the one second braking delay.

The disablement of emergency braking beyond a certain threshold, which was also reported fairly early, I believe, was also true, but the part about tuning the system to reduce false positives was probably referring to the one second delay.

The decision to delay braking for one second will reduce false positives, as it gives the system additional data to determine if it's a bag, or bird, or car exhaust, or a person (although Uber's system apparently not having enough of a sense of object permanence would have complicated that). And it probably makes sense to have some delay for that reason. But probably more like 100-300 milliseconds, and not a whole second. (You probably shouldn't delay at all if it's something expected, but a pedestrian crossing in a seemingly random spot on a high speed road is probably unexpected enough to benefit from delaying for a small fraction of a second to double check.) Note that this is something that you probably want to do regardless of whether there's a safety driver, and you probably don't want to sound an alarm. But one second is almost surely too long. If you have to delay that long, your system is too crappy to be on the road (which, incidentally, was true of the Uber system).

Disabling emergency braking, on the other hand, was just plain stupid (and/or a conscious decision to trade away safety). I'm not sure there's any legitimate justification for that, and from what I've heard Uber no longer does that.

The problem wasn't that there was tuning to eliminate false positives (ie. to identify them as likely false) and that it went too far. The article says this was the case:

Uber’s software has the ability to ignore “false positives,” or objects in its path that wouldn’t actually be a problem for the vehicle, such as a plastic bag floating over a road. In this case, Uber executives believe the company’s system was tuned so that it reacted less to such objects. But the tuning went too far, and the car didn’t react fast enough, one of these people said.

At least as far as the NTSB report is concerned, there was no system attempting to classify Herzberg as a false positive. Instead she was misclassified as things which do not present a problem (like a vehicle or bicycle in another lane.)

The problem really resulted because the system did not track a trajectory for Herzberg. Had it tracked a trajectory, regardless of how it classified her, it would have noticed that something, no matter what it is, was on a collision course, and it would have done so with plenty of time to brake before its anti-brake-jab heuristic kicked in. The only reason the anti-brake-jab heuristic played a role was because it did not identify her as on an intersecting course until she was actually in the lane.

The anti-brake-jab system meant the car didn't brake for her at all, but only long after what should have happened. It is the proximate cause for the severity of the impact, but not for the impact itself. (In theory with full braking force at the moment of final detection, it might have only hit her gently, but that presumes instant decision making and brake application, which it doesn't have.)

It doesn't say tuning to "eliminate" false positives, and it doesn't say tuning to "identify false positives as likely false." It doesn't say she was "treated as a false positive." It doesn't say she was "classified as a false positive." You are adding words that aren't there.

According to NTSB: "When the system detects an emergency situation, it initiates action suppression. This is a one-second period during which the ADS suppresses planned braking while the (1) system verifies the nature of the detected hazard and calculates an alternative path, or (2) vehicle operator takes control of the vehicle. ATG stated that it implemented action suppression process due to the concerns of the developmental ADS identifying false alarms—detection of a hazardous situation when none exists—causing the vehicle to engage in unnecessary extreme maneuvers."

The system was tuned to ignore hazardous situations for one second in order to reduce false positives. The tuning went too far, and the car didn't react fast enough to avoid a crash.

Properly tracking the trajectory may have helped, though it'd be hard to do without a good classification system (to track trajectory you have to recognize the bulk of the object, and have some way to figure out that the object at time X is the same object that you recognize at time Y; without a good classification system that can be impossible, especially with a bicycle oriented perpendicular to the road). And in the end it might very well be the wrong approach; a good AI will be able to predict path better than a manually coded path prediction based on calculated trajectory, though again you have to recognize that there's a single object, in space and in time, in order to calculate trajectory or predict path, and a bicycle oriented perpendicular to the road can make that especially difficult.

Another problem was not slowing down for a non-moving vehicle/bicycle in the adjacent lane. You shouldn't do 40 past a stopped car in the travel lane, and it's illegal under Arizona law (and the laws of many states) to pass within 3 feet of a bicycle in an adjacent lane (https://www.azleg.gov/ars/28/00735.htm).

There are two kinds of ways you can deal with false positives. You can have algorithms to try to identify them (thus turning them into not positives) of various sorts. Those algorithms can make yes/no decisions or just affect scores which decide if something is a positive (ie. an obstacle to be avoided.) This is the sort of thing you "tune" in my vocabulary.

Uber had another, general approach, which is just to say that any positive, if it would trigger emergency braking, was to be left to the safety driver for one second. That is indeed a way to deal with false positives but it deals with true positives just as much, so I don't think it is probably classified as a false positive mitigation even though the reason you do that is because you are getting too many false positives. It is not a reasonable approach to deal with false positives by turning off positives!

I guess the amount of time you leave it to the safety driver and the amount of acceleration classified as emergency braking can be tuned. But only to a modest degree, and Herzberg didn't die because of this tuning or because she was a false positive or viewed as one at any time.

The section of the article I quote talks about the first (and more expected) type of false positive mitigation tuning, which turned out to not apply here. This is not an example of what that would normally mean, such as a system that examines a LIDAR point cloud and decides it is the reflections from a cloud of exhaust, or a blowing trash bag.

I think you've misread the section of the article you quoted. I don't think that "ignore 'false positives'" means to produce fewer false positives. I think "ignore 'false positives'" means to ignore certain positives because they are too likely to be false ones.

I also think you're misconstruing what Uber's approach was. It's not a reasonable approach to deal with false positives by turning off positives. But it is a reasonable approach to deal with false positives by briefly delaying your reaction to unexpected positives while you gather more data (in the NTSB quote, while the "system verifies the nature of the detected hazard and calculates an alternative path"). Instantly panicking and slamming on the brakes whenever you think you perceive something in your path isn't always the best solution. Sometimes it's better to stay calm and double-check that you are perceiving things correctly first.

And yes, the amount of time you delay reacting to the positives is something that can be tuned. And Uber tuned it way too long. 100-300 milliseconds (which would still beat humans) might have been acceptable, even in a system without a safety driver, especially if the delay were only done when unexpected positives appear. If your other systems are working, unexpected positives that require an instant response are almost always going to be false positives. (If your other systems suck, like Uber's, then maybe not. But then you shouldn't be testing on the road in the first place.)

Herzberg probably didn't die because of this tuning, because by the time the one-second delay kicked in, she was already so close that the other rule (the rule that did kill her; the rule that disabled emergency braking) kicked in.

I am not commenting so much on whether Uber's decision to delay action on emergency braking decisions due to surprise obstacles is a reasonable one. I think it would be more reasonable if they had triggered an audible alarm and a mild braking in those situations to trigger the safety driver to then do full braking. Hell, almost any nice car today works this way -- first the audible alarm of FCW, and a short time later, if the obstacle is still there and the driver does nothing, AEB. The Volvo had that and as discussed, that was turned off because it used the same radar frequency among other reasons.

I understand that the reason for this delay is to stop false positives from causing a poor quality ride. That is not the same as hitting an obstacle because you treated it like a false positive, which is what the leak implied. That's, at best, hitting an obstacle because you treat every positive of its class as a potential false positive.

My point is that the leaker, if they understood what the NTSB has reported, would not have said it as was reported. Either the leaker was wrong, or there is more there that the NTSB did not report on. The leak misled us, it did not point us in the right direction. The NTSB report, in fact, barely listed the AEB delay as a problem, saying it "added to the risk" and it was not one of the direct causes of the accident. (Of course, neither was the bad trajectory calculation, though that was more important.)

But my point stand, the leak talked about false positive mitigations of the sort that might arise in detecting blowing trash as contributing to the accident. They did not.

First of all, my understanding is that the one-second delay probably did not cause the crash. I say this because my understanding is that by the time the one-second delay kicked in, the crash was already imminent. So to the extent the article implied that this was the cause of the crash, it was probably wrong. I think at that stage in the investigation that the one-second delay was probably a good candidate for being a cause of the crash, but as it turned out, it was probably already too late by that point.

You say "the leak talked about false positive mitigations." If by "the leak" you're referring to something other than the article I quoted, then I don't know what you're talking about.

The article says, "Like other autonomous vehicle systems, Uber’s software has the ability to ignore “false positives,”". I don't see any reasonable way to interpret that other than that Uber's software, like other autonomous vehicle systems, treats every positive of a certain class as a potential false positive, and ignores it. The article goes on, "or objects in its path that wouldn’t actually be a problem for the vehicle, such as a plastic bag floating over a road." That's a description of what false positives are. It's there because many, if not most, readers of that article don't know what false positives are and why you want to ignore them.

NTSB says this: "When the system detects an emergency situation, it initiates action suppression." "ATG stated that it implemented action suppression process due to the concerns of the developmental ADS identifying false alarms—detection of a hazardous situation when none exists...." So yes, Uber ignores certain classes of positives in order to ignore false positives.

I can understand how you could misread the article to think that it meant something else.

You say the car didn't treat Herzberg like a false positive. I'm not sure how it didn't. It ignored her, exactly the same way it would ignore a false positive.

Maybe there was some "leaker" that talked to you specifically and misled you. If so, I hope you'll rely less on leakers in general, and this leaker in particular, in the future. I know I'll be much more skeptical of things you attribute to anonymous leaks in the future. I remember at the time you were making a lot of claims without divulging your sources, and I was skeptical of them, but I'll be even more skeptical in the future.

I don't think that article was particularly inaccurate, though. It wasn't incredibly clear. I think the author probably tried to condense an accurate description of what someone thought at the time was a contributing cause of the crash into too few lines, and thus conflated things a bit. But I also think that's pretty clear from just reading the article and understanding that tuning a system to ignore false positives necessarily involves the risk of also ignoring true positives. "But the tuning went too far, and the car didn’t react fast enough...." Which is true, but as it turned out, it was probably already too late by the time the one-second delay kicked in. To put it another way, the one-second delay would have caused the crash if Herzberg was detected between zero seconds and one seconds prior to the crash being inevitable. Even NTSB probably didn't know for sure if that was the case at that time. Now they seem to be implying that the crash was already inevitable by the time the one-second delay kicked in.

Finally, you suggest that Uber should have "triggered an audible alarm and a mild braking" while suppressing action. A mild braking would probably be good. But I don't think the action suppression time should last long enough to expect a human to take over (someone isn't going to go from not paying attention to being able to react to an emergency in one second, let alone the 100-300 milliseconds I suggested would be more appropriate), so I think an audible alarm would just be annoying in the 99% of the time that it is a false alarm (if your other systems are working). The report says there are two things that can happen during the action suppression delay: The positive can be confirmed to be a false positive, or the safety driver can take over. You seem to focus on the latter, but I think the former is the much more reasonable possibility. As I've said above, I think a very short action suppression period makes sense even when you don't have a safety driver, when the situation suddenly and unexpectedly goes from "everything is normal" to "OMG impact is imminent." When that happens, if your system is designed correctly, chances are it's just a trash bag or puff of smoke and not a human suddenly appearing in the middle of the road.

I think action suppression (maybe for 100-300 milliseconds) makes sense, though not in the situation that the Uber was in. If you have a bicycle (or an "unknown") in the lane right next to you, then you should be in a heightened state of alertness, and you wouldn't want to suppress action for even a millisecond if it suddenly swerves into your lane. You shouldn't need to suppress action while calculating an alternative path, either, as you should already be calculating them ahead of time when passing a bicycle (or an "unknown") at 40+ miles per hour. But if you're driving down the road and all is clear and then suddenly you see a human pop up in your lane and there's no reasonable explanation for how there could be a human there when there wasn't one anywhere in your sights a few milliseconds ago, then you're probably misperceiving something, and delaying a few hundred milliseconds before slamming on the brakes probably makes sense. (Sounding an alarm? Nah.)

In obstacle detection, A false positive is an obstacle that appears in the perception pipeline but in reality is not actually an obstacle -- in particular because it isn't actually there, or in some cases because it is something like blowing trash or birds. And yes, all perception systems have to have algorithms to look at the initially detected obstacles and figure which ones should be disregarded. Sometimes they might be sensor errors or flaws, like multipath radar returns, or pictures of cars on the sides of trucks (a notorious false positive for early vision systems.)

The Information article made specific mention of this sort of anti-false-positive detection in the context of Herzberg. It implied that the car got sensor returns from her (which it did) but then it decided "this is a false positive, so disregard."

Now what it actually did is a similar thing that all systems do. It did identify her as an obstacle, but put her in the class of "obstacles in other lanes we track but don't worry about." That is normal,you don't brake for cars or bikes in left turn lanes or moving in the lane next to you.

The person who leaked to the Information was confused about the no AEB heuristic. It is there because their system is poor and returns too many false positives, but it is not a false positive eliminator. It delays emergency braking on all obstacles, not just false positives.

And, as we now know, was not the cause of the crash, but once the crash was inevitable, it made it much worse.

It implied that the car got sensor returns from her (which it did) but then it decided "this is a false positive, so disregard."

The car got sensor returns from her, in the path of the car, and decided, "this is probably a false positive, so disregard for one second while checking to see if it's really a false positive and while preparing a response."

Sort of. Cars don't have thoughts. They just do what they are programmed to do. That one second delay was put in, according to the NTSB, due to "concerns of the developmental ADS identifying false alarms." In other words, the delay was put in because the programmers thought it would only be triggered on likely false positives.

The person who leaked to the Information was confused about the no AEB heuristic.

You don't know that (unless you were the leaker). Maybe you have a source that is telling you that, but you obviously shouldn't trust your sources as much as you have been.

What the car did, in the final situation once it had identified she was something the car would hit, was say, "this is a positive" Not a false one. Not probably a false one. No determination on whether false or a real obstacle at all. It said, "We don't brake >0.6g" for 1 second.

That's what it was programmed to do.

The programmers put it in not because it would only be triggered on false positives. They put it in because there were too many false positives, and they decided to leave the decision to the safety driver if it required braking that was too hard.

Had the safety driver been paying attention, the system would have worked. She would have been on the brakes and disengaging seconds prior to all of this.

What I means is that the information quoted from the leaker does not match what went on. The leaker would not have talked about the actual false positive mitigators (like detecting blowing trash) if they knew -- from what we are told now -- what went on. Those systems played no role. Perhaps The Information misunderstood what they were leaked. I challenged him on that, and he disputed that they misunderstood.

What the car did, in the final situation once it had identified she was something the car would hit, was say, "this is a positive"

Yes. I'm not sure what your point is, though.

The programmers put it in not because it would only be triggered on false positives. They put it in because there were too many false positives, and they decided to leave the decision to the safety driver if it required braking that was too hard.

I think you're leaving out something there. Namely, "while the (1) system verifies the nature of the detected hazard and calculates an alternative path." The theory, it seems, was that many false positives would go away within one second. The ones that stayed, would then cause an action (except that emergency braking was disabled; it's likely the interaction between those two rules wasn't considered; probably the one second rule came first, and maybe it was a much shorter time at first, because the rule makes sense for shorter times and if you don't have the emergency braking disabled).

Had the safety driver been paying attention, the system would have worked.

Had the safety driver been paying attention, the one second delay would have never kicked in, because the safety driver would have taken over before then.

If the safety driver is paying attention, this situation will only kick in for false positives and people doing things like suddenly walking in front of a car just seconds before impact, without any indication that they were going to do so.

If the car is otherwise programmed correctly (for instance to get path prediction right), this situation will only kick in for false positives and people doing things like suddenly walking in front of a car just seconds before impact, without any indication that they were going to do so.

The only real problem with this delay is that you should always assume that everything else is going wrong. You should assume that cars get predictions wrong, and you should assume that safety drivers sometimes don't pay attention. (Even if you're doing eye tracking, you should still assume this, I think. Because maybe the eye tracking won't work right, or maybe the safety driver is daydreaming in a way that the eye tracker really can't detect.)

The leaker would not have talked about the actual false positive mitigators (like detecting blowing trash) if they knew -- from what we are told now -- what went on.

I think the leaker probably explained what types of false positives had occurred in the past, that were the reason for adding the one second delay. Blowing trash is an excellent example of a type of false positive that the one second delay works well for, because one second later the trash is probably either blown somewhere else or is detected as something like blowing trash. Birds and smoke too. They often appear suddenly and then disappear just as suddenly. I just think one second is too long.

Perhaps The Information misunderstood what they were leaked.

Probably. I'd say there's evidence that they misunderstood what they were talking about just from what they wrote. They said the car was programmed to "ignore false positives." You seem to take that literally, as though the car was somehow magically able to distinguish between false positives and real positives. But obviously that's not possible. The nature of a false positive is that it can't be distinguished from a real positive. So I assume that whoever they were paraphrasing meant "ignore positives that are assumed to be false."

Obviously it's not true that the 1 second AEB delay only applied to false positives, so I can't imagine why you would say that. Of course it caused a problem with a slow identification in this case. As you note it could also occur with a sudden incursion into the road. It could also happen in any other situation which requires emergency braking, such as a crash taking place in front of you, a surprise reveal of a stopped vehicle (which has caused a few Tesla crashes) and all the other things that cause emergency braking.

They decided emergency braking was the job of the safety driver. Yes, a driving force for that was that they had problems with false positives, but if you want to call that a false positive remover I think that's a stretch. And the leak to the information suggested that EH was treated like a false positive. She was not. She was a true positive, and was treated the way their system treated true positives when they needed emergency braking for whatever reason.

Obviously it's not true that the 1 second AEB delay only applied to false positives, so I can't imagine why you would say that.

I didn't say that.

They decided emergency braking was the job of the safety driver.

Yes, that's why they disabled emergency braking.

And the leak to the information suggested that EH was treated like a false positive.

She was treated like a false positive!

She was treated like a positive, a detected obstacle needing action. She was not treated like a false positive. What is confusing the issue is that, independent of this, Uber delayed emergency braking because they had not gotten their false positives down low enough. The approach they took to get down the brake jabs from false positives did not care if the obstacle was false or not. It did not reduce false positives, it eliminated brake jabs. That's a different thing.

She was treated like a positive, a detected obstacle needing action. She was not treated like a false positive.

I'm not sure why you even say that. She was treated like both. She was ignored.

Are you saying the car would have treated a false positive differently?

She was treated the same way a false positive would have been treated, right? Is that not what you mean by "treated like a false positive"?

The nature of a false positive is that it can't be distinguished from a real positive. A false positive is something that the car thinks is a positive but in reality it is not. You can't program a car to ignore false positives of a certain class without telling it to ignore all positives of a certain class. You can, of course, program a car to produce fewer false positives. But there's probably always going to be false positives. Humans have them too. So you either accept the reactions to them, or you mitigate them. Depending on the class of false positives, I think you have to do both. In my opinion, delaying hard braking for 100-300 milliseconds when an object suddenly and unexpectedly appears in the road is a good way to mitigate one class of false positives. (A car swerving out of the way and revealing a stopped obstacle wouldn't qualify as unexpected, by the way. You'll find that's unacceptably common once you gather the billions of miles of real-world driving data you need to produce a self-driving car. I'm not sure that a 300 millisecond delay is going to be a major problem if you're at a safe distance for the driving conditions and are paying attention to traffic, but it's probably common enough that you need to expect it. So in addition to reducing the reaction time from one second to 100-300 milliseconds, there should be be a heightened awareness state where reaction times are as fast as possible. During these times you calculate your escape route ahead of time, in case the expected-worst-case thing happens. For driving in traffic at high speeds, for when you see yellow, blue, or red flashing lights, for school zones, for when you think you see a bicyclist or other vehicle with no headlight or taillight on the roads at night.... It probably makes sense to build a neural net to identify the types of situations.)

It did not reduce false positives

Nor does the article say it did.

In the broad sense, a false positive can be viewed as something that was not really there which the system treats as there and reacts to. She certainly wasn't that.

Generally, though, when you talk about false positive mitigation, as the Information article did, you talk about positives detected by some level of the system, and then another part of the system is able to determine that they are not really there. As such the subsystem has a false positive but the whole system is able to avoid it.

Again, the emergency brake delay affects all positives, it is not there to mitigate specific false positives.

An example of a false positive mitigation system might include the example given in the article -- doing motion based analysis to tell blowing plastic from other obstacles, or identifying types of birds you should not break for. Identifying clouds of exhaust which otherwise might appear as an obstacle. Disregarding radar returns from signs and bridges because they appear in your map. Disregarding pictures of cars on billboards through either better networks or a map indicating the billboard is there.

Forbidding emergency braking for one second is forbidding emergency braking, not eliminating false positives.

Add new comment