By Dave Gershgorn, Luke Oakden-Rayner, Thomas Wilburn, Leon Chen

The artificial intelligence being developed for medical use is typically complicated pattern matching: An algorithm is shown many many medical scans of organs with tumors, as well as tumor-free images, and tasked with learning the patterns that differentiate the two categories.
We showed our algorithms nearly 200,000 images of malignant, benign, and tumor-free CT scans, in both 2D and 3D. The way we measure how accurate the nodule detection algorithm is as it learns to find these tumors is the same as they would be implemented in a specialist’s office, with a metric called “recall.” Recall tells us the percentage of nodules the algorithm catches, given a set number of false alarms. For instance, 60% for Recall@1 means it would catch 60% of tumors, with one false alarm per scan allowed. For the malignancy algorithm, it’s simply the percent of correctly identified nodules.
Theoretically, you could set the false-alarm threshold higher or lower, but it would impact the percentage of nodules caught. If four false alarms were allowed per nodule caught, for example, the percentage would go up. In real-world settings, more false alarms mean unnecessary tests for patients. But each doctor could be comfortable with different levels of algorithmic sensitivity, either prioritizing accuracy or fewer false positives, depending on their own workflow.
No comments:
Post a Comment