Mammograms acquired through population-based breast cancer screening programs produce a significant workload for radiologists. AI has been proposed as an automated second reader for mammograms that could help reduce this workload. The technology has shown encouraging results for cancer detection, but evidence related to its use in real screening settings is limited.
In the new study - the largest of its kind to date, Norwegian researchers led by Solveig Hofvind, Ph.D., from the Section for Breast Cancer Screening, Cancer Registry of Norway in Oslo, compared the performance of a commercially available AI system with routine independent double reading as performed in a population-based screening program. The study drew from almost 123,000 examinations performed on more than 47,000 women at four facilities in BreastScreen Norway, the nation’s population-based screening program.
The dataset included 752 cancers detected at screening and 205 interval cancers, or cancers detected between screening rounds. The AI system predicted the risk of cancer on a scale from 1 to 10, with 1 representing the lowest risk and 10 the highest risk. A total of 87.6% (653 of 752) of screen-detected and 44.9% (92 of 205) of interval cancers had the highest AI score of 10.
The researchers created three thresholds to assess the performance of the AI system as a decision-making tool. Using a threshold that mirrors the average individual radiologist rate of positive interpretation, the proportion of screen-detected cancers not selected by the AI system was less than 20%. While the AI system performed well, the study’s reliance on retrospective data means that more research is needed.
"In our study, we assumed that all cancer cases selected by the AI system were detected," Dr. Hofvind said. "This might not be true in a real screening setting. However, given that assumption, AI will probably be of great value in interpretation of screening mammograms in the future."
The results showed favorable histopathologic characteristics associated with a better prognosis for screening-detected cancers with low versus high AI scores. Opposite results were observed for interval cancers. This may indicate that interval cancers with low AI scores are true interval cancers not visible on the screening mammograms.
The high percentage of true negative examinations classified with a low AI score has the potential of substantially reducing the interpretive volume, while allowing only a small proportion of cancers to go undetected. By using AI as one of the two readers in a double reading setting, the radiologist could still identify these cancers, the researchers said.
"Based on our results, we expect AI to be of great value in the interpretation of screening mammograms in the future," Dr. Hofvind said. "We expect the greatest potential to be in reducing the reading volume by selecting negative examinations."
Although more study is needed before clinical implementation of AI in breast cancer screening, the results of the study help establish a basis for future research, including prospective studies, Dr. Hofvind said.
"We are looking forward to testing out different scenarios for AI using retrospective data and then running a prospective trial," she said.
Larsen M, Aglen CF, Lee CI, Hoff SR, Lund-Hanssen H, Lång K, Nygård JF, Ursin G, Hofvind S.
Artificial Intelligence Evaluation of 122 969 Mammography Examinations from a Population-based Screening Program.
Radiology. 2022 Mar 29:212381. doi: 10.1148/radiol.212381