TRANSFORMER VS. MAMBA AS SKIN CANCER CLASSIFIER: PRELIMINARY RESULTS

Authors

  • Vladyslav Nikitin National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, IASA, Department of Artificial Intelligence (AI), Ukraine https://orcid.org/0009-0001-9921-0213
  • Valery Danilov National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, IASA, Department of Artificial Intelligence (AI)І, Ukraine https://orcid.org/0000-0003-3389-3661

DOI:

https://doi.org/10.20535/kpisn.2024.1-4.301028

Abstract

Background: Skin cancer is a deadly disease that takes dozens of thousands of lives yearly. The key element of successful treatment of it is early detection. However, invasive detection methods are not always feasible. Meanwhile, Transformers, the most renowned and researched models keep being computationally heavy. In this paper we investigate Mamba model for such classification problem compared to Transformers.
Objective: This paper compares the effectiveness of two machine learning architectures, Vision Transformer (ViT) and Mamba, for skin cancer classification using dermoscopy images. The goal is to determine if Mamba can provide a computationally efficient alternative to ViT without decrease in diagnostics accuracy.
Methods: We used the HAM10000 dataset, a well-known benchmark in skin cancer classification, with 10015 dermoscopic images. We preprocessed the data to address issues like class imbalance and normalized the images. Both ViT and Mamba models were pretrained on the ImageNet dataset and fine-tuned for skin cancer classification. We evaluated the models based on overall accuracy and F1 scores for specific classes of skin cancer.
Results: The results show that both ViT and Mamba models have similar overall accuracy, with Mamba models performing slightly better in classifying less represented classes like Bowen's Disease and Dermatofibroma. Both models demonstrated high F1 scores for Melanoma, indicating their effectiveness in identifying this severe form of skin cancer.
Conclusions: Our findings suggest that Mamba is a viable alternative to ViT for skin cancer classification, offering similar accuracy while potentially reducing computational costs. This could make non-invasive skin cancer diagnostics more accessible and affordable. Further research is needed to explore other variations of the Mamba model and to fine-tune its performance on larger datasets.

References

World Cancer Research Fund International, "Skin cancer statistics," 2022. [Online]. Available: https://www.wcrf.org/cancer-trends/skin-cancer-statistics/. [Accessed: 28-Mar-2024].

A. F. Jerant, J. T. Johnson, C. D. Sheridan, and T. J. Caffrey, "Early detection and treatment of skin cancer," Am. Fam. Physician, vol. 62, no. 2, pp. 357-368, Jul. 2000. [Online]. Available: https://www.aafp.org/pubs/afp/issues/2000/0715/p357.html. [Accessed: 28-Mar-2024].

A. Dosovitskiy et al., "An image is worth 16x16 words: Transformers for image recognition at scale," arXiv preprint arXiv:2010.11929v2, 2020. [Online]. Available: https://doi.org/10.48550/arXiv.2010.11929. [Accessed: 28-Mar-2024].

V. Nikitin and N. Shapoval, "Vision Transformer for Skin Cancer Classification," Scientific Collection «InterConf+», no. 33(155), May 2023, pp. 449-60. [Online]. Available: https://doi.org/10.51582/interconf.19-20.05.2023.039. [Accessed: 28-Mar-2024].

G. Yang, S. Luo, and P. A. Greer, "A Novel Vision Transformer Model for Skin Cancer Classification," Neural Process. Lett., vol. 55, pp. 9335-9351, 2023. [Online]. Available: https://doi.org/10.1007/s11063-023-11204-5. [Accessed: 28-Mar-2024].

C. Xin et al., "An improved transformer network for skin cancer classification," Comput. Biol. Med., vol. 149, p. 105939, 2022. [Online]. Available: https://doi.org/10.1016/j.compbiomed.2022.105939. [Accessed: 28-Mar-2024].

A. Gu and T. Dao, "Mamba: Linear-time sequence modeling with selective state spaces," arXiv preprint arXiv:2312.00752, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2312.00752. [Accessed: 28-Mar-2024].

A. Vaswani et al., "Attention is all you need," arXiv preprint arXiv:1706.03762, 2017. [Online]. Available: https://doi.org/10.48550/arXiv.1706.03762. [Accessed: 28-Mar-2024].

F. D. Keles, P. M. Wijewardena, and C. Hegde, "On the computational complexity of self-attention," arXiv.org, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2209.04881. [Accessed: 28-Mar-2024].

A. Gu, K. Goel, and C. Ré, "Efficiently modeling long sequences with structured state spaces," arXiv.org, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2111.00396. [Accessed: 28-Mar-2024].

Y. Liu et al., "Vmamba: Visual state space model," arXiv preprint arXiv:2401.10166, 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2401.10166. [Accessed: 28-Mar-2024].

L. Zhu et al., "Vision mamba: Efficient visual representation learning with bidirectional state space model," arXiv preprint arXiv:2401.09417, 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2401.09417. [Accessed: 28-Mar-2024].

P. Tschandl, C. Rosendahl, and H. Kittler, "The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions," Sci. Data, vol. 5, 180161, 2018. [Online]. Available: https://doi.org/10.1038/sdata.2018.161. [Accessed: 28-Mar-2024].

Downloads

Published

2024-12-31