TRANSFORMER VS. MAMBA AS SKIN CANCER CLASSIFIER: PRELIMINARY RESULTS
DOI:
https://doi.org/10.20535/kpisn.2024.1-4.301028Abstract
Background: Skin cancer is a deadly disease that takes dozens of thousands of lives yearly. The key element of successful treatment of it is early detection. However, invasive detection methods are not always feasible. Meanwhile, Transformers, the most renowned and researched models keep being computationally heavy. In this paper we investigate Mamba model for such classification problem compared to Transformers.
Objective: This paper compares the effectiveness of two machine learning architectures, Vision Transformer (ViT) and Mamba, for skin cancer classification using dermoscopy images. The goal is to determine if Mamba can provide a computationally efficient alternative to ViT without decrease in diagnostics accuracy.
Methods: We used the HAM10000 dataset, a well-known benchmark in skin cancer classification, with 10015 dermoscopic images. We preprocessed the data to address issues like class imbalance and normalized the images. Both ViT and Mamba models were pretrained on the ImageNet dataset and fine-tuned for skin cancer classification. We evaluated the models based on overall accuracy and F1 scores for specific classes of skin cancer.
Results: The results show that both ViT and Mamba models have similar overall accuracy, with Mamba models performing slightly better in classifying less represented classes like Bowen's Disease and Dermatofibroma. Both models demonstrated high F1 scores for Melanoma, indicating their effectiveness in identifying this severe form of skin cancer.
Conclusions: Our findings suggest that Mamba is a viable alternative to ViT for skin cancer classification, offering similar accuracy while potentially reducing computational costs. This could make non-invasive skin cancer diagnostics more accessible and affordable. Further research is needed to explore other variations of the Mamba model and to fine-tune its performance on larger datasets.
References
World Cancer Research Fund International, "Skin cancer statistics," 2022. [Online]. Available: https://www.wcrf.org/cancer-trends/skin-cancer-statistics/. [Accessed: 28-Mar-2024].
A. F. Jerant, J. T. Johnson, C. D. Sheridan, and T. J. Caffrey, "Early detection and treatment of skin cancer," Am. Fam. Physician, vol. 62, no. 2, pp. 357-368, Jul. 2000. [Online]. Available: https://www.aafp.org/pubs/afp/issues/2000/0715/p357.html. [Accessed: 28-Mar-2024].
A. Dosovitskiy et al., "An image is worth 16x16 words: Transformers for image recognition at scale," arXiv preprint arXiv:2010.11929v2, 2020. [Online]. Available: https://doi.org/10.48550/arXiv.2010.11929. [Accessed: 28-Mar-2024].
V. Nikitin and N. Shapoval, "Vision Transformer for Skin Cancer Classification," Scientific Collection «InterConf+», no. 33(155), May 2023, pp. 449-60. [Online]. Available: https://doi.org/10.51582/interconf.19-20.05.2023.039. [Accessed: 28-Mar-2024].
G. Yang, S. Luo, and P. A. Greer, "A Novel Vision Transformer Model for Skin Cancer Classification," Neural Process. Lett., vol. 55, pp. 9335-9351, 2023. [Online]. Available: https://doi.org/10.1007/s11063-023-11204-5. [Accessed: 28-Mar-2024].
C. Xin et al., "An improved transformer network for skin cancer classification," Comput. Biol. Med., vol. 149, p. 105939, 2022. [Online]. Available: https://doi.org/10.1016/j.compbiomed.2022.105939. [Accessed: 28-Mar-2024].
A. Gu and T. Dao, "Mamba: Linear-time sequence modeling with selective state spaces," arXiv preprint arXiv:2312.00752, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2312.00752. [Accessed: 28-Mar-2024].
A. Vaswani et al., "Attention is all you need," arXiv preprint arXiv:1706.03762, 2017. [Online]. Available: https://doi.org/10.48550/arXiv.1706.03762. [Accessed: 28-Mar-2024].
F. D. Keles, P. M. Wijewardena, and C. Hegde, "On the computational complexity of self-attention," arXiv.org, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2209.04881. [Accessed: 28-Mar-2024].
A. Gu, K. Goel, and C. Ré, "Efficiently modeling long sequences with structured state spaces," arXiv.org, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2111.00396. [Accessed: 28-Mar-2024].
Y. Liu et al., "Vmamba: Visual state space model," arXiv preprint arXiv:2401.10166, 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2401.10166. [Accessed: 28-Mar-2024].
L. Zhu et al., "Vision mamba: Efficient visual representation learning with bidirectional state space model," arXiv preprint arXiv:2401.09417, 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2401.09417. [Accessed: 28-Mar-2024].
P. Tschandl, C. Rosendahl, and H. Kittler, "The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions," Sci. Data, vol. 5, 180161, 2018. [Online]. Available: https://doi.org/10.1038/sdata.2018.161. [Accessed: 28-Mar-2024].
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Vladyslav Nikitin, Valery Danilov
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under CC BY 4.0 that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work