Can Large Language Models Save Vulnerability Detection in Low-Resource Languages?

As Kotlin, Swift, and Rust grow in popularity for mobile and systems programming, a critical challenge emerges: vulnerability detection models struggle due to scarce labeled datasets—only 0.2–0.8% the size of those for C/C++. Traditional code-embedding models like CodeBERT falter in this low-data regime, with performance dropping sharply despite aggressive sampling strategies. In this study, we benchmark CodeBERT against fine-tuned and few-shot ChatGPT on real-world vulnerability data from these languages. Results reveal that LLMs significantly outperform baselines—boosting function-level F1 by up to 34% and improving line-level localization accuracy by over 50%—even with minimal training data. The findings suggest that large language models, empowered by broad pre-training, can bridge the data scarcity gap, offering secure, practical solutions for emerging languages. This work redefines the future of automated vulnerability detection in an era of language diversity and limited labeled code.

Ali Babar