Synthetic Swarm Mosquito Dataset for Acoustic Classification: A Proof of Concept

1Vietnamese-German University
RIVF International Conference on Computing and Communication Technologies, 2025
*Author    Mentor

Abstract

Abstract—Mosquito-borne diseases pose a serious global health threat, causing over 700,000 deaths annually. This work introduces a proof-of-concept Synthetic Swarm Mosquito Dataset for Acoustic Classification, created to simulate realistic multi-species and noisy swarm conditions. Unlike conventional datasets that require labor-intensive recording of individual mosquitoes, the synthetic approach enables scalable data generation while reducing human resource demands.

Synthetic swarm audio generation is the core novelty of this work. This approach facilitates the development of realistic, scalable multi-species datasets that would be impractical to collect through fieldwork. Using log-mel spectrograms, we evaluated lightweight deep learning architectures for the classification of mosquito species.

Experiments show that these models can effectively identify six major mosquito vectors and are suitable for deployment on embedded low-power devices. The study demonstrates the potential of synthetic swarm audio datasets to accelerate acoustic mosquito research and enable scalable real-time surveillance solutions. The public dataset used in this study can be found here.

Index Terms—Synthetic swarm audio, mosquito audio classification, log-mel spectrogram, convolutional neural networks, vector surveillance

Methodology

A. Audio Preprocessing and Feature Extraction

Raw audio signals are sampled at 16 kHz and normalized. To convert 1D waveforms into deep learning-compatible formats, we use Short-Time Fourier Transform (STFT) to handle non-stationary wingbeat signals:

X(t, ω) = ∫ x(τ)w(τ - t)e-jωτ

In this study, 64-bin log-mel spectrograms computed via STFT are used. Mel spectrograms are computed using a 512-point FFT with a 25 ms Hann window and a 10 ms hop size. Each spectrogram contains 128 mel bands and is rendered as 224 × 224 pixel RGB images for compatibility with visual backbones.

Mel-spectrogram of a synthetic mosquito swarm
Fig. 2: Mel-spectrogram of a synthetic mosquito swarm.Vertical lines indicate wingbeat frequencies
Example input with corresponding species label metadata
Fig. 3: Example input with corresponding species label metadata. Used for generating multi-label targets

B. Synthetic Swarm Generation

For each synthetic sample, n ~ U(1, 10) mosquitoes are selected from species set S = {s₁, ..., sₖ}. The synthetic swarm is generated as:

X(t) = Σᵢ₌₁ⁿ gᵢ · xᵢ(t - τᵢ) · 1[0,T](t - τᵢ)

where gᵢ ~ U(0.2, 1.0) is gain variation to simulate distance, and τᵢ ~ U(0, 3.0) is time offset. Segments xᵢ(t) are randomly chunked (Δt ∈ [0.3, 0.6]s). White Gaussian noise is added at SNR levels of 20-40 dB to simulate realistic trap environments. Multi-label ground truth vectors y ∈ {0, 1}ᵏ indicate species presence.

C. Dataset

We use the Abuzz dataset containing labeled mosquito wingbeat recordings across 20 species. For this study, we focus on key disease vector species categorized into four groups:

  • Aedes: A. aegypti, A. albopictus
  • Anopheles: A. arabiensis, A. gambiae
  • Culex: C. quinquefasciatus, C. pipiens
  • Noise: Non-mosquito background sounds

D. Model Architectures

Three neural network architectures are evaluated for multi-label classification:

1) CNN (ResNet-18)

Baseline architecture using ResNet-18 to process 224 × 224 RGB spectrograms:

ŷ = σ(WTf + b), f = GAP(CNN(X))

Pros: Efficient and edge-friendly. Cons: No temporal modeling.

2) CNN + RNN

CNN features are reshaped and passed to an RNN for temporal awareness:

S = reshape(F) ∈ RB×T×C, R = RNN(S), ŷ = σ(WTR + b)

3) CNN + LSTM

Replacing RNN with LSTM improves long-range dependency modeling:

(ht, ct) = LSTM(S), ŷ = σ(WThT + b)

This hybrid achieves stronger modeling of long-range wingbeat patterns.

E. Training Pipeline

The training procedure includes spectrogram conversion, label encoding, stratified data splitting, and end-to-end model optimization with the following configuration:

  • Loss Function: Binary Cross-Entropy (BCE) with sigmoid activation
  • Optimizer: Adam optimizer with learning rate α = 10⁻⁴
  • Early Stopping: Based on validation loss
  • Data Split: Stratified splitting based on label cardinality

Predictions ŷ ∈ [0, 1]ᵏ are thresholded at different τ values for evaluation.

F. Evaluation Metrics

We use the following multi-label classification metrics to evaluate model performance:

  • Multi-label Accuracy: Average proportion of correctly predicted labels per sample across all classes: (1/C) Σⱼ I(yⱼ = ŷⱼ)
  • Macro Precision, Recall, F1-score: Highlight per-class performance and robustness across imbalanced data

These metrics ensure comprehensive evaluation of the model's ability to handle multi-species classification in realistic mosquito swarm scenarios.

Experimental Results

This section presents the results of experiments conducted on the multi-label mosquito species classification task using CNN+LSTM, CNN+RNN, and CNN models trained on swarm audio spectrograms. We evaluate performance under different decision thresholds (0.3, 0.5, and 0.7), focusing on the trade-off between sensitivity and precision for embedded real-time applications.

A. Impact of Threshold on Model Performance

Figure 9 compares the performance of CNN+LSTM, CNN+RNN, and CNN across three thresholds using four metrics: Accuracy, Macro F1, Precision, and Recall. The CNN+LSTM consistently outperforms the others, particularly in F1 and recall, confirming its ability to capture long-term audio patterns.

Key Findings:

  • Threshold 0.3: CNN+LSTM achieves highest F1 and recall, indicating strong sensitivity to mosquito species presence
  • Threshold 0.5-0.7: All models show decreased recall and F1 with minimal precision gains
  • Trade-off: Lower thresholds provide better sensitivity but reduced precision due to more liberal predictions
Performance comparison across thresholds
Figure 9: Performance comparison of CNN+LSTM, CNN+RNN, and CNN models across thresholds 0.3, 0.5, and 0.7

B. Learning Curve Analysis

Threshold 0.3 – Stable Generalization

Training and validation losses show smooth and consistent convergence, suggesting good generalization. The model avoids overfitting, and validation accuracy continues to rise after training accuracy saturates—evidence of well-balanced learning.

Recommendation: Threshold 0.3 is optimal for applications where missing a mosquito is more costly than a false positive, such as in early-warning traps.

Threshold 0.5 – Overconfidence and Reduced Sensitivity

Early convergence in training but unstable validation loss indicates reduced generalization. Validation accuracy plateaus and final test performance drops compared to the 0.3 threshold case. This suggests the model becomes overconfident yet less sensitive to minority classes—undesired for imbalanced multilabel detection tasks.

Loss curves at threshold 0.3
Figure 10(a): Training and validation loss curves of CNN+LSTM model at threshold 0.3 showing stable generalization
Loss curves at threshold 0.5
Figure 10(b): Training and validation loss curves of CNN+LSTM model at threshold 0.5 showing reduced generalization
Accuracy curves at threshold 0.3
Figure 11(a): Training and validation accuracy of CNN+LSTM at threshold 0.3 with well-balanced learning
Accuracy curves at threshold 0.5
Figure 11(b): Training and validation accuracy of CNN+LSTM at threshold 0.5 showing plateaued performance

C. Field Validation in Vung Tau, Vietnam

To evaluate real-world performance, we conducted field validation using a baited mosquito trap equipped with CO₂ as an attractant to simulate human presence. A high-sensitivity microphone captured continuous flight-tone recordings of approaching mosquitoes. Real swarm audio samples were processed using our trained CNN+LSTM model with a conservative detection threshold of 0.3.

Species # Positive Detections (out of 12) Detection Rate Biological Relevance
Aedes aegypti 10 83.3% ✓ Common in southern Vietnam
Anopheles arabiensis 8 66.7% ✗ False positive (African species)
Aedes albopictus 6 50.0% ✓ Present in Vietnam
Culex quinquefasciatus 1 8.3% △ Under-detected (common species)
Anopheles gambiae 0 0% ✓ Correctly absent (African species)
Culex pipiens 1 8.3% ✓ Low significance in region

Field Validation Summary:

  • Strong Performance: Ae. aegypti (83.3%) and moderate detection of Ae. albopictus (50%) align with known distribution in southern Vietnam
  • False Positives: An. arabiensis detections are biologically incorrect as this African vector doesn't occur in Vietnam
  • Under-sensitivity: C. quinquefasciatus low detection rate (8.3%) indicates model limitations for this common Vietnamese species
  • Geographic Accuracy: An. gambiae absence correctly reflects its non-occurrence in Vietnam
Loss curves at threshold 0.3
Figure 10(a): Training and validation loss curves of CNN+LSTM model at threshold 0.3 showing stable generalization
Loss curves at threshold 0.5
Figure 10(b): Training and validation loss curves of CNN+LSTM model at threshold 0.5 showing reduced generalization
Accuracy curves at threshold 0.3
Figure 11(a): Training and validation accuracy of CNN+LSTM at threshold 0.3 with well-balanced learning
Accuracy curves at threshold 0.5
Figure 11(b): Training and validation accuracy of CNN+LSTM at threshold 0.5 showing plateaued performance

Related Work and Comparison

Recent advances in mosquito acoustic classification have leveraged various machine learning approaches. This section compares our synthetic swarm approach with existing methods in the field.

Traditional Acoustic Classification Approaches

Bale et al. (2019) provided a comprehensive survey of acoustic-based mosquito classification using flight tones, establishing the foundation for frequency-domain analysis. Yang & Zhang (2019) utilized Continuous Wavelet Transform (CWT) for insect wingbeat classification, demonstrating the effectiveness of time-frequency representations.

Fernandes & Batista (2020) developed CNN-based mosquito detection for smartphone audio, while Ramos et al. (2023) proposed an acoustic sensor module for real-time detection and classification in embedded systems.

Deep Learning for Mosquito Classification

Kiskin et al. (2019) pioneered bioacoustic detection with deep learning, establishing CNN architectures for species identification. Wang et al. (2020) introduced CRNN (CNN+RNN) architectures specifically for mosquito species identification, which influenced our hybrid approach.

Toledo et al. (2021) developed LSTM-based mosquito genus classification using wingbeat sounds, demonstrating the effectiveness of sequential modeling for temporal acoustic patterns—a key inspiration for our CNN+LSTM architecture.

Noise-Robust and Real-World Applications

Supratak et al. (2024) addressed noise robustness with MosquitoSong+, focusing on classification in real-world environments. Their work highlighted the challenges of field deployment that motivated our synthetic swarm approach for data augmentation.

The HumBug Project (MindFoundry, 2022) and Abuzz Project (Stanford, 2018) demonstrated citizen science approaches to mosquito tracking using smartphones, providing valuable datasets including the Abuzz dataset used in our study.

Method/Study Architecture Dataset Type Key Innovation Limitation
Bale et al. (2019) Survey/Traditional Real recordings Flight tone analysis Limited scalability
Yang & Zhang (2019) CWT-based Individual insects Time-frequency analysis Single species focus
Wang et al. (2020) CRNN Real recordings CNN+RNN hybrid Limited dataset size
Toledo et al. (2021) LSTM Wingbeat sounds Temporal modeling Genus-level only
Fernandes & Batista (2020) CNN Smartphone audio Mobile deployment Single mosquito detection
Supratak et al. (2024) MosquitoSong+ Real-world noisy Noise robustness Complex preprocessing
Our Method (2025) CNN+LSTM Synthetic Swarms Multi-label swarm classification Synthetic data dependency

Our Contributions vs. Existing Work:

  • Novel Synthetic Swarm Approach: First work to generate realistic mosquito swarm audio for multi-label classification
  • Multi-Species Detection: Unlike single-species approaches, we handle complex swarm scenarios with multiple species simultaneously
  • Scalable Data Generation: Addresses dataset scarcity through controlled synthetic generation with biological realism
  • Field Validation: Real-world testing in Vietnam demonstrates practical applicability beyond laboratory settings
  • Threshold Optimization: Systematic analysis of decision thresholds for deployment flexibility

Public Health Impact

As highlighted by Breedlove (2022) and Boyer et al. (2018), mosquitoes remain critical vectors for deadly diseases including Zika, dengue, and malaria. Our acoustic classification system contributes to the broader effort of vector surveillance and control, offering a scalable, cost-effective solution for real-time mosquito monitoring in endemic regions.

Citation

@inproceedings{dinh2025synthetic,
  title={Synthetic Swarm Mosquito Dataset for Acoustic Classification: A Proof of Concept},
  author={Dinh, Thai Duy and Vo, Minh Luan},
  supervisor={Nguyen, Cuong Tuan and Vo, Hien Bich},
  booktitle={RIVF International Conference on Computing and Communication Technologies},
  year={2025},
  pages={1--6},
  organization={IEEE}
}

References

[1] B. Breedlove, "Deadly, dangerous, and decorative creatures," Emerging Infectious Diseases, vol. 28, no. 2, pp. 495–496, 2022.

[2] S. Boyer, E. Calvez, T. Chouin-Carneiro, D. Diallo, and A.-B. Failloux, "An overview of mosquito vectors of zika virus," Microbes and Infection, vol. 20, no. 11-12, pp. 646–660, 2018. [Online]. Available: www.sciencedirect.com/science/article/pii/S128645791830039X

[3] T. Bale, S. Belew, N. Williams, B. Johnson, T. Hall, N. Buckerfield, A. Kaul, M. H. Imtiaz, and B. Taylor, "Acoustic-based classification of mosquitoes using flight tones: A survey," Acoustics Australia, vol. 47, no. 2, pp. 191–199, 2019.

[4] I. Kiskin, V. Kindratenko, and S. J. Cox, "Bioacoustic detection with deep learning," Interspeech, 2019.

[5] MindFoundry, "The humbug project," 2022, https://www.mindfoundry.ai/blog/humbug-2022.

[6] H. Fernandes and P. Batista, "Mosquito detection using cnn on smartphone audio," arXiv preprint arXiv:2008.09024, 2020.

[7] K. Ramos, M. L. C. Guico, and J. K. A. Galicia, "Acoustic sensor module for mosquito detection and classification," in 2023 9th International Conference on Computer and Communication Engineering (ICCCE). IEEE, 2023, pp. 126–131.

[8] M. Yang and L. Zhang, "Insect wingbeat classification using cwt," Sensors, vol. 19, no. 5, p. 1123, 2019.

[9] Y. Wang, T. Li, and X. Chen, "Crnn for mosquito species identification," ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 900–904, 2020.

[10] R. Toledo, R. da Silva, and J. Souza, "Lstm-based mosquito genus classification using their wingbeat sound," Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 5, no. 3, pp. 1–20, 2021.

[11] A. Supratak, P. Rattanatamrong, and S. Chomphan, "Mosquitosong+: Noise-robust mosquito classification in real-world environments," PLOS ONE, vol. 19, no. 2, p. e0310121, 2024.

[12] A. Project, "Citizen science to track mosquitoes using smartphones," 2018, https://abuzz.stanford.edu/.