Few-Shot Open-Set Audio Classification Using Attention Information-Fused Prototypes

Most existing audio classification methods suppose that each query (testing) sample belongs to a class of support (training) samples, and misrecognize samples of unseen classes as seen classes (cannot reject samples of unseen classes). In this study, we propose a method for Few-shot Open-set Audio Classification (FOAC), which can recognize query samples of seen classes after updating the model using a few support samples, and meanwhile reject query samples from unseen classes. We design a model consisting of an encoder and a classifier. The encoder is the backbone of a ResNet used for extracting embeddings. The classifier consists of prototype generators of few-shot classes and open-set classes. Prototypes of few-shot classes are obtained by fusing the class-discriminative information of support and query embeddings and by assigning larger weighting coefficient to representative part of the support embeddings. One prototype is generated for open-set classes using the proposed prototype generator. The encoder is trained with abundant samples of base classes in supervised manner, and then the prototypes of base classes are generated under the supervision of a joint loss. The classifier is trained using a few samples of few-shot classes in a meta-training way. Three public datasets (LS-100, NSynth-100, and FSC-89) are used to assess the performance of our method. Experiments show that our method has advantage over prior methods in AUROC and accuracy. This advantage has statistical significance for most prior methods. Our method has lower computational complexity than most prior methods. The code is at https://github.com/Jessytan/FOAC-AIFP.