I am trying to build a musical instrument recognition system using CNN. The features I want to extract are the Mel spectrogram, MFCC, spectral centroid, spectral flux, and zero crossing rate. I want to compare which feature, when fed to CNN, will lead to the highest performance (accuracy, precision, recall, and F1). My question is: is it possible to feed CNN with the raw values instead of the spectrogram of the features I extracted?
Most of the related studies I read involved getting the spectrogram and feeding this image directly to CNN.