I'm not sure if it is possible but anyway,
I use using System.Speech.Recognition;
in winform C# app.
I'm wondering if it is possible not only to recognize speech, but also recognize voice, somehow recognize difference between different voices
to get something near to reading of multiply content from each separate voice, for example from two simultaneously or separately speaking users as different two.
Or at least maybe some method to control background loudness, for example if AudioLevelUpdated
event allows me to see input volume, but maybe also exist some specific way to separate loud voice from extra noise or voices in background
I think you should take a look at CRIS which is part of Microsoft Cognitive Services, at least for you question about noise.
CRIS is a Custom Speech Service and its basic use is to improve the quality of Speech-to-text using custom acoustics models (like background noise) and learning vocabulary using samples.
You can import :
Acoustic Datasets
Language Datasets
Pronunciation Datasets
For example in acoustic models you have:
Microsoft Conversational Model for recognizing speech spoken in a conversational style (i.e. speech directed at another person).
Microsoft Search and Dictation Model for speech directed to an application, such as commands, search queries or dictation.
There is also a Speaker Recognition API available in preview