I'm currently studying the convergence properties of stochastic gradient descent (SGD) algorithms, particularly in the context of L-smooth non-convex functions. While there is substantial literature on the convergence of vanilla SGD under these conditions, I'm curious about the state of research concerning mini-batch SGD.
Most of the studies I've come across focus on mini-batch SGD under conditions such as convexity or strong convexity. However, I'm specifically interested in its convergence behavior on L-smooth non-convex functions.
Could anyone familiar with this area of research direct me to relevant studies or papers that explore the convergence of mini-batch SGD in this specific context? Any insights or references would be greatly appreciated.
Thank you!