Thread-local storage is a method to reduce synchronization overhead in multi-threaded applications where data is not shared between threads. That requires a protection mechanism around certain thread-local memory locations (like TLS and stack) in order that only a single one of the threads may access that memory. Since all threads within a process share the same virtual address space, how is thread-local storage and stack of a thread is protected from other threads of the same process?
I guess OS should provide such a protection mechanism, and if so, how? ... The whole concept of thread-local storage is to reduce overhead, so involving OS means adding overhead. Is there a runtime library or hardware support? or maybe is not protected at all and is left to the programmer...
You are correct in thinking that a programmer could access the thread local storage space of another thread within the same process. It wouldn't be trivial since the programmer would have to either access the memory directly or use some undocumented APIs but it could theoretically be done. But, why would he (or she)?! The whole premise of the TLS is to make it easy for programmers to store data in a place that is not shared with other threads within the process.
The fact that the thread local storage is managed by the OS means that the actual location of the thread local storage in the process's memory is not advertised directly. Reading and writing to the TLS is "managed" by the operating system with relatively low overhead (a function call) by suppling a simple Get/Set api. The protection here is mostly convenience by making it difficult for somebody to accidentally access data that belongs (is also accessed) by a different thread.