Although this optimization is useful in system programming, test-and-set is to be avoided in high-level concurrent programming: spinning in applications deprives the operating system scheduler the knowledge of who is blocking on what. Consequently, the scheduler will have to guess on how to allocate CPU time among the threads -- typically just allowing the threads to use up their timing quota. Threads will end up spinning unproductively, waiting for threads that are not scheduled.
By using operating-system provided lock objects, such as mutexes, the OS can schedule exactly the unblocked threads.