# Basic

## 1. **Why we generally use Softmax non-linearity function as last operation in-network?**

It is because it takes in a vector of real numbers and returns a probability distribution. Its definition is as follows. Let x be a vector of real numbers (positive, negative, whatever, there are no constraints).

Then the i’th component of Softmax(*x) is —*<br>

![](https://54486267-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MJ5sRvnbi6ybKs7Ho3A%2F-MJMQYXppML-OY7oJV0L%2F-MJMUy_u4S2ihbpwr7AV%2Fimage.png?alt=media\&token=00456e60-f15a-45af-a6ae-aff38c8f6db3)

It should be clear that the output is a probability distribution: each element is non-negative and the sum over all components is 1.

>
