指在神经网络的每一层中,把同一个小批量(batch)里的数据,进行标准化,让数据分布更规整 in each layer of the neural network, doing standardization for each batch of data, to make these data mean be 0, and stand variance be 1. x' = (x - x mean) / sd - increase training speed/convergence - regularization, prevent overfitting - prevent gradient explosion/vanishing