בן גוריון - Deep learning Deep Learning Quiz

לחץ כאן לכל השאלות

You have built a network using the tanh activation for all the hidden units. You initialize the weights to relative large values, using np.random.randn(..,..)*1000. What will happen?

1
done
tanh becomes flat for large values, this leads its gradient to be close to zero. This slows down the optimization algorithm.
by
מיין לפי

* השאלה נוספה בתאריך: 22-06-2018