if the first few iterations of gradient descent increase the loss rather than decrease, then the most likely cause is that we have set the learning rate to too large a value
(מועד ב 2022)
if the first few iterations of gradient descent increase the loss rather than decrease, then the most likely cause is that we have set the learning rate to too large a value
(מועד ב 2022)
* השאלה נוספה בתאריך: 14-07-2025