Discuss, Learn and be Happy דיון בשאלות

help brightness_4 brightness_7 format_textdirection_r_to_l format_textdirection_l_to_r

Suppose batch gradient descent in a deep network is taking excessively long to find a value of the parameters that achieves a small value for the cost function J(W[1],b[1],...,W[L],b[L]). Which of the following techniques could help find parameter values that attain a small value forJ? (Check all that apply)

1
sentiment_very_satisfied
done
done
done
by
מיין לפי

If searching among a large number of hyperparameters, you should try values in a grid rather than random values, so that you can carry out the search more systematically and not rely on chance. True or False?

1
done
Note: Try random values, don't do grid search. Because you don't know which hyperparamerters are more important than others.
by
מיין לפי

Every hyperparameter, if set poorly, can have a huge negative impact on training, and so all hyperparameters are about equally important to tune well. True or False?

1
done
by
מיין לפי

During hyperparameter search, whether you try to babysit one model (“Panda” strategy) or train a lot of models in parallel (“Caviar”) is largely determined by:

1
done
by
מיין לפי

Which of these statements about deep learning programming frameworks are true? (Check all that apply)

1
done
done
by
מיין לפי

In machine translation, if we carry out beam search without using sentence normalization, the algorithm will tend to output overly short translations

1
done
by
מיין לפי

Suppose you learn a word embedding for a vocabulary of 10000 words. Then the embedding vectors should be 10000 dimensional, so as to capture the full range of variation and meaning in those words

1
done
by
מיין לפי

What is t-SNE?

1
done
by
מיין לפי

To which of these tasks would you apply a many-to-one RNN architecture?

1
sentiment_very_satisfied
done
by
מיין לפי