Discuss, Learn and be Happy דיון בשאלות

help brightness_4 brightness_7 format_textdirection_r_to_l format_textdirection_l_to_r

You are building a binary classifier for recognizing cucumbers (y=1) vs. watermelons (y=0). Which one of these activation functions would you recommend using for the output layer?

1
done
Note: The output value from a sigmoid function can be easily understood as a probability. Sigmoid outputs a value between 0 and 1 which makes it a very good choice for binary classification. You can classify as 0 if the output is less than 0.5 and classify as 1 if the output is more than 0.5. It can be done with tanh as well but it is less convenient as the output is between -1 and 1.
by
מיין לפי

Suppose you have built a neural network. You decide to initialize the weights and biases to be zero. Which of the following statements are True? (Check all that apply)

1
done
by
מיין לפי

Logistic regression’s weights w should be initialized randomly rather than to all zeros, because if you initialize to all zeros, then logistic regression will fail to learn a useful decision boundary because it will fail to “break symmetry”, True/False?

1
done
Logistic Regression doesn't have a hidden layer. If you initialize the weights to zeros, the first example x fed in the logistic regression will output zero but the derivatives of the Logistic Regression depend on the input x (because there's no hidden layer) which is not zero. So at the second iteration, the weights values follow x's distribution and are different from each other if x is not a constant vector.
by
מיין לפי

You have built a network using the tanh activation for all the hidden units. You initialize the weights to relative large values, using np.random.randn(..,..)*1000. What will happen?

1
done
tanh becomes flat for large values, this leads its gradient to be close to zero. This slows down the optimization algorithm.
by
מיין לפי

What is the "cache" used for in our implementation of forward propagation and backward propagation?

1
done
the "cache" records values from the forward propagation units and sends it to the backward propagation units because it is needed to compute the chain rule derivatives.
by
מיין לפי

Which of the following statements is true?

1
done
Note: You can check the lecture videos. I think Andrew used a CNN example to explain this.
by
מיין לפי

Vectorization allows you to compute forward propagation in an L-layer neural network without an explicit for-loop (or any other explicit iterative loop) over the layers l=1, 2, …,L. True/False?

1
done
Note: We cannot avoid the for-loop iteration over the computations among layers.
by
מיין לפי

During forward propagation, in the forward function for a layer l you need to know what is the activation function in a layer (Sigmoid, tanh, ReLU, etc.). During backpropagation, the corresponding backward function also needs to know what is the activation function for layer l, since the gradient depends on it. True/False?

1
done
During backpropagation you need to know which activation was used in the forward propagation to be able to compute the correct derivative.
by
מיין לפי

There are certain functions with the following properties: (i) To compute the function using a shallow network circuit, you will need a large network (where we measure size by the number of logic gates in the network), but (ii) To compute it using a deep network circuit, you need only an exponentially smaller network. True/False?

1
done
by
מיין לפי

Which of these statements about mini-batch gradient descent do you agree with?

1
done
Note: Vectorization is not for computing several mini-batches in the same time.
by
מיין לפי