If alpha were small enough, the gradient descent should always successfully take a tiny, small downhill and decrease f(theta0, theta1) at least a little bit. if gradient descent instead increases the objective value, that means alpha is too large (or you have a bug in your code!)