Mitchell, Ch. 4

What are the characteristics of a neuron?

A biological entity that gathers different signals from other neurons and has the ability to process those signal into new signals and transmit them to still other neurons.
What are the characteristics of a synapse?

It is the biological entity through which one neuron passes its signal to one or more other neurons.
What are the characteristics of a node?

It gathers numerical inputs and transforms them into a single numerical output.
Explain the term linearly separable.

A binary classification problem is linearly separable if there is a linear function f(x), such that for x=(x1, ...,xn), f(x)=a0+a1x1+...+anxn, and x is in class 1 if and only if f(x)>= 0, and otherwise, x is in class 2.
The weight vector w = (w0, w1, w2, w3, w4) = (-5,3,2,7, 6) determines a hyperplane that divides 4-space into two portions. Give a mathematical description of those two portions.

The first portion consists of x that satisfy -5+3x1+2x2+7x3 +6x4>=0 and the second of those that satisy -5+3x1+2x2+7x3 +6x4<0.
Know to explain and apply the perceptron training rule.

Δw_i=η(t_d-o_d)x_id
The perceptron rule
If vectors w and x respectively define the weights and inputs of a neural node, describe its net input.

The net input is w*x+b, where b is a number and w*x is the dot product of w and x.
In terms of the net input, what is the transfer function of a neural node?

The function whose input is the net input and whose output represent the output of the node.
What is the transfer function of a perceptron?

The step function (sign) which has value 0 for x<0 and value 1 for all other x.
To what sort or problems might one apply gradient-descent?

Optimization problems where the function to be optimized is differentiable.
Define the concept of sum squared error?

If o_i is the estimate of t_i , then the sum-squared error is
∑_i(t_i -o_i)²
If E=(17 -3w1 +5w2)², calculate the gradient of E when w1= 7 and w2= -2.

∂E/∂w₁=2(17 -3w1 +5w2 )(-3)=2(17 -3(7)+5(-2 ))(-3) =2(-14)(-3)=84,
∂E/∂w₂=2(17 -3w1 +5w2 )(5)=2(17 -3(7)+5(-2 ))(5) =2(-14)(5)=-140, so the gradient of E is (84,-140).
Describe when the perceptron rule converges and to what.

The perceptron rule converges to an exact hypothesis when the data is linearly separable.
Describe when the Gradient Descent algorithm (also known as Widrow-Hoff rule) converges and to what.

The Widrow-Hoff rule converges to the minimal mean-square solution hypothesis in possibly unbounded time, and does so whether or not the data is linearly separable. It tolerates bad training data, but correspondingly, might fail to properly classify specific training data.
Give the mathematical formula for the logistic function σ.

σ(y) = 1/(1+e^-y).
Give a formula that describes the derivative of σ in terms of σ.

σ^’(y) = σ(y)(1-σ(y)).
What is a feedforward artificial NN?

An ANN for which every input node feeds its values only to the first hidden layer, nodes in all but the last hidden layer feed their output forward only to the next hidden layer, and the last hidden layer feeds its values only to the output layer.
What is a backpropagation network?

A backpropagation network is a feedforward network that uses sigmoid nodes in its output and hidden layers and propagates back the error (to update its weights) during training.
Discuss the relationships between network topology, inductive bias, and network generalization.

Choosing a very powerful network topology with many hidden layers and a large number of hidden nodes creates a weak inductive bias since it is more likely to include many hypotheses that approximate the target hypothesis on the training data. This tends to weaken the ability of the trained net to generalize.
Describe 1-of-n output coding.

In a classification problem, one maintains a different variable for each class. For any input, exactly one of these variables would have the value 1, and all others would have the value 0.