This should look familiar, we are just replacing limits/expectations/etc with their empirical counterparts.
can be written as
\hat{f}(x) = \frac{1}{nh} \sum_{i=1}^n w \left(\frac{x-X_i}{h}\right)
w(x) = \left\{ \begin{array}{lr} 1/2 if |x| < 1 \\ 0 else\end{array}\right.
In general you can can write a kernel smoother as:
\hat{f}(x) = \frac{1}{nh} \sum_{i=1}^n K\left(\frac{x-X_i}{h}\right)
where \int K(x) dx =1 (this guarantees that \int \hat{f}(x) dx = 1 ) and h is the bandwidth.
We often care about things like MSE:
MSE(x) = \mathbb{E}\left[\left(\hat{f}(x) - f(x)\right)^2\right]
=\left(\mathbb{E}[\hat{f}(x)] - f(x)\right)^2 + {\rm Var}(\hat{f}(x))
E_{\hat{F}}[Y|X=x_0] = {\rm a.v.e.} \{ y_i; x_i = x_0\}
If the values of x_i are categorical we can estimate this directly.
If not we need to "borrow strength"
You've seen this before for linear regression
Define \{W_i(x)\}_{i=1}^{n} for each x and let
s(x) = \sum_{i=1}^n W_i(x) y_i
E[ Y | X ] = \int y f_{X,Y}(x,y) \, dy / f_X(x)
s(x) = \frac{ n^{-1}\sum_{i=1}^n K\left( \frac{x - x_i}{h} \right) y_i } { n^{-1}\sum_{i=1}^n K\left ( \frac{ x - x_i }{h} \right)}
Again we are basically just taking integrals and replacing them with sums. Noticing a theme here? Write down the theoretical parameter you are trying to estimate and then substitute empirical analogs.
Bias(x) = \int K(z) (f(x-hz) - f(z))dz
Var(x) = n^{-1} \int \frac{1}{h^2} K\left(\frac{x-y}{h}\right)^2 f(y)dy - n^{-1} \left(\int \frac{1}{h}K\left(\frac{x-y}{h}\right)f(y)dy \right)^2
Assume h = h_n \rightarrow 0 with nh_n \rightarrow 0 . If this is true then bias/variance go to zero as n\rightarrow \infty .
You can asymptotically minimize MSE(X) by solving \frac{\partial}{\partial h} MSE(x) = 0
You get something like this:
h_{opt} = n^{-1/5} \left(\frac{f(x)\int K^2(z)dz}{(f''(x)^2 (\int z^2 K(z)dz)^2)}\right)^{1/5}
X_1, \ldots, X_n \sim f(x_1,\ldots,x_d)
We can estimate a multivariate smoother
\hat{f}(x) = \frac{1}{nh^d} \sum_{i=1}^n K\left(\frac{x_i-X_i}{h}\right)
wher the kernel K(\cdot) is now a function on a d-dimensional vector satisfying
K(u) \geq 0 , \int_{\mathbb{R}^d} K(u)du = 1 , \int_{\mathbb{R}^d}uK(u)du = 0 and \int_{\mathbb{R}^d} uu^T K(u)du = I_d
If we consider spike firing as a
stochastic
process we can think of the firing rate also as the probability density of finding a spike at a certain instance of time. In this picture, is the rate of the underlying Poisson process that generates the spikes; cf. Section
5.2.3
. Stochastic rate models are therefore on the border line between analog rate models and noisy spiking neuron models. The main difference is that stochastic spiking neuron models such as the Spike Response Model with escape noise (cf. Section
) allows us to include refractoriness whereas a Poisson model does not (
).
A stochastic rate model in continuous time is defined by an inhomogeneous Poisson process. Spikes are formal events characterized by their firing time
t
_{
j
}
^{
(f)
}
where
j
is the index of the neuron and
f
counts the spikes. At each moment of time spikes are generated with rate (
t
) which depends on the input. It is no longer possible to calculate the input from a rate equation as in Eq.(
) since the input consists now of spikes which are point events in time. We set
In order to illustrate the relation with the deterministic rate model of Eq.(
), we discretize time in steps of length
t
= 1/ where is the maximum firing rate. In each time step the stochastic neuron is either active (
S
_{
i
}
= + 1) or quiescent (
S
_{
i
}
= 0). The two states are taken stochastically with a probability which depends continuously upon the input
h
_{
i
}
. The probability that a neuron is active at time
t
+
t
given an input
h
_{
i
}
at time
t
is
Closely related to the stochastic point of view is the notion of the rate as the average activity of a population of equivalent neurons. `Equivalent' means that all neurons have identical connectivity and receive the same type of input. Noise, however, is considered to be independent for each pair of neurons so that their response to the input can be different. We have seen in Section 1.5 that we can define a `rate', if we take a short time window t , count the number of spikes (summed over all neurons in the group) that occur in an interval t ... t + t and divide by the number of neurons and t . In the limit of N and t 0 (in this order), the activity A is an analog variable which varies in continuous time,
Let us assume that we have several groups of neurons. Each group l contains a large number of neurons and can be described by its activity A _{ l } . A simple phenomenological model for the interaction between different groups is
We will see later in Chapter
, that Eq.(
) is indeed a correct description of the fixed point of interacting populations of neurons, that is, if all activity values
A
_{
k
}
are, apart from fluctuations constant. As mentioned in Chapter
1.4
, the interpretation of the rate as a population activity is not without problems. There are hardly ensembles which would be large enough to allow sensible averaging and, at the same time, consist of neurons which are strictly equivalent in the sense that the internal parameters and the input is identical for all the neurons belonging to the same ensemble. On the other hand, neurons in the cortex are often arranged in groups (columns) that are roughly dealing with the same type of signal and have similar response properties. We will come back to the interpretation of Eq.(
5.137
) as a population activity in Chapter
6
.