BU-EC526-Note
Finite Differences
$$f(x)=a_ix^i$$
Taylor expansion at x=x0 (n=0,1,2,…)
$$ f(x)=\sum_{n=0}^{\infty}\frac{1}{n!}x^n\frac{d^n}{dx^n}f(x_0)$$
Forward, Backward and Central Difference
$$
\begin{align}
\Delta_hf(x)=&\frac{f(x+h)-f(x)}{h}\
\widetilde{\Delta}{h}f(x)=&\frac{f(x)-f(x-h)}{h}\equiv-\Delta{-h}f(x)\
\Delta_{2h}f(x)=&\frac{f(x+h)-f(x-h)}{2h}\equiv \frac{1}{2}[\Delta_hf(x)+\widetilde\Delta_hf(x)]\
\end{align}
$$
Higher order and Underestimed Coefficients
$$\Delta_{h}x^{N}=\sum_{i=1}^{N}C[N][i]x^{N-1}h^{i-1}=Nx^{N-1}+\frac{N(N-1)}{2!}x^{N-2}h+\cdots+h^{N-1}$$
Computing C[n][i] recursively (Pascal Triangle):
1 | C[n][i] = 0; |
高h^n的中心差分的消去记得看作业
Integrals to Sums
$$\Delta_hy(x)=f(x)\quad\Longrightarrow\quad y(Nh)=\sum_{i=0}^{N-1}f(ih)h+c;$$
不同积分法看作业
Gaussian Integration
$$\int_{-1}^1f(x)dx\simeq\sum_{i=1}^Nw_if(x_i)$$
Where all coefficients $w_i$
$$w_i=\frac{2(1-x_i^2)}{[(n+1)P_{n+1}(x_i)]^2}$$
For intergral in [a,b], to make it $[a,b]\rightarrow [-1,1]$, do the rescaling
$$\begin{aligned}\int_a^bf(x’)dx’=\frac{(b-a)}{2}\int_{-1}^1f\left(\frac{b-a}{2}x+\frac{b+a}{2}\right)dx\end{aligned}$$
The solution of $w_i$ can be done in a recursive way. 见作业。
Parallelization Methods
- Threads
- OpenMP
- OpenACC
- MPI
- CUDA
Root Finding
Bisection
blabla
Newton
$$x\leftarrow x-f(x)/f’(x) \text{ until f(x)/f’(x)<Error}$$
time cost
For N-bit accuracy,
- Bisection: O(log2(N))
- Newton: O(log2(log2(N)))
Gaussain Elimination
blabla
Introduction to Matrices
For $$S[x_1,…,x_N]=\frac{1}{2}\sum_i(\sum_ja_{ij}x_j-b_i)^2$$
What is the minium of S?
$$\frac{\partial S}{\partial x_k}=\sum_ia_{ik}(\sum_ja_{ij}x_j-b_i)=0$$
Let’s define $A^{\intercal}=||a_{ji}||$, this implies
$$a_{ik}(a_{ij}x_j-b_i)=0\Leftrightarrow A^{\intercal}(Ax-b)=0$$
This means that if we have N vars and N eqns we can get the minium of S by
solving,
$$Ax=b$$
If we have too few vars we need to solve
$$A^{\intercal}Ax=A^{\intercal}b$$
Notice that [看作业4],
$$\chi^2=\sum_{i=0}^{N_0}\frac{(y_i-f(x_i))^2}{\sigma_i^2}=(y_i/\sigma_i-(x_i^n/\sigma_i)c_n)$$
so we can let $b_i=y_i/\sigma_i$ and $a_{in}=(x_i^n/sigma_i)$ to get the equation,
$$A^{\intercal}Ac=A^{\intercal}b\Longleftrightarrow\sum_i(x_i^{m+n}/\sigma_i^2)c_n=\sum_ix_i^my_i/\sigma_i^2$$
So the new $$A=||a_{nm}=\sum_i(x_i^{m+n}/\sigma_i^2)||$$and the new $$b_n=\sum_i\sum_jx_i^my_i/\sigma_i^2$$
solving $A^{\intercal}Ax=A^{\intercal}b$ becomes a problem called “pseudo-inverse”, aiming to minimizing the residuals in the “new” least square method problem.
Solving the Matrix Problem
Let’s start from a question about blurring: blur one pixel px[x][y] with itself and its 4 neighbors.
1 | px[x][y] = 0.25*(px[x+1][y] + px[x-1][y] + px[x][y+1] + px[x][y-1]); |
where $x,y \in [0,L-1]$. To make px compatible for periodic input, like some coord.>=L or <=0, do modules for each elements above:
1 | px[x][y] = 0.25*(px[(x+1)%L][y] + px[(x-1+L)%L][y] + px[x][(y+1)%L] + px[x][(y-1+L)%L]); |
Sometimes the pixel itself is important too, so we add a parameter r where $0\leq r\leq 1$ to show the porposion of the central poel itself:
1 | px[x][y] = (1-r)*px[x][y]+r/4*(px[(x+1)%L][y] + px[(x-1+L)%L][y] |
The operation above can be written as a stencil as below.
$$\left(\begin{array}{ccc}0&r/4&0\r/4&1-r&r/4\0&r/4&0\end{array}\right)$$
Let’s consider px as a function f, let t be a step iteration, we have
$$f(x,y,t+1)=(1-r)f(x,y,t)+\frac{r}{4}\left[f(x+1,y,t)+f(x-1,y,t)+f(x,y+1,t)+f(x,y-1,t)\right]$$
When $t\rightarrow t+\delta t$, we have
$$\begin{aligned}
f(x,y,t+\delta t)& =(1-r)f(x,y,t)+\frac{r}{4}\left[f(x+a,y,t)+f(x-a,y,t)+f(x,y+a,t)+f(x,y-a,t)\right] \
&\approx(1-r)f(x,y,t)+\frac{r}{4}\left[f(x,y,t)+a\frac{df}{dx}(x,y,t)+\frac{a^2}{2}\frac{d^2f}{dx^2}(x,y,t)\right. \
&+f(x,y,t)-a\frac{df}{dx}(x,y,t)+\frac{a^2}{2}\frac{d^2f}{dx^2}(x,y,t) \
&+f(x,y,t)-a\frac{df}{dy}(x,y,t)+\frac{a^{2}}{2}\frac{d^{2}f}{dy^{2}}(x,y,t) \
&+f(x,y,t)-a\frac{df}{dy}(x,y,t)+\frac{a^2}{2}\frac{d^2f}{dy^2}(x,y,t)\bigg] \
&=(1-r)f(x,y,t)+\frac{r}{4}\left[4f(x,y,t)+2\left(\frac{a^2}{2}\frac{d^2f}{dx^2}(x,y,t)+\frac{a^2}{2}\frac{d^2f}{dy^2}(x,y,t)\right)\right] \
&=f(x,y,t)+\frac{ra^{2}}{4}\nabla^{2}f(x,y,t),
\end{aligned}$$
where $\nabla^2$ is Laplacian. The conclusion above can be simplified as
$$f(x,y,t)+\delta t\frac{df}{dt}(x,y,t) =f(x,y,t)+\frac{ra^{2}}4\nabla^{2}f(x,y,t) $$
So we get an important equation in physics: the Diffusion Equation, where f will labeled appropriately as T the temperature.
$$
\frac{df}{dt}(x,y,t) =\frac{ra^2}{4\delta t}\nabla^2f(x,y,t)
$$
[P22有一些Jacobi和Gauss-Seidel的迭代的代码记得补算法本身]
Curve fitting & Error analysis
This chapter mainly talks about Lagrangian interpolating polynomial
THEOREM:
For given n+1 points $(x_i,y_i),i\in [0,n]$, there is only one Lagrangian polynomial L(x) that:
- L(x) has the degree <=k
- L(x) goes through every node, which means, $\forall i\in [0,n], L(x_i)=y_i$.
L(x) is repersented as
$$L(x)=\sum_{j=0}^ky_j\ell_j(x)$$
where $l_j(x)$ equals
$$\begin{aligned}
\ell_{j}(x)& =\frac{(x-x_0)}{(x_j-x_0)}\cdots\frac{(x-x_{j-1})}{(x_j-x_{j-1})}\frac{(x-x_{j+1})}{(x_j-x_{j+1})}\cdots\frac{(x-x_k)}{(x_j-x_k)} \
&=\prod_{\substack{0\leq m\leq k\m\neq j}}\frac{x-x_m}{x_j-x_m}.
\end{aligned}$$
and the generation of $L(x)$ can be represented as a Polynomial Interpolation algorithm:
1 | f(x) = 0 |
Let’s consider as follows:
- $L_i(x)$ should goes through $(x_i,y_i)$ and $(x_j,0)$ where $j\neq i$
- Adding all $l_i(x)$ together will get a curve that goes thourgh all these points.
FFT as recursion for divide and conquer
From continuous to discrete
Suppose we sample a periodic signal (with period T) at N times $y_n=f(t_k n )$ for $n=0…N-1$ at equal time intervalt$t_n=n\Delta t$. The Fouriese serise for a desicrete sample of a “time” sequence with N terns can be expanded like this:
$$y_n=a_0+\sum_{k=1}^{N_0}[a_k\cos(2\pi kn/N)+b_k\sin(2\pi kn/N)]$$
where $2N_i+1\geq N$. Frequency $\omega_k=2\pi k /N$.
Using Euler Equations
$$\cos(\theta)+i\sin(\theta)\equiv e^{i\theta}$$
we change the equation above into
$$y_n=\sum_{k=0}^{N-1}c_ke^{2\pi ikn/N}$$
it’s a linear equation, and solving $c_k$ by $y_k$ and Gaussian elimimation is a $O(N^3)$ approach.
but there’s a better trick, notice
$$\sum_ne^{2\pi ikn/N}e^{-2\pi ik’n/N}=\delta_{kk’}$$
we have
$$c_k=\frac{1}{N}\sum_{k=0}^{N-1}y_ke^{-i2\pi nk/N}$$
Let’s define $\omega_N^n\equiv e^{i2\pi n/N}$ as the “Nth roots of unit”. You can regard it as a batch of vectors on a unit circle, dividing the circle into N parts evenly. Or you can regard them as somehow a batch of “base” of vector for a linear space (though not so correct according to the definition).
FFT details
$$y_k\equiv\mathcal{FT}N[c_n]=\sum{n=0}^{N-1}e^{i2\pi nk/N}a_n=\sum_{n=0}^{N-1}(\omega_N^n)^ka_n$$
Use a binary representation of : $n=\sum 2^pn_p, n_p=0,1$
$$\omega_{N}^{n}=\omega_{N}^{n_{0}}\omega_{N/2}^{n_{1}}\omega_{N/4}^{n_{2}}…\omega_{2}^{n_{p}}$$
$$\sum_{n}\omega_{N}^{nk}=\sum_{n_{0}=0,1}\omega_{N/2}^{n_{1}k}\sum_{n_{1}=0,1}\omega_{N}^{n_{0}k}…\sum_{n_{p}=0,1}\omega_{2}^{n_{p}k}$$
Split the polynomial into even and odd parts, Then, we have 2 N/2 FTs:
-
if we do low bit first,
$$y_k=\sum_{n=0}^{N/2-1}e^{i2\pi nk/(N/2)}a_{2n}+\omega_N^k\sum_{n=0}^{N/2-1}e^{i2\pi nk/(N/2)}a_{2n+1}$$
lower k:
$$y_k=\sum_{n=0}^{N/2-1}e^{i2\pi n/(N/2)}[a_{2n}+\omega_N^ka_{2n+1}]$$
higher k:
$$y_{k+N/2}=\sum_{n=0}^{N/2-1}e^{i2\pi n/(N/2)}[a_{2n}-\omega_N^ka_{2n+1}$$ -
if we do high bit first,
$$y_k=\sum_{n=0}^{N/2-1}e^{i2\pi nk/N}a_n+e^{i\pi k}\sum_{n=0}^{N/2-1}e^{i2\pi nk/N}a_{n+N/2}]$$
even k:
$$y_{2\tilde{k}}=\sum_{n=0}^{N/2-1}e^{i2\pi n\tilde{k}/(N/2)}[a_n+a_{n+N/2}]$$
odd k:
$$y_{2\tilde{k}+1}=\sum_{n=0}^{N/2-1}e^{i2\pi n\tilde{k}/(N/2)}\omega_N^n[a_n-a_{n+N/2}]$$
This pattern as a data flow is called the butterfly network.
In summary,
$$\begin{aligned}
y_{k}&=\mathcal{FT}{N/2}[a{2n}+\omega_{N}^{k}a_{2n+1}] \
y_{k+N/2}& =\mathcal{FT}{N/2}[a{2n}-\omega_{N}^{k}a_{2n+1}] \
y_{2k}&=\mathcal{FT}{N/2}[a{n}+a_{n+N/2}] \
y_{2k+1}&=\mathcal{FT}{N/2}\omega{N}^{n}[a_{n}-a_{n+N/2}]
\end{aligned}$$
Multigrid Recursion to speed up Matrix Solvers
blabla