Sachin AbeywardanaJekyll2017-09-08T22:50:37+00:00https://sachinruk.github.io/Sachin Abeywardanahttps://sachinruk.github.io/<![CDATA[Docker for Data Science]]>https://sachinruk.github.io/blog/Docker-for-Data-Science2017-08-24T00:00:00+00:002017-08-24T00:00:00+00:00Sachin Abeywardanahttps://sachinruk.github.io
<h1 id="docker-for-data-science">Docker for Data Science</h1>
<p>Docker is a tool that simplifies the installation process for software engineers. Coming from a statistics background I used to care very little about how to install software and would occasionally spend a few days trying to resolve system configuration issues. Enter the god-send Docker almighty.</p>
<p>Think of Docker as a light virtual machine (I apologise to the Docker gurus for using that term). Its <strong>underlying philosophy is that if it works on my machine it will work on yours</strong>.</p>
<h2 id="whats-in-it-for-data-scientists">What’s in it for Data Scientists</h2>
<ol>
<li>Time: The amount of time that you save on not installing packages in itself makes this framework worth it.</li>
<li><strong>Reproducible Research</strong>: I think of Docker as akin to setting the seed in a report. This makes sure that the analysis that you are generating will run on any other analysts machine.</li>
</ol>
<h2 id="how-does-it-work">How Does it Work?</h2>
<p>Docker employs the concept of (reusable) layers. So whatever line that you write inside the <code class="highlighter-rouge">Dockerfile</code> is considered a layer. For example you would usually start with:</p>
<pre><code class="language-Dockerfile">FROM ubuntu
RUN apt-get install python3
</code></pre>
<p>This Dockerfile would install <code class="highlighter-rouge">python3</code> (as a layer) on top of the <code class="highlighter-rouge">Ubuntu</code> layer.</p>
<p>What you essentially do is for each project you write all the <code class="highlighter-rouge">apt-get install</code>, <code class="highlighter-rouge">pip install</code> etc. commands into your Dockerfile instead of executing it locally.</p>
<p>I recommend reading the tutorial on https://docs.docker.com/get-started/ to get started on Docker. The <strong>learning curve is minimal</strong> (2 days work at most) and the gains are enormous.</p>
<h2 id="dockerhub">Dockerhub</h2>
<p>Lastly Dockerhub deserves a special mention. Personally Dockerhub is what makes Docker truly powerful. It’s what github is to git, a open platform to share your Docker images.</p>
<p>My Docker image for Machine Learning and data science is availale here: https://hub.docker.com/r/sachinruk/ml_class/</p>
<p><a href="https://sachinruk.github.io/blog/Docker-for-Data-Science/">Docker for Data Science</a> was originally published by Sachin Abeywardana at <a href="https://sachinruk.github.io">Sachin Abeywardana</a> on August 24, 2017.</p>
<![CDATA[DeepSchool.io]]>https://sachinruk.github.io/blog/DeepSchool.io2017-07-04T00:00:00+00:002017-07-04T00:00:00+00:00Sachin Abeywardanahttps://sachinruk.github.io
<h1 id="deepschoolio"><a href="http://www.deepschool.io">DeepSchool.io</a></h1>
<p>NodeSchool is one of the most inclusive software communities that I have come across. What I liked about it the most is its emphasis on writing code. There are so many meetups that I have been to where I simply listen to talks and go home without much of a takehome message.</p>
<p><a href="http://www.deepschool.io"><code class="highlighter-rouge">www.DeepSchool.io</code></a> is an open-source, community based project to teach the A-Z of Deep Learning (DL). This project came out of a weekly class that I did at Arbor Networks where I work as a Data Scientist.</p>
<p>Personally I come from a background where I did a PhD in Machine Learning. However, with the development of tools such as Keras, DL has become a lot more accessible to the general community.</p>
<p>Even with these available tools teaching Deep Learning can be quite difficult. The first lesson I did was a complete train wreck. I had forgotten where I started and jumped straight into a multi layered Deep Net. I Took for granted that people would understand what a loss function is, and what regression vs logisitic regression is. <img src="https://sachinruk.github.io/images/d6d.jpg" alt="owl" /></p>
<p>Conversely I did not want to spend too much time on the mathematics either. I wanted to create something that would get people <strong>tackling DL problems fast instead of diving too deep into the theory</strong>. I spent 6 months or so on Andrew Ngs DL course that did go through the theory. This unfortunately did not equip me with the tools necessary towards actually being comfortable with using DL in any meaningful way. The goal is to focus on the bigger picture of what you can do with DL.</p>
<h2 id="goals">Goals</h2>
<ol>
<li>Make Deep Learning easier (minimal code).</li>
<li>Minimise required mathematics.</li>
<li>Make it practical (runs on laptops).</li>
<li>Open Source Deep Learning Learning.</li>
<li>Grow a <strong>collaborating practical community</strong> around DL.</li>
</ol>
<p>The assumed knowledge is that you are able to code in Python. I make all code available in Jupyter Notebooks for the sole reason being that you can interact with it. Running on a single python script decreases this interactivity.</p>
<p>I also use Docker containers along with Docker-compose so that I don’t have to deal with installation issues. This tends to take up upwards of half an hour at some workshops. Mind you, the current container that I have put up uses 3GB of space.</p>
<h2 id="call-for-contributions">Call for Contributions</h2>
<p>There is still much to do with Deep School. These are some of the most important requirements in order of importance:</p>
<ol>
<li>Use the tutorials!</li>
<li>Help with documenting tutorials (there are parts I could have explained better).</li>
<li>Contribute tutorials. At the time of writing I am yet to do a LSTM tutorial. Furthermore I am yet to provide the more advanced tutorials such as Attention Networks, Generative Adversarial Networks, Reinforcement Learning etc.</li>
<li>Help me setup a website/ forum. I have limited experience with websites. It would be good to provide a <code class="highlighter-rouge">NodeSchool.io</code> style webpage so that we could spread the message.</li>
</ol>
<p><a href="https://sachinruk.github.io/blog/DeepSchool.io/">DeepSchool.io</a> was originally published by Sachin Abeywardana at <a href="https://sachinruk.github.io">Sachin Abeywardana</a> on July 04, 2017.</p>
<![CDATA[Keras LSTMs]]>https://sachinruk.github.io/blog/Keras-LSTM2016-10-20T00:00:00+00:002016-10-20T00:00:00+00:00Sachin Abeywardanahttps://sachinruk.github.io
<p>Keras has been one of the really powerful Deep Learning libraries that allow you to have a Deep Net running in a few lines of codes. Best part, don’t worry about the math. In the following videos you will find how to implement a popular Recursive Neural Net (RNN) called Long Short Term Memory RNNs (LSTM).</p>
<p>Note: You could easily replace the LSTM units with Gated Recurrent Units (GRU) with the same function call.</p>
<p>Source code: https://github.com/sachinruk/PyData_Keras_Talk/blob/master/cosine_LSTM.ipynb</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/ywinX5wgdEU" frameborder="0" allowfullscreen=""></iframe>
<iframe width="560" height="315" src="https://www.youtube.com/embed/e1pEIYVOtqc" frameborder="0" allowfullscreen=""></iframe>
<h3 id="faq">FAQ:</h3>
<ol>
<li>Why do we need a Dense Layer?
The output is still one dimensional (y) and therefore the 32 hidden layers need to be projected down to one. Hence the dense layer is used.</li>
<li>How do you decide number of layers and number of nodes in each layer?
Personally for me this is trial and error. Generally larger number of layers (deeper) is better than going wide (more nodes). But I usually limit myself to 5 at most unless there is a truly large dataset (100MB+)</li>
</ol>
<h3 id="references">References</h3>
<ol>
<li>To understand the maths behind LSTM:
http://colah.github.io/posts/2015-08-Understanding-LSTMs/</li>
<li>For another guide to Keras LSTMs:
http://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/</li>
<li>If you are still confused (try my stackoverflow post):
http://stackoverflow.com/questions/38714959/understanding-keras-lstms</li>
</ol>
<p><a href="https://sachinruk.github.io/blog/Keras-LSTM/">Keras LSTMs</a> was originally published by Sachin Abeywardana at <a href="https://sachinruk.github.io">Sachin Abeywardana</a> on October 20, 2016.</p>
<![CDATA[Deep Learning Quantile Regression - Keras]]>https://sachinruk.github.io/blog/Quantile-Regression2016-10-16T00:00:00+00:002016-10-16T00:00:00+00:00Sachin Abeywardanahttps://sachinruk.github.io
<p>The loss function is simple as doing the following. Which is simply the pin-ball loss function.</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="k">def</span> <span class="nf">tilted_loss</span><span class="p">(</span><span class="n">q</span><span class="p">,</span><span class="n">y</span><span class="p">,</span><span class="n">f</span><span class="p">):</span>
<span class="n">e</span> <span class="o">=</span> <span class="p">(</span><span class="n">y</span><span class="o">-</span><span class="n">f</span><span class="p">)</span>
<span class="k">return</span> <span class="n">K</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">K</span><span class="o">.</span><span class="n">maximum</span><span class="p">(</span><span class="n">q</span><span class="o">*</span><span class="n">e</span><span class="p">,</span> <span class="p">(</span><span class="n">q</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span><span class="o">*</span><span class="n">e</span><span class="p">),</span> <span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>
</code></pre>
</div>
<p>When it comes to compiling the neural network, just simply do:</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">model</span><span class="o">.</span><span class="nb">compile</span><span class="p">(</span><span class="n">loss</span><span class="o">=</span><span class="k">lambda</span> <span class="n">y</span><span class="p">,</span><span class="n">f</span><span class="p">:</span> <span class="n">tilted_loss</span><span class="p">(</span><span class="mf">0.5</span><span class="p">,</span><span class="n">y</span><span class="p">,</span><span class="n">f</span><span class="p">),</span> <span class="n">optimizer</span><span class="o">=</span><span class="s">'adagrad'</span><span class="p">)</span>
</code></pre>
</div>
<p>I chose 0.5 which is the median, but you can try whichever quantile that you are after. Word of caution, which applies to any quantile regression method; you may find that the quantile output might be extreme/ unexpected when you take extreme quantiles (eg. 0.001 or 0.999).</p>
<p>A more complete working example can be found <a href="https://github.com/sachinruk/KerasQuantileModel/blob/master/Keras%20Quantile%20Model.ipynb">here</a>.</p>
<p><a href="https://sachinruk.github.io/blog/Quantile-Regression/">Deep Learning Quantile Regression - Keras</a> was originally published by Sachin Abeywardana at <a href="https://sachinruk.github.io">Sachin Abeywardana</a> on October 16, 2016.</p>
<![CDATA[XgBoost - Machine Learning made EASY!]]>https://sachinruk.github.io/blog/XgBoost2016-08-08T00:00:00+00:002016-08-08T00:00:00+00:00Sachin Abeywardanahttps://sachinruk.github.io
<p>One of the machine learning frameworks that has been exploding on the Kaggle scene has been Xgboost. In my personal experience it has been an extremely powerful machine learning algorithm, beating random forests on most problems I’ve played around with.</p>
<p>The following video is a quick introduction to XgBoost.</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/87xRqEAx6CY" frameborder="0" allowfullscreen=""></iframe>
<p><a href="https://sachinruk.github.io/blog/XgBoost/">XgBoost - Machine Learning made EASY!</a> was originally published by Sachin Abeywardana at <a href="https://sachinruk.github.io">Sachin Abeywardana</a> on August 08, 2016.</p>
<![CDATA[Reversible jump MCMC]]>https://sachinruk.github.io/blog/Reversible-Jump-MCMC2015-10-20T00:00:00+00:002015-10-20T00:00:00+00:00Sachin Abeywardanahttps://sachinruk.github.io
<h1 id="reversible-jump-mcmc">Reversible jump MCMC</h1>
<p>Reversible jump MCMC is a Bayesian algorithm to infer the number of components/ clusters from a set of data. For this illustration we shall consider a two component model at most.</p>
<h2 id="model">Model</h2>
<p>The likelihoods can be represented as:
<script type="math/tex">% <![CDATA[
\begin{align}
p(y_i|\lambda_{11},k=1)=&\lambda_{11}\exp(-\lambda_{11}y_i)\\
p(y_i|\lambda_{12},\lambda_{22},k=2,z_i)=&\prod_j (\lambda_{j2}\exp(-\lambda_{j2}y_i))^{1(z_i=j)}
\end{align} %]]></script></p>
<p>The priors on the latent variables are:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align}
p(\lambda_{jk})\propto & \frac{1}{\lambda_{jk}}\qquad \lambda_{jk}\in[a,b]\\
p(z_i=1)=&\pi\\
p(\pi) = & \text{Dir}(\alpha)
p(k=j)= & 1/K
\end{align} %]]></script>
<h2 id="jumping-dimensions">Jumping dimensions</h2>
<p>We need to consider a Metropolis-Hastings (MH) step to consider going from one component to two components. The MH step in general is as follows:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align}
\alpha = & \frac{p(y,\theta_2^{t+1})}{p(y,\theta_1^t)}\frac{q(\theta_1^t|\theta_2^{t+1})}{q(\theta_2^{t+1}|\theta_1^{t})}\\
A = & \text{min}\left(1,\alpha\right)
\end{align} %]]></script>
<p>where,</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align}
p(y_i,\theta_2)=& p(y|\lambda_{12},\lambda_{22},\pi)p(\lambda_{12})p(\lambda_{22})p(\pi)\\
=&\pi p(y_i|\lambda_{12})+(1-\pi) p(y_i|\lambda_{22})
\end{align} %]]></script>
<h3 id="jumping-from-1-dim-to-2">Jumping from 1 dim to 2</h3>
<p>In this case let the parameters <script type="math/tex">\theta=\{\cup_j\lambda_{jk},k,\pi\}</script> . As we can let the proposal distribution be anything, we let <script type="math/tex">q(\theta_1\to\theta_2)</script> as follows:
<script type="math/tex">\begin{align}
q(\lambda_{j2},\pi,k=2|k=1,\lambda_{11})=q(\lambda_{j2}|k=2,\lambda_{11})q(\pi|k=2)q(k=2|k=1)
\end{align}</script></p>
<p>We let the <strong>proposal</strong> <script type="math/tex">q(k=2\vert k=1)=1</script>. We also have the following dimensional jump:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align}
\mu_1,\mu_2\sim & U(0,1)\\
\lambda_{12}=&\lambda_{11}\frac{\mu_1}{1-\mu_1}\\
\lambda_{22}=&\lambda_{11}\frac{1-\mu_1}{\mu_1}\\
\pi=&\mu_2
\end{align} %]]></script>
<p>Thus, in order to find the distribution (q(\lambda_{j2}\vert k=2,\lambda_{11})) we use the change of variable identity that <script type="math/tex">q(\lambda_{j2}\vert k=2,\lambda_{11})=q(\mu_1)\vert J\vert</script> where, <script type="math/tex">J</script> is the jacobian <script type="math/tex">\frac{\partial(\lambda_{11},\mu_1)}{\partial(\lambda_{12},\lambda_{22})}</script>. The Jacobian determinant is found to be <script type="math/tex">\frac{\mu_1(1-\mu_1)}{2\lambda_{11}}</script> while <script type="math/tex">q(\mu_1)=q(\mu_2)=1</script> since they are sampled from standard uniform distributions. Also (q(\mu_2)=q(\pi\vert k=2)).</p>
<table>
<tbody>
<tr>
<td>Since we need the ratio of proposed states ( \frac{q(\theta_1^t</td>
<td>\theta_2^{t+1})}{q(\theta_2^{t+1}</td>
<td>\theta_1^{t})} ) we are also required to find ( q(\lambda_{11},k=2\vert\lambda_{2j},\pi,k=1) = q(\lambda_{11}\vert\lambda_{2j},k=2) q(k=1 \vert k=2) ). We again take ( q(k=1\vert k=2)=1 ). (q(\lambda_{11}=\sqrt{\lambda_{12}\lambda_{22}})=1)</td>
</tr>
</tbody>
</table>
<h3 id="jumping-from-2-to-1">Jumping from 2 to 1</h3>
<p>The MH step is conducted using the reciprocal of $\alpha$ in the equation above.</p>
<h2 id="rjmcmc-algorithm">RJMCMC Algorithm</h2>
<p><a href="https://sachinruk.github.io/blog/Reversible-Jump-MCMC/">Reversible jump MCMC</a> was originally published by Sachin Abeywardana at <a href="https://sachinruk.github.io">Sachin Abeywardana</a> on October 20, 2015.</p>
<![CDATA[Chinese Restuarant Process]]>https://sachinruk.github.io/blog/Chinese-Restaurant-Process2015-10-09T00:00:00+00:002015-10-09T00:00:00+00:00Sachin Abeywardanahttps://sachinruk.github.io
<p>In this instance we generate the parameters <script type="math/tex">\theta_k</script> from <script type="math/tex">\mathcal{N}(\mathbf{0},3\mathbf{I})</script>. The data is generated from <script type="math/tex">\mathcal{N}(\theta_k,0.1\mathbf{I})</script>. Where <script type="math/tex">k</script> is the table. Table allocation is the main part of the CRP which is determined by:
<script type="math/tex">% <![CDATA[
\begin{align}
k=\begin{cases}
\text{new table } & \text{with prob = } \frac{\alpha}{\alpha+n-1}\\
\text{table k } & \text{with prob = } \frac{n_k}{\alpha+n-1}
\end{cases}
\end{align} %]]></script>
where <script type="math/tex">n_k</script> is the number of customers at table <script type="math/tex">k</script>.</p>
<p>The associated ipython notebook is <a href="https://github.com/sachinruk/sachinruk.github.io/blob/master/_posts/Stats%20Blog/CRP.ipynb">located here</a>.</p>
<p><a href="https://sachinruk.github.io/blog/Chinese-Restaurant-Process/">Chinese Restuarant Process</a> was originally published by Sachin Abeywardana at <a href="https://sachinruk.github.io">Sachin Abeywardana</a> on October 09, 2015.</p>
<![CDATA[Free Education]]>https://sachinruk.github.io/thoughts/Free-Education2015-10-01T00:00:00+00:002015-10-01T00:00:00+00:00Sachin Abeywardanahttps://sachinruk.github.io
<p>It’s a given in most countries that access to primary and secondary education is compulsory and be free of charge. However, not all countries have access to free tertiary education, and is highly competitive to get into the ones that are. It is a shame that students with richer backgrounds simply travel to countries where University education can be accessed via payment entry, while students who would have made productive better employees are trapped into a circle of poverty.</p>
<p>I strongly believe that giving everyone an equal opportunity in this world would lead to the gap between the rich and the poor marginalising. Even though governments around the world will not act to give everyone who deserve a chance a fair go, there are plenty of methods to reach the same goal.</p>
<p>The Internet is a powerful tool that perhaps you do not quite realise its impact. Today there are quite a few universities that offer free courses under the name opencourseware. Look at the video below to see what is offered. Stanford, MIT, Yale and some of the other top Universities in the world offer these courses and not to mention the top quality lecturers. There are other websites such as codecademy, khanacademy, youtube etc. that will provide you with the answers that you may be looking for.</p>
<p>In France (or so I believe it is) it is a basic human right to be able to access broadband of atleast 1Mbps. Taking my home country (Sri Lanka) as an example we are beginning to see that internet access is cheap and being readily available for even disadvantaged communities.</p>
<p>I personally have learnt much of my material for my PhD through these free videos. When I watch some of these videos I think to myself, why did I bother to go to lectures when there were these wonderful teachers who knew HOW TO TEACH. It’s a shame to think that a lot of the lecturers I have met who were simply good researchers yet terrible teachers.</p>
<p>In concluding, humanity is built on the knowledge of others. So contribute your knowledge, pass on what you have.</p>
<p><a href="https://sachinruk.github.io/thoughts/Free-Education/">Free Education</a> was originally published by Sachin Abeywardana at <a href="https://sachinruk.github.io">Sachin Abeywardana</a> on October 01, 2015.</p>
<![CDATA[Stanford Deep Learning]]>https://sachinruk.github.io/blog/Deep-Learning2015-09-21T00:00:00+00:002015-09-21T00:00:00+00:00Sachin Abeywardanahttps://sachinruk.github.io
<p>One of the best Neural Networks/ Deep Learning tutorials can be found <a href="http://ufldl.stanford.edu/tutorial/">here</a>. It is written by Andrew Ng, or as I think of him the best Machine Learning Sensei.</p>
<p>I have been doing the tutorial recently and will be giving hints as I go. So far I am upto the Convolutional Neural Network part (as of Sep 21).</p>
<p>The code is available in my github repository: <a href="https://github.com/sachinruk/Standford_DL">https://github.com/sachinruk/Standford_DL</a></p>
<h2 id="important-hints">Important Hints</h2>
<ol>
<li>When doing the 1st Deep Neural Network exercise (<code class="highlighter-rouge">supervised_dnn_cost.m</code>) remember that error component <script type="math/tex">\delta^{(l)}</script> is calculated for each individual example seperately. When calculating the gradient for <script type="math/tex">W</script> we use <script type="math/tex">\delta^{(l+1)} {a^{(l)}}^T</script>. The transpose is important.</li>
</ol>
<p>This blog post will be edited in the coming days.
#</p>
<p><a href="https://sachinruk.github.io/blog/Deep-Learning/">Stanford Deep Learning</a> was originally published by Sachin Abeywardana at <a href="https://sachinruk.github.io">Sachin Abeywardana</a> on September 21, 2015.</p>
<![CDATA[Silence the Lambs]]>https://sachinruk.github.io/thoughts/Silence-the-lambs2015-08-22T00:00:00+00:002015-08-22T00:00:00+00:00Sachin Abeywardanahttps://sachinruk.github.io
<p>The past week in Australian politics has seen our fearless leader proclaim war on ‘environmentalism’. The proposed law would see activists banned from protesting projects unless they were directly affected. Sounds familiar? A few months back the same government made it a criminal offence for anyone to report on the state of refugees in detention camps.</p>
<p>Putting aside whatever your political views are on refugess, asylum seekers, the greater threat that should be obvious is the calculated retraction of freedom of speech. It is a human right to be able to communicate our desires whether they affect us directly or not.</p>
<p>The Carmichael Coal project that got canned cost approximately 1500 jobs, nevermind the 1000s of renewable energy jobs that were wiped since Abbott came into power. Nevermind the fact that burning coal will further the impact on climate change.</p>
<p>Staying silent on an issue that is affecting the entire planet is selfish. Being forced to stay silent is a crime. I will leave you with this:</p>
<div id="fb-root"></div>
<script>(function(d, s, id) { var js, fjs = d.getElementsByTagName(s)[0]; if (d.getElementById(id)) return; js = d.createElement(s); js.id = id; js.src = "//connect.facebook.net/en_GB/sdk.js#xfbml=1&version=v2.3"; fjs.parentNode.insertBefore(js, fjs);}(document, 'script', 'facebook-jssdk'));</script>
<div class="fb-video" data-allowfullscreen="1" data-href="/theprojecttv/videos/vb.107787018440/10152808607343441/?type=1"><div class="fb-xfbml-parse-ignore"><blockquote cite="https://www.facebook.com/theprojecttv/videos/10152808607343441/"><a href="https://www.facebook.com/theprojecttv/videos/10152808607343441/"></a><p>Waleed on Australia's Renewable Energy Target #TheProjectTV (written by Tom Whitty @twhittyer)</p>Posted by <a href="https://www.facebook.com/theprojecttv">The Project</a> on Thursday, 16 April 2015</blockquote></div></div>
<p><a href="https://sachinruk.github.io/thoughts/Silence-the-lambs/">Silence the Lambs</a> was originally published by Sachin Abeywardana at <a href="https://sachinruk.github.io">Sachin Abeywardana</a> on August 22, 2015.</p>
<![CDATA[von Mises-Fisher Distribution]]>https://sachinruk.github.io/blog/von-Mises-Fisher2015-08-10T00:00:00+00:002015-08-10T00:00:00+00:00Sachin Abeywardanahttps://sachinruk.github.io
<p>The von Mises Fisher Distribution is a multivariate distribution on a hyper sphere. I have decided to share the expectation and covariance of the vMF distribution. The Wikipedia page doesn’t give much info of this distribution.</p>
<h2 id="expectation-of-vmf-distribution">Expectation of vMF distribution</h2>
<p>Let \(C\) be the normalising constant.</p>
<script type="math/tex; mode=display">\int_{||\mathbf{x}||_2=1}\exp(\kappa\mathbf{\mu}^T\mathbf{x})\,d\mathbf{x} = \frac{(2\pi)^{d/2-1} I_{d/2-1}(\kappa)}{\kappa^{d/2-1}}=C</script>
<p>Let \(\mathbf{y}=\kappa\mathbf{\mu}\). Therefore \(\kappa=\sqrt{\mathbf{y}^T\mathbf{y}}\).</p>
<script type="math/tex; mode=display">\begin{align}
\frac{d\kappa}{d\mathbf{y}}=\frac{1}{2}\frac{\mathbf{y}}{\sqrt{\mathbf{y}^T\mathbf{y}}}=\frac{\kappa\mathbf{\mu}}{\kappa}=\mathbf{\mu}
\end{align}</script>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align}
\int \mathbf{x} \exp(\mathbf{y}^T \mathbf{x}) d\mathbf{x} =& \frac{d}{d\mathbf{y}} \int \exp(\mathbf{y}^T \mathbf{x}) d\mathbf{x}\\
=& \frac{d\kappa}{d\mathbf{y}} \frac{d}{d\kappa} \int \exp(\mathbf{y}^T \mathbf{x}) d\mathbf{x} \\
=& \mathbf{\mu} \frac{d}{d\kappa} \frac{(2\pi)^{d/2-1} I_{d/2-1}(\kappa)}{\kappa^{d/2-1}} \\
=& \mathbf{\mu} \left(\frac{I'_{d/2-1}(\kappa)}{I_{d/2-1}(\kappa)} - \frac{d/2-1}{\kappa}\right) \frac{(2\pi)^{d/2-1} I_{d/2-1}(\kappa)}{\kappa^{d/2-1}}\\
E(\mathbf{x}) =& \frac{\int \mathbf{x} \exp(\mathbf{y}^T \mathbf{x}) d\mathbf{x}}{\int \exp(\mathbf{y}^T \mathbf{x}) d\mathbf{x}} = \mathbf{\mu} \left(\frac{I'_{d/2-1}(\kappa)}{I_{d/2-1}(\kappa)} - \frac{d/2-1}{\kappa}\right)\\
E(\mathbf{x}) =& \frac{I_{d/2}(\kappa)}{I_{d/2-1}(\kappa)}\mathbf{\mu}
\end{align} %]]></script>
<p>This is an interesting result because its saying that the mean of a von Mises-Fisher distribution is NOT \(\mathbf{\mu}\). It is infact multiplied a constant <script type="math/tex">\frac{I_{d/2}(\kappa)}{I_{d/2-1}(\kappa)}</script> which is between \((0,1)\). If you think about a uniformly distributed vMF this makes sense (\(\kappa\to 0\)). If we average all those vectors pointing in different directions it averages very close to 0. This whole ‘averaging’ of unit vectors is what makes the expected value not equal \(\mathbf{\mu}\) but a vector pointing in the same direction but smaller in length.</p>
<p>##Covariance of von Mises-Fisher Distribution</p>
<p>Using the same differential approach we can find <script type="math/tex">E(\mathbf{xx}^T)</script> and hence the covariance by using the identity <script type="math/tex">cov(\mathbf{x},\mathbf{x})=E(\mathbf{xx}^T)-E(\mathbf{x})E(\mathbf{x})^T</script>. Hence the covariance is,</p>
<script type="math/tex; mode=display">\begin{align}
\frac{h(\kappa)}{\kappa}\mathbf{I}+\left(1-2\frac{\nu+1}{\kappa}h(\kappa)-h(\kappa)^2\right)\mathbf{\mu}\mathbf{\mu}^T
\end{align}</script>
<p>where <script type="math/tex">h(\kappa)=\frac{I_{\nu+1}(\kappa)}{I_{\nu}(\kappa)}</script> and <script type="math/tex">\nu=d/2-1</script>.</p>
<p><a href="https://sachinruk.github.io/blog/von-Mises-Fisher/">von Mises-Fisher Distribution</a> was originally published by Sachin Abeywardana at <a href="https://sachinruk.github.io">Sachin Abeywardana</a> on August 10, 2015.</p>
<![CDATA[Sample Variance]]>https://sachinruk.github.io/blog/Sample-Variance2015-08-06T00:00:00+00:002015-08-06T00:00:00+00:00Sachin Abeywardanahttps://sachinruk.github.io
<p>People often question why is there a “n-1” term when I calculate the variance. Why not divide through by “n”. Most stats courses dismiss this question by saying, “oh, that’s because you lose a degree of freedom”. What is a degree of freedom. In the video below we ignore this notion of degree of freedom and answer where the “n-1” came from when calculating sample variance.</p>
<iframe width="420" height="315" src="https://www.youtube.com/embed/xG8DK45H-5U" frameborder="0" allowfullscreen=""></iframe>
<p><a href="https://sachinruk.github.io/blog/Sample-Variance/">Sample Variance</a> was originally published by Sachin Abeywardana at <a href="https://sachinruk.github.io">Sachin Abeywardana</a> on August 06, 2015.</p>
<![CDATA[Normal Distribution]]>https://sachinruk.github.io/blog/Normal-Distribution2015-08-02T00:00:00+00:002015-08-02T00:00:00+00:00Sachin Abeywardanahttps://sachinruk.github.io
<p>No stats blog would be complete without a discussion of the Gaussian distribution. In the following video I discuss how to obtain the mean and variance of a Gaussian. You do need some knowledge of integration.</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/1Hj9YCz52u8" frameborder="0" allowfullscreen=""></iframe>
<p>Oh, and this is a statisticians “hello world”.</p>
<p><a href="https://sachinruk.github.io/blog/Normal-Distribution/">Normal Distribution</a> was originally published by Sachin Abeywardana at <a href="https://sachinruk.github.io">Sachin Abeywardana</a> on August 02, 2015.</p>