6 files changed, 62 insertions, 7 deletions
diff --git a/content/bibliography.tex b/content/bibliography.tex
index 6b41fa9..1521ac2 100644
--- a/content/bibliography.tex
+++ b/content/bibliography.tex
@@ -4,6 +4,8 @@
 
 \begin{thebibliography}{99} % Bibliography - this is intentionally simple in this template
 
+\printbibliography
+
 \end{thebibliography}
 
 %----------------------------------------------------------------------------------------
diff --git a/content/cmix.tex b/content/cmix.tex
index 0fe3797..855f439 100644
--- a/content/cmix.tex
+++ b/content/cmix.tex
@@ -136,10 +136,4 @@ The fact that CMix minimizes the amount of public-key cryptographic operations f
 
 Another advantage is that in theory the latency of the actual messages during the real-time phase should be lower than other mix networks. Again due to the lack of public-key cryptography during this phase.
 
-\section{}
-
-
-
-
-
 
diff --git a/content/implementation.tex b/content/implementation.tex
new file mode 100644
index 0000000..f2ad497
--- /dev/null
+++ b/content/implementation.tex
@@ -0,0 +1,33 @@
+\section{Elgamal in Cyclic Group and Elliptic Curve}
+
+The goal of this research is to see how differently these two ways of using Elgamal in this crypto scheme effect things like latency and troughput in CMix. But also doing this in as reusable way possible so that the implementation is of use for further research.
+
+Unfortunately elgamal is not a very popular encryption algorithm in modern crypto libraries. and even if it is supported with a high level interface it is very rigid and wouldn't allow for any secret sharing as needed in CMix. So because of this I created my own CMix library in ``C'' which abstracts the lower lever crypto primitives for both ``cyclic group'' and ``elliptic curve'' elgamal. This makes it easy to switch between implementations and test without changing any of the application code. This library is written in ``C'' to make interfacing with it from other languages used in the application level easier. For the application level code I used ``C++''
+
+I ended up implementing the cryptographic operations with libgcrypt. This library provides the primitives I needed for both cyclic group and elliptic curve variants of the algorithm. This did not mean that the library took care of everything and some creativity was required to implement CMix especially for the elliptic curve algorithm.
+
+
+\subsection{Wire format}
+So for cmix to work we need to send group elements of the cryptographic algorithms from one node to the next. For cyclic group this is easy. They are just big integers with a max size and can be sent as arrays of a fixed length. For elliptic curve however there is no standard for sending elliptic curve points. So the choice was made to send both coordinates.
+
+This choice was made because libgcrypt makes it hard to find, for instance in Ed25519, the x coordinate corresponding to the y coordinate you just received. The only way I found to do so was convert the y coordinate to a hexadecimal string. reverse the string to have the big-endian representation. Prepend the byte ``0x40'' to the string and import it with ``gcry\_mpi\_ec\_decode\_point'' I tried to avoid this scenario but the documentation on the subject of how ed25519 points are handled when missing their x coordinate, and how to properly initialize the point with only Y and Z coordinates made me take the safe route. So note that there is probably room for improvement in the Elliptic curve implementation.
+
+So as to not worry network level things I used Google protobuf to serialize these messages further. This also means that you could have mutliple nodes which are implemented in different languages interact with each other.
+
+\subsection{Mapping strings to Ed25519 points}
+
+There is however the Koblitz's method\cite{Koblitz} This is a probabilistic method of finding a curve point. it starts by choosing a number that will be used as a  ``stride''. The string you want to encode as a point has to be interpreted as a group element and multiplied by the stride, the result also has to be a group element so your message space is made smaller by a factor ``stride''. Now you can check if your result has a corresponding x coordinate. If it has one you are done, If it doesn't you add one and check again. Until you try to add stride to the original number if you haven't found a suitable y coordinate beforehand this mapping algorithm fails.
+
+You can map a point back to the original string by dividing the y coordinate with the ``stride''
+
+The problem with this probabilistic Koblitz's method is choosing your ``stride''. There is no way of knowing what the max distance between two consecutive suitable y coordinates could be, half of the possible group elements would be suitable, but it is impossible to list all the Ed25519 curve points and check. This makes it an unsolvable problem, but we can make educated guesses and stay on the safe side of those guesses.
+
+Now to address the concern that the you divide your message space by your stride. By doing this you also effect your throughput. Atleast it would if you would optimally pack your messages in the 252 bits you have when using Ed25519. If you only use the lower 248 bits however, which gives you 31 byte messages. you have 4 bits to use as a stride. Which, anecdotally, seems to be enough. Ofcourse you can never know if this stride will work for all possible messages. But for the purpose of this benchmark it seems to work well enough. Maybe it is possible to find out how stride influences the chance of not finding a suitable y coordinate. But that is outside of the scope of this research.
+
+A couple of small but nice benefits of using $16$ as stride. Multiplication and division are bit shifts as $16$ is a power of $2$. And all the output in hexadecimal numbers are shifted by 1 character. This makes some of the debugging process easier on the mind, but are minor considerations compared to making the stride large enough to actually make the chance of an unmapable input as small as possible.
+
+\subsection{Debugging the cmix operations.}
+
+Debugging the cmix operations is hard. Intermediate results produced by nodes are hard to validate for the real 2048 bit numbers used in the cyclic group and the 256 bit numbers used in the elgamal example. Fortunately it is possible to use smaller cyclic groups and smaller curves to debug structural issues in the algorithms. For the cyclic groups this works like a charm, as the operations are pretty easy to do by hand or calculator, The operations for elliptic curve are a lot less intuitive, even for small curves. Especially with no known good library that implements these operations. This makes debugging some of the Mix primitives tedious and time consuming.
+
+Some tools that have been a great help in creating the implementation are AddressSanitizer\cite{ASan} and the companion leak check tool LeakSanitizer\cite{LSan}. These tools check for any object and array access outside of the allowed bounds in the code paths that are executed. This is no guarantee the application is completely memory safe, but it does ensure that any out of bounds access is detected. Even if the access was still technically in valid memory just nog part of the object or array. The leak tool also helped to keep an eye on the tokens passed around by the CMix library. Because the implementation language of the library was $C$ and there are 2 different implementations for the 2 different elgamal backends. There is a lot of token passing. These tokens need to be destructed but have no clear overarching owner. This makes it harder to keep track of the lifetime. So LeakSanitizer helped out tracking these lost tokens. Which allowed me to eventually run the benchmark for a high number of clients on a system with relatively low specs.
diff --git a/content/introduction.tex b/content/introduction.tex
index e69de29..ac26833 100644
--- a/content/introduction.tex
+++ b/content/introduction.tex
@@ -0,0 +1 @@
+\section{Introduction}
+\ No newline at end of file
diff --git a/thesis.bib b/thesis.bib
index e69de29..8409e44 100644
--- a/thesis.bib
+++ b/thesis.bib
@@ -0,0 +1,23 @@
+@online{ASan,
+	title = {{AddressSanitizer} documentation},
+	author = {LLVM},
+	url = {http://clang.llvm.org/docs/AddressSanitizer.html},
+	urldate = {2017-04-16}
+}
+
+@online{LSan,
+	title = {{LeakSanitizer} documentation},
+	author = {LLVM},
+	url = {http://clang.llvm.org/docs/LeakSanitizer.html},
+	urldate = {2017-04-16}
+}
+
+@article{Koblitz,
+	title={Encoding and decoding of a message in the implementation of Elliptic Curve cryptography using Koblitz's method},
+	author={Bh, Padma and Chandravathi, D and Roja, P Prapoorna},
+	journal={International Journal of Computer Science and Engineering},
+	volume={2},
+	number={5},
+	pages={1904--1907},
+	fyear={2010}
+}
+\ No newline at end of file
diff --git a/thesis.tex b/thesis.tex
index a9f48b0..7665cb9 100644
--- a/thesis.tex
+++ b/thesis.tex
@@ -64,7 +64,7 @@
 
 \usepackage[backend=biber]{biblatex}
 
-\bibliography{thesis} 
+\addbibresource{thesis.bib} 
 
 \input{content/title}
 
@@ -77,6 +77,8 @@
 
 \input{content/cmix}
 
+\input{content/implementation}
+
 \input{content/bibliography}
 
 \end{document}
 \ No newline at end of file