6 files changed, 117 insertions, 82 deletions
diff --git a/content/cmix.tex b/content/cmix.tex
index 855f439..1597057 100644
--- a/content/cmix.tex
+++ b/content/cmix.tex
@@ -1,6 +1,6 @@
-\section{The CMix network} 
+\section{The \cmix network} 
 
-This section will explain how the CMix network works, it's starts by explaining what a mix network is. Then it explains what the CMix mix network does differently and why.
+This section will explain how the \cmix network works, it's starts by explaining what a mix network is. Then it explains what the \cmix mix network does differently and why.
  
 \newcommand{\NODE}[1]{
 $node_{#1}$
@@ -8,25 +8,22 @@ $node_{#1}$
 
 \subsection{Mix networks}
 
-%TODO3: Find and quote David Chaums original paper about mix networks.
-%TODO4: Find a paper about TOR.
+\cmix is a new anonymity mix network\cite{cMix}. Just like any other mix network it aims to provide anonymity by hiding timing information of messages.
 
-%TODO1: find out how to credit and who. 
-CMix is a new anonymity mix network developed by *1*. Just like any other mix network it aims to provide anonymity by hiding timing information of messages.
+The first mix network was developed by David Chaum \cite{chaum1981untraceable}. this mix network consists of $N$ nodes. Each of these nodes have a public/private key pair. Users that want to use the mix network have to package their message as follows, it prepends the the identifier of the destination to the message and encrypts it with the public key of \NODE{N-1}. It then prepends the identifier of \NODE{N-1} and encrypts it with the public key of \NODE{N-2} He does this for all the nodes in the network working backwards and sends it to the first node.
 
-%TODO2: check if we HAVE to sign the message in the original mixnetwork.
-The first mix network was developed by David Chaum. this mix network consists of $N$ nodes. Each of these nodes have a public/private key pair. Users that want to use the mix network have to package their message as follows, it prepends the the identifier of the destination to the message and encrypts it with the public key of \NODE{N-1}. It then prepends the identifier of \NODE{N-1} and encrypts it with the public key of \NODE{N-2} He does this for all the nodes in the network working backwards and sends it to the first node.
-
-This first node can now unpack the message it receives and retreive an identifier for the next node and a encrypted message which only \NODE{N+1} can decrypt. The last node can decrypt the original message which contains its destination and sends it the end user. Up until this point this is roughly how the TOR anonymity network operates, but there is a difference. The first node in this network does not immediately send out the messages it receives. The node first collects up to $P$ messages. When this threshold is achieved it will decrypt all the messages and randomly shuffle the order they were in, otherwise known as mixing. It then sends them to the next node.
+This first node can now unpack the message it receives and retrieve an identifier for the next node and a encrypted message which only \NODE{N+1} can decrypt. The last node can decrypt the original message which contains its destination and sends it the end user. Up until this point this is roughly how the TOR\cite{goldschlag1999onion} anonymity network operates, but there is a difference. The first node in this network does not immediately send out the messages it receives. The node first collects up to $P$ messages. When this threshold is achieved it will decrypt all the messages and randomly shuffle the order they were in, otherwise known as mixing. It then sends them to the next node.
 
 This causes an arbitrary amount of delay on client connection messages. Furthermore, an outsider analyzing the input and output of the nodes cannot see which packet went where in the mixing operation. So it cannot keep track of a specific message. This is what grants anonymity in mix networks. The downside however is the amount of public key operations the client and nodes need to do to send a single message. And while this isn't really an issue in modern day desktop computers and or low volume traffic, it is an issue for mobile phones and low-power devices.
 
-\subsection{The CMix protocol}
-A CMix network is a fixed network consisting of $N$ nodes. This means there is a fixed network order and which computer represents which node is known in advance. It uses elgamal encryption as the homomorphic encryption property of elgamal is used extensively in the protocol.
+\subsection{The \cmix protocol}
+A \cmix network is a fixed network consisting of $N$ nodes. This means there is a fixed network order. All clients know which computer represents each node in the network. It uses ElGamal encryption. And it relies heavily on the homomorphic encryption property of ElGamal.
+
+The \cmix network operates in 3 phases. Initialization, precomputation and realtime. The Initialization phase is only to do some key setup. This is the only time clients need to do public key operations as they have to establish a shared key every node using Diffie-Hellman. This is why all communcations between the nodes have to be authenticated, so one way to do this is having SSL connections between the nodes and between the clients and the nodes.
 
-The CMix network operates in 3 phases. Initialization, precomputation and real-time. The Initialization phase is only to do some key setup. This is the only time clients need to do public key operations as they have to establish a shared key every node using Diffie Hellman. This is why all communcations between the nodes have to be authenticated, so one way to do this is having SSL connections between the nodes and between the clients and the nodes.
+\subsection{Initialization phase}
 
-During the initialization phase the first node sends his public key to the next node which multiplies his own public key with the value he received and sends it to the next. The last nodes sends it back to the first node which stores this final value and sends it on the next node which does the same. Now all nodes have the product of the all the nodes' public keys.
+During the initialization phase the first node sends his public key to the next node which multiplies his own public key with the value he received and sends it to the next. The last node sends it back to the first node which stores this final value and sends it on the next node which does the same. Now all nodes have the product of all the nodes' public keys.
 
 $$ D = \prod_N d_i $$
 \vspace{-1em}
@@ -36,7 +33,7 @@ where:
 \item $E$ is the shared key between nodes
 \end{itemize}
 
-When a client connects, the client agrees upon a key using Diffie-hellman key-exchange with each node. These keys will be used to send messages into the network, which will be used during the realtime-phase
+When a client connects, the client agrees upon a key using Diffie-Hellman key-exchange with each node. These keys will be used to send messages into the network, which will be used during the realtime phase
 
 $$ K_c = \{k_0, k_1, k_{N-1}\} $$
 \vspace{-1em}
@@ -46,7 +43,7 @@ where:
 \item $K_c$ The vector of Keys stored by the client
 \end{itemize}
 
-During any part of the protocol a client may send a message into the network. When using multiplicative group elgamal, It does this by multiplying a plain-text message with the shared keys in $K_c$. Then it sends this result to the first node. When using elliptic curve however the group elements, such as messages and shared keys, and need to be added.
+During any part of the protocol a client may send a message into the network. When using multiplicative group ElGamal, It does this by multiplying a plain-text message with the shared keys in $K_c$. Then it sends this result to the first node. When using elliptic curve however the group elements, such as messages and shared keys, and need to be added.
 
 \begin{equation}[]
 Message = M * k_0 * k_1 * ... * k_{N-1} \label{form:message}
@@ -58,17 +55,19 @@ where:
 \item $M$ The plain-text message
 \end{itemize}
 
-The Precomputation phase can be started as soon $P$, the minimum number of messages to have accumulated in \NODE{0}, is known. The first node generates $P$ random values $R$ and $S$. $R$ is a vector of $P$ values as is $S$. When encrypting or multiplying these vectors the component-wise encryption or multiplication is implied. It also generates a random permutation function $\pi$ which can randomly shuffle $P$ messages. 
+\subsection{Precomputation phase}
 
-The Precomputation can be split up into 3 phases.
+The precomputation phase can be started as soon $P$, the minimum number of messages to have accumulated in \NODE{0}, is known. The first node generates $P$ random values $R$ and $S$. $R$ is a vector of $P$ values as is $S$. When encrypting or multiplying these vectors the component-wise encryption or multiplication is implied. It also generates a random permutation function $\pi$ which can randomly shuffle $P$ messages. 
 
-In the first phase the first node encrypts his $R$ with the key $E$ and sends it to the $node_{next}$. $Node_{next}$ also generates $P$ random values $R$ and $S$ and encrypts his $R$ with the key $E$. $Node_{next}$ multiplies it's encryption result with the values received by the first node, and by the homomorphic encryption property of elgamal the result of this multiplication is the encryption of the of the multiplication of the two original $R$ vectors. It then sends the result of the multiplication to the next node which also encrypts his $R$ and multiplies and sends in on.
+The precomputation can be split up into 3 phases.
+
+In the first phase the first node encrypts his $R$ with the key $E$ and sends it to the $node_{next}$. $Node_{next}$ also generates $P$ random values $R$ and $S$ and encrypts his $R$ with the key $E$. $Node_{next}$ multiplies it's encryption result with the values received by the first node, and by the homomorphic encryption property of ElGamal the result of this multiplication is the encryption of the multiplication of the two original $R$ vectors. It then sends the result of the multiplication to the next node which also encrypts his $R$ and multiplies it with his input and sends in on.
 
 $$ \mathcal{E}_E(R_0 * R_1) = \mathcal{E}_S(R_0) * \mathcal{E}_S(R_1)  $$
 \vspace{-1em}
 where:
 \begin{itemize}[label=]
-\item $\mathcal{E}_E$ is elgamal encryption under key $E$
+\item $\mathcal{E}_E$ is ElGamal encryption under key $E$
 \item $R_i$ is the R vector of node $i$
 \end{itemize}
 
@@ -86,15 +85,18 @@ Whenever the result reaches the first node again it uses its permutation functio
 \vspace{-1em}
 where:
 \begin{itemize}[label=]
-\item $\mathcal{E}_E$ is elgamal encryption under key $E$
+\item $\mathcal{E}_E$ is ElGamal encryption under key $E$
 \item $R_i$ is the R vector of node $i$
 \item $S_i$ is the S vector of node $i$
 \item $\pi_i$ is the permutation function of node $i$
 \end{itemize}
 
-The third part of the precomputation is decrypting this value, and every node has a part of the decryption it can do. So each node calculates its decryption share and multiplies this with the value it receives. Which is elgamal decryption, but as the key is spread over all the nodes the result is still encrypted with keys of the other nodes. The result is passed over to the next node in the network. The last node in the network stores the value for use in the real-time phase. This value is the decrypted value of formula \eqref{form:EPiRS}
+The third part of the precomputation is decrypting this value, and every node has a part of the decryption it can do. So each node calculates its decryption share and multiplies this with the value it receives. Which is ElGamal decryption, but as the key is spread over all the nodes the result is still encrypted with keys of the other nodes. The result is passed over to the next node in the network. The last node in the network stores the value for use in the realtime phase. This value is the decrypted value of formula \eqref{form:EPiRS}
+
+\subsection{Realtime phase}
+\label{sec:realpre}
 
-Whenever the first node received $P$ messages as described in formula \eqref{form:message}, and the precomputation has ended, the realtime-phase can begin. The realtime-phase can also be split into 3 different stages.
+Whenever the first node received $P$ messages as described in formula \eqref{form:message}, and the precomputation has ended, the realtime phase can begin. The realtime phase can also be split into 3 different stages.
 
 In the first stage each node multiplies the message with the inverse of the key it agreed upon in the initialization phase. it then multiplies with that result with $r_i$, where i is the position of the message in the buffer, replacing $K_c$ with one of the values in $R$. The result is not yet permuted so the messages stay in the same place. The result gets passed to the next node which does the same steps. So for all the messages in the input we do.
 
@@ -126,14 +128,14 @@ where:
 \item $S_i$ is the S vector of node $i$
 \end{itemize}
 
-The last phase is to remove the R and S vectors from the input. The last node stored decryption of formula \ref{form:EPiRS}. The last node calculates the inverse of this group element and multiplies it with the result of \ref{form:PiMRS}. This will cancel out all permuted R and S values, and the last node wil be left with $M$.
+The last phase is to remove the R and S vectors from the input. The last node stored decryption of formula \ref{form:EPiRS}. The last node calculates the inverse of this group element and multiplies it with the result of \ref{form:PiMRS}. This will cancel out all permuted R and S values, and the last node will be left with $M$.
 
-Very critical to see is that during the realtime phase no public-key crypto is being done, only some multiplications. And the clients, outside of the initialization phase, never have to do any public-key crypto. It is also possible for client and servers to update the shared key using the old key as a seed. So in theory a client only needs to do the key agreement once per network.
+Very critical to see is that during the realtime phase no public-key cryptography is being done, only some multiplications. And the clients, outside of the initialization phase, never have to do any public-key cryptography. It is also possible for client and servers to update the shared key using the old key as a seed. So in theory a client only needs to do the key agreement once per network.
 
-\subsection{Advantages of CMix}
+\subsection{Advantages of \cmix}
 
-The fact that CMix minimizes the amount of public-key cryptographic operations for the client, makes it more appealing for low power devices. So devices that cannot draw high amounts of power all the time due to battery constrains or NFC power limits. Also mobile phones of which you want to conserve battery power.
+The fact that \cmix minimizes the amount of public-key cryptographic operations for the client, makes it more appealing for low power devices. So devices that cannot draw high amounts of power all the time due to battery constrains or NFC power limits. Also mobile phones of which you want to conserve battery power.
 
-Another advantage is that in theory the latency of the actual messages during the real-time phase should be lower than other mix networks. Again due to the lack of public-key cryptography during this phase.
+Another advantage is that in theory the latency of the actual messages during the realtime phase should be lower than other mix networks. Again due to the lack of public-key cryptography during this phase.
 
 
diff --git a/content/cmix_additions.tex b/content/cmix_additions.tex
index feb411e..2443d19 100644
--- a/content/cmix_additions.tex
+++ b/content/cmix_additions.tex
@@ -1,12 +1,12 @@
-\section{Cmix additions}
+\section{\cmix additions}
 
-So the base protocol still has some issues, thankfully these issues can be addressed at the cost of some speed and clarity. Because it would not be safe to use CMix in the wild without these attack mitigations we implemented the extra messages needed. This makes for a more realistic benchmark.
+So the base protocol still has some issues, thankfully these issues can be addressed at the cost of some speed and clarity. Because it would not be safe to use \cmix in the wild without these attack mitigations we implemented the extra messages needed. This makes for a more realistic benchmark.
 
 \subsection{Tagging attack}
 \label{sec:tagging}
 In a tagging attack an adversary changes a message slightly and later can detect this tag and remove it, otherwise it wouldn't be undetectable. This leaks information of the the origin of the message and from which slot it came. The easiest variant of this would be if a malicious person had control over the last node.
 
-When you control the last node you can change the output of realtime procomputation phase slightly by when combining your nodes $r$ value for this slot with the input. you either combine the input with $r * i$, for cyclic group elgamal, or $r + p$, for elliptic curve implementations. After all computations are done you have the plaintexts that you want to send out. If you can verify that one of the outputs is not valid, it probably is the value you modified with either $i$ or $p$. You now now the slot this value used to be in and you can reverse your tag by doing the reverse operation. This is undetectable in the network and thus compromise the network. Note that the last node is not special in CMix, if all but one Node is malicious the protocol should still be safe.
+When you control the last node you can change the output of realtime precomputation phase slightly by when combining your nodes $r$ value for this slot with the input. you either combine the input with $r * i$, for cyclic group ElGamal, or $r + p$, for elliptic curve implementations. After all computations are done you have the plaintexts that you want to send out. If you can verify that one of the outputs is not valid, it probably is the value you modified with either $i$ or $p$. You now now the slot this value used to be in and you can reverse your tag by doing the reverse operation. This is undetectable in the network and thus compromise the network. Note that the last node is not special in \cmix, if all but one Node is malicious the protocol should still be safe.
 
-To stop this attack we need to change the protocol. First we need to change the third step of the precomputation phase. Instead of sending the decryption shares of each of the nodes to the next, we send a hash, a commitment to our decryption shares to the next node. The nodes keep the decryption shares to themselves, and will use them seperately in the realtime phase. The last node also includes a hash of the current mix result. So the hash of $\pi(R)*S$ for all slots. This makes that an adversary can no longer tamper with the $r$ values in the realtime phase, which caused the tagging attack to be possible in the first place.
+To stop this attack we need to change the protocol. First we need to change the third step of the precomputation phase. Instead of sending the decryption shares of each of the nodes to the next, we send a hash, a commitment to our decryption shares to the next node. The nodes keep the decryption shares to themselves, and will use them separately in the realtime phase. The last node also includes a hash of the current mix result. So the hash of $\pi(R)*S$ for all slots. This makes that an adversary can no longer tamper with the $r$ values in the realtime phase, which caused the tagging attack to be possible in the first place.
 
diff --git a/content/conclusion.tex b/content/conclusion.tex
new file mode 100644
index 0000000..94c6701
--- /dev/null
+++ b/content/conclusion.tex
@@ -0,0 +1 @@
+\section{conclusion}
+\ No newline at end of file
diff --git a/content/discussion.tex b/content/discussion.tex
index 3de03e9..44a7377 100644
--- a/content/discussion.tex
+++ b/content/discussion.tex
@@ -6,19 +6,19 @@
 
 In this section \ec and \mg will refer to the 2 different implementations that we compare. They stand for elliptic curve and multiplicative groups.
 
-So lets first talk about what our expectations are. The size of the messages being sent are different. The ed25519 implementation can send messages up to $31$ bytes. The 2048 bit multiplicative group can send up to $255$ bytes of data. This implementation uses an prefix to the message data that is the destination id. This takes up $20$ bytes of each message, as it is the SHA1 hash of the public key of the receiver. this means that the ed25519 implementation only sends $11$ bytes of payload versus $235$ bytes of payload for the multiplicative group implementation.
+So lets first talk about what our expectations are. The size of the messages being sent are different. The ed25519 implementation can send messages up to $31$ bytes. The 2048 bit multiplicative group can send up to $255$ bytes of data. This implementation uses a prefix to the message data that is the destination id. This takes up $20$ bytes of each message, as it is the SHA1 hash of the public key of the receiver. this means that the ed25519 implementation only sends $11$ bytes of payload versus $235$ bytes of payload for the multiplicative group implementation.
 
-However there are ways around this, by doing multiple mixes of the ed25519 ein one single cMix run, which in turn can be trivially parallelized. The effective payload of the ed25519 algorithm would become $228$ bytes versus $235$ bytes of the multiplicative group. This is why I will consider the payload difference between the multiplicative group and the ed25519 implementation to a factor of $8$.
+However there are ways around this. By doing 8 mixes of the ed25519 implementation in one single \cmix run with the same permutation functions but with different $R$ and $S$, we can significantly boost the effective payload. The effective payload of the ed25519 algorithm would become $228$ bytes versus $235$ bytes of the multiplicative group. This is why I will consider the payload difference between the multiplicative group and the ed25519 implementation to a factor of $8$.
 
-\subsection{precomputation}
+Note that these separate runs of the ed25519 protocol can trivially be parallelized, possibly making it even more interesting by comparison.
 
-If this is the case we would hope that the ed25519 implementation is atleast 8 times faster than the multiplicative group. Unfortunately this is not the case for the precomputation steps of the algorithm. 
+So by the above dissertation, we should see that \ec is 8 times faster than \mg in all respects. Just by the virtue of only having to handle $\frac{1}{8}$ of the data. We hope to even be a bit faster than that.
 
-\subsubsection{prepre}
+\subsection{Precomputation precomputation}
 
 First of all this step does not seems to differ between nodes, all \ec nodes spend around $3.67s$ and the \mg nodes spend about $19.19s$. Which is $5.23$ times faster, not the 8 times we would expect.
 
-This can easily be explained by the way one generates random curve points on an elliptic curve. You have to generate a random number and check, in the case of ed25519, if that number is a valid $y$ coordinate. For ed25519 about half of the possible $y$ values are valid, so the chance that a randomly generated number is valid will be $.5$. Because of this you need to, on average, generate and check twice as many coordinates as the actual number of random curve points you want \ref{eq:prerpe}.
+This can easily be explained by the way one generates random curve points on an elliptic curve. You have to generate a random number and check, in the case of ed25519, if that number is a valid $y$ coordinate. For Ed25519 about half of the possible $y$ values are valid, so the chance that a randomly generated number is valid will be $.5$. Because of this you need to, on average, generate and check twice as many coordinates as the actual number of random curve points you want \ref{eq:prepre}.
 
 \begin{equation}
 \label{eq:prepre}
@@ -27,25 +27,45 @@ This can easily be explained by the way one generates random curve points on an
 
 Therefore we only expect to be 4 times faster so comparing to that we are still slightly faster than the \mg counterpart, but \ec is slower on a fundamental level because of the extra calculations needed to generate random curve points.
 
-\subsubsection{premix}
+\subsection{Precomputation mixing}
+\label{sec:premix}
 
 The values for the premix step are very close to our ideal 8 times ratio, $6.79$ times faster. There is however no fundamental explanation why it should be slower than expected. There is however one implementation detail that might have had it's influence here.
 
-When serializing elliptic curve points, there are several choices you can make about the format. You can send both affine coordinates. Only the defining affine coordinate, or the projective coordinates. I chose in this implementation to send both affine $x$ and $y$ coordinates. This forces each node to calculate the inverse of the internally used projective $Z$ coordinate. Which is relatively slow. The reason for this is that the raw affine $x$ and $y$ coordinate are easier to inspect and debug. Trying to be faster by only sending the defining affine coordinate would still require the inverse of $Z$ to be calculated so there is not much to gain there. On top of that there is no easy way to find the corresponding $x$ for any given $y$ in the \emph{libgcrypt} elliptic curve interface.
+When serializing elliptic curve points, there are several choices you can make about the format. You can send both affine coordinates. Only sending the $y$ coordinate, for ed25519, and derive the $x$ coordinate later, or send the projective coordinates. I chose in this implementation to send both affine $x$ and $y$ coordinates. This forces each node to calculate the inverse of the internally used projective $Z$ coordinate. Which is relatively slow. The reason for this is that the raw affine $x$ and $y$ coordinate are easier to inspect and debug. Trying to be faster by only sending the defining affine coordinate would still require the inverse of $Z$ to be calculated so there is not much to gain there. On top of that there is no easy way to find the corresponding $x$ for any given $y$ in the \emph{libgcrypt} elliptic curve interface.
 
-There is time to gain if we switch from affine to projected coordinate system within the network format. it would mean sending more data but the conversions from network format to internal would be much faster. This would avoid the inversion of $Z$ and this would be the big time saver for many of the steps even the $prepre$ step we already discussed. My expectation is that it would make this step at least 8 times faster than the \mg implementation making it comparable in performance. 
+There is time to gain if we switch from affine to projected coordinate system within the network format. it would mean sending more data but the conversions from network format to internal would be much faster. This would avoid the inversion of $Z$ and this would be the big time saver for many of the steps, even the $prepre$ step we already discussed. My expectation is that it would make this step at least 8 times faster than the \mg implementation making it comparable in performance. 
 
-\subsubsection{prepost}
+\subsection{Precomputation postcomputation}
 
-So here in the precomputaion-postcompuation phase the elliptic curve algorithm pays off. with $1.17s$ vs $20.16s$ a $17.26$ times speed increase. This is probably because the inversions in \mg are much more computational intensive as inversions in \ec.
+So here in the pre-computation post-computation phase the elliptic curve algorithm pays off. with $1.17s$ vs $20.16s$ a $17.26$ times speed increase. This is probably because the inversions in \mg are much more computational intensive as inversions in \ec. This does not mean that they are free in \ec as discussed in section \ref{sec:premix}
+	
+\subsection{Realtime precomputation}
 
-\subsubsection{realpre}
+Here is a large time saver, namely \ec is $81.90$ times faster. Additionally it is a time saver in the realtime portion of the protocol. This will massively impact perceived latency by the clients. So if that is your goal then \ec seems to be the way to go.
+
+Why \ec is so much faster is uncertain. This step consist of removing the shared key the user and node agreed upon during initialization, and adding the $r$ value of the node for this client slot, see section \ref{sec:realpre}. It could be related to the fact that no encryption is applied at this stage of the protocol. And maybe because the elliptic curve is in comparison so much smaller that it saves exponentially more time than the \mg algorithm.
 
 Another timesaver compared to \mg. It's $81.90$ times faster.
 
-\subsubsection{realmix and realpost}
+\subsection{Realtime mix and realtime postcomputation}
+The problem with these 2 phases is that the duration of these phases are relatively short and therefore are more unreliable to derive any information from. For instance the Realtime mix operation seems to be slower in \ec. This could be the case and could be again explained by the serialization choices made for elliptic curve points. However the difference is so small with respect to the clock resolution that it is risky to claim this. It could be that these phases are calling more functions and we are just measuring the indirection overhead of the \cmix library implementation.
+
+It could also be due to memory fragmentation. The \ec implementation is uses smaller elements, allocating an element size buffer has a bigger chance to succeed within just cleared memory, but subsequent buffers might not find free bytes in the neighborhood. I try to minimize this by allocating all the elements I need at a time. However I don't have much control over for instance the protobuf library. I use the arena allocator from libprotobuf\cite{protobuf-arena} but have no idea how large these arenas are and how the cache locality is for the cmix network messages.
+
+In short there is not many conclusions we can derive from these results.
+
+These phases are similar between the 2 implementations. The One interesting ting here though is the Realtime phase in node 3 of both \ec and \mg. It takes quite some more time to remove the R and S than I expected that being said it seems to take about 3 times as long for \mg than \ec and that gives us some insight how long it takes to invert an element in both implementations.
+
+\subsection{Precomputation phase}
+
+The complete precomputation phase on average takes $22.98$ seconds for \ec versus $175.49$ seconds for \mg. So for elliptic curve to be  worth it we would need to make it a bit faster than it now is. But I fully believe it is possible. Again removing the unnecessary inversions would help. Using a different cryptographic library as backend for the ed25519 cryptography would have a much greater impact. How useful libgcrypt may have been for experimenting and debugging certain aspects of the application, it's cryptographic routines might not be as optimized as say an ed25519-donna. That being said ed25519-donna does not implement all the primitives I need to implement all the \cmix primitives. But creating a custom ed25519 library or using one that does support all the primitives \cmix needs would make a huge impact on the performance.
+
+This also holds for the multiplicative group implementation of libgcrypt but I expect that both algorithms would have the same level optimization within the library.
+
+\subsection{Realtime phase}
 
-These phases are similar. The \ec variant is relatively slow, but this might be related to the inversions that are incurred whenever the software serializes the elliptic curve points to the network format.
+The complete realtime phase on average takes $2.06$ seconds for \ec versus $75.81$ seconds for \mg. So in the Realtime phase using ed25519 is really worth it.
 
 
 
diff --git a/content/implementation.tex b/content/implementation.tex
index 76b7275..3188d65 100644
--- a/content/implementation.tex
+++ b/content/implementation.tex
@@ -1,34 +1,46 @@
-\section{Elgamal in Cyclic Group and Elliptic Curve}
+\section{Implementation}
 
-The goal of this research is to see how differently these two ways of using Elgamal in this crypto scheme effect things like latency and troughput in CMix. But also doing this in as reusable way possible so that the implementation is of use for further research.
+A large part of this research is actually making an implementation of the protocol in such a way that it;
 
-This means I will be trying to find cryptographic libraries that do most of the work while I focus on getting a uniform API over different backends and the correct implementation of said backends. Unfortunately elgamal is not a very popular encryption algorithm in modern crypto libraries. and even if it is supported with a high level interface it is very rigid and wouldn't allow for any secret sharing as needed in CMix. Because of this I needed access to lower level cryptographic primitives. Which I found in the libgcrypt library. The next step is to create my own CMix library in ``C'' which abstracts the lower lever crypto primitives for both ``cyclic group'' and ``elliptic curve'' elgamal. This makes it easy to switch between implementations and test without changing any of the application code. This library is written in ``C'' to make interfacing with it from other languages used in the application level easier. For the application level code I used ``C++'', but in theory this should be easily swapped for a different application language.
+\begin{itemize}
+	\item Supports multiple but subtly different cryptographic back-ends.
+	\item Is debuggable and reusable.
+	\item Allows back-ends to be comparably benchmarked.
+\end{itemize}
 
-However using libgcrypt makes it possible to implement CMix, it doesn't mean it was easily done. The library still lacks some convenience functions that needed workarounds in the eventual implementation. This is especially the case for the elliptic curve backend. Some specific problems will be discussed later.
+The following section will talk about some implementation specific things, to talk about how I achieved those goals or why something might need some attention for future research. For more information on where to find the implementation see \ref{app:impl}
+
+\subsection{ElGamal in Cyclic Group and Elliptic Curve}
+
+The goal of this research is to see how differently these two ways of using ElGamal in this crypto scheme effect things like latency and throughput in \cmix. But also doing this in as reusable way possible so that the implementation is of use for further research.
+
+This means I will be trying to find cryptographic libraries that do most of the work while I focus on getting a uniform API over different back-ends and the correct implementation of said back-ends. Unfortunately ElGamal is not a very popular encryption algorithm in modern cryptographic libraries. and even if it is supported with a high level interface it is very rigid and wouldn't allow for any secret sharing as needed in \cmix. Because of this I needed access to lower level cryptographic primitives. Which I found in the libgcrypt library. The next step is to create my own \cmix library in ``C'' which abstracts the lower lever cryptographic primitives for both ``cyclic group'' and ``elliptic curve'' ElGamal. This makes it easy to switch between implementations and test without changing any of the application code. This library is written in ``C'' to make interfacing with it from other languages used in the application level easier. For the application level code I used ``C++'', but in theory this should be easily swapped for a different application language.
+
+However using libgcrypt makes it possible to implement \cmix, it doesn't mean it was easily done. The library still lacks some convenience functions that needed workarounds in the eventual implementation. This is especially the case for the elliptic curve back-end. Some specific problems will be discussed later.
 
 \subsection{Wire format}
-For cmix to work, we need to send group elements of the cryptographic algorithms from one node to the next. For cyclic group this is easy. They are just big integers with a max size and can be sent as arrays of a fixed length.
+For \cmix to work, we need to send group elements of the cryptographic algorithms from one node to the next. For cyclic group this is easy. They are just big integers with a max size and can be sent as arrays of a fixed length.
 
 When serializing elliptic curve points, there are several choices. You can send both affine coordinates. You can send only the y coordinate, in the case of ed25519. This means you need to calculate, one of the, corresponding x'es when de-serializing. Or you can send the projective coordinates. I chose in this implementation to send both affine $x$ and $y$ coordinates. This forces each node to calculate the inverse of the internally used projective $Z$ coordinate. Which is relatively slow. 
 
 The main reason for this is that the raw affine $x$ and $y$ coordinate are easier to inspect and debug. On top of that libgcrypt is not very forthcoming on documentation of their internal structure, especially in their elliptic curve library. This means I was not sure hoe to properly serialize the projective coordinates of a point. So just for now keep in mind that the overall implementation could be a bit faster when these unnecessary inversions of $Z$ would be eliminated.
 
-So as to not worry network level things I used Google protobuf to serialize these messages further. This also means that you could have mutliple nodes which are implemented in different languages interact with each other.
+So as to not worry network level things I used Google protobuf to serialize these messages further. This also means that you could have multiple nodes which are implemented in different languages interact with each other.
 
 \subsection{Mapping strings to Ed25519 points}
 
-There is however the Koblitz's method\cite{Koblitz} This is a probabilistic method of finding a curve point. it starts by choosing a number that will be used as a  ``stride''. The string you want to encode as a point has to be interpreted as a group element and multiplied by the stride. The result also has to be a group element so your message space is made smaller by a factor ``stride''. Now you can check if your result has a corresponding x coordinate. If it has one you are done, if it doesn't you add $1$ to your current trial y coordinate and check again. This continues up until your trial y coordinate reaches the value of  $first\_trial\_y\_coordinate + stride$. If you haven't found a suitable y coordinate by now this mapping algorithm fails.
+There is however the Koblitz's method\cite{Koblitz}. This is a probabilistic way of finding a curve point for a random integer in such a way that it is reversible. It starts by choosing a number that will be used as a  ``stride''. The string you want to encode as a point has to be interpreted as a group element and multiplied by the stride. The result of this multiplication must also be a group element. This means your message space is made smaller by a factor ``stride''. Now you can check if your result has a corresponding x coordinate. If it has one you are done, if it doesn't you add $1$ to your current trial y coordinate and check again. This continues up until your trial y coordinate reaches the value of  $first\_trial\_y\_coordinate + stride$. If you haven't found a suitable y coordinate by now this mapping algorithm fails.
 
 You can map a point back to the original string by dividing the y coordinate with the ``stride''. This works because integer division and only adding less than the stride to the original random number.
 
-The problem with this probabilistic Koblitz's method is choosing your ``stride''. There is no way of knowing what the max distance between two consecutive suitable y coordinates could be, half of the possible group elements would be suitable, but it is impossible to list all the Ed25519 curve points and check. This makes it an unsolvable problem, but we can make educated guesses and stay on the safe side that guess.
+The problem with this probabilistic Koblitz's method is choosing your ``stride''. There is no way of knowing what the max distance between two consecutive suitable y coordinates could be. We do know that about half of the possible group elements would be suitable in Ed25519, but it is impossible to list all the Ed25519 curve points and check. This makes it an unsolvable problem, but we can make educated guesses and stay on the safe side that guess.
 
-Now to address the concern that you divide your message space by your stride. This theoretically also effects your throughput. This only affects you if you would optimally pack your messages in the 252 bit message space you have available in Ed25519. However if you only use the lower 248 bits, which gives you 31 byte message space which saves time and effort packing your messages optimally. you have 5 bits to use as a stride. Which, anecdotally, seems to be enough. Ofcourse you can never know if this stride will work for all possible messages. But for the purpose of this benchmark it seems to work well enough. Maybe it is possible to find out how stride influences the chance of not finding a suitable y coordinate. But that is outside of the scope of this research.
+Now to address the concern that you divide your message space by your stride. This theoretically effects your throughput, however it only does if you would optimally pack your messages in the 252 bit message space you have available in Ed25519. However if you only use the lower 248 bits, which gives you 31 byte message space. You have 5 bits to use as a stride. Which, anecdotally, seems to be enough. Of course you can never know if this stride will work for all possible messages. But for the purpose of this benchmark it seems to work well enough. Maybe it is possible to find out how stride influences the chance of not finding a suitable y coordinate. But that is outside of the scope of this research.
 
-A couple of small but nice benefits of using $32$ as stride. Multiplication and division are bit shifts as $32$ is a power of $2$. For debugging purposes you might want to consider a stride of $16$. This makes any hexadecimal representation of a number just shift up one character. However in actual runs of the algorithm with a stride of $16$ some runs would fail because there was no suitable $y$ coordinate within $16$ values of the message value. This has not yet happened for $32$ yet. To reiterate this is no guarantee that it will never happen.
+A couple of small but nice benefits of using $2^5 = 32$ as stride. Multiplication and division are bit shifts as $32$ is a power of $2$. For debugging purposes you might want to consider a stride of $16$. This makes any hexadecimal representation of a number just shift up one character. However in actual runs of the algorithm with a stride of $16$ was insufficient. Some of the runs would fail because there was no suitable $y$ coordinate within $16$ value range. This has not yet happened for $32$ yet. To reiterate this is no guarantee that it will never happen.
 
-\subsection{Debugging the cmix operations.}
+\subsection{Debugging the \cmix operations.}
 
-Debugging the cmix operations is hard. Intermediate results produced by nodes are hard to validate for the real 2048 bit numbers used in the cyclic group and the 256 bit numbers used in the elgamal example. Fortunately it is possible to use smaller cyclic groups and smaller curves to debug structural issues in the algorithms. For the cyclic groups this works like a charm, as the operations are pretty easy to do by hand or calculator, The operations for elliptic curve are a lot less intuitive, even for small curves. Especially with no known good library that implements these operations. This makes debugging some of the mix algorithm primitives tedious and time consuming.
+Debugging the \cmix operations is hard. Intermediate results produced by the nodes are hard to validate for the real 2048 bit numbers used in the cyclic group and the 256 bit numbers used in the ElGamal cases. Fortunately it is possible to use smaller cyclic groups and smaller curves to debug structural issues in the algorithms. For the cyclic groups this works like a charm, as the operations are pretty easy to do by hand or calculator, The operations for elliptic curve are a lot less intuitive, even for small curves. Especially with no known good library that implements these operations. This makes debugging some of the mix algorithm primitives tedious and time consuming.
 
-Some tools that have been a great help in creating the implementation are AddressSanitizer\cite{ASan} and the companion leak check tool LeakSanitizer\cite{LSan}. These tools check all the executed code paths for any object or array access outside of the allowed bounds. This is no guarantee the application is completely memory safe. It does ensure that any out of bounds access is detected, even if the access was still technically in valid memory just not part of the object or array. The leak tool also helped to keep an eye on the tokens passed around by the CMix library. Because the implementation language of the library was $C$ and there are 2 different implementations for the 2 different elgamal backends. There is a lot of token passing. These tokens need to be destructed but have no clear overarching owner. This makes it harder to keep track of object lifetime. So LeakSanitizer helped out tracking these lost tokens. Which allowed me to eventually eliminate memory leaks. Which in turn allowed me to run the benchmark for a high number of clients on a system with relatively low specs.
+Some tools that have been a great help in creating the implementation in general are AddressSanitizer\cite{ASan} and the companion leak check tool LeakSanitizer\cite{LSan}. These tools check all the executed code paths for any object or array access outside of the allowed bounds. This is no guarantee the application is completely memory safe. It does ensure that any out of bounds access is detected, even if the access was still technically in valid memory just not part of the object or array. The leak tool also helped to keep an eye on the tokens passed around by the \cmix library. Because the implementation language of the library was $C$ and there are 2 different implementations for the 2 different ElGamal back-ends. There is a lot of token passing. These tokens need to be destructed but have no clear overarching owner. This makes it harder to keep track of object lifetime. So LeakSanitizer helped out tracking these lost tokens. Which allowed me to eventually eliminate memory leaks. Which in turn allowed me to run the benchmark for a high number of clients on a system with relatively low specs.
diff --git a/content/results.tex b/content/results.tex
index ab7ad28..abf8093 100644
--- a/content/results.tex
+++ b/content/results.tex
@@ -1,45 +1,45 @@
 \section{Results}
 
-So the raw results presented in appendix \ref{app-tables} were obtained by running 3 nodes and 500 clients on the same computer. The clients and nodes operated in the way you would normally see a cMix setup. All connections, either from node to node or client to node, are TCP connections encrypted using TLS. 
+So the raw results presented in appendix \ref{app-tables} were obtained by running 3 nodes and 500 clients on the same computer. The clients and nodes operated in the way you would normally see a \cmix setup. All connections, either from node to node or client to node, are TCP connections encrypted using TLS. 
 
-Latency is off course negligible because all the participants are running on the same computer but the goal is not to measure network latency. Rather we want to know if there is a benefit in using elliptic curve as apposed to multiplicative group el-gamal. 
+Latency is off course negligible because all the participants are running on the same computer but the goal is not to measure network latency. Rather we want to know if there is a benefit in using elliptic curve as apposed to multiplicative group ElGamal. 
 
-The reason behind running 3 nodes is simple. There are subtle distinctions between what nodes do, depending on the position in the network. The first node needs to aggregate messages and initiate the mix when enough messages have been received. The last node needs to do additional calculations to prevent the tagging attack mentioned in section \ref{sec:tagging}. Additionally the last node needs to decrypt the final message en send it to its destination. So the minimal testcase should contain 3 nodes. 1 first, 1 middle and 1 last node. I don't expect to see much difference between these nodes with the exception of the ``RealPost'' step as the last node needs to decrypt the ciphertext, and prepare plaintext buffers to send out to the clients.
+The reason behind running 3 nodes is simple. There are subtle distinctions between what nodes do, depending on the position in the network. The first node needs to aggregate messages and initiate the mix when enough messages have been received. The last node needs to do additional calculations to prevent the tagging attack mentioned in section \ref{sec:tagging}. Additionally the last node needs to decrypt the final message en send it to its destination. So the minimal test case should contain 3 nodes. 1 first, 1 middle and 1 last node. I don't expect to see much difference between these nodes with the exception of the ``RealPost'' step as the last node needs to decrypt the ciphertext, and prepare plaintext buffers to send out to the clients.
 
-The reasoning behind running 500 clients is 2-fold. In the original cMix paper \cite{TODO} The largest test was with 500 clients. So I wanted to mimic that. The second reason is that it still feasible to do 500 clients using a single pc with 12GB or RAM. We could still increase the number of clients by about 100 but running 500 of them gives us large enough timings that we don't need to worry about the timer resolution of the used CPU timers. And not running the extra 100 clients just gives us some headroom.
+The reasoning behind running 500 clients is 2-fold. In the original \cmix paper \cite{TODO} The largest test was with 500 clients. So I wanted to mimic that. The second reason is that it still feasible to do 500 clients using a single pc with 12GB or RAM. We could still increase the number of clients by about 100 but running 500 of them gives us large enough timings that we don't need to worry about the timer resolution of the used CPU timers. And not running the extra 100 clients just gives us some headroom.
 
 For the timings I used the \emph{boost::timer::cpu\_timer}\cite{BoostCpuTimer} which has a timer resolution of $10000000ns$ for both user and system clocks on a Linux environment. This is why all the results are accurate to one-hundredth of a second. The timings used are the so called ``User'' timings. This eliminates the time spend context switching, which gives us slightly more accurate results. The system and wall times are also recorded, but just filtered out in the results table as they are not relevant. 
 
-So for gathering results I created a program called statsd, it is included in the repository. The program receives timer snapshots over TCP. So each node sends a snapshot just before they start working on a phase of the cMix algorithm. After we are done with computational work but before sending the data to the next node another snapshot of the clock state is send to the statsd. So the results are purely the computation of that cMix phase. With some additional conversions to the wireformat, but not the overhead of sending the message over the socket. This is done just after the cMIx operation complete courtesy of the implicit strand of the boost::asio async socket operations.
+So for gathering results I created a program called statsd, it is included in the repository. The program receives timer snapshots over TCP. So each node sends a snapshot just before they start working on a phase of the \cmix algorithm. After we are done with computational work but before sending the data to the next node another snapshot of the clock state is send to the statsd. So the results are purely the computation of that \cmix phase. With some additional conversions to the wire format, but not the overhead of sending the message over the socket. This is done just after the \cmix operation complete courtesy of the implicit strand of the boost::asio asynchronous socket operations.
 
-Gathering the results over TCP with a separate daemon enables people to run this same benchmark over seperate servers. Enabling some nice test vectors as you can control network congestion and packetloss. 
+Gathering the results over TCP with a separate daemon enables people to run this same benchmark over separate servers. Enabling some nice test vectors as you can control network congestion and packet loss. 
 
 The following results were gathered with the pc specs as listed in appendix \ref{app-specs}.
 
 \begin{table}
-
-\resizebox{\columnwidth}{!}{
-\begin{tabular}{|c||c|c|c|c|c|c|}
+\begin{tiny}
+\begin{tabularx}{\columnwidth}{|X|X|X|X|X|X|X|}
 \hline
 Node & prepre (s (\textsigma)) & premix (s (\textsigma)) & prepost (s (\textsigma)) & realpre (s (\textsigma)) & realmix (s (\textsigma)) & realpost (s (\textsigma)) \\\hline\hline  
-Node1 ec 500 & 3.662 (0.030) & 2.819 (0.015) & 1.163 (0.029) & 0.299 (0.004) & 0.122 (0.004) & 0.123 (0.004) \\\hline
-Node2 ec 500 & 3.676 (0.025) & 2.818 (0.020) & 1.170 (0.029) & 0.302 (0.005) & 0.124 (0.005) & 0.123 (0.005) \\\hline
-Node3 ec 500 & 3.680 (0.026) & 2.819 (0.018) & 1.169 (0.028) & 0.302 (0.004) & 0.212 (0.004) & 0.451 (0.020) \\\hline
-\end{tabular}
-}
-\caption{Node time average over runs with standard deviation in seconds using ed25519.}
+Node1 & 3.662 (0.030) & 2.819 (0.015) & 1.163 (0.029) & 0.299 (0.004) & 0.122 (0.004) & 0.123 (0.004) \\\hline
+Node2 & 3.676 (0.025) & 2.818 (0.020) & 1.170 (0.029) & 0.302 (0.005) & 0.124 (0.005) & 0.123 (0.005) \\\hline
+Node3 & 3.680 (0.026) & 2.819 (0.018) & 1.169 (0.028) & 0.302 (0.004) & 0.212 (0.004) & 0.451 (0.020) \\\hline
+\end{tabularx}
+\end{tiny}
+
+\caption{Node time average over 50 runs with standard deviation in seconds using ed25519 and running 500 clients.}
 \end{table}
 
 \begin{table}
-\resizebox{\columnwidth}{!} {
-\begin{tabular}{|c||c|c|c|c|c|c|}
+\begin{tiny}
+\begin{tabularx}{\columnwidth}{|X|X|X|X|X|X|X|}
 \hline
 Node & prepre (s (\textsigma)) & premix (s (\textsigma)) & prepost (s (\textsigma)) & realpre (s (\textsigma)) & realmix (s (\textsigma)) & realpost (s (\textsigma)) \\\hline\hline  
-Node1 mg 500 & 19.145 (0.039) & 19.142 (0.035) & 20.125 (0.560) & 24.769 (1.373) & 0.074 (0.005) & 0.140 (0.005) \\\hline
-Node2 mg 500 & 19.215 (0.041) & 19.140 (0.035) & 20.114 (0.661) & 24.509 (2.063) & 0.072 (0.005) & 0.139 (0.006) \\\hline
-Node3 mg 500 & 19.219 (0.073) & 19.152 (0.066) & 20.235 (1.183) & 24.560 (2.845) & 0.074 (0.006) & 1.475 (0.018) \\\hline
-\end{tabular}
-}
-\caption{Node time average over runs with standard deviation in seconds using 2048 bit multiplicative group.}
+Node1 & 19.145 (0.039) & 19.142 (0.035) & 20.125 (0.560) & 24.769 (1.373) & 0.074 (0.005) & 0.140 (0.005) \\\hline
+Node2 & 19.215 (0.041) & 19.140 (0.035) & 20.114 (0.661) & 24.509 (2.063) & 0.072 (0.005) & 0.139 (0.006) \\\hline
+Node3 & 19.219 (0.073) & 19.152 (0.066) & 20.235 (1.183) & 24.560 (2.845) & 0.074 (0.006) & 1.475 (0.018) \\\hline
+\end{tabularx}
+\end{tiny}
+\caption{Node time average over 50 runs with standard deviation in seconds using 2048 bit multiplicative group and running 500 clients.}
 
 \end{table}