Minor fixes for readability.

author: Dennis Brentjes <d.brentjes@gmail.com> 2017-06-05 09:45:31 +0200
committer: Dennis Brentjes <d.brentjes@gmail.com> 2017-06-05 09:45:31 +0200
commit: 5482f6b544fa91273ec983892681b6c67e59e825 (patch)
tree: d2a1de44153deef445508249eceb807cafa518a0 /content
parent: 33483109b741824e163210acfda07dfa96876cc9 (diff)
download: thesis-5482f6b544fa91273ec983892681b6c67e59e825.tar.gz
thesis-5482f6b544fa91273ec983892681b6c67e59e825.tar.bz2
thesis-5482f6b544fa91273ec983892681b6c67e59e825.zip
7 files changed, 64 insertions, 67 deletions
diff --git a/content/cmix.tex b/content/cmix.tex
index cc689fd..0a368b3 100644
--- a/content/cmix.tex
+++ b/content/cmix.tex
@@ -1,7 +1,7 @@
 \section{The \cmix network}
 \label{sec:cmix}
 
-This section will explain how the \cmix network works, it's starts by explaining what a mix network is. Then it explains what the \cmix mix network does differently and why.
+This section will explain how the \cmix network works, it's starts by explaining what a mix network is. Then it elaborates what the \cmix mix network does differently and why.
  
 \newcommand{\NODE}[1]{
 $node_{#1}$
@@ -9,24 +9,24 @@ $node_{#1}$
 
 \subsection{Mix networks}
 
-\cmix is a new anonymity mix network\cite{cMix}. Just like any other mix network it aims to provide anonymity by hiding timing information of messages.
+\cmix is a new anonymity mix network\cite{cMix}. Just like any other mix network it aims to provide anonymity by hiding timing information of messages. This means hiding the difference in time between a message leaving the client and arriving at its destination.
 
-The first mix network was developed by David Chaum \cite{chaum1981untraceable}. this mix network consists of $N$ nodes. Each of these nodes have a public/private key pair. Users that want to use the mix network have to package their message as follows, it prepends the the identifier of the destination to the message and encrypts it with the public key of \NODE{N-1}. It then prepends the identifier of \NODE{N-1} and encrypts it with the public key of \NODE{N-2} He does this for all the nodes in the network working backwards and sends it to the first node.
+The first mix network was developed by David Chaum \cite{chaum1981untraceable}. this mix network consists of $N$ nodes. Each of these nodes have a public/private key pair. Users that want to use the mix network have to package their message as follows, it prepends the identifier of the destination to the message and encrypts it with the public key of \NODE{N-1}. It then prepends the identifier of \NODE{N-1} and encrypts it with the public key of \NODE{N-2}. The client does this for all the nodes in the network working backwards and sends it to the first node.
 
-This first node can now unpack the message it receives and retrieve an identifier for the next node and a encrypted message which only \NODE{N+1} can decrypt. The last node can decrypt the original message which contains its destination and sends it the end user. Up until this point this is roughly how the TOR\cite{goldschlag1999onion} anonymity network operates, but there is a difference. The first node in this network does not immediately send out the messages it receives. The node first collects up to $P$ messages. When this threshold is achieved it will decrypt all the messages and randomly shuffle the order they were in, otherwise known as mixing. It then sends them to the next node.
+This first node can now unpack the message it receives and retrieve an identifier for the next node and a encrypted message which only \NODE{N+1} can decrypt. The last node can decrypt the original message which contains its destination and sends it the end user. Up until this point this is roughly how the TOR\cite{goldschlag1999onion} anonymity network operates, but there is a difference. The first node in the \cmix network does not immediately send out the messages it receives. The node first collects up to $P$ messages. When this threshold is achieved it will decrypt all the messages and randomly shuffle the order they were in, otherwise known as mixing. It then sends them to the next node. Another subtle difference is that each message should have the same length, However it is possible to choose a large enough message size and pad all the messages to this length. Even when unpacking the messages the server nodes should keep padding the messages to the decided size as the network is not fixed and clients can choose their own path trough the network.
 
-This causes an arbitrary amount of delay on client connection messages. Furthermore, an outsider analyzing the input and output of the nodes cannot see which packet went where in the mixing operation. So it cannot keep track of a specific message. This is what grants anonymity in mix networks. The downside however is the amount of public key operations the client and nodes need to do to send a single message. And while this isn't really an issue in modern day desktop computers and or low volume traffic, it is an issue for mobile phones and low-power devices.
+This causes an arbitrary amount of delay on client connection messages. Furthermore, an outsider analyzing the input and output of the nodes cannot see which packet went where in the mixing operation. So it cannot keep track of a specific message. This is what grants anonymity within mix networks. A major downside of this classic mix network is the amount of public key operations the client and nodes need to do when sending single message. This may not be an issue on modern day desktop computers and or low volume traffic, but it is an issue for mobile phones and low-power devices.
 
 \subsection{The \cmix protocol}
-A \cmix network is a fixed network consisting of $N$ nodes. This means there is a fixed network order. All clients know which computer represents each node in the network. It uses ElGamal encryption. And it relies heavily on the homomorphic encryption property of ElGamal.
+A \cmix network is a fixed network consisting of $N$ nodes. This means there is a fixed network order and all clients know which computer represents each node in the network. It uses ElGamal encryption. And it relies heavily on the homomorphic properties of ElGamal.
 
-The \cmix network operates in 3 phases. Initialization, precomputation and realtime. The Initialization phase is only to do some key setup. This is the only time clients need to do public key operations as they have to establish a shared key every node using Diffie-Hellman. This is why all communcations between the nodes have to be authenticated, so one way to do this is having SSL connections between the nodes and between the clients and the nodes.
+The \cmix network operates in 3 phases. Initialization, precomputation and realtime. During the initialization phase only some key setup is done. This is the only time clients need to do public key operations as they have to establish a shared key with every node using Diffie-Hellman key exchange. This is why all communications between the nodes and from client to node have to be authenticated. One way to accomplish this is by using SSL connections for all communications within the network. Remember that the focus of this network is not encrypted traffic, recall that the last nodes sees all the plaintexts, but rather to hide timing information from an attacker.
 
 \subsection{Initialization phase}
 
-During the initialization phase the first node sends his public key to the next node which multiplies his own public key with the value he received and sends it to the next. The last node sends it back to the first node which stores this final value and sends it on the next node which does the same. Now all nodes have the product of all the nodes' public keys.
+During the initialization phase the first node sends his public key to the next node, which multiplies his own public key with the value he received and sends it to the next. The last node sends it back to the first node which stores this final value and sends it on the next node which also stores this value. Now all nodes have the product of all the nodes' public keys.
 
-$$ D = \prod_N d_i $$
+$$ E = \prod_N d_i $$
 \vspace{-1em}
 where:
 \begin{itemize}[label=]
@@ -34,7 +34,7 @@ where:
 \item $E$ is the shared key between nodes
 \end{itemize}
 
-When a client connects, the client agrees upon a key using Diffie-Hellman key-exchange with each node. These keys will be used to send messages into the network, which will be used during the realtime phase
+When a client connects, the client agrees upon a key using Diffie-Hellman key-exchange with each node. These keys will be used to send messages into the network, which will happen during the realtime phase
 
 $$ K_c = \{k_0, k_1,\ldots, k_{N-1}\} $$
 \vspace{-1em}
@@ -44,10 +44,10 @@ where:
 \item $K_c$ The vector of Keys stored by the client
 \end{itemize}
 
-During any part of the protocol a client may send a message into the network. When using multiplicative group ElGamal, It does this by multiplying a plain-text message with the shared keys in $K_c$. Then it sends this result to the first node. When using elliptic curve however the group elements, such as messages and shared keys, and need to be added.
+During any part of the protocol a client may send a message into the network. When using multiplicative group ElGamal, It does this by multiplying a plain-text message with the shared keys in $K_c$. Then it sends this result to the first node. When using elliptic curve however the group elements, such as messages and shared keys, and need to be added. Note that $\cdot$ means combining 2 values, meaning multiplication for multiplicative groups and addition for elliptic curves. For now as the original paper referenced multiplicative group the rest of this description of \cmix will refer to this operation as multiplication.
 
-\begin{equation}[]
-Message = M * k_0 * k_1 * ... * k_{N-1} \label{form:message}
+\begin{equation}
+Message = M \cdot k_0 \cdot k_1 \cdot ... \cdot k_{N-1} \label{form:message}
 \end{equation}
 \vspace{-1em}
 where:
@@ -58,13 +58,13 @@ where:
 
 \subsection{Precomputation phase}
 
-The precomputation phase can be started as soon $P$, the minimum number of messages to have accumulated in \NODE{0}, is known. The first node generates $P$ random values $R$ and $S$. $R$ is a vector of $P$ values as is $S$. When encrypting or multiplying these vectors the component-wise encryption or multiplication is implied. It also generates a random permutation function $\pi$ which can randomly shuffle $P$ messages. 
+The precomputation phase can be started as soon as $P$, the minimum number of messages to have accumulated in \NODE{0}, is known. The first node generates $P$ random values $R$ and $S$. $R$ is a vector of $P$ values as is $S$. When encrypting or multiplying these vectors the component-wise encryption or multiplication is implied. It also generates a random permutation function $\pi$ which can randomly shuffle $P$ messages. 
 
 The precomputation can be split up into 3 phases.
 
-In the first phase the first node encrypts his $R$ with the key $E$ and sends it to the $node_{next}$. $Node_{next}$ also generates $P$ random values $R$ and $S$ and encrypts his $R$ with the key $E$. $Node_{next}$ multiplies it's encryption result with the values received by the first node, and by the homomorphic encryption property of ElGamal the result of this multiplication is the encryption of the multiplication of the two original $R$ vectors. It then sends the result of the multiplication to the next node which also encrypts his $R$ and multiplies it with his input and sends in on.
+In the first phase the first node encrypts his $R$ with the key $E$ and sends it to the $node_{next}$. $Node_{next}$ also generates $P$ random values $R$ and $S$ and encrypts his $R$ with the key $E$. $Node_{next}$ multiplies it's encryption result with the values received by the first node, and by the homomorphic encryption property of ElGamal the result of this multiplication is the encryption of the multiplication of the two original $R$ vectors. It then sends the result of the multiplication to the next node which also encrypts his $R$ and multiplies it with his input and sends it on.
 
-$$ \mathcal{E}_E(R_0 * R_1) = \mathcal{E}_S(R_0) * \mathcal{E}_S(R_1)  $$
+$$ \mathcal{E}_E(R_0 \cdot R_1) = \mathcal{E}_S(R_0) \cdot \mathcal{E}_S(R_1)  $$
 \vspace{-1em}
 where:
 \begin{itemize}[label=]
@@ -78,9 +78,9 @@ Whenever the result reaches the first node again it uses its permutation functio
 &\mathcal{E}_E( \nonumber \\
 &\hspace{1em}\pi_{N-1}( ... \nonumber  \\ 
 &\hspace{2em}\pi_{1}( \nonumber  \\
-&\hspace{3em}\pi_{0}(R_{0} * ... * R_{N-1}) * S_0 \nonumber  \\
-&\hspace{2em}) * S_1 \nonumber \\
-&\hspace{1em} ... ) * S_{N-1} \label{form:EPiRS} \nonumber  \\
+&\hspace{3em}\pi_{0}(R_{0} \cdot ... \cdot R_{N-1}) \cdot S_0 \nonumber  \\
+&\hspace{2em}) \cdot S_1 \nonumber \\
+&\hspace{1em} ... ) \cdot S_{N-1} \label{form:EPiRS} \nonumber  \\
 &) 
 \end{align}
 \vspace{-1em}
@@ -92,17 +92,17 @@ where:
 \item $\pi_i$ is the permutation function of node $i$
 \end{itemize}
 
-The third part of the precomputation is decrypting this value, and every node has a part of the decryption it can do. So each node calculates its decryption share and multiplies this with the value it receives. Which is ElGamal decryption, but as the key is spread over all the nodes the result is still encrypted with keys of the other nodes. The result is passed over to the next node in the network. The last node in the network stores the value for use in the realtime phase. This value is the decrypted value of formula \eqref{form:EPiRS}
+The third part of the precomputation is decrypting this final value. Each node can perform part of the decryption with his private key part of $E$. in combination with the encryption specific random value which is used in ElGamal it is called the decryption share. When it multiplies the above value with the decryption share you remove your part of the encryption. So when passing your result to the next node each node can multiply it's decryption share with their input. After the last node performs this action the last nodes has the decrypted value of \eqref{form:EPiRS}. It stores this for use in the realtime phase.
 
 \subsection{Realtime phase}
 \label{sec:realpre}
 
 Whenever the first node received $P$ messages as described in formula \eqref{form:message}, and the precomputation has ended, the realtime phase can begin. The realtime phase can also be split into 3 different stages.
 
-In the first stage each node multiplies the message with the inverse of the key it agreed upon in the initialization phase. it then multiplies with that result with $r_i$, where i is the position of the message in the buffer, replacing $K_c$ with one of the values in $R$. The result is not yet permuted so the messages stay in the same place. The result gets passed to the next node which does the same steps. So for all the messages in the input we do.
+In the first stage each node multiplies the message with the inverse of the key it agreed upon in the initialization phase. It then multiplies that result with $r_i$, where $i$ is the position of the message in the buffer, replacing $K_c$ with the corresponding value in $R$. The result is not yet permuted so the messages stay in the same place. The result gets passed to the next node which does the same steps. So for all the messages in the input we do.
 
 \[
-M_{out} = M_{in} * K_{ci}^{-1} * R_j
+M_{out} = M_{in} \cdot K_{ci}^{-1} \cdot R_j
 \]
 \vspace{-1em}
 where:
@@ -115,11 +115,11 @@ where:
 The second phase is similar to the second phase of the precomputation, A node takes the message buffer, swaps around the messages according to its permutation function $\pi$ it then multiplies it by $S$, each node does this step.
 
 \begin{align}
-&Messages\_out = \pi_{N-1}( \ldots \nonumber  \\ 
+&M_{out} = \pi_{N-1}( \ldots \nonumber  \\ 
 &\hspace{1em}\pi_{1}( \nonumber  \\
-&\hspace{2em}\pi_{0}(R_{0} * \ldots * R_{N-1} * M) * S_0 \nonumber  \\
-&\hspace{1em}) * S_1 \nonumber \\
-& \ldots ) * S_{N-1} \label{form:PiMRS} \nonumber  \\
+&\hspace{2em}\pi_{0}(R_{0} \cdot \ldots \cdot R_{N-1} \cdot M_{in}) \cdot S_0 \nonumber  \\
+&\hspace{1em}) \cdot S_1 \nonumber \\
+& \ldots ) \cdot S_{N-1} \label{form:PiMRS} \nonumber  \\
 \end{align}
 \vspace{-1em}
 where:
@@ -131,11 +131,11 @@ where:
 
 The last phase is to remove the R and S vectors from the input. The last node stored decryption of formula \ref{form:EPiRS}. The last node calculates the inverse of this group element and multiplies it with the result of \ref{form:PiMRS}. This will cancel out all permuted R and S values, and the last node will be left with $M$.
 
-Very critical to see is that during the realtime phase no public-key cryptography is being done, only some multiplications. And the clients, outside of the initialization phase, never have to do any public-key cryptography. It is also possible for client and servers to update the shared key using the old key as a seed. So in theory a client only needs to do the key agreement once per network.
+Very critical to see is that during the realtime phase no public-key cryptography is being used, only some multiplications. And the clients, outside of the initialization phase, never have to do any public-key cryptography. It is also possible for client and servers to update the shared keys $K_c$ using the old key as a seed. So in theory a client only needs to do the key agreement once per network.
 
 \subsection{Advantages of \cmix}
 
-The fact that \cmix minimizes the amount of public-key cryptographic operations for the client, makes it more appealing for low power devices. So devices that cannot draw high amounts of power all the time due to battery constrains or NFC power limits. Also mobile phones of which you want to conserve battery power.
+The fact that \cmix minimizes the amount of public-key cryptographic operations for the client, makes it more appealing for low power devices. So devices that cannot draw high amounts of power all the time due to battery constrains or NFC power limits. Mobile phones of which you want to conserve battery power.
 
 Another advantage is that in theory the latency of the actual messages during the realtime phase should be lower than other mix networks. Again due to the lack of public-key cryptography during this phase.
 
diff --git a/content/cmix_additions.tex b/content/cmix_additions.tex
index e2c2ee5..277dd02 100644
--- a/content/cmix_additions.tex
+++ b/content/cmix_additions.tex
@@ -1,6 +1,6 @@
 \section{\cmix additions}
 \label{sec:cmixaddtions}
-So the base protocol still has some issues, thankfully these issues can be addressed at the cost of some speed and clarity. Because it would not be safe to use \cmix in the wild without these attack mitigations we implemented the extra messages needed. This makes for a more realistic benchmark.
+So the base protocol still has some issues, thankfully these issues can be addressed at the cost of some speed and clarity. Because it would not be safe to use \cmix in the wild without these attack mitigations. This implementation adds the extra messages needed as this results in a more realistic benchmark.
 
 \subsection{Tagging attack}
 \label{sec:tagging}
@@ -8,5 +8,5 @@ In a tagging attack an adversary changes a message slightly and later can detect
 
 When you control the last node you can change the output of realtime precomputation phase slightly by when combining your nodes $r$ value for this slot with the input. you either combine the input with $r * i$, for cyclic group ElGamal, or $r + p$, for elliptic curve implementations. After all computations are done you have the plaintexts that you want to send out. If you can verify that one of the outputs is not valid, it probably is the value you modified with either $i$ or $p$. You now now the slot this value used to be in and you can reverse your tag by doing the reverse operation. This is undetectable in the network and thus compromise the network. Note that the last node is not special in \cmix, if all but one Node is malicious the protocol should still be safe.
 
-To stop this attack we need to change the protocol. First we need to change the third step of the precomputation phase. Instead of sending the decryption shares of each of the nodes to the next, we send a hash, a commitment to our decryption shares to the next node. The nodes keep the decryption shares to themselves, and will use them separately in the realtime phase. The last node also includes a hash of the current mix result. So the hash of $\pi(R)*S$ for all slots. This makes that an adversary can no longer tamper with the $r$ values in the realtime phase, which caused the tagging attack to be possible in the first place.
+To stop this attack we need to change the protocol. First we need to change the third step of the precomputation phase. Instead of sending the decryption shares of each of the nodes to the next, we send a hash, a commitment to our decryption shares to the next node. The nodes keep the decryption shares to themselves, and will use them separately in the realtime phase. The last node also includes a hash of the current mix result. So the hash of the decryption of formula \ref{form:EPiRS}. This makes that an adversary can no longer tamper with the $r$ values in the realtime phase, which caused the tagging attack to be possible in the first place.
 
diff --git a/content/discussion.tex b/content/discussion.tex
index 03d4de1..ff3a688 100644
--- a/content/discussion.tex
+++ b/content/discussion.tex
@@ -8,37 +8,35 @@ In this section \ec and \mg will refer to the 2 different implementations that w
 
 So lets first talk about what our expectations are. The size of the messages being sent are different. The ed25519 implementation can send messages up to $31$ bytes. The 2048 bit multiplicative group can send up to $255$ bytes of data. This implementation uses a prefix to the message data that is the destination id. This takes up $20$ bytes of each message, as it is the SHA1 hash of the public key of the receiver. this means that the ed25519 implementation only sends $11$ bytes of payload versus $235$ bytes of payload for the multiplicative group implementation.
 
-However there are ways around this. By doing 8 mixes of the ed25519 implementation in one single \cmix run with the same permutation functions but with different $R$ and $S$, we can significantly boost the effective payload. The effective payload of the ed25519 algorithm would become $228$ bytes versus $235$ bytes of the multiplicative group. This is why I will consider the payload difference between the multiplicative group and the ed25519 implementation to a factor of $8$.
+However there are ways around this. By doing 8 mixes of the ed25519 implementation in one single \cmix run with the same permutation functions but with different $R$ and $S$, we can significantly boost the effective payload. The effective payload of the ed25519 algorithm would become $228$ bytes versus $235$ bytes of the multiplicative group. This is why I will consider the payload difference between the multiplicative group and the ed25519 implementation to be a factor of $8$.
 
 Note that these separate runs of the ed25519 protocol can trivially be parallelized, possibly making it even more interesting by comparison.
 
-So by the above dissertation, we should see that \ec is 8 times faster than \mg in all respects. Just by the virtue of only having to handle $\frac{1}{8}$ of the data. We hope to even be a bit faster than that.
+So by the above dissertation, we hope to see that \ec is 8 times faster than \mg in all respects. Just by the virtue of only having to handle $\frac{1}{8}$ of the data. We hope to even be a bit faster than that.
 
 \subsection{Precomputation precomputation}
 
 First of all this step does not seems to differ between nodes, all \ec nodes spend around $3.67s$ and the \mg nodes spend about $19.19s$. Which is $5.23$ times faster, not the 8 times we would expect.
 
-This can easily be explained by the way one generates random curve points on an elliptic curve. You have to generate a random number and check, in the case of ed25519, if that number is a valid $y$ coordinate. For Ed25519 about half of the possible $y$ values are valid, so the chance that a randomly generated number is valid will be $.5$. Because of this you need to, on average, generate and check twice as many coordinates as the actual number of random curve points you want \ref{eq:prepre}.
+This can easily be explained by the way one generates random curve points on an elliptic curve. You have to generate a random number and check, in the case of ed25519, if that number is a valid $y$ coordinate. For Ed25519 about half of the possible $y$ values are valid, so the chance that a randomly generated number is valid will be $.5$. Because of this you need to, on average, generate and check twice as many coordinates as the actual number of random curve points you want, see equation \ref{eq:prepre}.
 
 \begin{equation}
 \label{eq:prepre}
 \sum_{x=0}^{\infty} n \cdot \left( \frac{1}{2}\right)^x =\ 2n
 \end{equation} 
 
-Therefore we only expect to be 4 times faster so comparing to that we are still slightly faster than the \mg counterpart, but \ec is slower on a fundamental level because of the extra calculations needed to generate random curve points.
+Therefore we only expect to be 4 times faster in that regard we are still slightly faster than the \mg counterpart, but \ec is slower on a fundamental level because of the extra calculations needed to generate random curve points.
 
 \subsection{Precomputation mixing}
 \label{sec:premix}
 
 The values for the premix step are very close to our ideal 8 times ratio, $6.79$ times faster. There is however no fundamental explanation why it should be slower than expected. There is however one implementation detail that might have had it's influence here.
 
-When serializing elliptic curve points, there are several choices you can make about the format. You can send both affine coordinates. Only sending the $y$ coordinate, for ed25519, and derive the $x$ coordinate later, or send the projective coordinates. I chose in this implementation to send both affine $x$ and $y$ coordinates. This forces each node to calculate the inverse of the internally used projective $Z$ coordinate. Which is relatively slow. The reason for this is that the raw affine $x$ and $y$ coordinate are easier to inspect and debug. Trying to be faster by only sending the defining affine coordinate would still require the inverse of $Z$ to be calculated so there is not much to gain there. On top of that there is no easy way to find the corresponding $x$ for any given $y$ in the \emph{libgcrypt} elliptic curve interface.
+Like discussed in section \ref{sec:wireformat}, when serializing elliptic curve points, there are several choices you can make about the format. This implementation chose to send both affine coordinates, which adds unnecessary group element inversions. This means There is time to gain if we switch from affine to projected coordinate system within the network format. it would mean sending more data. However the conversions from network to internal format would be faster and it would avoid the inversion of $Z$. This would be a large time saver for many of the steps, even the $prepre$ step we already discussed. My expectation based on some informal testing and the results of the realtime postprocessing phase  is that it would make this step at least 8 times faster than the \mg implementation making it comparable in performance. 
 
-There is time to gain if we switch from affine to projected coordinate system within the network format. it would mean sending more data but the conversions from network format to internal would be much faster. This would avoid the inversion of $Z$ and this would be the big time saver for many of the steps, even the $prepre$ step we already discussed. My expectation is that it would make this step at least 8 times faster than the \mg implementation making it comparable in performance. 
+\subsection{Precomputation postprocessing}
 
-\subsection{Precomputation postcomputation}
-
-So here in the pre-computation post-computation phase the elliptic curve algorithm pays off. with $1.17s$ vs $20.16s$ a $17.26$ times speed increase. This is probably because the inversions in \mg are much more computational intensive as inversions in \ec. This does not mean that they are free in \ec as discussed in section \ref{sec:premix}
+So here in the precomputation post-computation phase the elliptic curve algorithm pays off. with $1.17s$ vs $20.16s$ a $17.26$ times speed increase. This is probably because the inversions in \mg are much more computational intensive as inversions in \ec. This does not mean that they are free in \ec as discussed in section \ref{sec:premix}
 	
 \subsection{Realtime precomputation}
 
@@ -46,22 +44,18 @@ Here is a large time saver, namely \ec is $81.90$ times faster. Additionally it
 
 Why \ec is so much faster is uncertain. This step consist of removing the shared key the user and node agreed upon during initialization, and adding the $r$ value of the node for this client slot, see section \ref{sec:realpre}. It could be related to the fact that no encryption is applied at this stage of the protocol. And maybe because the elliptic curve is in comparison so much smaller that it saves exponentially more time than the \mg algorithm.
 
-Another timesaver compared to \mg. It's $81.90$ times faster.
-
 \subsection{Realtime mix and realtime postcomputation}
 The problem with these 2 phases is that the duration of these phases are relatively short and therefore are more unreliable to derive any information from. For instance the Realtime mix operation seems to be slower in \ec. This could be the case and could be again explained by the serialization choices made for elliptic curve points. However the difference is so small with respect to the clock resolution that it is risky to claim this. It could be that these phases are calling more functions and we are just measuring the indirection overhead of the \cmix library implementation.
 
-It could also be due to memory fragmentation. The \ec implementation is uses smaller elements, allocating an element size buffer has a bigger chance to succeed within just cleared memory, but subsequent buffers might not find free bytes in the neighborhood. I try to minimize this by allocating all the elements I need at a time. However I don't have much control over for instance the protobuf library. I use the arena allocator from libprotobuf\cite{protobuf-arena} but have no idea how large these arenas are and how the cache locality is for the cmix network messages.
-
-In short there is not many conclusions we can derive from these results.
+It could also be due to memory fragmentation. The \ec implementation is uses smaller elements, allocating an element size buffer has a bigger chance to succeed within just cleared memory, but subsequent buffers might not find free bytes in the neighborhood. I try to minimize this by allocating all the elements I need at a time. However I don't have much control over for instance the protobuf library. I use the arena allocator from libprotobuf\cite{protobuf-arena} but have no idea how large these arenas are and how the cache locality is for the \cmix network messages.
 
-These phases are similar between the 2 implementations. The One interesting ting here though is the Realtime phase in node 3 of both \ec and \mg. It takes quite some more time to remove the R and S than I expected that being said it seems to take about 3 times as long for \mg than \ec and that gives us some insight how long it takes to invert an element in both implementations.
+In short there is not many conclusions we can derive from these results. However these phases seem to be similar between the 2 implementations. One interesting thing here is the Realtime phase in node 3 of both \ec and \mg. It takes quite some more time to remove the R and S than I expected that being said it seems to take about 3 times as long for \mg than \ec and that gives us some insight how long it takes to invert an element in both implementations.
 
 \subsection{Precomputation phase}
 
 The complete precomputation phase on average takes $22.98$ seconds for \ec versus $175.49$ seconds for \mg. So for elliptic curve to be  worth it we would need to make it a bit faster than it now is. But I fully believe it is possible. Again removing the unnecessary inversions would help. Using a different cryptographic library as backend for the ed25519 cryptography would have a much greater impact. How useful libgcrypt may have been for experimenting and debugging certain aspects of the application, it's cryptographic routines might not be as optimized as say an ed25519-donna. That being said ed25519-donna does not implement all the primitives I need to implement all the \cmix primitives. But creating a custom ed25519 library or using one that does support all the primitives \cmix needs would make a huge impact on the performance.
 
-This also holds for the multiplicative group implementation of libgcrypt but I expect that both algorithms would have the same level optimization within the library.
+This also holds for the multiplicative group implementation of libgcrypt but I expect that both algorithms would have the same level optimization within the library, because both implementation use the underlying big integer library and operations.
 
 \subsection{Realtime phase}
 
diff --git a/content/implementation.tex b/content/implementation.tex
index 4fe06d8..6de7e4b 100644
--- a/content/implementation.tex
+++ b/content/implementation.tex
@@ -9,39 +9,42 @@ A large part of this research is actually making an implementation of the protoc
 	\item Allows back-ends to be comparably benchmarked.
 \end{itemize}
 
-The following section will talk about some implementation specific things, to talk about how I achieved those goals or why something might need some attention in future research. For more information on where to find the implementation see appendix \ref{app:impl}
+The following section will talk about some implementation specific things, talking about how this framework achieves these goals or why something might need some attention in future research. For more information on where to find the implementation see appendix \ref{app:impl}
 
-\subsection{ElGamal in Cyclic Group and Elliptic Curve}
+\subsection{ElGamal in multiplicative Group and Elliptic Curve}
 
-The goal of this research is to see how differently these two ways of using ElGamal in this crypto scheme effect things like latency and throughput in \cmix. But also doing this in as reusable way possible so that the implementation is of use for further research.
+The goal of this research is to see how differently these two ways of using ElGamal in this cryptographic scheme effect things like latency and throughput in \cmix. But also doing this in the most reusable way  possible so that the implementation is of use for further research.
 
-This means I will be trying to find cryptographic libraries that do most of the work while I focus on getting a uniform API over different back-ends and the correct implementation of said back-ends. Unfortunately ElGamal is not a very popular encryption algorithm in modern cryptographic libraries. and even if it is supported with a high level interface it is very rigid and wouldn't allow for any secret sharing as needed in \cmix. Because of this I needed access to lower level cryptographic primitives. Which I found in the libgcrypt library. The next step is to create my own \cmix library in ``C'' which abstracts the lower lever cryptographic primitives for both ``cyclic group'' and ``elliptic curve'' ElGamal. This makes it easy to switch between implementations and test without changing any of the application code. This library is written in ``C'' to make interfacing with it from other languages used in the application level easier. For the application level code I used ``C++'', but in theory this should be easily swapped for a different application language.
+This means I will be trying to find cryptographic libraries that do most of the work while I focus on getting a uniform API over different back-ends and the correct implementation of said back-ends. Unfortunately ElGamal is not a very popular encryption algorithm in modern cryptographic libraries. and even if it is supported with a high level interface it is very rigid and wouldn't allow for any secret sharing as needed in \cmix. Because of this I needed access to lower level cryptographic primitives. Which I found in the libgcrypt library. The next step is to create my own \cmix library which abstracts the lower level cryptographic primitives for both ``multiplicative group'' and ``elliptic curve'' ElGamal. This makes it easy to switch between implementations and test without changing any of the application code. This library is written in ``C'' to make interfacing with it from other languages used in the application level easier. For the application level code I used ``C++'', but in theory this should be easily swapped for a different application language.
 
-However using libgcrypt makes it possible to implement \cmix, it doesn't mean it was easily done. The library still lacks some convenience functions that needed workarounds in the eventual implementation. This is especially the case for the elliptic curve back-end. Some specific problems will be discussed later.
+Although using libgcrypt makes it possible to implement \cmix, it doesn't mean it was an easy task. The library still lacks some convenience functions that needed workarounds in the eventual implementation. This especially is the case for the elliptic curve back-end. Some specific problems will be discussed later.
 
 \subsection{Wire format}
-For \cmix to work, we need to send group elements of the cryptographic algorithms from one node to the next. For cyclic group this is easy. They are just big integers with a max size and can be sent as arrays of a fixed length.
+\label{sec:wireformat}
+For \cmix to work, we need to send group elements of the cryptographic algorithms from one node to the next. For multiplicative group this is easy. They are just big integers with a max size and can be sent as arrays of a fixed length.
 
-When serializing elliptic curve points, there are several choices. You can send both affine coordinates. You can send only the y coordinate, in the case of ed25519. This means you need to calculate, one of the, corresponding x'es when de-serializing. Or you can send the projective coordinates. I chose in this implementation to send both affine $x$ and $y$ coordinates. This forces each node to calculate the inverse of the internally used projective $Z$ coordinate. Which is relatively slow. 
+When serializing elliptic curve points, there are several options. You can send both affine coordinates. You can send only the y coordinate, in the case of ed25519. This means you need to calculate, one of the, corresponding x'es when de-serializing. Or you can send the projective coordinates. I chose in this implementation to send both affine $x$ and $y$ coordinates. This forces each node to calculate the inverse of the internally used projective $Z$ coordinate. Which is relatively slow. 
 
-The main reason for this is that the raw affine $x$ and $y$ coordinate are easier to inspect and debug. On top of that libgcrypt is not very forthcoming on documentation of their internal structure, especially in their elliptic curve library. This means I was not sure hoe to properly serialize the projective coordinates of a point. So just for now keep in mind that the overall implementation could be a bit faster when these unnecessary inversions of $Z$ would be eliminated.
+The main reason for this is that the raw affine $x$ and $y$ coordinate are easier to inspect and debug. On top of that libgcrypt is not very forthcoming on documentation of their internal structure, especially in their elliptic curve library. This means that it was not clear how to properly serialize the projective coordinates of a point. So just for now keep in mind that the overall implementation could be a bit faster when these unnecessary inversions of $Z$ would be eliminated.
 
-So as to not worry network level things I used Google protobuf to serialize these messages further. This also means that you could have multiple nodes which are implemented in different languages interact with each other.
+So as to not worry about any other network serialization, this framework uses Google protobuf to serialize the \cmix messages further. This also means that you could have multiple nodes which are implemented in different languages interact with each other. As long as these languages have protobuf implementations available.
 
 \subsection{Mapping strings to Ed25519 points}
 
-There is however the Koblitz's method\cite{Koblitz}. This is a probabilistic way of finding a curve point for a random integer in such a way that it is reversible. It starts by choosing a number that will be used as a  ``stride''. The string you want to encode as a point has to be interpreted as a group element and multiplied by the stride. The result of this multiplication must also be a group element. This means your message space is made smaller by a factor ``stride''. Now you can check if your result has a corresponding x coordinate. If it has one you are done, if it doesn't you add $1$ to your current trial y coordinate and check again. This continues up until your trial y coordinate reaches the value of  $first\_trial\_y\_coordinate + stride$. If you haven't found a suitable y coordinate by now this mapping algorithm fails.
+There exists no deterministic and reversible mapping from integer values to elliptic curve points. However There is a probabilistic algorithm called the Koblitz's method\cite{Koblitz}. It starts by choosing a number that will be used as a  ``stride''. The string you want to encode as a point has to be interpreted as a group element and multiplied by the stride. The result of this multiplication must also be a group element. This means your message space is made smaller by a factor ``stride''. Now you can check if your result has a corresponding $x$ coordinate, in the case of Ed25519. If it has one you are done, if it doesn't you add $1$ to your current trial $y$ coordinate and check again. This continues up until your trial $y$ coordinate reaches the value of  $first\_trial\_y\_coordinate + stride$. If you haven't found a suitable $y$ coordinate by now this mapping algorithm fails.
 
-You can map a point back to the original string by dividing the y coordinate with the ``stride''. This works because integer division and only adding less than the stride to the original random number.
+You can map a point back to the original string by dividing the y coordinate with the ``stride''. This works because of integer division and because we only add a value less than the stride to the original number.
 
-The problem with this probabilistic Koblitz's method is choosing your ``stride''. There is no way of knowing what the max distance between two consecutive suitable y coordinates could be. We do know that about half of the possible group elements would be suitable in Ed25519, but it is impossible to list all the Ed25519 curve points and check. This makes it an unsolvable problem, but we can make educated guesses and stay on the safe side that guess.
+The problem with this probabilistic Koblitz's method is choosing your ``stride''. There is no way of knowing what the max distance between two consecutive suitable y coordinates could be. We do know that about half of the possible group elements would be suitable in Ed25519, but it is impossible to list all the Ed25519 curve points and check. This makes it an unsolvable problem, but we can make an educated guess, and stay on the safe side of that guess.
 
-Now to address the concern that you divide your message space by your stride. This theoretically effects your throughput, however it only does if you would optimally pack your messages in the 252 bit message space you have available in Ed25519. However if you only use the lower 248 bits, which gives you 31 byte message space. You have 5 bits to use as a stride. Which, anecdotally, seems to be enough. Of course you can never know if this stride will work for all possible messages. But for the purpose of this benchmark it seems to work well enough. Maybe it is possible to find out how stride influences the chance of not finding a suitable y coordinate. But that is outside of the scope of this research.
+Now to address the concern that you divide your message space by the stride. This reduction will theoretically effect your throughput, however it only does if you would optimally pack your messages in the 252 bit message space you have available in Ed25519. However if you only use the lower 248 bits, which gives you 31 byte message space. You have 5 bits to use as a stride. Which, anecdotally, seems to be enough. Of course you can never know if this stride will work for all possible messages. But for the purpose of this benchmark it seems to work well enough. Maybe it is possible to find out how stride influences the chance of not finding a suitable y coordinate. But that is outside of the scope of this research.
 
-A couple of small but nice benefits of using $2^5 = 32$ as stride. Multiplication and division are bit shifts as $32$ is a power of $2$. For debugging purposes you might want to consider a stride of $16$. This makes any hexadecimal representation of a number just shift up one character. However in actual runs of the algorithm with a stride of $16$ was insufficient. Some of the runs would fail because there was no suitable $y$ coordinate within $16$ value range. This has not yet happened for $32$ yet. To reiterate this is no guarantee that it will never happen.
+A small but nice benefit of using $2^5 = 32$ as stride; multiplication and division are bit shifts as $32$ is a power of $2$. For debugging purposes you might want to consider a stride of $16$. This makes any hexadecimal representation of a number just shift up one character. However in actual runs of the algorithm with a stride of $16$ was insufficient. Some of the runs would fail because there was no suitable $y$ coordinate within $16$ value range. This has not happened for a stride of $32$ yet. To reiterate this is no guarantee that it will never happen.
 
 \subsection{Debugging the \cmix operations.}
 
-Debugging the \cmix operations is hard. Intermediate results produced by the nodes are hard to validate for the real 2048 bit numbers used in the cyclic group and the 256 bit numbers used in the ElGamal cases. Fortunately it is possible to use smaller cyclic groups and smaller curves to debug structural issues in the algorithms. For the cyclic groups this works like a charm, as the operations are pretty easy to do by hand or calculator, The operations for elliptic curve are a lot less intuitive, even for small curves. Especially with no known good library that implements these operations. This makes debugging some of the mix algorithm primitives tedious and time consuming.
+Debugging the \cmix operations is hard. Intermediate results produced by the nodes are hard to validate for the real 2048 bit numbers used in the multiplicative group and the 256 bit numbers used in the ElGamal cases. Fortunately it is possible to use smaller multiplicative groups and smaller curves to debug structural issues in the algorithms. For the multiplicative groups this works like a charm, as the operations are pretty easy to do by hand or calculator, The operations for elliptic curve are a lot less intuitive, even for small curves. Especially with no known good library that implements these operations. This makes debugging some of the mix algorithm primitives tedious and time consuming.
 
-Some tools that have been a great help in creating the implementation in general are AddressSanitizer\cite{ASan} and the companion leak check tool LeakSanitizer\cite{LSan}. These tools check all the executed code paths for any object or array access outside of the allowed bounds. This is no guarantee the application is completely memory safe. It does ensure that any out of bounds access is detected, even if the access was still technically in valid memory just not part of the object or array. The leak tool also helped to keep an eye on the tokens passed around by the \cmix library. Because the implementation language of the library was $C$ and there are 2 different implementations for the 2 different ElGamal back-ends. There is a lot of token passing. These tokens need to be destructed but have no clear overarching owner. This makes it harder to keep track of object lifetime. So LeakSanitizer helped out tracking these lost tokens. Which allowed me to eventually eliminate memory leaks. Which in turn allowed me to run the benchmark for a high number of clients on a system with relatively low specs.
+Some tools that have been a great help in creating the implementation in general are AddressSanitizer\cite{ASan} and the companion leak check tool LeakSanitizer\cite{LSan}. These tools check all the executed code paths for any object or array access outside of the allowed bounds. This is no guarantee the application is completely memory safe. It does ensure that any out of bounds access is detected, even if the access was still technically in valid memory just not part of the object or array.
+
+The leak tool also helped to keep an eye on the tokens passed around by the \cmix library. Because the implementation language of the library was $C$ and there are 2 different implementations for the 2 different ElGamal back-ends. There is a lot of token passing. These tokens need to be destructed but have no clear overarching owner. This makes it harder to keep track of object lifetime. So LeakSanitizer helped out tracking these lost tokens. Which allowed me to eventually eliminate memory leaks. Which in turn allowed me to run the benchmark for a high number of clients on a system with relatively low specs.
diff --git a/content/introduction.tex b/content/introduction.tex
index aa1ec2a..ebb45fe 100644
--- a/content/introduction.tex
+++ b/content/introduction.tex
@@ -1,5 +1,5 @@
 \section{Introduction}
 
-Showing that one piece of software is faster than another is somewhat of an art. Trying to keep as many of the variables the same while varying the one you are interested in is not easy. Especially when the implementations are not of the same algorithm. This is the case for the \cmix mix network. The only condition for the underlying cryptography is that it is ElGamal based. This means that both multiplicative groups and elliptic curves are valid implementations. This makes benchmarking the fundamental performance differences between these two difficult. This paper and companion framework implementation focuses on providing a fair comparison by between the two, by providing a common interface. 
+Showing that one piece of software is faster than another is somewhat of an art. Trying to keep as many of the variables the same while varying the one you are interested in is not easy. Especially when the implementations are not of the same algorithm. This is the case for the \cmix mix network. The only condition for the underlying cryptography is, that it is ElGamal based. This means that both multiplicative groups and elliptic curves are valid implementations. Making benchmarking the fundamental performance differences between these two difficult and interesting. This paper and companion framework implementation focuses on providing a fair comparison by between the two, by providing a common interface. Included with 2 basic implementations with the same cryptographic library.
 
-Section \ref{sec:cmix} will talk about the \cmix network and how it works and why it works with ElGamal. Followed by \ref{sec:implementation}; which talks about some implementation specific things. Section \ref{sec:cmixaddtions} will talk about some flaws in the original \cmix protocol and discusses the fix to them. Then we talk about the results in section \ref{sec:results}, followed by the discussion of the results in section \ref{sec:discussion}. Final thought and further research ideas are in the conclusion section \ref{sec:conclusion}.
-\ No newline at end of file
+The paper is structured as follows. Section \ref{sec:cmix} will talk about the \cmix network and how it works and why it works with ElGamal. Followed by \ref{sec:implementation}; which talks about some implementation specific things. Section \ref{sec:cmixaddtions} will talk about some flaws in the original \cmix protocol and discusses the fix to them. Then we talk about the results in section \ref{sec:results}, followed by the discussion of the results in section \ref{sec:discussion}. Final thought and further research ideas are in the conclusion section \ref{sec:conclusion}.
+\ No newline at end of file
diff --git a/content/results.tex b/content/results.tex
index 36c14b7..12db128 100644
--- a/content/results.tex
+++ b/content/results.tex
@@ -6,11 +6,11 @@ Network latency is off course negligible because all the participants are runnin
 
 The reason behind running 3 nodes is simple. There are subtle distinctions between what nodes do, depending on the position in the network. The first node needs to aggregate messages and initiate the mix when enough messages have been received. The last node needs to do additional calculations to prevent the tagging attack mentioned in section \ref{sec:tagging}. Additionally the last node needs to decrypt the final message en send it to its destination. So the minimal test case should contain 3 nodes. 1 first, 1 middle and 1 last node. I don't expect to see much difference between these nodes with the exception of the ``RealPost'' step as the last node needs to decrypt the ciphertext, and prepare plaintext buffers to send out to the clients.
 
-The reasoning behind running 500 clients is 2-fold. In the original \cmix paper \cite{TODO} The largest test was with 500 clients. So I wanted to mimic that. The second reason is that it still feasible to do 500 clients using a single pc with 12GB or RAM. We could still increase the number of clients by about 100 but running 500 of them gives us large enough timings that we don't need to worry about the timer resolution of the used CPU timers. And not running the extra 100 clients just gives us some headroom.
+The reasoning behind running 500 clients is 2-fold. In the original \cmix paper \cite{cMix} The largest test was with 500 clients. So I wanted to mimic that. The second reason is that it still feasible to do 500 clients using a single pc with 12GB or RAM. We could still increase the number of clients by about 100 but running 500 of them gives us large enough timings that we don't need to worry about the timer resolution of the used CPU timers. And not running the extra 100 clients just gives us some headroom when other applications needs some extra ram.
 
 For the timings I used the \emph{boost::timer::cpu\_timer}\cite{BoostCpuTimer} which has a timer resolution of $10000000ns$ for both user and system clocks on a Linux environment. This is why all the results are accurate to one-hundredth of a second. The timings used are the so called ``User'' timings. This eliminates the time spend context switching, which gives us slightly more accurate results. The system and wall times are also recorded, but just filtered out in the results table as they are not relevant. 
 
-So for gathering results I created a program called statsd, it is included in the repository. The program receives timer snapshots over TCP. So each node sends a snapshot just before they start working on a phase of the \cmix algorithm. After we are done with computational work but before sending the data to the next node another snapshot of the clock state is send to the statsd. So the results are purely the computation of that \cmix phase. With some additional conversions to the wire format, but not the overhead of sending the message over the socket. This is done just after the \cmix operation complete courtesy of the implicit strand of the boost::asio asynchronous socket operations.
+So for gathering results I created a program called statsd, it is included in the repository. The program receives timer snapshots over TCP. So each node sends a snapshot just before they start working on a phase of the \cmix algorithm. After we are done with computational work but before sending the data to the next node another snapshot of the clock state is send to the statsd. So the results are purely about the computation of that \cmix phase. With some additional conversions to the wire format of the timer snapshots, but not the overhead of sending the message over the socket. This is done just after the \cmix operation complete courtesy of the implicit ``strand'' of the boost::asio asynchronous socket operations.
 
 Gathering the results over TCP with a separate daemon enables people to run this same benchmark over separate servers. Enabling some nice test vectors as you can control network congestion and packet loss. 
 
diff --git a/content/title.tex b/content/title.tex
index 5986056..8144e8e 100644
--- a/content/title.tex
+++ b/content/title.tex
@@ -6,7 +6,7 @@
 
 \pretitle{\begin{center}\Huge\bfseries} % Article title formatting
 \posttitle{\end{center}} % Article title closing formatting
-\title{Benchmarking CMix} % Article title
+\title{Benchmarking cMix} % Article title
 \author{%
 \textsc{Dennis Brentjes}\thanks{\url{www.brentj.es}} \\[1ex] % Your name
 \normalsize Radboud University Nijmegen \\ % Your institution
author	Dennis Brentjes <d.brentjes@gmail.com>	2017-06-05 09:45:31 +0200
committer	Dennis Brentjes <d.brentjes@gmail.com>	2017-06-05 09:45:31 +0200
commit	5482f6b544fa91273ec983892681b6c67e59e825 (patch)
tree	d2a1de44153deef445508249eceb807cafa518a0 /content
parent	33483109b741824e163210acfda07dfa96876cc9 (diff)
download	thesis-5482f6b544fa91273ec983892681b6c67e59e825.tar.gz thesis-5482f6b544fa91273ec983892681b6c67e59e825.tar.bz2 thesis-5482f6b544fa91273ec983892681b6c67e59e825.zip