diff options
| author | Dennis Brentjes <dennis@brentj.es> | 2018-09-04 19:28:20 +0200 |
|---|---|---|
| committer | Dennis Brentjes <dennis@brentj.es> | 2018-09-07 00:21:35 +0200 |
| commit | 4cf29b0c3beacafd8565ca5461381f53832688ed (patch) | |
| tree | d70c95146ca8d72eb9017fce0e9c3e2e88308472 /content | |
| parent | 1e316c9a7437580f499453cdafbb0c7433a46b88 (diff) | |
| download | thesis-master.tar.gz thesis-master.tar.bz2 thesis-master.zip | |
Diffstat (limited to 'content')
| -rw-r--r-- | content/abstract.tex | 2 | ||||
| -rw-r--r-- | content/cmix.tex | 136 | ||||
| -rw-r--r-- | content/cmix_additions.tex | 10 | ||||
| -rw-r--r-- | content/conclusion.tex | 12 | ||||
| -rw-r--r-- | content/discussion.tex | 32 | ||||
| -rw-r--r-- | content/implementation.tex | 80 | ||||
| -rw-r--r-- | content/introduction.tex | 10 | ||||
| -rw-r--r-- | content/related_work.tex | 8 | ||||
| -rw-r--r-- | content/results.tex | 31 |
9 files changed, 168 insertions, 153 deletions
diff --git a/content/abstract.tex b/content/abstract.tex index 79fbd54..87fd6a6 100644 --- a/content/abstract.tex +++ b/content/abstract.tex @@ -1,3 +1,3 @@ \begin{abstract} -This paper describes a way to compare 2 different cryptographic backends for the cMix mix network. It tries to focus on making a fair comparison between the two and focuses mainly on the latency aspect of the cMix primitives. It concludes that although it still has some implementation specific problems the elliptic curve backend for cMix seems to be a faster alternative to the more conservative multiplicative group algorithm. +This paper describes a way to compare two different cryptographic backends for the \cmix mix network. The benchmark tries to focus on making a fair comparison between the two and focuses mainly on the latency aspect of the \cmix primitives. It concludes that although it still has some implementation specific problems the elliptic curve backend for \cmix is a faster alternative to the more conservative multiplicative group backend used in this benchmark. \end{abstract}
\ No newline at end of file diff --git a/content/cmix.tex b/content/cmix.tex index 5b6504b..6900918 100644 --- a/content/cmix.tex +++ b/content/cmix.tex @@ -1,81 +1,80 @@ \newcommand{\NODE}[1]{ - $node_{#1}$ + \unskip$node_{#1}$\xspace } \section{Anonymity networks} \label{sec:anon} -Networks providing anonymity are needed and are being used in the present day. Unfortunately they are partly needed to escape surveillance and prosecution from oppressive governments\cite{jardine2018tor}. Fortunately there are also other applications like voting schemes\cite{park1993efficient}. In any it helps to understand what different types of anonymity networks exist and how they work in relation to \cmix. And how the techniques used in \cmix were pioneer elsewhere and what problems \cmix addresses with regards to other anonymity networks. +Networks providing anonymity are needed and are being used in the present day. They are used by journalist and/or activist to escape surveillance and prosecution from oppressive governments\cite{jardine2018tor} or they can be used for electronic voting schemes\cite{park1993efficient}. Either way it is useful to understand what different types of anonymity networks exist, what their goals are, and how they work with regards to \cmix{}. They show how the techniques used in \cmix{} today were pioneered by other anonymity networks. Showing the problems and shortcomings of the various networks and how \cmix{} addresses some of them to better suit their target user base. -\subsection{The onion router} -We can't talk about anonymity networks without talking about The Onion Router\cite{goldschlag1999onion} or TOR for short. It's a free software project that provides access to an anonymity network based on Onion Routing. +\subsection{The Onion Router} +We can't talk about anonymity networks without talking about The Onion Router\cite{goldschlag1999onion} or TOR for short. It's a popular and free software project that provides access to an anonymity network based on onion routing. \begin{figure}[!ht] \centering \begin{subfigure}{.5\textwidth} \centering - \includegraphics[width=.9\linewidth]{images/tor.png} - \caption{A tor route, image taken from\\ \protect\url{https://www.torproject.org/about/overview.html.en}} + \includegraphics[width=.9\linewidth]{images/TOR.png} + \caption{A TOR route, image taken from\\ \protect\url{https://www.torproject.org/about/overview.html.en}} \label{fig:tor} \end{subfigure}% \begin{subfigure}{.5\textwidth} \centering \includegraphics[width=.9\linewidth]{images/layers.png} - \caption{Tor layers, image taken from\\ \protect\url{https://www.cryptologie.net/article/203/how-does-tor-works/}} + \caption{TOR layers, image taken from\\ \protect\url{https://www.cryptologie.net/article/203/how-does-tor-works/}} \label{fig:layers} \end{subfigure} - \caption{2 figures visualizing how tor works} - \label{fig:howtorworks} + \caption{Two figures visualizing how TOR works} + \label{fig:how TOR works} \end{figure} -Tor works by allowing users to select a random path consisting of 3 nodes trough the tor network as you can see in \autoref{fig:tor}. When a user wants to send a message it has to encrypt it with the key of the exit node of the chosen path. This encryption results then needs to be encrypted with the key of the middle node, a relay node. Lastly the encryption result needs to be encrypted with the key of the first node, which also is a relay node. The the user can send this encrypted ``onion'' to the first node. The first node can peel off the outer layer of encryption and send it to the second node which in turn can peel of the new outer layer. The last node removes the last layer of encryption, revealing the plaintext message of the user. This plaintext can be sent to the user chosen destination as you can see in \autoref{fig:layers}. Which from the outside decouples the actual sender and receiver of the message. +TOR works by allowing users to select a random path consisting of three nodes trough the TOR network as you can see in \autoref{fig:tor}. When a user wants to send a message it has to encrypt it with the key of the exit node of the chosen path. This encryption results then needs to be encrypted with the key of the middle node, a relay node. Lastly the encryption result needs to be encrypted with the key of the first node, which also is a relay node. The user can send this encrypted ``onion'' to the first node. The first node can peel off the outer layer of encryption and send it to the second node which in turn can peel of the new outer layer. The last node removes the last layer of encryption, revealing the plaintext message of the user. This plaintext can be sent to the users chosen destination as you can see in \autoref{fig:layers}. Which from the outside decouples the actual sender and receiver of the message. -This simplified view of TOR reveals that the sender of a certain message remains anonymous so long as at least one of your 3 nodes is not compromised. And you use End to End encryption because the last node will see your plaintext. +This simplified view of TOR reveals that the sender of a certain message remains anonymous as long as at least one of your three nodes is not compromised. One should also use end-to-end encryption, because the last node will see your plaintext. -Unfortunately there is a fairly simple timing attack that can de anonymize the traffic trough the network. This is exploited by a correlation attack\cite{Johnson:2013:UGR:2508859.2516651}. This is an attack where the attacker sees incoming traffic of your first node and the outgoing traffic of the last node. It can then try to correlate the entry times, exit times, origin, destination and size of the packets. An attacker can use this data to correlate a packet entering and exiting the network and therefore link a user and a destination with a certain probability. Therefore de-anonymizing the traffic. This attack is highly probabilistic, but could flag users for further targeted investigation. Therefore we really want to prevent this attack from being possible. +Unfortunately there is a fairly simple timing attack that can de anonymize the traffic trough the network. This is exploited by a correlation attack\cite{Johnson:2013:UGR:2508859.2516651}. This is an attack where the attacker sees incoming traffic of your first node and the outgoing traffic of the last node. It can then try to correlate the entry times, exit times, origin, destination, and size of the packets. An attacker can use this data to correlate a packet entering and exiting the network and therefore link a user and a destination with a certain probability. Thereby de-anonymizing the traffic. This attack is highly probabilistic, but could flag users for further targeted investigation. Therefore we really want to prevent this attack from being possible. -In the case of TOR, because the timing of a single packet trough the network doesn't differ that much, the attack is relatively simple. This simple form of correlation attack would be mitigated by a mix network, a different type of anonymity network. +In the case of TOR the attack is relatively simple. The timing of a single packet trough the network is stable enough between consequent packets that you can easily correlate the input into and the output out of the network. This simple form of correlation attack would be mitigated by a mix network, a different type of anonymity network. \subsection{Mix networks} -The first mix network was proposed and developed by David Chaum \cite{chaum1981untraceable}. This mix network makes use of $N$ nodes. Each of these nodes has a public/private key pair. All users and nodes also have a identifier. Users that want to use the mix network have to package their message by pre-pending the identifier of the destination to the message and encrypts it with the public key of \NODE{N}. It then prepends the identifier of \NODE{N} and encrypts it with the public key of \NODE{N-1}. The client does this for all the nodes in the network working backwards. It can then send the message to the first node. +The first mix network was proposed and developed by David Chaum \cite{chaum1981untraceable}. This mix network makes use of $N$ nodes. Each of these nodes has a public/private key pair. All users and nodes also have an identifier. Users that want to use the mix network have to create their message by pre-pending the identifier of the destination to the message. The user then encrypts it with the public key of \NODE{N}. It then prepends the identifier of \NODE{N} and encrypts it with the public key of \NODE{N-1}. The client does this for all the nodes in the network working backwards. It can then send the message to the first node (\NODE{1}.) -This first node can now unpack the message it receives and retrieve an identifier for the next node and a encrypted message which only the next node can decrypt. The last node can decrypt the original destination and message which it can send it to the end user. Up until this point this is roughly how the TOR anonymity network operates, but there is a difference. The first node in the \cmix network does not immediately send out the messages it receives. The node first collects up to $P$ messages. When this threshold is achieved it will decrypt all the messages and randomly shuffle the order they were in, otherwise known as mixing. It then sends them to the next node. Another subtle difference is that each message within one mix operation should have the same length, However it is possible to choose a large enough message size and pad all the messages to this fixed length. +This first node can now decrypt the message it received. It then ends up with an identifier for the next node and a encrypted message which only the node with that identifier can decrypt. The current node sends that encrypted message to the destination with the corresponding identifier. This will continue until it reaches the last node. The last node can decrypt the original destination and message which it can send it to the end user. This is also how the TOR network operates, however we left out some details. The first node in the mix network does not immediately send out the messages it receives. The node first collects up to $P$ messages. When this threshold is reached it will decrypt all the messages and randomly shuffle the order they were in, also known as mixing. It then sends them to the next node. Another subtle difference is that each message within one mix operation should have the same length. However, it is possible to choose a large enough message size and pad all the messages to this fixed length. -This mixing and delaying in the first node causes an arbitrary amount of delay on when a message arrives at the first node and exits the first node. Furthermore, an outsider analyzing the input and output of the nodes cannot see which packet went where in the mixing operation. So it cannot keep track of a specific message. This is what grants the additional anonymity within mix networks, as it mitigates the simple timing attacks possible on TOR. However more complex attacks are still possible. By keeping track of multiple runs of the mix network and probing in between mix nodes. The mix network, without additional mitigations, is still vulnerable to flow analysis attacks\cite{zhu2004flow}. +The aggregation of messages in the first node cause an arbitrary amount of delay on the arrival of the messages. Either the arrival in the next node or the final destination of the packet. The mixing of hte messages makes it impossible for an outsider analyzing the input and output to correlate the input timing of a packet and it's position in the output buffer of a node. This means it cannot keep track of a specific message. This is what grants the additional anonymity within mix networks, as it mitigates the simple timing attacks possible on, for instance, TOR. However, more complex attacks are still possible. By keeping track of multiple runs of the mix network and probing in between mix nodes. The mix network, without additional mitigations, is still vulnerable to flow analysis attacks\cite{zhu2004flow}. \subsection{Re-encryption mix networks} -The introduction of Re-encryption mix nets by Park et all \cite{park1993efficient} -introduces the usage of \elgamal encryption in mix nets. Using it's homomorphic properties to no longer de-encrypt each incoming message but rather just re-encrypt the message which is faster. It also has the effect that the ciphertext no longer lengthens in the order of the amount of nodes in the mix network. Which does happen in the classic Chaum mix network and in TOR. +The introduction of Re-encryption mix networks by Park et al. \cite{park1993efficient} +introduces the usage of \elgamal{} encryption in mix networks. Using it's homomorphic encryption properties to no longer de-encrypt each incoming message but rather just re-encrypt the message, which is faster. It also has the effect that the ciphertext does not lengthen in the order of the amount of nodes in the mix network. This does happen in the classic Chaum mix network and in TOR, which needs to be fixed by adding additional padding. -The network operates by each node having a public and private \elgamal key. Each node publishes its public key and a user can use all of the public keys of the nodes to encrypt a message. Each node can re-encrypt the message with it's private key. After the last nodes successfully re-encrypts the value the plain text is revealed. In the original paper this is used as a voting scheme and therefore no mention is made of a Receiver as it just needs to aggregate in one place. But is of course extensible to support sending messages to specific recipients. +The network operates by each node having a public and private \elgamal{} key. Each node publishes its public key and an user can use all of the public keys of the nodes to encrypt a message. Each node can re-encrypt the message with it's private key. After the last nodes successfully re-encrypts the value the plain text is revealed. In the paper by Park et al. \cite{park1993efficient} this is used as a voting scheme and therefore no mention is made of a ``receiver'' or ``identifier'' as the goal is to aggregate the plaintexts in one place without revealing the original sender of a particular message. This scheme could be extended to support sending messages to specific recipients. -A major downside of these classic mix network is the amount of public key operations the client and nodes need to do when sending single message. This may not be an issue on modern day desktop computers and or low traffic volume applications, but it is an issue for mobile phones' with respect to battery life and low-power devices such as sensors networks. +A major downside of these classic mix network is the amount of public key operations the client and nodes need to do when sending a single message. This may not be an issue on modern day desktop computers and/or low traffic volume applications. It is however, an issue for mobile phones with respect to battery life and low-power devices such as sensors networks. -This is something that \cmix is attempting to resolve by introducing a precomputation phase and the use of the homomorphic properties of \elgamal. +This is something that \cmix{} is attempting to resolve by introducing a precomputation phase and the use of the homomorphic properties of \elgamal{}. -\section{\cmix} +\section{\cmix{}} \label{sec:cmix} -\cmix is a new anonymity mix network\cite{cMix}. Just like any other mix network it aims to provide anonymity by hiding timing information of messages. This means hiding the difference in time between a message leaving the client and arriving at its destination. +\cmix{} is a new anonymity mix network\cite{cMix}. Just like any other mix network it aims to provide anonymity by hiding timing information of messages. This means hiding the difference in time between a message leaving the client and arriving at its destination. -A \cmix network is a fixed network consisting of $N$ nodes. This means there is a fixed network order and all clients know which computer represents each node in the network. It uses \elgamal encryption. And it relies heavily on the homomorphic properties of \elgamal. +A \cmix{} network is a fixed network consisting of $N$ nodes. This means there is a fixed network order and all clients know which computer represents each node in the network. It uses \elgamal{} encryption and it relies heavily on the homomorphic properties of \elgamal{}. -The \cmix network operates in 3 phases. Initialization, precomputation and realtime. During the initialization phase shared keys are set up between nodes and clients. This is the only time clients need to do public key operations as they have to establish a shared key with every node using Diffie-Hellman key exchange. This is why all communications between the nodes and from client to node have to be authenticated. One way to accomplish this is by using SSL connections for all communications within the network. Remember that the focus of this network is not encrypted traffic, recall that the last nodes sees all the plaintexts, but rather to hide timing information from an attacker. +The \cmix{} network operates in three phases. Initialization, precomputation, and realtime. During the initialization phase, shared keys are set up between nodes and clients. This is the only time clients need to do public key operations as they have to establish a shared key with every node using Diffie-Hellman key exchange\cite{diffie1976new}. This is why all communication between the nodes and from client to node have to be authenticated. One way to accomplish this is by using SSL connections for all communications within the network. Remember that the focus of this network is not encrypted traffic, recall that the last nodes sees all the plaintexts, but rather to hide timing information from an attacker. -The fact that \cmix minimizes the amount of public-key cryptographic operations for the client, makes it appealing for low power devices. This allows devices that cannot draw high amounts of power at all times due to battery constrains or NFC power limits, or mobile phones of which you want to conserve battery power, to use the \cmix network. +The fact that \cmix{} minimizes the amount of public-key cryptographic operations for the client, makes it appealing for low power devices. This allows devices that cannot draw high amounts of power at all times due to battery constrains or NFC power limits to make use of this anonymity network with minimal impact on the design. It is also interesting for devices which want to save power instead of having a hard limit, such as mobile phones. -Another advantage is that in theory the latency of the actual messages during the realtime phase should be lower than other mix networks. Again due to the lack of public-key cryptography during this phase. - -The fact that \cmix minimizes the amount of public-key cryptographic operations for the client, makes it more appealing for low power devices. So devices that cannot draw high amounts of power all the time due to battery constrains or NFC power limits. Mobile phones of which you want to conserve battery power. - -Another advantage is that in theory the latency of the actual messages during the realtime phase should be lower than other mix networks. Again due to the lack of public-key cryptography during this phase. +Another advantage is that in theory the latency of the communication during the realtime phase is lower than other mix networks. Again due to the lack of public-key cryptography during the realtime phase. \subsection{Initialization phase} -During the initialization phase the first node sends his public key to the next node, which multiplies his own public key with the value he received and sends it to the next. The last node sends it back to the first node which stores this final value and sends it on the next node which also stores this value. Now all nodes have the product of all the nodes' public keys. +During the initialization phase the first node sends his public key ($d_1$) to the next node, which multiplies his own public key ($d_2$) with the value it received and sends the product of the two, to the next node. The last node also multiplies it's public key ($d_N$) and sends the product back to the first node which stores this final product ($E$) and forwards it to the next node which also stores the value of $E$. it keeps getting forwarded until it reaches the last node. Now all nodes have the shared network key $E$ as shown in \autoref{eq:E}. -$$ E = \prod_N d_i $$ +\begin{equation} +\label{eq:E} +E = \prod_N d_i +\end{equation} \vspace{-1em} where: \begin{itemize}[label=] @@ -83,9 +82,12 @@ where: \item $E$ is the shared key between nodes \end{itemize} -When a client connects, the client agrees upon a key using Diffie-Hellman key-exchange with each node. These keys will be used to send messages into the network, which will happen during the realtime phase +When a client connects, the client agrees upon a key using Diffie-Hellman key-exchange with each node. Each client will store these keys in a vector $K_c$ as show in \autoref{eq:Kc} These keys will be used to send messages into the network, which will happen during the realtime phase. The Nodes will keep track of a similar vector per user to undo the shared key encryption done by the user in the realtime phase. -$$ K_c = \{k_0, k_1,\ldots, k_{N-1}\} $$ +\begin{equation} +\label{eq:Kc} +K_c = \{k_0, k_1,\ldots, k_{N-1}\} +\end{equation} \vspace{-1em} where: \begin{itemize}[label=] @@ -93,15 +95,15 @@ where: \item $K_c$ The vector of Keys stored by the client \end{itemize} -During any part of the protocol a client may send a message into the network. This is done by either multiplying the message with the multiplicative group shared keys $K_c$, or by adding the elliptic curve points shared keys $K_c$ to the message. Because the original paper only referenced multiplicative group \elgamal the rest of this description will refer to this operation as multiplication. But you can refer to the ``$\cdot$'' operation as a ``combine'' operation which either means multiplication or addition within their respective domains. This is also the name of the operation in the \cmix library created by this benchmark framework, together with its companion method ``uncombine'' which multiplies or adds the inverse in their respective domain. +During any part of the protocol a client may send a message into the network. This is done by either multiplying the message with the multiplicative group shared keys $K_c$, see \autoref{eq:message}, or by adding the elliptic curve points shared keys $K_c$ to the message. Because the original paper only referenced multiplicative group \elgamal{} the rest of this description will refer to this operation as multiplication. But you can refer to the ``$\cdot$'' operation as a ``combine'' operation which either means multiplication or addition within their respective domains. This is also the name of the operation in the \cmix{} library created by this benchmark framework, together with its companion method ``uncombine'' which multiplies or adds the inverse in their respective domain. \begin{equation} -Message = M \cdot k_0 \cdot k_1 \cdot ... \cdot k_{N-1} \label{form:message} +Message = M \cdot k_{c_0} \cdot k_{c_1} \cdot ... \cdot k_{c_{N-1}} \label{eq:message} \end{equation} \vspace{-1em} where: \begin{itemize}[label=] -\item $K_c$ The shared key between client and node $i$ +\item $K_{c_i}$ The shared key between client and node $i$ \item $M$ The plaintext message \end{itemize} @@ -110,79 +112,87 @@ where: \begin{figure}[!ht] \centering \includegraphics[width=.9\linewidth]{images/basic_precomputation.pdf} - \caption{\cmix precomputation phase, image provided by Joeri de Ruiter from the \cmix paper \cite{cMix}} + \caption{\cmix{} precomputation phase, image provided by Joeri de Ruiter from the \cmix{} paper \cite{cMix}} \label{fig:cmix_pre} \end{figure} -The precomputation phase can be started as soon as $P$, the minimum number of messages to have accumulated in \NODE{0}, is known. The first node generates $P$ random values $R$ and $S$. $R$ is a vector of $P$ values as is $S$. When encrypting or multiplying these vectors the component-wise encryption or multiplication is implied. It also generates a random permutation function $\pi$ which can randomly shuffle $P$ messages. +The precomputation phase can be started as soon as $P$, the minimum number of messages to have accumulated in \NODE{0}, is known. The whole precomputation phase looks like \autoref{fig:cmix_pre}. The rest of this section will discuss the internals. -The precomputation can be split up into 3 phases. +In preparation of \emph{each} precomputation phase all the nodes generate 2 vectors of length $P$ random values called R and S. When encrypting or multiplying these vectors the component-wise encryption or multiplication is implied. The nodes also generate a random permutation function $\pi$ which randomly shuffles $P$ messages. -In the first phase the first node encrypts his $R$ with the key $E$ and sends it to the $node_{next}$. $Node_{next}$ also generates $P$ random values $R$ and $S$ and encrypts his $R$ with the key $E$. $Node_{next}$ multiplies it's encryption result with the values received by the first node, and by the homomorphic encryption property of \elgamal the result of this multiplication is the encryption of the multiplication of the two original $R$ vectors. It then sends the result of the multiplication to the next node which also encrypts his $R$ and multiplies it with his input and sends it on. +The precomputation can be split up into three parts. -$$ \mathcal{E}_E(R_0 \cdot R_1) = \mathcal{E}_E(R_0) \cdot \mathcal{E}_E(R_1) $$ +In the first part of the precomputation phase, the first node encrypts his vector $R$ with the key $E$ and sends the encrypted result to $node_{next}$. $Node_{next}$ also encrypts his $R$ with the key $E$. $Node_{next}$ multiplies it's encryption result with the values it received from the first node, and by the homomorphic encryption property of \elgamal{} (\autoref{eq:homo}), the result of this multiplication is the encryption of the multiplication of the two original $R$ vectors. The node then sends the result of this multiplication to the next node which also encrypts his $R$ and multiplies it with his input and forwards the result again. + +\begin{equation} +\label{eq:homo} +\mathcal{E}_E(R_0 \cdot R_1) = \mathcal{E}_E(R_0) \cdot \mathcal{E}_E(R_1) +\end{equation} \vspace{-1em} where: \begin{itemize}[label=] -\item $\mathcal{E}_E$ is \elgamal encryption under key $E$ +\item $\mathcal{E}_E$ is \elgamal{} encryption under key $E$ \item $R_i$ is the R vector of node $i$ \end{itemize} -Because of the circular nature of the \cmix network, where the last node is connected to the first node, whenever the result reaches the first node again phase 2 of the precomputation can start. The next phase is called the mix phase. Each node uses its permutation function $\pi$ on the incoming vector of values and multiplies that result with the encryption of $S$. It sends its result to the next node which will do the same. The last node produces the following result. +Because of the circular nature of the \cmix{} network, the last node is connected to the first node, whenever the result reaches the first node again part two of the precomputation can start. The next part is called the mix part. Each node uses its permutation function $\pi$ on the incoming vector of values and multiplies that result with the encryption of $S$. It sends the result to the next node which will do the same. The last node produces the following end-result (\autoref{eq:EPiRS}). \begin{align} -&\mathcal{E}_E( \nonumber \\ -&\hspace{1em}\pi_{N-1}( ... \nonumber \\ +&\mathcal{E}_E( \label{eq:EPiRS} \\ +&\hspace{1em}\pi_{N-1}( \nonumber \\ +&\hspace{2em}... \nonumber \\ &\hspace{2em}\pi_{1}( \nonumber \\ &\hspace{3em}\pi_{0}(R_{0} \cdot ... \cdot R_{N-1}) \cdot S_0 \nonumber \\ &\hspace{2em}) \cdot S_1 \nonumber \\ -&\hspace{1em} ... ) \cdot S_{N-1} \label{form:EPiRS} \nonumber \\ -&) +&\hspace{1em} ... ) \cdot S_{N-1} \nonumber \\ +&) \nonumber \end{align} \vspace{-1em} where: \begin{itemize}[label=] -\item $\mathcal{E}_E$ is \elgamal encryption under key $E$ +\item $\mathcal{E}_E$ is \elgamal{} encryption under key $E$ \item $R_i$ is the R vector of node $i$ \item $S_i$ is the S vector of node $i$ \item $\pi_i$ is the permutation function of node $i$ \end{itemize} -The third part of the precomputation, is about decrypting this final value. Each node can perform part of the decryption with his private key part of $E$. Together with the encryption specific random value which is part of \elgamal this value is called the decryption share. With this decryption share you can remove your random permutation and the random value vectors $R$ and $S$ from the eventual realtime messages that will be generated in the realtime phase. +The third part of the precomputation, is about decrypting this final value. Each node can perform part of the decryption with his private key part of $E$. Together with the encryption specific random value, which is part of \elgamal{}, this value is called the decryption share. With this decryption share you can remove your random permutation and the random value vectors $R$ and $S$ from the eventual realtime input message buffer that will be generated by the realtime phase. \subsection{Realtime phase} \label{sec:realpre} -Whenever the first node received $P$ messages as described in formula \eqref{form:message}, and the precomputation has ended, the realtime phase can begin. The realtime phase can also be split into 3 different stages. +Whenever the first node received $P$ messages as described in \autoref{eq:message}, and the precomputation phase has ended, the realtime phase can begin. The realtime phase can also be split into three different parts. The whole realtime phase looks like \autoref{fig:cmix_real}. \begin{figure}[!ht] \centering \includegraphics[width=.9\linewidth]{images/basic_realtime.pdf} - \caption{\cmix realtime phase, image provided by Joeri de Ruiter from the \cmix paper \cite{cMix}} + \caption{\cmix{} realtime phase, image provided by Joeri de Ruiter from the \cmix{} paper \cite{cMix}} \label{fig:cmix_real} \end{figure} -In the first stage each node multiplies the message with the inverse of the key it agreed upon in the initialization phase. It then multiplies that result with $r_i$, where $i$ is the position of the message in the buffer, replacing $K_c$ with the corresponding value in $R$. The result is not yet permuted so the messages stay in the same place. The result gets passed to the next node which does the same steps. So for all the messages in the input we do. +In the first part, the precomputation part of the realtime phase, each node multiplies the message with the inverse of the key it agreed upon with the sender of the message in the initialization phase. It then multiplies that result with $r_i$, which is the $i^{th}$ value of the random vector R. $i$ is the position index of the message in the buffer. So in fact it replaces $K_c$ with a value of $R$. The result is not yet permuted so the messages stay in the same place. The result gets passed to the next node which does the same steps. This means, for all the messages in the input we do the following (\autoref{eq:MR}). -\[ -M_{out} = M_{in} \cdot K_{ci}^{-1} \cdot R_j -\] +\begin{equation} +\label{eq:MR} +M_{out} = M_{in} \cdot K_{c_j}^{-1} \cdot r_i +\end{equation} \vspace{-1em} where: \begin{itemize}[label=] \item $M_{in}$ is one of the input messages -\item $R_j$ the corresponding r of $M_{in}s$ position -\item $K_{ci}$ Is the shared key of $M_{in}s$ position +\item $r_i$ is the value of $R$ at $M_{in}s$ position +\item $K_{c_j}$ Is the shared key of $M_{in}s$ position \end{itemize} -The second phase is similar to the second phase of the precomputation, A node takes the message buffer, swaps around the messages according to its permutation function $\pi$ it then multiplies it by $S$, each node does this step. +The second part of the realtime phase is similar to the second phase of the precomputation phase. A node takes the message buffer, swaps around the messages according to its permutation function $\pi$ it then multiplies it by $S$, each node does this and the computation is described in \autoref{eq:EPIRS} \begin{align} -&M_{out} = \pi_{N-1}( \ldots \nonumber \\ +&M_{out} = \pi_{N-1}( \label{eq:PiMRS} \\ +&\ldots \nonumber \\ &\hspace{1em}\pi_{1}( \nonumber \\ &\hspace{2em}\pi_{0}(R_{0} \cdot \ldots \cdot R_{N-1} \cdot M_{in}) \cdot S_0 \nonumber \\ &\hspace{1em}) \cdot S_1 \nonumber \\ -& \ldots ) \cdot S_{N-1} \label{form:PiMRS} \nonumber \\ +& \ldots ) \cdot S_{N-1} \nonumber \end{align} \vspace{-1em} where: @@ -192,7 +202,7 @@ where: \item $S_i$ is the S vector of node $i$ \end{itemize} -The last phase is to remove the R and S vectors from the input. The last node stored decryption of formula \ref{form:EPiRS}. The last node calculates the inverse of this group element and multiplies it with the result of \ref{form:PiMRS}. This will cancel out all permuted R and S values, and the last node will be left with $M$. +The last phase is to remove the $R$ and $S$ vectors from the input. The last node stored decryption of formula \autoref{eq:EPiRS}. The last node calculates the inverse of this group element and multiplies it with the result of \autoref{eq:PiMRS}. This will cancel out all permuted R and S values, and the last node will be left with $M$. Very critical to see is that during the realtime phase no public-key cryptography is being used, only multiplications. And the clients, outside of the initialization phase, never have to do any public-key cryptography. It is also possible for client and servers to update the shared keys $K_c$ using the old key as a seed. So in theory a client only needs to do the key agreement once per network. diff --git a/content/cmix_additions.tex b/content/cmix_additions.tex index a9675a9..ef6c0ff 100644 --- a/content/cmix_additions.tex +++ b/content/cmix_additions.tex @@ -1,12 +1,12 @@ -\section{\cmix additions} +\section{\cmix{} additions} \label{sec:cmixaddtions} -The base protocol still has some issues\cite{galteland2016attacks}, thankfully these issues can be addressed at the cost of some speed and clarity. Because it would not be safe to use \cmix in the wild without these attack mitigations. This implementation adds the extra messages needed as this results in a more realistic benchmark. +The base protocol of \cmix{} has some issues\cite{galteland2016attacks}, thankfully these issues can be addressed at the cost of some speed and clarity. Because it would not be safe to use \cmix{} in real world applications without these attack mitigations. This implementation adds the extra messages needed as this results in a more realistic benchmark. \subsection{Tagging attack} \label{sec:tagging} -In a tagging attack an adversary changes a message slightly in such a way that it can later detect and reverse the change. Detection to be able to track a message even though it has been permuted. Reversible because the adversary needs to stay undetected. The easiest variant; when a malicious person had control over the last node. This type of attack and the mitigation being discussed here were found and explained in \cite{galteland2016attacks} +In a tagging attack an adversary changes a message slightly in such a way that it can later detect and reverse the change. This modification allows the attacker to track a message even though it has been permuted. By making the change reversible the adversary stays undetected. This type of attack and the mitigation being discussed here are described by Galteland et al.\cite{galteland2016attacks} -When you control the last node you can change the output of realtime precomputation phase slightly. You can do this by slightly changing your value of $r$ one slot of the input. You either combine the input with $r * i$, for cyclic group ElGamal, or $r + p$, for elliptic curve implementations. After all the realtime computations are done you have the plaintexts that you want to send to their destination. If you can verify that one of the outputs is not valid, it probably is the value you modified with either $i$ or $p$. Now you know the slot this value used to be in and you can reverse your tag by doing the reverse operation. This is undetectable by other nodes or any client and thus compromises the network. Note that no node in \cmix is special. The claim is that when all but one node is compromised the network should still function as intended and keep your transmissions anonymous. +When you control the last node you can change the output of realtime precomputation phase slightly. You can do this by slightly changing your value of $r$ in one slot of the input. You either combine the input with $r * i$, for cyclic group ElGamal, or $r + p$, for elliptic curve implementations. After all the realtime computations are done you have the plaintexts that you want to send to their destinations. If you can verify that one of the outputs is not valid, it probably is the value you modified with either $i$ or $p$. Now you know the slot this value occupied when it was send to the first node in the realtime phase. Then you can reverse your tag by doing the reverse operation, which makes it undetectable by other nodes or any of the clients and thus compromises the network. Note that no node in \cmix{} is special. The claim is that when all but one node is compromised the network should still function as intended and keep your transmissions anonymous. -To stop this attack we need to change the protocol slightly as described in. First we need to change the third step of the precomputation phase. Instead of sending the decryption shares of each node to the next, we send a hash, a commitment to our decryption shares. The nodes keep the decryption shares to themselves, and will use them separately in the realtime phase. The last node also includes a hash of the current mix result. So the hash of the decryption of formula \ref{form:EPiRS}. This makes that an adversary can no longer tamper with the $r$ values in the realtime phase, which means an attacker can no longer apply the tag without being detectable by other nodes. +To stop this attack we need to change the protocol slightly as described in the paper by Galteland et al.\cite{galteland2016attacks}. First we need to change the third step of the precomputation phase. Instead of sending the decryption shares of each node to the next, we send a hash, a commitment to our decryption shares. The nodes keep the decryption shares to themselves, and will use them separately in the realtime phase. The last node also includes a hash of the current mix result. So the hash of the decryption of formula \autoref{eq:EPiRS}. This makes that an adversary can no longer tamper with the $r$ values in the realtime phase, which means an attacker can no longer apply the tag without being detectable by other nodes. diff --git a/content/conclusion.tex b/content/conclusion.tex index befeb8d..ea73c04 100644 --- a/content/conclusion.tex +++ b/content/conclusion.tex @@ -1,14 +1,14 @@ -\section{conclusion} +\section{Conclusion} \label{sec:conclusion} -The big picture shows using elliptic curve for \cmix can be very beneficial. Especially in the realtime phase of the protocol. And I've shown that there is room for optimization of the the protocol implementation itself. These optimizations could even make it perform better in the precomputation phase as well. +Using elliptic curve for \cmix{} can be very beneficial overall. Especially in the realtime phase of the protocol. This benchmark has shown that there is room for optimization of the protocol implementation itself. These optimizations could even make it perform better in the precomputation phase as well. -So in general using elliptic curve for \cmix shows promise. However there is more research to be done. Writing optimized backends for both algorithms and rerunning the tests to see if the differences still hold, is one of them. This is one of the fundamental things that need to be checked in further research. The goal of this research was feasibility, and this research shows just that. Now somebody with knowledge of writing fast and constant time cryptography implementations can pickup the topic of writing specialized backends and retest the algorithm. +So in general using elliptic curve for \cmix{} is promising. However there is more research to be done. For example, writing optimized backends for both algorithms and rerunning the tests to see if the differences still hold. This is one of the fundamental things that need to be checked in future research. The goal of this research was feasibility, this research shows just that. Hopefully this will be picked up to fully exploit the opportunities offered by elliptic curve \elgamal{}. -Another point to be taken into consideration is that this is the happy flow of the protocol. Checks like the tagging attack mitigation talked about in section \ref{sec:tagging} are implemented in a ``null'' fashion. Meaning the functions are in place and are being called but the implementation just returns a ``null'' byte as hash of the things it supposed to hash. Therefore the hash checking code is not in place. This was a deliberate choice, as these checks would not meaningfully affect the timings. The hashing algorithm scales linearly with the input size and would be the same over the 2 protocols. and checking it would also scale linearly with the input. Therefore it would be a constant time addition. In fact it would only pollute the timing results of the other cryptographic operations. However the protocol therefore needs some work to incorporate the hash checking where necessary. Therefore complying to the protocol standard. +Another point to be taken into consideration is that this is the happy flow of the protocol. Checks like the tagging attack mitigation talked about in section \ref{sec:tagging} are implemented in a ``null'' fashion. Meaning the functions are in place and are being called but the implementation just returns a ``null'' byte as hash of the things it supposed to hash. Therefore the hash checking code is not in place. This was a deliberate choice, as these checks would not meaningfully affect the timings. The hashing algorithm scales linearly with the input size and would be the same over the two protocols. Checking it would also scale linearly with the input. Therefore it would result in a constant time addition. In fact it would only pollute the timing results of the other cryptographic operations. However for the \cmix{} library to be used a reference implementation this addition needs to be implemented. -Another interesting avenue to take is to simulate real network traffic. There are frameworks\cite{zhang2015survey} to this end but none of the more popular and established ones work on the application layer. And to adapt the current framework to work on a network level or to convert route network traffic over the application layer is too much work to be in scope for this research. +Another interesting research direction to take is to simulate real network traffic. There are frameworks\cite{zhang2015survey} that do this, but none of the more popular and established ones work on the application layer. The work needed to adapt the current framework to work on a network level or to route the network traffic over the application layer is too much work to be in scope of this study. -And finally, there is still room to run this benchmark on separate machines. Having a server dedicated for each of the nodes and have 500 separate clients run the protocol. The benchmark framework is capable of this type of setup. All the communication is done over $TCP + SSL$ for the node communication and $TCP$ for the communication to the statistics collection daemon. But again writing additional tooling to distribute the executables and automation scripts would just take too long for the current research. +Finally, there is still room for future research by running this benchmark on separate machines. Having a server dedicated for each of the nodes and have 500 separate clients run the protocol. The benchmark framework can support such a setup. All the communication between nodes and clients is done over $TCP$ sockets with $SSL$ and the communication with the statistics daemon is done over $TCP$. Unfortunately writing the additional tooling to facilitate the deployment of all client and nodes on separate machines is out of scope for this paper. \newpage diff --git a/content/discussion.tex b/content/discussion.tex index 34bdc4e..cd646cc 100644 --- a/content/discussion.tex +++ b/content/discussion.tex @@ -3,14 +3,14 @@ We will now discuss the results in more detail, zooming in on all the separate phases and see if we can either verify our expectations or explain any deviations. -\subsection{Justification af claims} +\subsection{Justification of claims} \label{sec:microbench} -To justify the claims made in the rest of the discussion section a micro benchmark was created for both implementations, \ec and \mg. Please note that these are raw operation on their respective elements and thus does not take into account the difference in size. Therefore the \ec results should effectively be multiplied by $8$ to have a straight up comparison. +To justify the claims made in the rest of the discussion section a micro benchmark was created for both implementations, \ec and \mg. Please note that these are raw operation on their respective elements and thus does not take into account the difference in group element size. Therefore the \ec results should effectively be multiplied by $8$ to have a straight up comparison. \begin{description} -\item[Random] The time it took to generate 10000 random group elements / curve points. -\item[Combine] The time it took to multiply 10000 group elements or to add 10000 curve points. -\item[Uncombine] The time it took to invert 10000 group elements / curve points. And then multiply or add them. +\item[Random] The time it took to generate 10,000 random group elements / curve points. +\item[Combine] The time it took to multiply 10,000 group elements or to add 10,000 curve points. +\item[Uncombine] The time it took to invert 10,000 group elements / curve points and subsequently multiply or add them. \end{description} \noindent The results of a micro benchmark on multiplicative group. @@ -19,40 +19,40 @@ To justify the claims made in the rest of the discussion section a micro benchma \noindent The result of a micro benchmark on elliptic curve Ed25519 \vspace{-\parsep}\vspace{-\parsep}\VerbatimInput{results/mb_ec.txt} -This clearly shows that adding curve points is much faster than multiplying 2048 bit numbers in a group, but it also clearly shows but it also shows how slow inverting is. This gives us insight into how the results we see in the above tables came to be. +This clearly shows that adding curve points is much faster than multiplying 2048 bit numbers in a group. It also clearly shows how slow inverting is. This gives us insight into how the results we see in the above tables (Tables \ref{tab:ec500} and \ref{tab:mg500}) came to be. \subsection{Precomputation precomputation} -Right away we can see a large difference between \ec and \mg, with \ec being more than $2$ times slower. Most likely this can be explained by 2 factors. The first being that this step includes generating all the randomness needed for the \cmix run. It generates $R$ and $S$ and the permutation $\pi$. For Ed25519 it tries a $Y$-value and if it doesn't map onto the curve we try a different one. On average only half of the points would be valid. +Right away we can see a large difference between \ec and \mg, with \ec being more than $2$ times slower. Most likely this can be explained by two factors. The first being that this step includes generating all the randomness needed for the \cmix{} run. It generates $R$, $S$, and the permutation $\pi$. For $Ed25519$ it tries a $y$-value and if it doesn't map onto the curve we try a different one. On average only half of the points would be valid. This means that, in the limit towards infinity, twice as many candidate $y$ coordinates need to be generated and checked if they map to the curve, than we need random curve points (see \autoref{eq:limit}.) \begin{equation} -\label{eq:prepre} +\label{eq:limit} \sum_{x=0}^{\infty} n \cdot \left( \frac{1}{2}\right)^x =\ 2n \end{equation} -A quick benchmark done with ``valgrind --tool=callgrind''\cite{ValgrindCallgrind} shows us that for the entire run $70\%$ of time is spend in getting random bytes trough the $libgcrypt$ api for \ec. But ``only'' $46\%$ is spend gathering random bytes for the \mg +A quick benchmark done with ``valgrind --tool=callgrind''\cite{ValgrindCallgrind} shows us that for the entire run $70\%$ of time is spend in getting random bytes trough the $libgcrypt$ api for \ec, but only $46\%$ is spend gathering random bytes for the \mg -Another slowing factor might be the memory fragmentation. The way that libgcrypt operates is that each object is it's own opaque pointer to an internal data structure. So there is no way to consolidate memory into a single block. This wouldn't be a factor if both algorithms would use the same amount of allocations, but now that we do 8 simultaneous mixes for \ec we have 8 times the amount of the allocations for the \ec algorithm. This impacts the performance quite heavily +Another slowing factor might be the memory fragmentation. The way that libgcrypt operates is that each object is it's own opaque pointer to an internal data structure. So there is no way to consolidate memory into a single block. This would not be a factor if both algorithms would use the same amount of allocations, but now that we do 8 simultaneous mixes for \ec we have 8 times the amount of the allocations for the \ec algorithm. This impacts the performance quite heavily \subsection{Precomputation mixing} \label{sec:premix} -Another big time loss is attributed to the wire format chosen. because we send over the affine $x$ and $y$ coordinate for \ec, we need to convert the coordinate from projected to affine coordinate systems. This takes up extra time which can be eliminated if we choose. Unfortunately if we want to do this using libgcrypt we need to send over all 3 projected coordinates as there is no way to use only the $Y$ and $Z$ coordinates to reconstruct the curve point in libgcrypt. +Another big time loss is attributed to the wire format chosen, because we send over the affine $x$ and $y$ coordinate for \ec, we need to convert the coordinate from projected to affine coordinate systems. This takes up extra time which can be eliminated if desired. Unfortunately if we want to do this using libgcrypt we need to send over all three projected coordinates as there is no way to only use the $Y$ and $Z$ coordinates to reconstruct the curve point in libgcrypt. -Another downside to fixing this by sending projected coordinates is that there is no way of knowing how many bits a projected coordinate is, so the allocation of buffer space in the wire format container will become dependent on the point input. Something which is not necessary for the \mg implementation because it's buffer size will always be the size of the group element. This has to be exposed via the generic API if you to be able to send projected coordinates, because the application binaries, the nodes and clients, are responsible for all the network transmissions, not the \cmix libraries. +Another downside to fixing this by sending projected coordinates is that there is no way of knowing how many bits of buffer space a projected coordinate will occupy, so the allocation of buffer space in the wire format container will become dependent on the point input. Something which is not necessary for the \mg implementation because it's buffer size will always be the size of the group element. Because the application binaries, the nodes and clients, are responsible for all the network transmissions. This specific projected point size needs to be exposed by the generic API. \subsection{Precomputation postprocessing} -In this stage we only do a `simple' cryptographic operation on each of the points to determine the node's decryption share. There are no extra allocations and no unneeded inversions, and we can see \ec is $2.5$ times faster. +In this stage we only do a `simple' cryptographic operation on each of the points to determine the node's decryption share. There are no extra allocations nor unneeded inversions, and we can see \ec is $2.5$ times faster then \mg. \subsection{Realtime precomputation} -this phase is a large time saver, namely \ec is almost $13$ times faster. Additionally it is a time saver in the realtime portion of the protocol. This will massively impact perceived latency by the clients. The cryptographic primitives in this step are $1$ inversion and $1$ multiplication or point add. All though inversion are still very slow compared to adding 2 points in Ed25519, it is still considerably faster than a inversion in 2048 bit multiplicative group as we can see in section \ref{sec:microbench}. +This phase is a large time saver, namely \ec is almost $13$ times faster. Additionally it is a time saver in the realtime portion of the protocol. This will massively impact perceived latency by the clients. The cryptographic primitives in this step are $1$ inversion and $1$ multiplication or point addition. Although inversions are still very slow compared to adding 2 points in Ed25519, it is still considerably faster than an inversion in 2048 bit multiplicative group as we can see in section \ref{sec:microbench}. \subsection{Realtime mix and realtime postcomputation} -In the last 2 phases \ec is considerably slower than \mg. This seemed surprising. On closer inspection with ``valgrind --tool=callgrind''\cite{ValgrindCallgrind} we find that almost 94\% of the time was spent in the $gcry\_mpi\_ec\_get\_affine$ function call. Which does inversions which we already know are slow from our micro benchmark. To reiterate 1 of these inversions could be avoided if we would change the wire format to not send over the affine Y coordinate. +In the last two phases \ec is considerably slower than \mg. This seems surprising. On closer inspection with ``valgrind --tool=callgrind''\cite{ValgrindCallgrind} we find that almost 94\% of the time was spent in the $gcry\_mpi\_ec\_get\_affine$ function call. Which does inversions which are known to be slow based on our micro benchmark. One of these inversions could be avoided if we would change the wire format to not send over the affine $y$ coordinate. -That being said, the phases take around $1$s per node. And the time saved during the realtime precomputation phase still makes this worth while. +That being said, the phases take around $1$ second per node. And the time saved during the realtime precomputation phase still makes using \ec worth while. diff --git a/content/implementation.tex b/content/implementation.tex index abd05aa..98435e7 100644 --- a/content/implementation.tex +++ b/content/implementation.tex @@ -1,7 +1,7 @@ \section{Implementation} \label{sec:implementation} -A large portion of research effort was consumed by the implementation of \cmix and benchmark framework. Including setting up the benchmarking infrastructure in such a way that it; +A large portion of research effort was consumed by the implementation of \cmix{} and benchmark framework. Including setting up the benchmarking infrastructure in such a way that it. \begin{itemize} \item Supports multiple but subtly different cryptographic back-ends. @@ -9,94 +9,94 @@ A large portion of research effort was consumed by the implementation of \cmix a \item Allows different back-ends to be comparably benchmarked. \end{itemize} -The following section will talk about some implementation specific things, talking about how this framework achieves these goals and where things could be improved. Information on where to find the implementation can be found in appendix \ref{app:impl} +The following section will talk about some implementation specific details, talking about how this framework achieves these goals and where things could be improved. Information on where to find the implementation can be found in Appendix \autoref{app:impl} \subsection{Overview of the software} -The software consists of 9 modules, either small libraries or separate binaries. The modules are; +The software consists of nine modules. The modules are. \begin{description} - \item[libcmix-crypto] Library containing the API definition of both the generic API and the specific API's for both multiplicative group and elliptic curve variants of the API. Supplying the real cryptographic operations to implement \cmix in. This library is responsible for filling the generic API with the corresponding specific API. The specific API can be chosen at project configuration time. + \item[libcmix-crypto] Library containing the API definition of both the generic API and the specific API's for both the multiplicative group and elliptic curve variants of the API. These specific implementations supply the real cryptographic operations that facilitate the implementation of \cmix{}. The specific API can be chosen at project configuration time. - \item[libcmix] This library contains the implmentation of \cmix using $libcmix{\text -}crypto$. This library only supplies \cmix related data structures and operations. The library operates by passing each call to a library function a so called $CMixContext$ structure, which in turn contains a generic API. + \item[libcmix] Library containing the implementation of \cmix{} using $libcmix{\text -}crypto$. This library only supplies \cmix{} related data structures and operations. The library operates by passing a so called $CMixContext$ structure to each call of the \cmix{} library functions. This context structure contains, amongst other things, a generic API from the $libcmix{\text -}crypto$ library. - \item[libcmix-protobuf] This library contains just the \cmix definitions for the $google protobuf$ protocol. This\cmix implementation uses $protobuf$ to transmit messages between nodes and clients. This makes sure all communication between actors are properly serialized and de-serialized and allows it to inter-operate with different clients and/or nodes. + \item[libcmix-protobuf] Library containing just the \cmix{} definitions for the $google protobuf$ protocol. This\cmix{} implementation uses $protobuf$ to transmit messages between nodes and clients. This makes sure all communication between actors are properly serialized. It also allows the current implementation of actors to easily interoperate with different actor implementations. - \item[libcmix-network] Small networking library that supplies TCP servers and clients using the $Boost::asio$ library. The library supports both plain TCP and SSL connections. + \item[libcmix-network] Networking library that supplies TCP servers and clients using the $Boost$$::$$asio$ library. The library supports both plain TCP and SSL connections. - \item[libcmix-common] Small library that contains code that is relevant to both the $client$ and $node$ binaries. It contains abstractions over the $libcmix{\text -}network$ library. This was done because there is a sense of directionality in \cmix node communication, the network forms a ring where the last nodes has a connection to the first node. Therefore the library supplies Senders, Receivers and SenderReceivers. This prevents you from sending messages to the wrong node as you only receive from your previous node and only send to the next node. + \item[libcmix-common] Library that contains code that is relevant to both the $client$ and $node$ binaries. It contains abstractions over the $libcmix{\text -}network$ library. This was required because there is a sense of directionality in \cmix{} node communication, the network forms a ring where the last nodes has a connection to the first node. Therefore the library supplies ``Senders'', ``Receivers'' and ``SenderReceivers''. This prevents you from sending messages to the wrong node as you only receive from your previous node and only send to the next node. - \item[liblog] Small logging library, this is just an abstraction over $Boost::Log$ and only uses the $trivial\_log$ variant. + \item[liblog] Logging library, this is just an abstraction over $Boost$$::$$Log$ and only uses the $trivial\_log$ variant. - \item[node] A binary that implements the \cmix node behavior. Nodes can be configured at startup to fulfill certain roles. For full explanation of the configuration please refer to the \emph{-{}-help} output + \item[node] A binary that implements the \cmix{} node behavior. Nodes can be configured at startup to fulfill certain roles. For full explanation of the configuration please refer to the \emph{-{}-help} output. - \item[client] A binary that implements a \cmix client. Clients can also be configured as startup. Please refer to the \emph{-{}-help} output for more information. + \item[client] A binary that implements a \cmix{} client. Clients can also be configured at startup. Please refer to the \emph{-{}-help} output for more information. - \item[statsd] A binary that accepts Performance statistics of the nodes via TCP. Nodes can be configured to either connect to a stats daemon or perform operation without it. + \item[statsd] A binary that accepts Performance statistics of the nodes via TCP. Nodes can be configured to either connect to a stats daemon or operate normally without one. \end{description} \subsubsection{libcmix-crypto} -One goal of $libcmix{\text -}crypto$ was to be as reusable and accessible as possible. This is why $libcmix{\text -}crypto$ was written in $C$. Another goal was to make it easy to add new cryptographic backends and an easy way to swap between them. This implementation chose to only build and use one implementation based on the configuration of the project. This was done to make sure that no code change was needed to switch between algorithms. One can easily configure the project, and which cryptographic backend to use with $CMake$. In $CMake$ one can choose between either using elliptic curve, or multiplicative group. And choose out of a list of implementations of either the elliptic curve or multiplicative group variants. +One goal of $libcmix{\text -}crypto$ was to be as reusable and accessible as possible. This is why $libcmix{\text -}crypto$ was written in $C$. Another goal was to make it easy to add new cryptographic backends and an easy way to swap between backends. This implementation chose to only build and use one implementation based on the configuration of the project. This was done to make sure that no code change was needed to switch between algorithms. One can easily configure the project, and which cryptographic backend to use with $CMake$. In $CMake$ one can choose between either using elliptic curve, or multiplicative group. And choose out of a list of implementations of either the elliptic curve or multiplicative group variants. -For each type of encryption (multiplicative group or ed25519) multiple backends can be created and used. By adding a folder with a descriptive chosen $name$ and adding a file to that folder named $name\_algorithm.c$ for instance $gcrypt\_ed25519.c$. After rerunning the $CMake$ configuration step you can choose what backend you want to use. After a build of the project this implementation will be used as the \cmix cryptographic backend. +For each type of encryption (multiplicative group or $Ed25519$) multiple backends can be created and used. By adding a folder with a descriptive chosen $name$ and adding a file to that folder named $name\_algorithm.c$ for instance $gcrypt\_ed25519.c$. This will make the implementation show up in the $CMake$ configuration after running $CMake$ once. When building the project, the chosen implementation will be used as the \cmix{} cryptographic backend. -This additional backend should initialize all the $extern$ declared variables defined in one of the specific APIs, for example $ed25519.h$ (\autoref{lst:ed25519.h}), with compatible function pointers. There are some convenience macros to help you set up additional backends. +This additional backend should initialize all the $extern$ declared variables defined in one of the specific APIs, for example $ed25519.h$ (\autoref{lst:ed25519.h} in Appendix \autoref{app:cryptolst}), with compatible function pointers. There are some convenience macros to help you set up additional backends. \subsubsection{libcmix} Like $libcmix{\text -}crypto$, $libcmix$ was also implemented in $C$. All $libcmix$ functions operate by taking a $CMixContext$ structure as first parameter. -The $CMixContext$ contains all the randomness and other information needed for mix network operation. Return values from $libcmix{\text -}crypto$ are passed and stored in the opaque $GroupElement$ type. The specific cryptographic backend will be able to reinterpret objects of this type back to the right representation to do the actual cryptography with. This allows us to use the same functions (and thus function signatures) for all cryptography types and implementations. +The $CMixContext$ contains all the randomness and other information needed for mix network operation. Return values from $libcmix{\text -}crypto$ are passed and stored in the ``token'' $GroupElement$ type. The specific cryptographic backend will be able to reinterpret objects of this type. This allows us to use the same functions (and thus function signatures) for all cryptography types and implementations. -The downside is that the cryptographic backend library has to implement a kind of serialization. It is responsible for transforming $GroupElement$s to some kind of string representation and back. This representation can be, and in all current cases is, binary. Which also means that the library needs a function to let the caller know how large the serialized output will be. Furthermore the message representation does not have to match the serialized group element representation, in case of ed25519 this is impossible. So we get separate serialization for messages. +The downside is that the cryptographic backend library has to implement serialization. It is responsible for transforming $GroupElement$s to a string representation and back. This representation can be, and in all current cases is, binary. Which also means that the library needs a function to let the caller know how large the serialized output will be. Furthermore the message representation does not have to match the serialized group element representation. In case of $Ed25519$ serialized group elements are the affine $x$ and $y$ coordinates, but the message format is just a random string, we will discuss this problem in \autoref{sec:mappingcurve}. This means we get separate serialization for messages and intermediate results. -It also moves responsibility of generating randomness to the cryptographic library. As we need to generate random group element vectors R and S and only the cryptographic backend library knows what those are. The library also generates the permutation function which means it needs to generate a uniform int securely, And it also implements the Diffie-Hellman key-exchange functions as it also needs a random group element to perform the key-exchange. +It also moves responsibility of generating randomness to the cryptographic library. The \cmix{} implementation needs random group element vectors $R$ and $S$ and only the cryptographic backend library knows what those are. The backend library also generates the permutation function which means it needs to able to securely generate an integer value uniformly within a range. It should also implement the Diffie-Hellman key-exchange functions as the key-exchange needs to generate a random group element. -All these extra functionality which needs to be implemented in the backend adds up and makes the API look a bit bloated, but most functions are easily implemented in a couple of lines of code. Unfortunately it is simply impossible to have a higher level library take care of these operations without losing possible abstraction of the underlying cryptographic libraries or data structure used by that library. +All this extra functionality, which needs to be implemented in the backend, adds up and makes the API look a bit bloated. However most functions are easily implemented in a couple lines of code. Unfortunately it is simply impossible to have a higher level library that takes care of these operations without losing possible abstraction of the underlying cryptographic libraries or data structure used by such library. -\subsection{ElGamal in multiplicative Group and Elliptic Curve} +\subsection{\elgamal{} in multiplicative group and elliptic curve} -The goal of this research is to see how differently these two ways of using ElGamal in this cryptographic scheme effect things like latency and throughput in \cmix. But also doing this in the most reusable way possible so that the implementation is of use for further research. +The goal of this research is to see how differently these two ways of using ElGamal in this cryptographic scheme effect things like latency and throughput in \cmix{}. But also doing this in the most reusable way possible so that the implementation can be used in future research. -This means we will be trying to find cryptographic libraries that do most of the work while we focus on getting a uniform API over different back-ends and the correct implementation of said back-ends. Unfortunately ElGamal is not a very popular encryption algorithm in modern cryptographic libraries. and even if it is supported with a high level interface it is very rigid and wouldn't allow for any secret sharing as needed in \cmix. Because of this we needed access to lower level cryptographic primitives. Which was found in the libgcrypt library. The next step is to create a \cmix library which abstracts the lower level cryptographic primitives for both ``multiplicative group'' and ``elliptic curve'' ElGamal. This makes it easy to switch between implementations and test without changing any of the application code. This library is written in ``C'' to make interfacing with it from other languages used in the application level easier. For the application level code I used ``C++'', but in theory this should be easily swapped for a different application language. +This means we will be trying to find cryptographic libraries that do most of the work while we focus on getting a uniform API over different back-ends and the correct implementation of those back-ends. Unfortunately ElGamal is not a very popular encryption algorithm in modern cryptographic libraries. and even if it is supported with a high level interface it is very rigid and wouldn't allow for any secret sharing as needed in \cmix{}. Therefore we need access to lower level cryptographic primitives, which are found in the $libgcrypt$ library. The next step is to create a \cmix{} library which abstracts the lower level cryptographic primitives for both ``multiplicative group'' and ``elliptic curve'' ElGamal. This makes it easy to switch between implementations and test without changing any of the application code. This library is written in ``C'' to make interfacing with it easier, for example by a different language used in the application level. For the benchmark C++ was used as the application level language and in theory this should be easily swapped for a different application language. -Although using libgcrypt makes it possible to implement \cmix, it doesn't mean it was an easy task. The library still lacks some convenience functions that needed workarounds in the eventual implementation. This especially is the case for the elliptic curve back-end. Some specific problems will be discussed later. +Although using $libgcrypt$ makes it possible to implement \cmix{}, it doesn't mean it was an easy task. The library still lacks some convenience functions that required workarounds. Especially for the elliptic curve back-end. Some specific problems will be discussed later. \subsection{Wire format} \label{sec:wireformat} -For \cmix to work, we need to send group elements of the cryptographic algorithms from one node to the next. For multiplicative group this is easy. They are just big integers with a max size and can be sent as arrays of a fixed length. +For \cmix{} to work, we need to send group elements of the cryptographic algorithms from one node to the next. For multiplicative group this is easy. They are just big integers with a max size and can be sent as arrays of a fixed length. -When serializing elliptic curve points, there are several options. You can send both affine coordinates. You can send only the y coordinate, in the case of ed25519. This means you need to calculate, one of the, corresponding x'es when de-serializing. Or you can send the projective coordinates. I chose in this implementation to send both affine $x$ and $y$ coordinates. This forces each node to calculate the inverse of the internally used projective $Z$ coordinate. Which is relatively slow. +When serializing elliptic curve points, there are several options. You can send both affine coordinates. You can send only the y coordinate, in the case of ed25519. This means you need to calculate, one of the, corresponding x'es when de-serializing. Or you can send the projective coordinates. This benchmarks $Ed25519$ implementation sends both affine $x$ and $y$ coordinates. Mainly due to fact that he length of projected coordinates can vary wildly and some limitations fo the $libgcrypt$ library. Unfortunately this forces each node to calculate the inverse of the internally used projective $Z$ coordinate. Which is relatively slow. -The main reason for this is that the raw affine $x$ and $y$ coordinate are easier to inspect and debug. On top of that libgcrypt is not very forthcoming on documentation of their internal structure, especially in their elliptic curve library. This means that it was not clear how to properly serialize the projective coordinates of a point. So just for now keep in mind that the overall implementation could be a bit faster when these unnecessary inversions of $Z$ would be eliminated. +One nice side effect of this decision is that the affine $x$ and $y$ coordinates are easier to inspect and debug. On top of that $libgcrypt$ is not very forthcoming on documentation of their internal structure, especially in their elliptic curve library. This means that it was not clear how to properly serialize the projective coordinates of a point. So keep in mind that the overall implementation could be a bit faster when these unnecessary inversions of $Z$ would be eliminated. -So as to not worry about any other network serialization, this framework uses Google protobuf to serialize the \cmix messages further. This also means that you could have multiple nodes which are implemented in different languages interact with each other. As long as these languages have protobuf implementations available. +So as to not worry about any other network serialization, this framework uses $googleprotobuf$ to serialize the \cmix{} messages further. This also means that you could have multiple nodes which are implemented in different languages interact with each other. As long as these languages have protobuf implementations available. -\subsection{Mapping strings to Ed25519 points} - -So to use the homomorphic property of ElGamal we need to directly encode user input of the \cmix network into, in this case, valid curve points. +\subsection{Mapping strings to $Ed25519$ points} +\label{sec:mappingcurve} +To use the homomorphic property of ElGamal we need to directly encode user input of the \cmix{} network into, in this case, valid curve points. \subsubsection{Elligator} -An algorithm that might spring to mind first is the Elligator algorithm\cite{Bernstein:2013:EEP:2541806.2516734} to map curve points to strings indistinguishable from random strings. However this is the exact reverse of the operation I need in \cmix. Elligator works by converting curve points to a representative, a randomly seeming string of bytes. However not every possible string of that size is an representative of a curve point. This makes it impossible to use the Elligator for the reverse mapping in this problem. +An algorithm that might spring to mind first is the $Elligator$ algorithm\cite{Bernstein:2013:EEP:2541806.2516734} to map curve points to strings indistinguishable from random strings. Unfortunately this is the exact inverse of the operation that \cmix{} needs. $Elligator$ works by converting curve points to a representative, which is a randomly seeming string of bytes. However not every possible string of that size is a representative of a curve point. This makes it impossible to use the Elligator for the reverse mapping in this problem. \subsubsection{Koblitz's method} -There does exist an algorithm to map arbitrary strings to curve points. It is a probabilistic algorithm called the Koblitz's method\cite{Koblitz}. It starts by choosing a number that will be used as a ``stride''. The string you want to encode as a point has to be interpreted as a group element and multiplied by the stride. The result of this multiplication must also be a group element. This means your message space is made smaller by a factor ``stride''. Now you can check if your result has a corresponding $x$ coordinate, in the case of Ed25519. If it has one you are done, if it doesn't you add $1$ to your current trial $y$ coordinate and check again. This continues up until your trial $y$ coordinate reaches the value of $first\_trial\_y\_coordinate + stride$. If you haven't found an $x$ coordinate corresponding to your trial $y$ the mapping algorithm fails. +There does exist an algorithm to map arbitrary strings to curve points. It is a probabilistic algorithm called the Koblitz's method\cite{Koblitz}. It starts by choosing a number that will be used as a ``stride''. The string you want to encode as a point has to be interpreted as a group element and multiplied by the stride. The result of this multiplication must also be a group element. This means your message space is made smaller by a factor ``stride''. In the case of $Ed25519$ you can check if your result has a corresponding $x$ coordinate. If it has one you are done, if it doesn't you add $1$ to your current trial $y$ coordinate and check again. This continues up until your trial $y$ coordinate reaches the value of $first\_trial\_y\_coordinate + stride$. If you haven't found an $x$ coordinate corresponding to your trial $y$ the mapping algorithm fails. -You can map a point back to the original string by dividing the y coordinate with the ``stride''. This works because we only add a value less than the stride to the original number and integer division discards these additions. +You can map a point back to the original string by dividing the y coordinate with the ``stride.'' This works because we only add a value less than the stride to the original number and integer division discards these additions. -The problem with this probabilistic Koblitz's method is choosing your ``stride''. There is no way of knowing what the max distance between two consecutive suitable y coordinates could be. We do know that about half of the possible group elements would be suitable in Ed25519, but it is impossible to list all the Ed25519 curve points and check. This makes it an unsolvable problem, but we can make an educated guess, and stay on the safe side of that guess. +The problem with this probabilistic Koblitz's method is choosing your ``stride''. There is no way of knowing what the max distance between two consecutive suitable $y$ coordinates could be. We do know that about half of the possible group elements would be suitable in $Ed25519$, but it is impossible to list all the $Ed25519$ curve points and check. This makes it an unsolvable problem. This is why this paper makes an educated guess on how big the ``stride'' should be and backs up this decision with empirical data. -Now to address the concern that you divide your message space by the stride. This reduction will theoretically effect your throughput, however it only does if you would optimally pack your messages in the 252 bit message space you have available in Ed25519. However if you only use the lower 248 bits, which gives you 31 byte message space. You have 5 bits to use as a stride. Which, anecdotally, seems to be enough. Of course there is no way of knowing if this stride will work for all possible messages. But for the purpose of this benchmark it works well enough. It might be possible to find out how stride influences the chance of not finding a suitable y coordinate. But that is outside of the scope of this research. +Now to address the concern that you divide your message space by the stride. This reduction will theoretically effect your throughput, however it only does if you would optimally pack your messages in the 252 bit message space you have available in $Ed25519$. However if you only use the lower 248 bits, which gives you 31 byte message space. You have 5 bits to use as a stride. Which, anecdotally, seems to be enough. Of course there is no way of knowing if this stride will work for all possible messages. But for the purpose of this benchmark it works good enough. It might be possible to find out how the stride influences the chance of not finding a suitable $y$ coordinate. But that is outside of the scope of this research. -A small but nice benefit of using $2^5 = 32$ as stride; multiplication and division are bit shifts as $32$ is a power of $2$. For debugging purposes you might want to consider a stride of $16$. This makes any hexadecimal representation of a number just shift up one character. However in actual runs of the algorithm with a stride of $16$ was insufficient. Some of the runs would fail because there was no suitable $y$ coordinate within $16$ value range. This has not happened for a stride of $32$ yet. To reiterate this is no guarantee that it will never happen. +A small but nice benefit of using $2^5 = 32$ as stride is that multiplication and division are bit shifts as $32$ is a power of $2$. For debugging purposes you might want to consider a stride of $16$, because this makes any hexadecimal representation of a number just shift up one character. In actual runs of the algorithm using a stride of $16$ was insufficient. Some of the runs would fail because there was no suitable $y$ coordinate within the $16$ value range. This has not happened for a stride of $32$ yet. This is no guarantee that it will never happen. -\subsection{Debugging the \cmix operations.} +\subsection{Debugging the \cmix{} operations.} -Debugging the \cmix operations is hard. Intermediate results produced by the nodes are hard to validate for the real 2048 bit numbers used in the multiplicative group and the 256 bit numbers used in the ElGamal cases. Fortunately it is possible to use smaller multiplicative groups and smaller curves to debug structural issues in the algorithms. For the multiplicative groups this works like a charm, as the operations are pretty easy to do by hand or calculator, The operations for elliptic curve are a lot less intuitive, even for small curves. Especially with no known good library that implements these operations. This makes debugging some of the mix algorithm primitives tedious and time consuming. +Debugging the \cmix{} operations is hard. Intermediate results produced by the nodes are hard to validate for the real 2048 bit numbers used in the multiplicative group and the 256 bit numbers used in the ElGamal cases. Fortunately it is possible to use smaller multiplicative groups and smaller curves to debug structural issues in the algorithms. For the multiplicative groups this works like a charm, as the operations are pretty easy to do by hand or with a calculator. The operations for elliptic curve are a lot less intuitive, even for small curves. Especially with no known good library that implements these operations. This makes debugging some of the mix algorithm primitives tedious and time consuming. -Some tools that have been a great help in creating the implementation in general are AddressSanitizer\cite{ASan} and the companion leak check tool LeakSanitizer\cite{LSan}. These tools check all the executed code paths for any object or array access outside of the allowed bounds. This is no guarantee the application is completely memory safe. It does ensure that any out of bounds access is detected, even if the access was still technically in valid memory just not part of the object or array. +Some tools that have been a great help in creating the implementation in general are AddressSanitizer\cite{ASan} and the companion leak check tool LeakSanitizer\cite{LSan}. These tools check all the executed code paths for any object or array access outside of the allowed bounds. This is no guarantee the application is completely memory safe. It does ensure that any out of bounds access is detected, even if the access was still technically in valid memory. -The leak tool also helped to keep an eye on the tokens passed around by the \cmix library. Because the implementation language of the library was $C$ and there are 2 different implementations for the 2 different ElGamal back-ends. There is lots of token passing. These tokens need to be destructed but have no clear overarching owner at times. This makes it harder to keep track of object lifetime. So LeakSanitizer helped out tracking these lost tokens. Which allowed me to eventually eliminate memory leaks. Which in turn allowed me to run the benchmark for a high number of clients on a system with relatively low specs. +The leak tool also helped to keep an eye on the tokens passed around by the \cmix{} library. Because the implementation language of the library is $C$ and there are two different implementations for the two different ElGamal back-ends, many tokens are being passed around. These tokens need to be destructed but have no clear overarching owner. This makes it harder to keep track of object lifetime. So LeakSanitizer helped out tracking these lost tokens. Which allowed elimination of the memory leaks. Which in turn allowed the execution of the benchmark for a high number of clients on a system with relatively low specs. diff --git a/content/introduction.tex b/content/introduction.tex index c2ebc15..cd6dbdb 100644 --- a/content/introduction.tex +++ b/content/introduction.tex @@ -1,11 +1,11 @@ \section{Introduction} -Showing that one piece of software is faster than another is somewhat of an art. Trying to keep as many of the variables the same while varying the one you are interested in is not easy. Especially when the implementations are not of the same algorithm. This is the case for the \cmix mix network, where you can choose between doing \elgamal in either multiplicative groups or in elliptic curves. This makes benchmarking the fundamental performance differences between these two difficult and interesting. This paper and companion framework implementation focuses on facilitating a fair comparison between the two, by providing a common interface that can implement the \cmix primitives. Allowing 2 different cryptographic back-ends to be implemented. +Showing that one piece of software is faster than another is an art. Trying to keep as many of the variables the same while varying the one you are interested in is not easy. Especially when the implementations are not based on the same algorithm. This is the case for the \cmix{} mix network, where you can choose between doing \elgamal{} in either multiplicative groups or in elliptic curves. This makes benchmarking the fundamental performance differences between two implementations of these two algorithms difficult and interesting. This paper and companion framework implementation focuses on facilitating a comparison between the two. The framework provides a common interface that can implement the \cmix{} primitives, allowing two different cryptographic back-ends to be implemented and benchmarked. -The goal of this research is to verify whether there is an benefit in using \elgamal over elliptic curves. This paper implements an elliptic curve and multiplicative group. And compares them back-end with the same underlying cryptographic primitive library that supported both. +The goal of this research is to verify whether there is an benefit in using \elgamal{} over elliptic curves. This paper implements an elliptic curve and multiplicative group version of \elgamal{} using the same underlying cryptographic primitive library that supports both. Finally it will compare these two implementation to verify whether elliptic curve \elgamal{} is faster for this specific application. -We will try to show how \cmix works and how it needs to be implemented to be safe against some of the known attacks of which the mitigations will influence the delay of the individual \cmix phases. Explain some of the implementation details and problems that needed to be solved and what kind of impact it had on the run time of the algorithms, and elaborate how we timed the applications and gathered data keeping in mind the limitations of the platform used. +We will show how \cmix{} works and how it needs to be implemented to be safe against some of the known attacks. This implementation will focus on the mitigations that actually introduce extra messages between the nodes and therefore introduce more latency for the different \cmix{} phases. We will then discuss some of the implementation details and problems that needed to be solved and what kind of impact these implementation details had on the running time of the implementations. This also means discussing how the results are gathered whilst keeping in mind the limitations of the used platform. -Finally we will show that both of the \cmix implementations supplied by this research paper scale linearly in the amount of clients that participate in a run, and that elliptic curve implementations for \cmix are an interesting alternative to multiplicative group with espei. +Finally we will show that both of the \cmix{} implementations supplied by this research paper scale linearly in the amount of clients that participate in a run. -The paper is structured as follows. First we will talk about other anonymity networks and their current day use in \cref{sec:anon}. In \cref{sec:cmix} we will talk about the \cmix network, how it works and why it uses \elgamal. \Cref{sec:related} discusses related work. Followed by \cref{sec:implementation}; which talks about the benchmark implementation and some implementation details. \Cref{sec:cmixaddtions} will talk about some flaws in the original \cmix protocol and discusses how to fix them. Then we talk about the results in \cref{sec:results}, followed by the discussion of the results in \cref{sec:discussion}. Final thought and further research ideas are in the conclusion \cref{sec:conclusion}.
\ No newline at end of file +The paper is structured as follows. First we will talk about other anonymity networks and their current day use in \cref{sec:anon}. In \cref{sec:cmix} we will talk about the \cmix{} network, how it works and why it uses \elgamal{}. \Cref{sec:related} discusses related work. Followed by \cref{sec:implementation}; which talks about the benchmark implementation and some implementation details. \Cref{sec:cmixaddtions} will talk about some flaws in the original \cmix{} protocol and discusses how to fix them. Then we talk about the results in \cref{sec:results}, followed by the discussion of the results in \cref{sec:discussion}. Final thought and further research ideas are in the conclusion \cref{sec:conclusion}.
\ No newline at end of file diff --git a/content/related_work.tex b/content/related_work.tex index 7dd41f9..7c5ade0 100644 --- a/content/related_work.tex +++ b/content/related_work.tex @@ -1,11 +1,11 @@ \section{Related work} \label{sec:related} -The \cmix network uses raw \elgamal encryption and we try to benchmark the performance of using either multiplicative group or elliptic curve variants. Within the field of benchmarking mix networks and \elgamal implementations some work has already been done. But most of the work is about comparing elliptic curve cryptography with RSA. Which unfortunately is not that interesting for us. +The \cmix{} network uses raw \elgamal{} encryption and we try to benchmark the performance of using either multiplicative group or elliptic curve variants. Within the field of benchmarking mix networks and \elgamal{} implementations some work has already been done. But most of the work is about comparing elliptic curve cryptography with RSA. Which unfortunately is not that interesting for our benchmark. -A paper by Osman Ugus, Alban Hessler and Dirk Westhoff \cite{ugus2007performance} shows that additive homomorphic EC-\elgamal Encryption has its merits within sensor networks. +A paper by Osman Ugus, Alban Hessler and Dirk Westhoff \cite{ugus2007performance} shows that additive homomorphic EC-\elgamal{} Encryption has its merits within sensor networks. They implemented the encryption scheme on a small 8 bit CPU platform, the \emph{MicaZ mote}, and benchmarked 500 runs of the encryption scheme to get their results. They show that they can run their benchmark in just over a second which makes it, according to them, the fastest implementation on that platform at that time. -In a paper by Rosy Sunuwar and Suraj Ketan Samal\cite{sunuwar2015elgamal} they compare classical symmetric encryption schemes against elliptic curve \elgamal. They show that EC-\elgamal is relatively slow, but propose a fix by introducing $E^3C^2K$ algorithm. The algorithm in itself might not be interesting for us. The results of their comparison with standard symmetric encryption schemes, for both encryption and decryption is. It shows the relative slowness of decryption compared to encryption of standard EC-\elgamal +In a paper by Rosy Sunuwar and Suraj Ketan Samal\cite{sunuwar2015elgamal} they compare classical symmetric encryption schemes against elliptic curve \elgamal{}. They show that EC-\elgamal{} is relatively slow, but propose a fix by introducing $E^3C^2K$ algorithm. The algorithm in itself might not be interesting for this research paper. The results of their comparison with standard symmetric encryption schemes, for both encryption and decryption is. It shows the relative slowness of decryption compared to encryption of standard EC-\elgamal{}. -and finally the paper by Ye Zhu et al.\cite{zhu2004flow} not only describes the flow analysis attacks that effects all mix networks, but also benchmarks them to show that mitigating the found attacks can be done without losing too much performance. They use the total time of a mix for specific mix network as a metric. This is useful for them because they are only interested in the bigger picture of the network performance. For this research we want to be able to zoom in on specific phases if need be to inspect what is causing the slow down and if that is something implementation specific or a fundamental problem of the cryptographic primitives being used.
\ No newline at end of file +Finally the paper by Ye Zhu et al.\cite{zhu2004flow} not only describes the flow analysis attacks that effects all mix networks, but also benchmarks them to show that mitigating the found attacks can be done without losing too much performance. They use the total time of a mix for specific mix network as a metric. This is useful for them because they are only interested in the bigger picture of the network performance. For this research we want to zoom in on specific phases and/or parts if need be. Enabling us to inspect what is causing the slow down and investigate if that slow down is caused by something implementation specific or some fundamental problem of the cryptographic primitives being used.
\ No newline at end of file diff --git a/content/results.tex b/content/results.tex index 9f81d2d..40f50de 100644 --- a/content/results.tex +++ b/content/results.tex @@ -4,29 +4,29 @@ \newcommand{\mg}[0]{\emph{mg}\xspace} -In this section \ec and \mg will refer to the 2 different implementations that we compare. They stand for elliptic curve and multiplicative group. +In this section \ec and \mg will refer to the two different implementations that we compare. They stand for elliptic curve and multiplicative group respectively. -So the raw results, which can be found in the cmix repository \ref{app:code}, were obtained by running 3 nodes and 500 clients on the same computer. The clients and nodes operated in the way you would normally see a \cmix setup. All connections, either from node to node or client to node, are TCP connections encrypted using TLS. Each one of these 500 clients prepare a message of 248 bytes for \ec or a message of 256 bytes for \mg and send it to the first node of the network. This is achieved by either using a 2048 bit group for \mg or by using 31 bytes of the ed25519 group element and doing 8 mixes per run to get up to 248 bytes. The timings in the table correspond with the average of 100 runs and the standard deviation of that average for each step in the protocol. For example $prepre$ stands for $precomputation precomputation$ phase and $realpost$ stands for $realtime postcomputation$ phase. +The raw results, which can be found in the \cmix{} repository see Appendix \autoref{app:code}, were obtained by running 3 nodes and 500 clients on the same computer. The clients and nodes operated in the way you would normally see in a \cmix{} setup. All connections, either from node to node or client to node, are TCP connections encrypted using TLS. Each one of these 500 clients prepares a message of 248 bytes for \ec or a message of 256 bytes for \mg and send it to the first node. This is achieved by either using a 2048 bit group for \mg or by using 31 bytes of the ed25519 group element and doing 8 mixes per run to get up to 248 bytes. The timings in the Tables \ref{tab:ec500} and \ref{tab:mg500} correspond with the average of 100 runs and the standard deviation of the average for each step in the protocol. For example, $prepre$ stands for $precomputation$ $precomputation$ phase and $realpost$ stands for $realtime$ $postcomputation$ phase. -Note that these separate runs of the ed25519 protocol can trivially be parallelized, possibly making it even more interesting by comparison. But as we are interested in a straight up comparison this implementation does not parallelize multiple mixes. +Note that these separate runs of the $Ed25519$ protocol can trivially be parallelized, possibly making benchmark results even more interesting by comparison. But as we are interested in a straight up comparison this implementation does not parallelize multiple mixes. -This implementation uses a prefix to the message data that is the destination id. This takes up $20$ bytes of each message, as it is the SHA1 hash of the public key of the receiver. So the payload would become $236$ and $228$ bytes for \mg and \ec respectively. +This implementation uses a prefix to the message data that is the destination id. This takes up $20$ bytes of each message, as it is the SHA1 hash of the public key of the receiver. So the payload would become $236$ and $228$ bytes for \mg and \ec, respectively. Network latency is negligible because all the participants are running on the same computer but the goal is not to measure network latency. Rather we want to know if there is a benefit in using elliptic curve as apposed to multiplicative group ElGamal. -The reason behind running 3 nodes is simple. There are subtle distinctions between what nodes do, depending on the position in the network. The first node needs to aggregate messages and initiate the mix when enough messages have been received. The last node needs to do additional calculations to prevent the tagging attack mentioned in section \ref{sec:tagging}. Additionally the last node needs to decrypt the final message en send it to its destination. So the minimal test case should contain 3 nodes. 1 first, 1 middle and 1 last node. I don't expect to see much difference between these nodes with the exception of the ``RealPost'' step as the last node needs to decrypt the ciphertext, and prepare plaintext buffers to send out to the clients. +The reason behind running three nodes is simple. There are subtle distinctions between what nodes do, depending on the position in the network. The first node needs to aggregate messages and initiate the mix when enough messages have been received. The last node needs to do additional calculations to prevent the tagging attack mentioned in section \ref{sec:tagging}. Additionally the last node needs to decrypt the final message and send it to its destination. So the minimal test case should contain 3 nodes. one first, one middle and one last node. No large difference in time between these nodes is expected, with the exception of the ``RealPost'' step as the last node needs to decrypt the ciphertext and prepare plaintext buffers to send out to the clients. -In this benchmark we are running 500 clients per mix because of 2 reasons. In the original \cmix paper \cite{cMix} The largest test was with 500 clients. So I wanted to mimic that. The second reason is that it still feasible to do 500 clients using a single pc with 12GB or RAM. We could still increase the number of clients by about 100 but running 500 of them gives us large enough timings that we don't need to worry about the timer resolution of the used CPU timers. And not running the extra 100 clients just gives us some headroom when other applications needs some extra ram. +In this benchmark we are running 500 clients per mix because of two reasons. In the original \cmix{} paper \cite{cMix} The largest test was ran with 500 clients. So this benchmark mimics that. The second reason is that it is still feasible to run 500 clients using a single pc with 12GB or RAM. We could still increase the number of clients by about 100 but running 500 of them gives us large enough timings that we don't need to worry about the timer resolution of the used CPU timers. In fact not running the extra 100 clients gives us some headroom with regards to other applications still running in that background. -For the timings I used the \emph{boost::timer::cpu\_timer}\cite{BoostCpuTimer} which has a timer resolution of $10000000ns$ for both user and system clocks on a Linux environment. This is why all the results are accurate to one-hundredth of a second. The timings used are the so called ``User'' timings. This eliminates the time spend context switching, which gives us slightly more accurate results. The system and wall times are also recorded, but just filtered out in the results table as they are not relevant. +For the timings this benchmark used the \emph{boost::timer::cpu\_timer}\cite{BoostCpuTimer} which has a timer resolution of $10,000,000ns$ for both user and system clocks on a Linux environment. This is why all the results are accurate to one-hundredth of a second. The timings used are the so called ``User'' timings. This eliminates the time spend context switching, which gives us slightly more accurate results. The system and wall times are also recorded, but just filtered out in the results table as they are not relevant. -So for gathering results I created a program called statsd, it is included in the repository. The program receives timer snapshots over TCP. So each node sends a snapshot just before they start working on a phase of the \cmix algorithm. After we are done with computational work but before sending the data to the next node another snapshot of the clock state is send to the statsd. So the results are purely about the computation of that \cmix phase. With some additional conversions to the wire format of the timer snapshots, but not the overhead of sending the message over the socket. This is done just after the \cmix operation complete courtesy of the implicit ``strand'' of the boost::asio asynchronous socket operations. +Gathering results is done by the $statsd$ application mentioned in \autoref{sec:implementation}. The program receives timer snapshots over TCP. So each node sends a snapshot just before they start working on a phase of the \cmix{} algorithm. After a node is done with its computational work but before sending the data to the next node, another time snapshot is send to the $statsd$. This means the results are purely about the computation of that \cmix{} phase. Some overhead of serializing the timer snapshot is recorded by the this measuring method, however this is a very small contribution to the total time spent in comparison to measured effect and even compared to the timers resolution. -Gathering the results over TCP with a separate daemon enables people to run this same benchmark over separate servers. Enabling some nice test vectors as you can control network congestion and packet loss. +Gathering the results over TCP with a separate daemon enables people to run this same benchmark over separate servers. Enabling some test vectors as you can control network congestion and packet loss. \subsection{Summary of the results} -The following results were gathered with the pc specs as listed in appendix \ref{app-specs}. The optimization specific flags that were used are listed in appendix \ref{app-ccopts}. +The following results were gathered with the pc specs as listed in Appendix \autoref{app-specs}. The optimization specific flags that were used are listed in Appendix \autoref{app-ccopts}. \begin{table}[!ht] \centering @@ -38,6 +38,7 @@ The following results were gathered with the pc specs as listed in appendix \ref \end{footnotesize} \caption{Node time average over 100 runs with standard deviation in seconds using ed25519 and running 500 clients.} + \label{tab:ec500} \end{table} \begin{table}[!ht] @@ -49,9 +50,13 @@ The following results were gathered with the pc specs as listed in appendix \ref \end{tabular} \end{footnotesize} \caption{Node time average over 100 runs with standard deviation in seconds using 2048 bit multiplicative group and running 500 clients.} - + \label{tab:mg500} \end{table} +To show the algorithms scale linearly in the number of clients used in a run, additional data has been gathered (see Tables \ref{tab:ec100} to \ref{tab:mg400} in Appendix \autoref{app:addtables}). The Graphs \ref{graph:prepre} to \ref{graph:realpost} show the average timings and standard deviation of all the results summarized in tables. From these graphs we can derive that most of the operation are linear, except for two. These differences can easily be explained by external factors like entropy running low or micro effects like inefficient cache usage. + +We can also clearly see that all nodes do the same amount of work except for the $realtime$ $postcomputation$ phase. This particular part of the protocol needs to perform some additional calculations which are not performed by the other nodes and it is clearly visible in the graph. + \clearpage \tikzset{every picture/.style={line width=2pt}} @@ -77,7 +82,7 @@ The following results were gathered with the pc specs as listed in appendix \ref \end{axis} \end{tikzpicture} \caption{A graph of the average times for the precomputation precomputation step.} - \label{fig:prepre} + \label{graph:prepre} \end{figure} \begin{figure}[!ht] @@ -204,7 +209,7 @@ The following results were gathered with the pc specs as listed in appendix \ref \end{tikzpicture} \caption{A graph of the average times for the realtime postcomputation step.} - \label{fig:realpost} + \label{graph:realpost} \end{figure} \pagebreak |
