Processes review comments.

author: Dennis Brentjes <dennis@brentj.es> 2018-08-18 14:14:55 +0200
committer: Dennis Brentjes <dennis@brentj.es> 2018-09-02 21:56:20 +0200
commit: 1e316c9a7437580f499453cdafbb0c7433a46b88 (patch)
tree: 918079a02069294d7043412280e95a003de464f0 /content/cmix.tex
parent: 23968a760efa6e03e8d47fbff108ec5aae010fe3 (diff)
download: thesis-1e316c9a7437580f499453cdafbb0c7433a46b88.tar.gz
thesis-1e316c9a7437580f499453cdafbb0c7433a46b88.tar.bz2
thesis-1e316c9a7437580f499453cdafbb0c7433a46b88.zip
1 files changed, 65 insertions, 30 deletions
diff --git a/content/cmix.tex b/content/cmix.tex
index eac6840..5b6504b 100644
--- a/content/cmix.tex
+++ b/content/cmix.tex
@@ -5,44 +5,71 @@
 \section{Anonymity networks}
 \label{sec:anon}
 
+Networks providing anonymity are needed and are being used in the present day. Unfortunately they are partly needed to escape surveillance and prosecution from oppressive governments\cite{jardine2018tor}. Fortunately there are also other applications like voting schemes\cite{park1993efficient}. In any it helps to understand what different types of anonymity networks exist and how they work in relation to \cmix. And how the techniques used in \cmix were pioneer elsewhere and what problems \cmix addresses with regards to other anonymity networks.  
+
 \subsection{The onion router}
 We can't talk about anonymity networks without talking about The Onion Router\cite{goldschlag1999onion} or TOR for short. It's a free software project that provides access to an anonymity network based on Onion Routing.
 
-Tor works by users selecting a path trough the tor network consisting of 3 nodes. When a user wants to send a message it has to encrypt its message with the key of the last node in the network. This yields a result, which needs to be encrypted with the key of the middle node. This results needs to be encrypted with the key of the first node and then can be send out the first node. The first node can peel off the outer layer of encryption and send it to the second node which in turn can peel of the new outer layer. The last node removes the last layer of encryption and reveals the plaintext. This plaintext can be sent to the original destination. This simplified view of TOR reveals that the sender of a certain message remains anonymous so long as at least one of your 3 nodes is not compromised. And you use End to End encryption because the last node will see your plaintext.
+\begin{figure}[!ht]
+	\centering
+	\begin{subfigure}{.5\textwidth}
+		\centering
+		\includegraphics[width=.9\linewidth]{images/tor.png}
+		\caption{A tor route, image taken from\\ \protect\url{https://www.torproject.org/about/overview.html.en}}
+		\label{fig:tor}
+	\end{subfigure}%
+	\begin{subfigure}{.5\textwidth}
+		\centering
+		\includegraphics[width=.9\linewidth]{images/layers.png}
+		\caption{Tor layers, image taken from\\ \protect\url{https://www.cryptologie.net/article/203/how-does-tor-works/}}
+		\label{fig:layers}
+	\end{subfigure}
+	\caption{2 figures visualizing how tor works}
+	\label{fig:howtorworks}
+\end{figure}
+
+Tor works by allowing users to select a random path consisting of 3 nodes trough the tor network as you can see in \autoref{fig:tor}. When a user wants to send a message it has to encrypt it with the key of the exit node of the chosen path. This encryption results then needs to be encrypted with the key of the middle node, a relay node. Lastly the encryption result needs to be encrypted with the key of the first node, which also is a relay node. The the user can send this encrypted ``onion'' to the first node. The first node can peel off the outer layer of encryption and send it to the second node which in turn can peel of the new outer layer. The last node removes the last layer of encryption, revealing the plaintext message of the user. This plaintext can be sent to the user chosen destination as you can see in \autoref{fig:layers}. Which from the outside decouples the actual sender and receiver of the message. 
 
-Unfortunately some information is leaked, and this is exploited by a correlation  attack\cite{Johnson:2013:UGR:2508859.2516651}. This is an attack where the attacker sees incoming traffic of your first node and the outgoing traffic of the last node. It can then try to correlate the entry times, exit times, origin, destination and size of the packets. An attacker can use this data to correlate a packet entering and exiting the network and therefore link a user and a destination with a certain probability. Therefore de-anonymizing the traffic. This attack is highly probabilistic, but could flag users for further targeted investigation. Therefore we really want to prevent this attack from being possible.
+This simplified view of TOR reveals that the sender of a certain message remains anonymous so long as at least one of your 3 nodes is not compromised. And you use End to End encryption because the last node will see your plaintext.
 
-This attack would be mitigated by the another type of anonymity network called a Mix net.
+Unfortunately there is a fairly simple timing attack that can de anonymize the traffic trough the network. This is exploited by a correlation  attack\cite{Johnson:2013:UGR:2508859.2516651}. This is an attack where the attacker sees incoming traffic of your first node and the outgoing traffic of the last node. It can then try to correlate the entry times, exit times, origin, destination and size of the packets. An attacker can use this data to correlate a packet entering and exiting the network and therefore link a user and a destination with a certain probability. Therefore de-anonymizing the traffic. This attack is highly probabilistic, but could flag users for further targeted investigation. Therefore we really want to prevent this attack from being possible.
 
+In the case of TOR, because the timing of a single packet trough the network doesn't differ that much, the attack is relatively simple. This simple form of correlation attack would be mitigated by a mix network, a different type of anonymity network.
 
 \subsection{Mix networks}
 
-The first mix network was proposed and developed by David Chaum \cite{chaum1981untraceable}. this mix network consists of $N$ nodes. Each of these nodes have a public/private key pair. Users that want to use the mix network have to package their message as follows, it prepends the identifier of the destination to the message and encrypts it with the public key of \NODE{N-1}. It then prepends the identifier of \NODE{N-1} and encrypts it with the public key of \NODE{N-2}. The client does this for all the nodes in the network working backwards and sends it to the first node.
+The first mix network was proposed and developed by David Chaum \cite{chaum1981untraceable}. This mix network makes use of $N$ nodes. Each of these nodes has a public/private key pair. All users and nodes also have a identifier. Users that want to use the mix network have to package their message by pre-pending the identifier of the destination to the message and encrypts it with the public key of \NODE{N}. It then prepends the identifier of \NODE{N} and encrypts it with the public key of \NODE{N-1}. The client does this for all the nodes in the network working backwards. It can then send the message to the first node.
 
-This first node can now unpack the message it receives and retrieve an identifier for the next node and a encrypted message which only \NODE{N+1} can decrypt. The last node can decrypt the original message which contains its destination and sends it the end user. Up until this point this is roughly how the TOR anonymity network operates, but there is a difference. The first node in the \cmix network does not immediately send out the messages it receives. The node first collects up to $P$ messages. When this threshold is achieved it will decrypt all the messages and randomly shuffle the order they were in, otherwise known as mixing. It then sends them to the next node. Another subtle difference is that each message within one mix operation should have the same length, However it is possible to choose a large enough message size and pad all the messages to this length.
+This first node can now unpack the message it receives and retrieve an identifier for the next node and a encrypted message which only the next node can decrypt. The last node can decrypt the original destination and message which it can send it to the end user. Up until this point this is roughly how the TOR anonymity network operates, but there is a difference. The first node in the \cmix network does not immediately send out the messages it receives. The node first collects up to $P$ messages. When this threshold is achieved it will decrypt all the messages and randomly shuffle the order they were in, otherwise known as mixing. It then sends them to the next node. Another subtle difference is that each message within one mix operation should have the same length, However it is possible to choose a large enough message size and pad all the messages to this fixed length.
 
-This mixing and delaying in the first node causes an arbitrary amount of delay on client connection messages. Furthermore, an outsider analyzing the input and output of the nodes cannot see which packet went where in the mixing operation. So it cannot keep track of a specific message. This is what grants the additional anonymity within mix networks, as it mitigates the correlation attack possible on TOR.
+This mixing and delaying in the first node causes an arbitrary amount of delay on when a message arrives at the first node and exits the first node. Furthermore, an outsider analyzing the input and output of the nodes cannot see which packet went where in the mixing operation. So it cannot keep track of a specific message. This is what grants the additional anonymity within mix networks, as it mitigates the simple timing attacks possible on TOR. However more complex attacks are still possible. By keeping track of multiple runs of the mix network and probing in between mix nodes. The mix network, without additional mitigations, is still vulnerable to flow analysis attacks\cite{zhu2004flow}.
 
-The introduction of Re-encryption mix nets by Park et all \cite{park1993efficient}
-introduces the usage of ElGamal encryption in mix nets. Using it's homomorphic properties to no longer de-encrypt each incoming message but rather just re-encrypt the message which makes it faster to run. It also has the effect that the ciphertext no longer lengthens in the order of the amount of nodes in the mix network. Which does happen in the classic Chaum mix network and in TOR.
+\subsection{Re-encryption mix networks}
 
-The network operates by each node having a public and private ElGamal key. It publishes its public key and a user can use all of the public keys of all of the nodes en encrypt his message. Each node can re-encrypt the message with it's private key. After the last nodes successfully re-encrypts the value the plain text is revealed. In the original paper this is used as a voting scheme and therefore no mention is made of a Receiver as it just needs to aggregate in one place. But is of course extensible to support sending messages to specific recipients.
+The introduction of Re-encryption mix nets by Park et all \cite{park1993efficient}
+introduces the usage of \elgamal encryption in mix nets. Using it's homomorphic properties to no longer de-encrypt each incoming message but rather just re-encrypt the message which is faster. It also has the effect that the ciphertext no longer lengthens in the order of the amount of nodes in the mix network. Which does happen in the classic Chaum mix network and in TOR.
 
+The network operates by each node having a public and private \elgamal key. Each node publishes its public key and a user can use all of the public keys of the nodes to encrypt a message. Each node can re-encrypt the message with it's private key. After the last nodes successfully re-encrypts the value the plain text is revealed. In the original paper this is used as a voting scheme and therefore no mention is made of a Receiver as it just needs to aggregate in one place. But is of course extensible to support sending messages to specific recipients.
 
+A major downside of these classic mix network is the amount of public key operations the client and nodes need to do when sending single message. This may not be an issue on modern day desktop computers and or low traffic volume applications, but it is an issue for mobile phones' with respect to battery life and low-power devices such as sensors networks. 
 
-A major downside of these classic mix network is the amount of public key operations the client and nodes need to do when sending single message. This may not be an issue on modern day desktop computers and or low volume traffic, but it is an issue for mobile phones' battery life and low-power devices. This is were the precomputation and use of ElGamals homomorphic properties come in to play.
+This is something that \cmix is attempting to resolve by introducing a precomputation phase and the use of the homomorphic properties of \elgamal.
 
+\section{\cmix}
+\label{sec:cmix}
+\cmix is a new anonymity mix network\cite{cMix}. Just like any other mix network it aims to provide anonymity by hiding timing information of messages. This means hiding the difference in time between a message leaving the client and arriving at its destination.
 
+A \cmix network is a fixed network consisting of $N$ nodes. This means there is a fixed network order and all clients know which computer represents each node in the network. It uses \elgamal encryption. And it relies heavily on the homomorphic properties of \elgamal.
 
+The \cmix network operates in 3 phases. Initialization, precomputation and realtime. During the initialization phase shared keys are set up between nodes and clients. This is the only time clients need to do public key operations as they have to establish a shared key with every node using Diffie-Hellman key exchange. This is why all communications between the nodes and from client to node have to be authenticated. One way to accomplish this is by using SSL connections for all communications within the network. Remember that the focus of this network is not encrypted traffic, recall that the last nodes sees all the plaintexts, but rather to hide timing information from an attacker.
 
+The fact that \cmix minimizes the amount of public-key cryptographic operations for the client, makes it appealing for low power devices. This allows devices that cannot draw high amounts of power at all times due to battery constrains or NFC power limits, or mobile phones of which you want to conserve battery power, to use the \cmix network.
 
-\section{\cmix}
-\label{sec:cmix}
-\cmix is a new anonymity mix network\cite{cMix}. Just like any other mix network it aims to provide anonymity by hiding timing information of messages. This means hiding the difference in time between a message leaving the client and arriving at its destination.
+Another advantage is that in theory the latency of the actual messages during the realtime phase should be lower than other mix networks. Again due to the lack of public-key cryptography during this phase.
 
-A \cmix network is a fixed network consisting of $N$ nodes. This means there is a fixed network order and all clients know which computer represents each node in the network. It uses ElGamal encryption. And it relies heavily on the homomorphic properties of ElGamal.
+The fact that \cmix minimizes the amount of public-key cryptographic operations for the client, makes it more appealing for low power devices. So devices that cannot draw high amounts of power all the time due to battery constrains or NFC power limits. Mobile phones of which you want to conserve battery power.
 
-The \cmix network operates in 3 phases. Initialization, precomputation and realtime. During the initialization phase only some key setup is done. This is the only time clients need to do public key operations as they have to establish a shared key with every node using Diffie-Hellman key exchange. This is why all communications between the nodes and from client to node have to be authenticated. One way to accomplish this is by using SSL connections for all communications within the network. Remember that the focus of this network is not encrypted traffic, recall that the last nodes sees all the plaintexts, but rather to hide timing information from an attacker.
+Another advantage is that in theory the latency of the actual messages during the realtime phase should be lower than other mix networks. Again due to the lack of public-key cryptography during this phase.
 
 \subsection{Initialization phase}
 
@@ -66,8 +93,8 @@ where:
 \item $K_c$ The vector of Keys stored by the client
 \end{itemize}
 
-During any part of the protocol a client may send a message into the network. When using multiplicative group ElGamal, It does this by multiplying a plaintext message with the shared keys in $K_c$. Then it sends this result to the first node. When using elliptic curve however the group elements, such as messages and shared keys, and need to be added. Note that $\cdot$ means combining 2 values, meaning multiplication for multiplicative groups and addition for elliptic curves. For now as the original paper referenced multiplicative group the rest of this description of \cmix will refer to this operation as multiplication.
-
+During any part of the protocol a client may send a message into the network. This is done by either multiplying the message with the multiplicative group shared keys $K_c$, or by adding the elliptic curve points shared keys $K_c$ to the message. Because the original paper only referenced multiplicative group \elgamal the rest of this description will refer to this operation as multiplication. But you can refer to the ``$\cdot$'' operation as a ``combine'' operation which either means multiplication or addition within their respective domains. This is also the name of the operation in the \cmix library created by this benchmark framework, together with its companion method ``uncombine'' which multiplies or adds the inverse in their respective domain.
+ 
 \begin{equation}
 Message = M \cdot k_0 \cdot k_1 \cdot ... \cdot k_{N-1} \label{form:message}
 \end{equation}
@@ -80,21 +107,28 @@ where:
 
 \subsection{Precomputation phase}
 
+\begin{figure}[!ht]
+	\centering
+	\includegraphics[width=.9\linewidth]{images/basic_precomputation.pdf}
+	\caption{\cmix precomputation phase, image provided by Joeri de Ruiter from the \cmix paper \cite{cMix}}
+	\label{fig:cmix_pre}
+\end{figure}
+
 The precomputation phase can be started as soon as $P$, the minimum number of messages to have accumulated in \NODE{0}, is known. The first node generates $P$ random values $R$ and $S$. $R$ is a vector of $P$ values as is $S$. When encrypting or multiplying these vectors the component-wise encryption or multiplication is implied. It also generates a random permutation function $\pi$ which can randomly shuffle $P$ messages. 
 
 The precomputation can be split up into 3 phases.
 
-In the first phase the first node encrypts his $R$ with the key $E$ and sends it to the $node_{next}$. $Node_{next}$ also generates $P$ random values $R$ and $S$ and encrypts his $R$ with the key $E$. $Node_{next}$ multiplies it's encryption result with the values received by the first node, and by the homomorphic encryption property of ElGamal the result of this multiplication is the encryption of the multiplication of the two original $R$ vectors. It then sends the result of the multiplication to the next node which also encrypts his $R$ and multiplies it with his input and sends it on.
+In the first phase the first node encrypts his $R$ with the key $E$ and sends it to the $node_{next}$. $Node_{next}$ also generates $P$ random values $R$ and $S$ and encrypts his $R$ with the key $E$. $Node_{next}$ multiplies it's encryption result with the values received by the first node, and by the homomorphic encryption property of \elgamal the result of this multiplication is the encryption of the multiplication of the two original $R$ vectors. It then sends the result of the multiplication to the next node which also encrypts his $R$ and multiplies it with his input and sends it on.
 
-$$ \mathcal{E}_E(R_0 \cdot R_1) = \mathcal{E}_S(R_0) \cdot \mathcal{E}_S(R_1)  $$
+$$ \mathcal{E}_E(R_0 \cdot R_1) = \mathcal{E}_E(R_0) \cdot \mathcal{E}_E(R_1)  $$
 \vspace{-1em}
 where:
 \begin{itemize}[label=]
-\item $\mathcal{E}_E$ is ElGamal encryption under key $E$
+\item $\mathcal{E}_E$ is \elgamal encryption under key $E$
 \item $R_i$ is the R vector of node $i$
 \end{itemize}
 
-Whenever the result reaches the first node again phase 2 of the precomputation starts. this is a mix phase. It uses its permutation function $\pi$ on the incoming vector of values and multiplies that result with the encryption of $S$. It sends its result to the next node which does the same. The last node gets the following result.
+Because of the circular nature of the \cmix network, where the last node is connected to the first node, whenever the result reaches the first node again phase 2 of the precomputation can start. The next phase is called the mix phase. Each node uses its permutation function $\pi$ on the incoming vector of values and multiplies that result with the encryption of $S$. It sends its result to the next node which will do the same. The last node produces the following result.
 
 \begin{align}
 &\mathcal{E}_E( \nonumber \\
@@ -108,19 +142,26 @@ Whenever the result reaches the first node again phase 2 of the precomputation s
 \vspace{-1em}
 where:
 \begin{itemize}[label=]
-\item $\mathcal{E}_E$ is ElGamal encryption under key $E$
+\item $\mathcal{E}_E$ is \elgamal encryption under key $E$
 \item $R_i$ is the R vector of node $i$
 \item $S_i$ is the S vector of node $i$
 \item $\pi_i$ is the permutation function of node $i$
 \end{itemize}
 
-The third part of the precomputation, is about decrypting this final value. Each node can perform part of the decryption with his private key part of $E$. in combination with the encryption specific random value which is used in ElGamal it is called the decryption share. When it multiplies the above value with the decryption share you remove your part of the encryption. So when passing your result to the next node each node can multiply it's decryption share with their input. After the last node performs this action the last nodes has the decrypted value of equation \eqref{form:EPiRS}. It stores this for use in the realtime phase.
+The third part of the precomputation, is about decrypting this final value. Each node can perform part of the decryption with his private key part of $E$. Together with the encryption specific random value which is part of \elgamal this value is called the decryption share. With this decryption share you can remove your random permutation and the random value vectors $R$ and $S$ from the eventual realtime messages that will be generated in the realtime phase.
 
 \subsection{Realtime phase}
 \label{sec:realpre}
 
 Whenever the first node received $P$ messages as described in formula \eqref{form:message}, and the precomputation has ended, the realtime phase can begin. The realtime phase can also be split into 3 different stages.
 
+\begin{figure}[!ht]
+	\centering
+	\includegraphics[width=.9\linewidth]{images/basic_realtime.pdf}
+	\caption{\cmix realtime phase, image provided by Joeri de Ruiter from the \cmix paper \cite{cMix}}
+	\label{fig:cmix_real}
+\end{figure}
+
 In the first stage each node multiplies the message with the inverse of the key it agreed upon in the initialization phase. It then multiplies that result with $r_i$, where $i$ is the position of the message in the buffer, replacing $K_c$ with the corresponding value in $R$. The result is not yet permuted so the messages stay in the same place. The result gets passed to the next node which does the same steps. So for all the messages in the input we do.
 
 \[
@@ -153,12 +194,6 @@ where:
 
 The last phase is to remove the R and S vectors from the input. The last node stored decryption of formula \ref{form:EPiRS}. The last node calculates the inverse of this group element and multiplies it with the result of \ref{form:PiMRS}. This will cancel out all permuted R and S values, and the last node will be left with $M$.
 
-Very critical to see is that during the realtime phase no public-key cryptography is being used, only some multiplications. And the clients, outside of the initialization phase, never have to do any public-key cryptography. It is also possible for client and servers to update the shared keys $K_c$ using the old key as a seed. So in theory a client only needs to do the key agreement once per network.
-
-\subsection{Advantages of \cmix}
-
-The fact that \cmix minimizes the amount of public-key cryptographic operations for the client, makes it more appealing for low power devices. So devices that cannot draw high amounts of power all the time due to battery constrains or NFC power limits. Mobile phones of which you want to conserve battery power.
-
-Another advantage is that in theory the latency of the actual messages during the realtime phase should be lower than other mix networks. Again due to the lack of public-key cryptography during this phase.
+Very critical to see is that during the realtime phase no public-key cryptography is being used, only multiplications. And the clients, outside of the initialization phase, never have to do any public-key cryptography. It is also possible for client and servers to update the shared keys $K_c$ using the old key as a seed. So in theory a client only needs to do the key agreement once per network.
author	Dennis Brentjes <dennis@brentj.es>	2018-08-18 14:14:55 +0200
committer	Dennis Brentjes <dennis@brentj.es>	2018-09-02 21:56:20 +0200
commit	1e316c9a7437580f499453cdafbb0c7433a46b88 (patch)
tree	918079a02069294d7043412280e95a003de464f0 /content/cmix.tex
parent	23968a760efa6e03e8d47fbff108ec5aae010fe3 (diff)
download	thesis-1e316c9a7437580f499453cdafbb0c7433a46b88.tar.gz thesis-1e316c9a7437580f499453cdafbb0c7433a46b88.tar.bz2 thesis-1e316c9a7437580f499453cdafbb0c7433a46b88.zip