changed results and conclusion stuff.

author: Dennis Brentjes <dennis@brentj.es> 2018-05-21 17:10:57 +0200
committer: Dennis Brentjes <dennis@brentj.es> 2018-05-21 17:10:57 +0200
commit: fffc35ad3c397359b29f30949547e2c767198600 (patch)
tree: bd43016f8cd92f46fd0a2afe0171ac964dc85f8d
parent: 8728e253d49a28752f4be1ef37467e8374560785 (diff)
download: thesis-fffc35ad3c397359b29f30949547e2c767198600.tar.gz
thesis-fffc35ad3c397359b29f30949547e2c767198600.tar.bz2
thesis-fffc35ad3c397359b29f30949547e2c767198600.zip
4 files changed, 89 insertions, 24 deletions
diff --git a/content/cmix.tex b/content/cmix.tex
index ed5fb71..3af780d 100644
--- a/content/cmix.tex
+++ b/content/cmix.tex
@@ -6,20 +6,35 @@
 \label{sec:anon}
 
 \subsection{The onion router}
-We can't talk about anonymity networks without talking about The Onion Router\cite{goldschlag1999onion} or TOR for short. It's a free software project that provides access to an anonymity network. It has some similarities with some classic mix networks but also has some weaknesses which mix networks try to resolve.
+We can't talk about anonymity networks without talking about The Onion Router\cite{goldschlag1999onion} or TOR for short. It's a free software project that provides access to an anonymity network based on Onion Routing.
 
 Tor works by users selecting a path trough the tor network consisting of 3 nodes. When a user wants to send a message it has to encrypt its message with the key of the last node in the network. This yields a result, which needs to be encrypted with the key of the middle node. This results needs to be encrypted with the key of the first node and then can be send out the first node. The first node can peel off the outer layer of encryption and send it to the second node which in turn can peel of the new outer layer. The last node removes the last layer of encryption and reveals the plaintext. This plaintext can be sent to the original destination. This simplified view of TOR reveals that the sender of a certain message remains anonymous so long as at least one of your 3 nodes is not compromised. And you use End to End encryption because the last node will see your plaintext.
 
-However there is something called a correlation attack\cite{Johnson:2013:UGR:2508859.2516651}. This is an attack where the attacker can see incoming traffic of your first node and the outgoing traffic of the last node. It can then try to correlate the entry times, exit times and size of the packets. An attacker can use this data to correlate a packet entering and exiting the network and therefore link a user and a destination with a certain probability. Therefore de-anonymizing the traffic. This attack is highly probabilistic, but could flag users for further targeted investigation. Therefore we really want to prevent this attack from being possible.
+Unfortunately some information is leaked, and this is exploited by a correlation  attack\cite{Johnson:2013:UGR:2508859.2516651}. This is an attack where the attacker sees incoming traffic of your first node and the outgoing traffic of the last node. It can then try to correlate the entry times, exit times, origin, destination and size of the packets. An attacker can use this data to correlate a packet entering and exiting the network and therefore link a user and a destination with a certain probability. Therefore de-anonymizing the traffic. This attack is highly probabilistic, but could flag users for further targeted investigation. Therefore we really want to prevent this attack from being possible.
+
+This attack would be mitigated by the another type of anonymity network called a Mix net.
 
 
 \subsection{Mix networks}
 
-The first mix network was developed by David Chaum \cite{chaum1981untraceable}. this mix network consists of $N$ nodes. Each of these nodes have a public/private key pair. Users that want to use the mix network have to package their message as follows, it prepends the identifier of the destination to the message and encrypts it with the public key of \NODE{N-1}. It then prepends the identifier of \NODE{N-1} and encrypts it with the public key of \NODE{N-2}. The client does this for all the nodes in the network working backwards and sends it to the first node.
+The first mix network was proposed and developed by David Chaum \cite{chaum1981untraceable}. this mix network consists of $N$ nodes. Each of these nodes have a public/private key pair. Users that want to use the mix network have to package their message as follows, it prepends the identifier of the destination to the message and encrypts it with the public key of \NODE{N-1}. It then prepends the identifier of \NODE{N-1} and encrypts it with the public key of \NODE{N-2}. The client does this for all the nodes in the network working backwards and sends it to the first node.
+
+This first node can now unpack the message it receives and retrieve an identifier for the next node and a encrypted message which only \NODE{N+1} can decrypt. The last node can decrypt the original message which contains its destination and sends it the end user. Up until this point this is roughly how the TOR anonymity network operates, but there is a difference. The first node in the \cmix network does not immediately send out the messages it receives. The node first collects up to $P$ messages. When this threshold is achieved it will decrypt all the messages and randomly shuffle the order they were in, otherwise known as mixing. It then sends them to the next node. Another subtle difference is that each message within one mix operation should have the same length, However it is possible to choose a large enough message size and pad all the messages to this length.
+
+This mixing and delaying in the first node causes an arbitrary amount of delay on client connection messages. Furthermore, an outsider analyzing the input and output of the nodes cannot see which packet went where in the mixing operation. So it cannot keep track of a specific message. This is what grants the additional anonymity within mix networks, as it mitigates the correlation attack possible on TOR.
+
+The introduction of Re-encryption mix nets by Park et all \cite{park1993efficient}
+introduces the usage of ElGamal encryption in mix nets. Using it's homomorphic properties to no longer de-encrypt each incoming message but rather just re-encrypt the message which makes it faster to run. It also has the effect that the ciphertext no longer lengthens in the order of the amount of nodes in the mix network. Which does happen in the classic Chaum mix network and in TOR.
+
+The network operates by each node having a public and private ElGamal key. It publishes its public key and a user can use all of the public keys of all of the nodes en encrypt his message. Each node can re-encrypt the message with it's private key. After the last nodes successfully re-encrypts the value the plain text is revealed. In the original paper this is used as a voting scheme and therefore no mention is made of a Receiver as it just needs to aggregate in one place. But is of course extensible to support sending messages to specific recipients.
+
+
+
+A major downside of these classic mix network is the amount of public key operations the client and nodes need to do when sending single message. This may not be an issue on modern day desktop computers and or low volume traffic, but it is an issue for mobile phones' battery life and low-power devices. This is were the precomputation and use of ElGamals homomorphic properties come in to play.
+
+
 
-This first node can now unpack the message it receives and retrieve an identifier for the next node and a encrypted message which only \NODE{N+1} can decrypt. The last node can decrypt the original message which contains its destination and sends it the end user. Up until this point this is roughly how the TOR anonymity network operates, but there is a difference. The first node in the \cmix network does not immediately send out the messages it receives. The node first collects up to $P$ messages. When this threshold is achieved it will decrypt all the messages and randomly shuffle the order they were in, otherwise known as mixing. It then sends them to the next node. Another subtle difference is that each message should have the same length, However it is possible to choose a large enough message size and pad all the messages to this length. Even when unpacking the messages the server nodes should keep padding the messages to the decided size as the network is not fixed and clients can choose their own path trough the network.
 
-This causes an arbitrary amount of delay on client connection messages. Furthermore, an outsider analyzing the input and output of the nodes cannot see which packet went where in the mixing operation. So it cannot keep track of a specific message. This is what grants the additional anonymity within mix networks, as it mitigates the correlation attack possible on TOR. A major downside of this classic mix network is the amount of public key operations the client and nodes need to do when sending single message. This may not be an issue on modern day desktop computers and or low volume traffic, but it is an issue for mobile phones and low-power devices.
 
 \section{\cmix}
 
diff --git a/content/discussion.tex b/content/discussion.tex
index ff3a688..23196a9 100644
--- a/content/discussion.tex
+++ b/content/discussion.tex
@@ -6,37 +6,37 @@
 
 In this section \ec and \mg will refer to the 2 different implementations that we compare. They stand for elliptic curve and multiplicative group.
 
-So lets first talk about what our expectations are. The size of the messages being sent are different. The ed25519 implementation can send messages up to $31$ bytes. The 2048 bit multiplicative group can send up to $255$ bytes of data. This implementation uses a prefix to the message data that is the destination id. This takes up $20$ bytes of each message, as it is the SHA1 hash of the public key of the receiver. this means that the ed25519 implementation only sends $11$ bytes of payload versus $235$ bytes of payload for the multiplicative group implementation.
+So lets first talk about what our expectations will be.
+The size of the 2 types of messages, for \mg and \ec, are slightly different.
+We use a 2048 bit multiplicative group so we have 256 byte messages.
+Ed25519 has a little below 255 bit group elements giving us less then 32 byte messages.
+For convenience and because we're using Koblitz's method we only use 31 byte for the message part. But to counter this we do 8 mixes at a time to go up to 248 byte messages. Still 8 bytes short, but this makes the timings more comparable.
 
-However there are ways around this. By doing 8 mixes of the ed25519 implementation in one single \cmix run with the same permutation functions but with different $R$ and $S$, we can significantly boost the effective payload. The effective payload of the ed25519 algorithm would become $228$ bytes versus $235$ bytes of the multiplicative group. This is why I will consider the payload difference between the multiplicative group and the ed25519 implementation to be a factor of $8$.
+Note that these separate runs of the ed25519 protocol can trivially be parallelized, possibly making it even more interesting by comparison. But as we are interested in a straight up comparison this implementation does not use a parallelized implementation of multiple mixes.
 
-Note that these separate runs of the ed25519 protocol can trivially be parallelized, possibly making it even more interesting by comparison.
+This implementation uses a prefix to the message data that is the destination id. This takes up $20$ bytes of each message, as it is the SHA1 hash of the public key of the receiver.So the payload would become $236$ and $228$ bytes for \mg and \ec respectively.
 
-So by the above dissertation, we hope to see that \ec is 8 times faster than \mg in all respects. Just by the virtue of only having to handle $\frac{1}{8}$ of the data. We hope to even be a bit faster than that.
+So by the above dissertation, we hope to see that \ec is faster than \mg in all respects. This would mean it would be beneficial to research parallelism and more optimal implementations in the future.
 
 \subsection{Precomputation precomputation}
 
-First of all this step does not seems to differ between nodes, all \ec nodes spend around $3.67s$ and the \mg nodes spend about $19.19s$. Which is $5.23$ times faster, not the 8 times we would expect.
-
-This can easily be explained by the way one generates random curve points on an elliptic curve. You have to generate a random number and check, in the case of ed25519, if that number is a valid $y$ coordinate. For Ed25519 about half of the possible $y$ values are valid, so the chance that a randomly generated number is valid will be $.5$. Because of this you need to, on average, generate and check twice as many coordinates as the actual number of random curve points you want, see equation \ref{eq:prepre}.
+Right away we can see a large difference between \ec and \mg, with \ec being more than $2$ times slower. Most likely this can be explained by 2 factors. The first being that this step includes generating all the randomness needed for the \cmix run. It generates $R$ and $S$ and the permutation $\pi$. For Ed25519 it tries a $Y$-value and if it doesn't map onto the curve we try a different one. On average only half of the points would be valid.
 
 \begin{equation}
 \label{eq:prepre}
 \sum_{x=0}^{\infty} n \cdot \left( \frac{1}{2}\right)^x =\ 2n
 \end{equation} 
 
-Therefore we only expect to be 4 times faster in that regard we are still slightly faster than the \mg counterpart, but \ec is slower on a fundamental level because of the extra calculations needed to generate random curve points.
+Another slowing factor might be the memory fragmentation. The way that libgcrypt operates is that each object is it's own opaque pointer to an internal data structure. So there is no way to consolidate memory into a single block. This wouldn't be a factor if both algorithms would use the same amount of allocations, but now that we do 8 simultaneous mixes for \ec we have 8 times the amount of the allocations for the \ec algorithm. This impacts the performance quite heavily 
 
 \subsection{Precomputation mixing}
 \label{sec:premix}
 
-The values for the premix step are very close to our ideal 8 times ratio, $6.79$ times faster. There is however no fundamental explanation why it should be slower than expected. There is however one implementation detail that might have had it's influence here.
-
-Like discussed in section \ref{sec:wireformat}, when serializing elliptic curve points, there are several choices you can make about the format. This implementation chose to send both affine coordinates, which adds unnecessary group element inversions. This means There is time to gain if we switch from affine to projected coordinate system within the network format. it would mean sending more data. However the conversions from network to internal format would be faster and it would avoid the inversion of $Z$. This would be a large time saver for many of the steps, even the $prepre$ step we already discussed. My expectation based on some informal testing and the results of the realtime postprocessing phase  is that it would make this step at least 8 times faster than the \mg implementation making it comparable in performance. 
+Another big time loss is attributed to the wire format chosen. because we only send over the Y coordinate, we need to convert the coordinate from projected to affine coordinate systems. This takes up extra time which can be eliminated if we choose. Unfortunately if we want to do this using libgcrypt we need to send over all 3 projected coordinates as there is no way to use only the $Y$ and $Z$ coordinates to reconstruct the curve point in libgcrypt.
 
 \subsection{Precomputation postprocessing}
 
-So here in the precomputation post-computation phase the elliptic curve algorithm pays off. with $1.17s$ vs $20.16s$ a $17.26$ times speed increase. This is probably because the inversions in \mg are much more computational intensive as inversions in \ec. This does not mean that they are free in \ec as discussed in section \ref{sec:premix}
+In this stage we only do a `simple' cryptographic operation on each of the points to determine the node's decryption share. There are no extra allocations and no unneeded inversions, and we can see \ec is *2.5* times faster.
 	
 \subsection{Realtime precomputation}
 
diff --git a/content/implementation.tex b/content/implementation.tex
index 6de7e4b..d6dbef4 100644
--- a/content/implementation.tex
+++ b/content/implementation.tex
@@ -31,7 +31,15 @@ So as to not worry about any other network serialization, this framework uses Go
 
 \subsection{Mapping strings to Ed25519 points}
 
-There exists no deterministic and reversible mapping from integer values to elliptic curve points. However There is a probabilistic algorithm called the Koblitz's method\cite{Koblitz}. It starts by choosing a number that will be used as a  ``stride''. The string you want to encode as a point has to be interpreted as a group element and multiplied by the stride. The result of this multiplication must also be a group element. This means your message space is made smaller by a factor ``stride''. Now you can check if your result has a corresponding $x$ coordinate, in the case of Ed25519. If it has one you are done, if it doesn't you add $1$ to your current trial $y$ coordinate and check again. This continues up until your trial $y$ coordinate reaches the value of  $first\_trial\_y\_coordinate + stride$. If you haven't found a suitable $y$ coordinate by now this mapping algorithm fails.
+So to use the homomorphic property of ElGamal we need to directly encode user input of the \cmix network into, in this case, valid curve points.
+
+\subsubsection{Elligator}
+
+An algorithm that might spring to mind first is the Elligator algorithm\cite{Bernstein:2013:EEP:2541806.2516734} to map curve points to strings indistinguishable from random strings. However this is the exact reverse of the operation I need in \cmix. Elligator works by converting curve points to a representative, a random string. However not every possible string of that size is an representative of a curve point. This makes it impossible to use the Elligator reverse mapping for this problem.
+
+\subsubsection{Koblitz's method}
+
+There exists no deterministic and reversible mapping from integer values to elliptic curve points. However there is a probabilistic algorithm called the Koblitz's method\cite{Koblitz}. It starts by choosing a number that will be used as a  ``stride''. The string you want to encode as a point has to be interpreted as a group element and multiplied by the stride. The result of this multiplication must also be a group element. This means your message space is made smaller by a factor ``stride''. Now you can check if your result has a corresponding $x$ coordinate, in the case of Ed25519. If it has one you are done, if it doesn't you add $1$ to your current trial $y$ coordinate and check again. This continues up until your trial $y$ coordinate reaches the value of  $first\_trial\_y\_coordinate + stride$. If you haven't found a suitable $y$ coordinate by now this mapping algorithm fails.
 
 You can map a point back to the original string by dividing the y coordinate with the ``stride''. This works because of integer division and because we only add a value less than the stride to the original number.
 
diff --git a/thesis.bib b/thesis.bib
index c49350f..7cb6f2b 100644
--- a/thesis.bib
+++ b/thesis.bib
@@ -36,12 +36,27 @@
 	fyear={2010}
 }
 
-@inproceedings{cMix,
-	title={cMix: Anonymization by high-performance scalable mixing},
-	author={Chaum, David and Javani, Farid and Kate, Aniket and Krasnova, Anna and de Ruiter, Joeri and Sherman, Alan T and Das, Debajyoti},
-	booktitle={Proceedings of ACM CCS},
-	volume={2016},
-	year={2016}
+@Inbook{cMix,
+	author="Chaum, David
+	and Das, Debajyoti
+	and Javani, Farid
+	and Kate, Aniket
+	and Krasnova, Anna
+	and De Ruiter, Joeri
+	and Sherman, Alan T.",
+	editor="Gollmann, Dieter
+	and Miyaji, Atsuko
+	and Kikuchi, Hiroaki",
+	title="cMix: Mixing with Minimal Real-Time Asymmetric Cryptographic Operations",
+	bookTitle="Applied Cryptography and Network Security: 15th International Conference, ACNS 2017, Kanazawa, Japan, July 10-12, 2017, Proceedings",
+	year="2017",
+	publisher="Springer International Publishing",
+	address="Cham",
+	pages="557--578",
+	abstract="We introduce cMix, a new approach to anonymous communications. Through a precomputation, the core cMix protocol eliminates all expensive real-time public-key operations---at the senders, recipients and mixnodes---thereby decreasing real-time cryptographic latency and lowering computational costs for clients. The core real-time phase performs only a few fast modular multiplications.",
+	isbn="978-3-319-61204-1",
+	doi="10.1007/978-3-319-61204-1_28",
+	url="https://doi.org/10.1007/978-3-319-61204-1_28"
 }
 
 @article{chaum1981untraceable,
@@ -82,4 +97,31 @@
 	publisher = {ACM},
 	address = {New York, NY, USA},
 	keywords = {anonymity, metrics, onion routing},
+}
+
+@inproceedings{park1993efficient,
+	title={Efficient anonymous channel and all/nothing election scheme},
+	author={Park, Choonsik and Itoh, Kazutomo and Kurosawa, Kaoru},
+	booktitle={Workshop on the Theory and Application of of Cryptographic Techniques},
+	pages={248--259},
+	year={1993},
+	organization={Springer}
+}
+
+@inproceedings{Bernstein:2013:EEP:2541806.2516734,
+	author = {Bernstein, Daniel J. and Hamburg, Mike and Krasnova, Anna and Lange, Tanja},
+	title = {Elligator: elliptic-curve points indistinguishable from uniform random strings},
+	booktitle = {Proceedings of the 2013 ACM SIGSAC conference on Computer \&\#38; communications security},
+	series = {CCS '13},
+	year = {2013},
+	isbn = {978-1-4503-2477-9},
+	location = {Berlin, Germany},
+	pages = {967--980},
+	numpages = {14},
+	url = {http://doi.acm.org/10.1145/2508859.2516734},
+	doi = {10.1145/2508859.2516734},
+	acmid = {2516734},
+	publisher = {ACM},
+	address = {New York, NY, USA},
+	keywords = {censorship circumvention, elliptic curves, injective maps},
 } 
 \ No newline at end of file
author	Dennis Brentjes <dennis@brentj.es>	2018-05-21 17:10:57 +0200
committer	Dennis Brentjes <dennis@brentj.es>	2018-05-21 17:10:57 +0200
commit	fffc35ad3c397359b29f30949547e2c767198600 (patch)
tree	bd43016f8cd92f46fd0a2afe0171ac964dc85f8d
parent	8728e253d49a28752f4be1ef37467e8374560785 (diff)
download	thesis-fffc35ad3c397359b29f30949547e2c767198600.tar.gz thesis-fffc35ad3c397359b29f30949547e2c767198600.tar.bz2 thesis-fffc35ad3c397359b29f30949547e2c767198600.zip