diff options
| -rw-r--r-- | content/bibliography.tex | 2 | ||||
| -rw-r--r-- | content/cmix_additions.tex | 5 | ||||
| -rw-r--r-- | content/implementation.tex | 25 | ||||
| -rw-r--r-- | content/results.tex | 14 | ||||
| -rw-r--r-- | thesis.tex | 9 |
5 files changed, 29 insertions, 26 deletions
diff --git a/content/bibliography.tex b/content/bibliography.tex index 1521ac2..134e2c5 100644 --- a/content/bibliography.tex +++ b/content/bibliography.tex @@ -3,9 +3,7 @@ %---------------------------------------------------------------------------------------- \begin{thebibliography}{99} % Bibliography - this is intentionally simple in this template - \printbibliography - \end{thebibliography} %---------------------------------------------------------------------------------------- diff --git a/content/cmix_additions.tex b/content/cmix_additions.tex index 19cea01..feb411e 100644 --- a/content/cmix_additions.tex +++ b/content/cmix_additions.tex @@ -3,9 +3,10 @@ So the base protocol still has some issues, thankfully these issues can be addressed at the cost of some speed and clarity. Because it would not be safe to use CMix in the wild without these attack mitigations we implemented the extra messages needed. This makes for a more realistic benchmark. \subsection{Tagging attack} +\label{sec:tagging} In a tagging attack an adversary changes a message slightly and later can detect this tag and remove it, otherwise it wouldn't be undetectable. This leaks information of the the origin of the message and from which slot it came. The easiest variant of this would be if a malicious person had control over the last node. -When you control the last node you can change the output of Realtime procomputation phase slightly by when combining your nodes $r$ value for this slot with the input. you either combine the input with $r * i$, for cyclic group elgamal, or $r + p$, for elliptic curve implementations. After all computations are done you have the plaintexts that you want to send out. If you can verify that one of the outputs is not valid, it probably is the value you modified with either $i$ or $p$. You now now the slot this value used to be in and you can reverse your tag by doing the reverse operation. This is undetectable in the network and thus compromise the network. Note that the last node is not special in CMix, if all but one Node is malicious the protocol should still be safe. +When you control the last node you can change the output of realtime procomputation phase slightly by when combining your nodes $r$ value for this slot with the input. you either combine the input with $r * i$, for cyclic group elgamal, or $r + p$, for elliptic curve implementations. After all computations are done you have the plaintexts that you want to send out. If you can verify that one of the outputs is not valid, it probably is the value you modified with either $i$ or $p$. You now now the slot this value used to be in and you can reverse your tag by doing the reverse operation. This is undetectable in the network and thus compromise the network. Note that the last node is not special in CMix, if all but one Node is malicious the protocol should still be safe. -To stop this attack we need to change the protocol. First we need to change the third step of the precomputation phase. Instead of sending the decryption shares of each of the nodes to the next, we send a hash, a commitment to our decryption shares to the next node. The nodes keep the decryption shares to themselves, and will use them seperately in the realtime phase. The last node also includes a hash of the current mix result. So the has of $\pi(R*S)$ for all slots. This makes that an adversary can no longer tamper with the $r$ values in the realtime phase, which caused the tagging attack to be possible in the first place. +To stop this attack we need to change the protocol. First we need to change the third step of the precomputation phase. Instead of sending the decryption shares of each of the nodes to the next, we send a hash, a commitment to our decryption shares to the next node. The nodes keep the decryption shares to themselves, and will use them seperately in the realtime phase. The last node also includes a hash of the current mix result. So the hash of $\pi(R)*S$ for all slots. This makes that an adversary can no longer tamper with the $r$ values in the realtime phase, which caused the tagging attack to be possible in the first place. diff --git a/content/implementation.tex b/content/implementation.tex index 1da07de..76b7275 100644 --- a/content/implementation.tex +++ b/content/implementation.tex @@ -2,32 +2,33 @@ The goal of this research is to see how differently these two ways of using Elgamal in this crypto scheme effect things like latency and troughput in CMix. But also doing this in as reusable way possible so that the implementation is of use for further research. -Unfortunately elgamal is not a very popular encryption algorithm in modern crypto libraries. and even if it is supported with a high level interface it is very rigid and wouldn't allow for any secret sharing as needed in CMix. So because of this I created my own CMix library in ``C'' which abstracts the lower lever crypto primitives for both ``cyclic group'' and ``elliptic curve'' elgamal. This makes it easy to switch between implementations and test without changing any of the application code. This library is written in ``C'' to make interfacing with it from other languages used in the application level easier. For the application level code I used ``C++'' - -I ended up implementing the cryptographic operations with libgcrypt. This library provides the primitives I needed for both cyclic group and elliptic curve variants of the algorithm. This did not mean that the library took care of everything and some creativity was required to implement CMix especially for the elliptic curve algorithm. +This means I will be trying to find cryptographic libraries that do most of the work while I focus on getting a uniform API over different backends and the correct implementation of said backends. Unfortunately elgamal is not a very popular encryption algorithm in modern crypto libraries. and even if it is supported with a high level interface it is very rigid and wouldn't allow for any secret sharing as needed in CMix. Because of this I needed access to lower level cryptographic primitives. Which I found in the libgcrypt library. The next step is to create my own CMix library in ``C'' which abstracts the lower lever crypto primitives for both ``cyclic group'' and ``elliptic curve'' elgamal. This makes it easy to switch between implementations and test without changing any of the application code. This library is written in ``C'' to make interfacing with it from other languages used in the application level easier. For the application level code I used ``C++'', but in theory this should be easily swapped for a different application language. +However using libgcrypt makes it possible to implement CMix, it doesn't mean it was easily done. The library still lacks some convenience functions that needed workarounds in the eventual implementation. This is especially the case for the elliptic curve backend. Some specific problems will be discussed later. \subsection{Wire format} -So for cmix to work we need to send group elements of the cryptographic algorithms from one node to the next. For cyclic group this is easy. They are just big integers with a max size and can be sent as arrays of a fixed length. For elliptic curve however there is no standard for sending elliptic curve points. So the choice was made to send both coordinates. +For cmix to work, we need to send group elements of the cryptographic algorithms from one node to the next. For cyclic group this is easy. They are just big integers with a max size and can be sent as arrays of a fixed length. + +When serializing elliptic curve points, there are several choices. You can send both affine coordinates. You can send only the y coordinate, in the case of ed25519. This means you need to calculate, one of the, corresponding x'es when de-serializing. Or you can send the projective coordinates. I chose in this implementation to send both affine $x$ and $y$ coordinates. This forces each node to calculate the inverse of the internally used projective $Z$ coordinate. Which is relatively slow. -This choice was made because libgcrypt makes it hard to find, for instance in Ed25519, the x coordinate corresponding to the y coordinate you just received. The only way I found to do so was convert the y coordinate to a hexadecimal string. reverse the string to have the big-endian representation. Prepend the byte ``0x40'' to the string and import it with ``gcry\_mpi\_ec\_decode\_point'' I tried to avoid this scenario but the documentation on the subject of how ed25519 points are handled when missing their x coordinate, and how to properly initialize the point with only Y and Z coordinates made me take the safe route. So note that there is probably room for improvement in the Elliptic curve implementation. +The main reason for this is that the raw affine $x$ and $y$ coordinate are easier to inspect and debug. On top of that libgcrypt is not very forthcoming on documentation of their internal structure, especially in their elliptic curve library. This means I was not sure hoe to properly serialize the projective coordinates of a point. So just for now keep in mind that the overall implementation could be a bit faster when these unnecessary inversions of $Z$ would be eliminated. So as to not worry network level things I used Google protobuf to serialize these messages further. This also means that you could have mutliple nodes which are implemented in different languages interact with each other. \subsection{Mapping strings to Ed25519 points} -There is however the Koblitz's method\cite{Koblitz} This is a probabilistic method of finding a curve point. it starts by choosing a number that will be used as a ``stride''. The string you want to encode as a point has to be interpreted as a group element and multiplied by the stride, the result also has to be a group element so your message space is made smaller by a factor ``stride''. Now you can check if your result has a corresponding x coordinate. If it has one you are done, If it doesn't you add one and check again. Until you try to add stride to the original number if you haven't found a suitable y coordinate beforehand this mapping algorithm fails. +There is however the Koblitz's method\cite{Koblitz} This is a probabilistic method of finding a curve point. it starts by choosing a number that will be used as a ``stride''. The string you want to encode as a point has to be interpreted as a group element and multiplied by the stride. The result also has to be a group element so your message space is made smaller by a factor ``stride''. Now you can check if your result has a corresponding x coordinate. If it has one you are done, if it doesn't you add $1$ to your current trial y coordinate and check again. This continues up until your trial y coordinate reaches the value of $first\_trial\_y\_coordinate + stride$. If you haven't found a suitable y coordinate by now this mapping algorithm fails. -You can map a point back to the original string by dividing the y coordinate with the ``stride'' +You can map a point back to the original string by dividing the y coordinate with the ``stride''. This works because integer division and only adding less than the stride to the original random number. -The problem with this probabilistic Koblitz's method is choosing your ``stride''. There is no way of knowing what the max distance between two consecutive suitable y coordinates could be, half of the possible group elements would be suitable, but it is impossible to list all the Ed25519 curve points and check. This makes it an unsolvable problem, but we can make educated guesses and stay on the safe side of those guesses. +The problem with this probabilistic Koblitz's method is choosing your ``stride''. There is no way of knowing what the max distance between two consecutive suitable y coordinates could be, half of the possible group elements would be suitable, but it is impossible to list all the Ed25519 curve points and check. This makes it an unsolvable problem, but we can make educated guesses and stay on the safe side that guess. -Now to address the concern that the you divide your message space by your stride. By doing this you also effect your throughput. Atleast it would if you would optimally pack your messages in the 252 bits you have when using Ed25519. If you only use the lower 248 bits however, which gives you 31 byte messages. you have 5 bits to use as a stride. Which, anecdotally, seems to be enough. Ofcourse you can never know if this stride will work for all possible messages. But for the purpose of this benchmark it seems to work well enough. Maybe it is possible to find out how stride influences the chance of not finding a suitable y coordinate. But that is outside of the scope of this research. +Now to address the concern that you divide your message space by your stride. This theoretically also effects your throughput. This only affects you if you would optimally pack your messages in the 252 bit message space you have available in Ed25519. However if you only use the lower 248 bits, which gives you 31 byte message space which saves time and effort packing your messages optimally. you have 5 bits to use as a stride. Which, anecdotally, seems to be enough. Ofcourse you can never know if this stride will work for all possible messages. But for the purpose of this benchmark it seems to work well enough. Maybe it is possible to find out how stride influences the chance of not finding a suitable y coordinate. But that is outside of the scope of this research. -A couple of small but nice benefits of using $32$ as stride. Multiplication and division are bit shifts as $32$ is a power of $2$. For debugging purposes you might want to consider a stride of $16$. This makes any hexadecimal representation of a number just shift up one character. However in actual runs of the algorithm some runs would fail because there was no suitable $y$ coordinate within $16$ values of the message value. This has not yet happened for $32$. Again this is no guarantee that it will never happen though. +A couple of small but nice benefits of using $32$ as stride. Multiplication and division are bit shifts as $32$ is a power of $2$. For debugging purposes you might want to consider a stride of $16$. This makes any hexadecimal representation of a number just shift up one character. However in actual runs of the algorithm with a stride of $16$ some runs would fail because there was no suitable $y$ coordinate within $16$ values of the message value. This has not yet happened for $32$ yet. To reiterate this is no guarantee that it will never happen. \subsection{Debugging the cmix operations.} -Debugging the cmix operations is hard. Intermediate results produced by nodes are hard to validate for the real 2048 bit numbers used in the cyclic group and the 256 bit numbers used in the elgamal example. Fortunately it is possible to use smaller cyclic groups and smaller curves to debug structural issues in the algorithms. For the cyclic groups this works like a charm, as the operations are pretty easy to do by hand or calculator, The operations for elliptic curve are a lot less intuitive, even for small curves. Especially with no known good library that implements these operations. This makes debugging some of the Mix primitives tedious and time consuming. +Debugging the cmix operations is hard. Intermediate results produced by nodes are hard to validate for the real 2048 bit numbers used in the cyclic group and the 256 bit numbers used in the elgamal example. Fortunately it is possible to use smaller cyclic groups and smaller curves to debug structural issues in the algorithms. For the cyclic groups this works like a charm, as the operations are pretty easy to do by hand or calculator, The operations for elliptic curve are a lot less intuitive, even for small curves. Especially with no known good library that implements these operations. This makes debugging some of the mix algorithm primitives tedious and time consuming. -Some tools that have been a great help in creating the implementation are AddressSanitizer\cite{ASan} and the companion leak check tool LeakSanitizer\cite{LSan}. These tools check for any object and array access outside of the allowed bounds in the code paths that are executed. This is no guarantee the application is completely memory safe, but it does ensure that any out of bounds access is detected. Even if the access was still technically in valid memory just nog part of the object or array. The leak tool also helped to keep an eye on the tokens passed around by the CMix library. Because the implementation language of the library was $C$ and there are 2 different implementations for the 2 different elgamal backends. There is a lot of token passing. These tokens need to be destructed but have no clear overarching owner. This makes it harder to keep track of the lifetime. So LeakSanitizer helped out tracking these lost tokens. Which allowed me to eventually run the benchmark for a high number of clients on a system with relatively low specs. +Some tools that have been a great help in creating the implementation are AddressSanitizer\cite{ASan} and the companion leak check tool LeakSanitizer\cite{LSan}. These tools check all the executed code paths for any object or array access outside of the allowed bounds. This is no guarantee the application is completely memory safe. It does ensure that any out of bounds access is detected, even if the access was still technically in valid memory just not part of the object or array. The leak tool also helped to keep an eye on the tokens passed around by the CMix library. Because the implementation language of the library was $C$ and there are 2 different implementations for the 2 different elgamal backends. There is a lot of token passing. These tokens need to be destructed but have no clear overarching owner. This makes it harder to keep track of object lifetime. So LeakSanitizer helped out tracking these lost tokens. Which allowed me to eventually eliminate memory leaks. Which in turn allowed me to run the benchmark for a high number of clients on a system with relatively low specs. diff --git a/content/results.tex b/content/results.tex index 62cfb65..ab7ad28 100644 --- a/content/results.tex +++ b/content/results.tex @@ -1,17 +1,19 @@ \section{Results} -So the raw results presented in appendix \ref{app-tables} were obtained by running 3 nodes and 500 clients on the same computer. The clients and nodes operated in the way you would normally see a cMix setup. All connections, either from node to node or clien to node, are TCP connections encrypted using TLS. +So the raw results presented in appendix \ref{app-tables} were obtained by running 3 nodes and 500 clients on the same computer. The clients and nodes operated in the way you would normally see a cMix setup. All connections, either from node to node or client to node, are TCP connections encrypted using TLS. Latency is off course negligible because all the participants are running on the same computer but the goal is not to measure network latency. Rather we want to know if there is a benefit in using elliptic curve as apposed to multiplicative group el-gamal. -The reason behind running 3 nodes is simple. There are subtle distinctions between nodes do depending on the position in the network. The first node needs to aggregate messages and initiate the mix when enough messages have been received. The last node needs to do additional calculations to prevent the tagging attack mentioned in section \ref{sec:tagging}. Additionally the last node needs to decrypt the final message en send it to its destination. So the minimal testcase should contain 3 nodes. 1 first, 1 middle and 1 last node. Now I don't expect to see much difference between these nodes with the exception of the ``RealPost'' step as the last node needs to prepare buffers with the plaintext to send out to the clients. +The reason behind running 3 nodes is simple. There are subtle distinctions between what nodes do, depending on the position in the network. The first node needs to aggregate messages and initiate the mix when enough messages have been received. The last node needs to do additional calculations to prevent the tagging attack mentioned in section \ref{sec:tagging}. Additionally the last node needs to decrypt the final message en send it to its destination. So the minimal testcase should contain 3 nodes. 1 first, 1 middle and 1 last node. I don't expect to see much difference between these nodes with the exception of the ``RealPost'' step as the last node needs to decrypt the ciphertext, and prepare plaintext buffers to send out to the clients. -The reasoning behind running 500 clients is 2-fold. in the original cMix paper \cite{TODO} The largest test was with 500 clients. So I wanted to mimic that. The second reason is that it still feasible to do 500 clients using a single pc with 12GB or RAM. We still could increase the number of clients but running 500 of them gives us large enough timings that we don't need to worry about the timer resolution of the used CPU timers. +The reasoning behind running 500 clients is 2-fold. In the original cMix paper \cite{TODO} The largest test was with 500 clients. So I wanted to mimic that. The second reason is that it still feasible to do 500 clients using a single pc with 12GB or RAM. We could still increase the number of clients by about 100 but running 500 of them gives us large enough timings that we don't need to worry about the timer resolution of the used CPU timers. And not running the extra 100 clients just gives us some headroom. -For the timings I used the \emph{boost::timer::cpu\_timer}\cite{BoostCpuTimer} which has a timer resolution of $10000000ns$ for both user and system clocks on a Linux environment. This is why all the results are accurate to one-hundredth of a second. The timings used are the so called ``User'' timings. This eliminates the time spend context switching, which gives us slightly more accurate results. The system and wall times are recorded though, just filtered out in the results table as they are not relevant. +For the timings I used the \emph{boost::timer::cpu\_timer}\cite{BoostCpuTimer} which has a timer resolution of $10000000ns$ for both user and system clocks on a Linux environment. This is why all the results are accurate to one-hundredth of a second. The timings used are the so called ``User'' timings. This eliminates the time spend context switching, which gives us slightly more accurate results. The system and wall times are also recorded, but just filtered out in the results table as they are not relevant. So for gathering results I created a program called statsd, it is included in the repository. The program receives timer snapshots over TCP. So each node sends a snapshot just before they start working on a phase of the cMix algorithm. After we are done with computational work but before sending the data to the next node another snapshot of the clock state is send to the statsd. So the results are purely the computation of that cMix phase. With some additional conversions to the wireformat, but not the overhead of sending the message over the socket. This is done just after the cMIx operation complete courtesy of the implicit strand of the boost::asio async socket operations. +Gathering the results over TCP with a separate daemon enables people to run this same benchmark over seperate servers. Enabling some nice test vectors as you can control network congestion and packetloss. + The following results were gathered with the pc specs as listed in appendix \ref{app-specs}. \begin{table} @@ -19,7 +21,7 @@ The following results were gathered with the pc specs as listed in appendix \ref \resizebox{\columnwidth}{!}{ \begin{tabular}{|c||c|c|c|c|c|c|} \hline -Node & prepre & premix & prepost & realpre & realmix & realpost \\\hline\hline +Node & prepre (s (\textsigma)) & premix (s (\textsigma)) & prepost (s (\textsigma)) & realpre (s (\textsigma)) & realmix (s (\textsigma)) & realpost (s (\textsigma)) \\\hline\hline Node1 ec 500 & 3.662 (0.030) & 2.819 (0.015) & 1.163 (0.029) & 0.299 (0.004) & 0.122 (0.004) & 0.123 (0.004) \\\hline Node2 ec 500 & 3.676 (0.025) & 2.818 (0.020) & 1.170 (0.029) & 0.302 (0.005) & 0.124 (0.005) & 0.123 (0.005) \\\hline Node3 ec 500 & 3.680 (0.026) & 2.819 (0.018) & 1.169 (0.028) & 0.302 (0.004) & 0.212 (0.004) & 0.451 (0.020) \\\hline @@ -32,7 +34,7 @@ Node3 ec 500 & 3.680 (0.026) & 2.819 (0.018) & 1.169 (0.028) & 0.302 (0.004) & 0 \resizebox{\columnwidth}{!} { \begin{tabular}{|c||c|c|c|c|c|c|} \hline -Node & prepre & premix & prepost & realpre & realmix & realpost \\\hline\hline +Node & prepre (s (\textsigma)) & premix (s (\textsigma)) & prepost (s (\textsigma)) & realpre (s (\textsigma)) & realmix (s (\textsigma)) & realpost (s (\textsigma)) \\\hline\hline Node1 mg 500 & 19.145 (0.039) & 19.142 (0.035) & 20.125 (0.560) & 24.769 (1.373) & 0.074 (0.005) & 0.140 (0.005) \\\hline Node2 mg 500 & 19.215 (0.041) & 19.140 (0.035) & 20.114 (0.661) & 24.509 (2.063) & 0.072 (0.005) & 0.139 (0.006) \\\hline Node3 mg 500 & 19.219 (0.073) & 19.152 (0.066) & 20.235 (1.183) & 24.560 (2.845) & 0.074 (0.006) & 1.475 (0.018) \\\hline @@ -53,7 +53,7 @@ \pagestyle{fancy} % All pages have headers and footers \fancyhead{} % Blank out the default header \fancyfoot{} % Blank out the default footer -\fancyhead[C]{Benchmarking CMix $\bullet$ Jan 2017} % Custom header text +\fancyhead[C]{Benchmarking CMix $\bullet$ Jun 2017} % Custom header text \fancyfoot[RO,LE]{\thepage} % Custom footer text \usepackage{titling} % Customizing the title section @@ -62,12 +62,15 @@ \usepackage{hyperref} % For hyperlinks in the PDF -\usepackage[backend=biber]{biblatex} +\usepackage[backend=biber,citestyle=numeric]{biblatex} +\addbibresource{thesis.bib} \usepackage{pdfpages} \usepackage{csquotes} +\usepackage{textgreek} + \usepackage{listings} \lstset{ basicstyle=\small\ttfamily, @@ -75,8 +78,6 @@ columns=flexible, breaklines=true } -\addbibresource{thesis.bib} - \input{content/title} \begin{document} |
