lots of small changes to the thesis.

author: Dennis Brentjes <d.brentjes@gmail.com> 2017-05-28 18:34:44 +0200
committer: Dennis Brentjes <d.brentjes@gmail.com> 2017-05-28 18:35:08 +0200
commit: 610b3f96ec31ee6192d46767dedae9d9efaedf9b (patch)
tree: 8568071a7b8df54d1e64c9ded9852143263372fd /content/discussion.tex
parent: 6ff78ac5b7b36ada3028d2d5380fa3dbe35bbd66 (diff)
download: thesis-610b3f96ec31ee6192d46767dedae9d9efaedf9b.tar.gz
thesis-610b3f96ec31ee6192d46767dedae9d9efaedf9b.tar.bz2
thesis-610b3f96ec31ee6192d46767dedae9d9efaedf9b.zip
1 files changed, 34 insertions, 14 deletions
diff --git a/content/discussion.tex b/content/discussion.tex
index 3de03e9..44a7377 100644
--- a/content/discussion.tex
+++ b/content/discussion.tex
@@ -6,19 +6,19 @@
 
 In this section \ec and \mg will refer to the 2 different implementations that we compare. They stand for elliptic curve and multiplicative groups.
 
-So lets first talk about what our expectations are. The size of the messages being sent are different. The ed25519 implementation can send messages up to $31$ bytes. The 2048 bit multiplicative group can send up to $255$ bytes of data. This implementation uses an prefix to the message data that is the destination id. This takes up $20$ bytes of each message, as it is the SHA1 hash of the public key of the receiver. this means that the ed25519 implementation only sends $11$ bytes of payload versus $235$ bytes of payload for the multiplicative group implementation.
+So lets first talk about what our expectations are. The size of the messages being sent are different. The ed25519 implementation can send messages up to $31$ bytes. The 2048 bit multiplicative group can send up to $255$ bytes of data. This implementation uses a prefix to the message data that is the destination id. This takes up $20$ bytes of each message, as it is the SHA1 hash of the public key of the receiver. this means that the ed25519 implementation only sends $11$ bytes of payload versus $235$ bytes of payload for the multiplicative group implementation.
 
-However there are ways around this, by doing multiple mixes of the ed25519 ein one single cMix run, which in turn can be trivially parallelized. The effective payload of the ed25519 algorithm would become $228$ bytes versus $235$ bytes of the multiplicative group. This is why I will consider the payload difference between the multiplicative group and the ed25519 implementation to a factor of $8$.
+However there are ways around this. By doing 8 mixes of the ed25519 implementation in one single \cmix run with the same permutation functions but with different $R$ and $S$, we can significantly boost the effective payload. The effective payload of the ed25519 algorithm would become $228$ bytes versus $235$ bytes of the multiplicative group. This is why I will consider the payload difference between the multiplicative group and the ed25519 implementation to a factor of $8$.
 
-\subsection{precomputation}
+Note that these separate runs of the ed25519 protocol can trivially be parallelized, possibly making it even more interesting by comparison.
 
-If this is the case we would hope that the ed25519 implementation is atleast 8 times faster than the multiplicative group. Unfortunately this is not the case for the precomputation steps of the algorithm. 
+So by the above dissertation, we should see that \ec is 8 times faster than \mg in all respects. Just by the virtue of only having to handle $\frac{1}{8}$ of the data. We hope to even be a bit faster than that.
 
-\subsubsection{prepre}
+\subsection{Precomputation precomputation}
 
 First of all this step does not seems to differ between nodes, all \ec nodes spend around $3.67s$ and the \mg nodes spend about $19.19s$. Which is $5.23$ times faster, not the 8 times we would expect.
 
-This can easily be explained by the way one generates random curve points on an elliptic curve. You have to generate a random number and check, in the case of ed25519, if that number is a valid $y$ coordinate. For ed25519 about half of the possible $y$ values are valid, so the chance that a randomly generated number is valid will be $.5$. Because of this you need to, on average, generate and check twice as many coordinates as the actual number of random curve points you want \ref{eq:prerpe}.
+This can easily be explained by the way one generates random curve points on an elliptic curve. You have to generate a random number and check, in the case of ed25519, if that number is a valid $y$ coordinate. For Ed25519 about half of the possible $y$ values are valid, so the chance that a randomly generated number is valid will be $.5$. Because of this you need to, on average, generate and check twice as many coordinates as the actual number of random curve points you want \ref{eq:prepre}.
 
 \begin{equation}
 \label{eq:prepre}
@@ -27,25 +27,45 @@ This can easily be explained by the way one generates random curve points on an
 
 Therefore we only expect to be 4 times faster so comparing to that we are still slightly faster than the \mg counterpart, but \ec is slower on a fundamental level because of the extra calculations needed to generate random curve points.
 
-\subsubsection{premix}
+\subsection{Precomputation mixing}
+\label{sec:premix}
 
 The values for the premix step are very close to our ideal 8 times ratio, $6.79$ times faster. There is however no fundamental explanation why it should be slower than expected. There is however one implementation detail that might have had it's influence here.
 
-When serializing elliptic curve points, there are several choices you can make about the format. You can send both affine coordinates. Only the defining affine coordinate, or the projective coordinates. I chose in this implementation to send both affine $x$ and $y$ coordinates. This forces each node to calculate the inverse of the internally used projective $Z$ coordinate. Which is relatively slow. The reason for this is that the raw affine $x$ and $y$ coordinate are easier to inspect and debug. Trying to be faster by only sending the defining affine coordinate would still require the inverse of $Z$ to be calculated so there is not much to gain there. On top of that there is no easy way to find the corresponding $x$ for any given $y$ in the \emph{libgcrypt} elliptic curve interface.
+When serializing elliptic curve points, there are several choices you can make about the format. You can send both affine coordinates. Only sending the $y$ coordinate, for ed25519, and derive the $x$ coordinate later, or send the projective coordinates. I chose in this implementation to send both affine $x$ and $y$ coordinates. This forces each node to calculate the inverse of the internally used projective $Z$ coordinate. Which is relatively slow. The reason for this is that the raw affine $x$ and $y$ coordinate are easier to inspect and debug. Trying to be faster by only sending the defining affine coordinate would still require the inverse of $Z$ to be calculated so there is not much to gain there. On top of that there is no easy way to find the corresponding $x$ for any given $y$ in the \emph{libgcrypt} elliptic curve interface.
 
-There is time to gain if we switch from affine to projected coordinate system within the network format. it would mean sending more data but the conversions from network format to internal would be much faster. This would avoid the inversion of $Z$ and this would be the big time saver for many of the steps even the $prepre$ step we already discussed. My expectation is that it would make this step at least 8 times faster than the \mg implementation making it comparable in performance. 
+There is time to gain if we switch from affine to projected coordinate system within the network format. it would mean sending more data but the conversions from network format to internal would be much faster. This would avoid the inversion of $Z$ and this would be the big time saver for many of the steps, even the $prepre$ step we already discussed. My expectation is that it would make this step at least 8 times faster than the \mg implementation making it comparable in performance. 
 
-\subsubsection{prepost}
+\subsection{Precomputation postcomputation}
 
-So here in the precomputaion-postcompuation phase the elliptic curve algorithm pays off. with $1.17s$ vs $20.16s$ a $17.26$ times speed increase. This is probably because the inversions in \mg are much more computational intensive as inversions in \ec.
+So here in the pre-computation post-computation phase the elliptic curve algorithm pays off. with $1.17s$ vs $20.16s$ a $17.26$ times speed increase. This is probably because the inversions in \mg are much more computational intensive as inversions in \ec. This does not mean that they are free in \ec as discussed in section \ref{sec:premix}
+	
+\subsection{Realtime precomputation}
 
-\subsubsection{realpre}
+Here is a large time saver, namely \ec is $81.90$ times faster. Additionally it is a time saver in the realtime portion of the protocol. This will massively impact perceived latency by the clients. So if that is your goal then \ec seems to be the way to go.
+
+Why \ec is so much faster is uncertain. This step consist of removing the shared key the user and node agreed upon during initialization, and adding the $r$ value of the node for this client slot, see section \ref{sec:realpre}. It could be related to the fact that no encryption is applied at this stage of the protocol. And maybe because the elliptic curve is in comparison so much smaller that it saves exponentially more time than the \mg algorithm.
 
 Another timesaver compared to \mg. It's $81.90$ times faster.
 
-\subsubsection{realmix and realpost}
+\subsection{Realtime mix and realtime postcomputation}
+The problem with these 2 phases is that the duration of these phases are relatively short and therefore are more unreliable to derive any information from. For instance the Realtime mix operation seems to be slower in \ec. This could be the case and could be again explained by the serialization choices made for elliptic curve points. However the difference is so small with respect to the clock resolution that it is risky to claim this. It could be that these phases are calling more functions and we are just measuring the indirection overhead of the \cmix library implementation.
+
+It could also be due to memory fragmentation. The \ec implementation is uses smaller elements, allocating an element size buffer has a bigger chance to succeed within just cleared memory, but subsequent buffers might not find free bytes in the neighborhood. I try to minimize this by allocating all the elements I need at a time. However I don't have much control over for instance the protobuf library. I use the arena allocator from libprotobuf\cite{protobuf-arena} but have no idea how large these arenas are and how the cache locality is for the cmix network messages.
+
+In short there is not many conclusions we can derive from these results.
+
+These phases are similar between the 2 implementations. The One interesting ting here though is the Realtime phase in node 3 of both \ec and \mg. It takes quite some more time to remove the R and S than I expected that being said it seems to take about 3 times as long for \mg than \ec and that gives us some insight how long it takes to invert an element in both implementations.
+
+\subsection{Precomputation phase}
+
+The complete precomputation phase on average takes $22.98$ seconds for \ec versus $175.49$ seconds for \mg. So for elliptic curve to be  worth it we would need to make it a bit faster than it now is. But I fully believe it is possible. Again removing the unnecessary inversions would help. Using a different cryptographic library as backend for the ed25519 cryptography would have a much greater impact. How useful libgcrypt may have been for experimenting and debugging certain aspects of the application, it's cryptographic routines might not be as optimized as say an ed25519-donna. That being said ed25519-donna does not implement all the primitives I need to implement all the \cmix primitives. But creating a custom ed25519 library or using one that does support all the primitives \cmix needs would make a huge impact on the performance.
+
+This also holds for the multiplicative group implementation of libgcrypt but I expect that both algorithms would have the same level optimization within the library.
+
+\subsection{Realtime phase}
 
-These phases are similar. The \ec variant is relatively slow, but this might be related to the inversions that are incurred whenever the software serializes the elliptic curve points to the network format.
+The complete realtime phase on average takes $2.06$ seconds for \ec versus $75.81$ seconds for \mg. So in the Realtime phase using ed25519 is really worth it.
author	Dennis Brentjes <d.brentjes@gmail.com>	2017-05-28 18:34:44 +0200
committer	Dennis Brentjes <d.brentjes@gmail.com>	2017-05-28 18:35:08 +0200
commit	610b3f96ec31ee6192d46767dedae9d9efaedf9b (patch)
tree	8568071a7b8df54d1e64c9ded9852143263372fd /content/discussion.tex
parent	6ff78ac5b7b36ada3028d2d5380fa3dbe35bbd66 (diff)
download	thesis-610b3f96ec31ee6192d46767dedae9d9efaedf9b.tar.gz thesis-610b3f96ec31ee6192d46767dedae9d9efaedf9b.tar.bz2 thesis-610b3f96ec31ee6192d46767dedae9d9efaedf9b.zip