From 23968a760efa6e03e8d47fbff108ec5aae010fe3 Mon Sep 17 00:00:00 2001
From: Dennis Brentjes <dennis@brentj.es>
Date: Tue, 26 Jun 2018 23:16:41 +0200
Subject: Small revision of the thesis.

---
 content/discussion.tex | 33 ++++++++++-----------------------
 1 file changed, 10 insertions(+), 23 deletions(-)

(limited to 'content/discussion.tex')

diff --git a/content/discussion.tex b/content/discussion.tex
index e27fcd6..397f43d 100644
--- a/content/discussion.tex
+++ b/content/discussion.tex
@@ -16,7 +16,8 @@ Note that these separate runs of the ed25519 protocol can trivially be paralleli
 
 This implementation uses a prefix to the message data that is the destination id. This takes up $20$ bytes of each message, as it is the SHA1 hash of the public key of the receiver.So the payload would become $236$ and $228$ bytes for \mg and \ec respectively.
 
-\subsection{justification af claims}
+\subsection{Justification af claims}
+\label{sec:microbench}
 To justify the claims made in the rest of the discussion section I created a micro benchmark on both implementations, \ec and \mg. Please note that these are raw operation on their respective elements and thus does not take into account the difference in size. Therefore the \ec results should effectively be multiplied by $8$ to have a straight up comparison.
  
 \begin{description}
@@ -26,10 +27,10 @@ To justify the claims made in the rest of the discussion section I created a mic
 \end{description}
 
 \noindent The results of a micro benchmark on multiplicative group.
-\VerbatimInput{results/mb_mg.txt}
+\vspace{-\parsep}\vspace{-\parsep}\VerbatimInput{results/mb_mg.txt}
 
 \noindent The result of a micro benchmark on elliptic curve Ed25519
-\noin\VerbatimInput{results/mb_ec.txt}
+\vspace{-\parsep}\vspace{-\parsep}\VerbatimInput{results/mb_ec.txt}
 
 This clearly shows that adding curve points is much faster than multiplying 2048 bit numbers in a group, but it also clearly shows but it also shows how slow inverting is. This gives us insight into how the results we see in the above tables came to be.
 
@@ -40,7 +41,9 @@ Right away we can see a large difference between \ec and \mg, with \ec being mor
 \begin{equation}
 \label{eq:prepre}
 \sum_{x=0}^{\infty} n \cdot \left( \frac{1}{2}\right)^x =\ 2n
-\end{equation} 
+\end{equation}
+
+A quick benchmark done with ``valgrind --tool=callgrind'' shows us that for the entire run $70\%$ of time is spend in getting random bytes trough the $libgcrypt$ api for \ec. But ``only'' $46\%$ is spend gathering random bytes for the \mg
 
 Another slowing factor might be the memory fragmentation. The way that libgcrypt operates is that each object is it's own opaque pointer to an internal data structure. So there is no way to consolidate memory into a single block. This wouldn't be a factor if both algorithms would use the same amount of allocations, but now that we do 8 simultaneous mixes for \ec we have 8 times the amount of the allocations for the \ec algorithm. This impacts the performance quite heavily 
 
@@ -55,28 +58,12 @@ In this stage we only do a `simple' cryptographic operation on each of the point
 	
 \subsection{Realtime precomputation}
 
-Here is a large time saver, namely \ec is almost $13$ times faster. Additionally it is a time saver in the realtime portion of the protocol. This will massively impact perceived latency by the clients.
-
-\ec is much faster because the we only need to multiply the incoming values with 2 precomputed values in the precomputation phase, the inverse of the session key $K^{-1}$, and the value of $R$. These operations are faster in \ec than they are in \mg. 
+Here is a large time saver, namely \ec is almost $13$ times faster. Additionally it is a time saver in the realtime portion of the protocol. This will massively impact perceived latency by the clients. The cryptographic primitives in this step are $1$ inversion and $1$ multiplication or point add. All though inversion are still very slow compared to adding 2 points in Ed25519, it is still considerably faster than a inversion in 2048 bit multiplicative group as we can see in section \ref{sec:microbench}. 
 
 \subsection{Realtime mix and realtime postcomputation}
-The problem with these 2 phases is that the duration of these phases are relatively short and therefore are more unreliable to derive any information from. For instance the Realtime mix operation seems to be slower in \ec. This could be the case and could be again explained by the serialization choices made for elliptic curve points. However the difference is so small with respect to the clock resolution that it is risky to claim this. It could be that these phases are calling more functions and we are just measuring the indirection overhead of the \cmix library implementation.
-
-It could also be due to memory fragmentation. The \ec implementation is uses smaller elements, allocating an element size buffer has a bigger chance to succeed within just cleared memory, but subsequent buffers might not find free bytes in the neighborhood. I try to minimize this by allocating all the elements I need at a time. However I don't have much control over for instance the protobuf library. I use the arena allocator from libprotobuf\cite{protobuf-arena} but have no idea how large these arenas are and how the cache locality is for the \cmix network messages.
-
-In short there is not many conclusions we can derive from these results. However these phases seem to be similar between the 2 implementations. One interesting thing here is the Realtime phase in node 3 of both \ec and \mg. It takes quite some more time to remove the R and S than I expected that being said it seems to take about 3 times as long for \mg than \ec and that gives us some insight how long it takes to invert an element in both implementations.
-
-\subsection{Precomputation phase}
-
-The complete precomputation phase on average takes $22.98$ seconds for \ec versus $175.49$ seconds for \mg. So for elliptic curve to be  worth it we would need to make it a bit faster than it now is. But I fully believe it is possible. Again removing the unnecessary inversions would help. Using a different cryptographic library as backend for the ed25519 cryptography would have a much greater impact. How useful libgcrypt may have been for experimenting and debugging certain aspects of the application, it's cryptographic routines might not be as optimized as say an ed25519-donna. That being said ed25519-donna does not implement all the primitives I need to implement all the \cmix primitives. But creating a custom ed25519 library or using one that does support all the primitives \cmix needs would make a huge impact on the performance.
-
-This also holds for the multiplicative group implementation of libgcrypt but I expect that both algorithms would have the same level optimization within the library, because both implementation use the underlying big integer library and operations.
-
-\subsection{Realtime phase}
-
-The complete realtime phase on average takes $2.06$ seconds for \ec versus $75.81$ seconds for \mg. So in the Realtime phase using ed25519 is really worth it.
-
 
+In the last 2 phases \ec is considerably slower than \mg. The unnecessary inversion in the wire protocol might be to blame for this. It is hard to say for certain but taking into account the operations done in those phases and section \ref{sec:microbench} it is the most likely cause.
 
+That being said, the phases take around $1$s per node. And the time saved during the realtime precomputation phase still makes this worth while.
 
 
-- 
cgit v1.2.3-70-g09d2