summaryrefslogtreecommitdiff
path: root/content/discussion.tex
diff options
context:
space:
mode:
authorDennis Brentjes <dennis@brentj.es>2018-09-04 19:28:20 +0200
committerDennis Brentjes <dennis@brentj.es>2018-09-07 00:21:35 +0200
commit4cf29b0c3beacafd8565ca5461381f53832688ed (patch)
treed70c95146ca8d72eb9017fce0e9c3e2e88308472 /content/discussion.tex
parent1e316c9a7437580f499453cdafbb0c7433a46b88 (diff)
downloadthesis-master.tar.gz
thesis-master.tar.bz2
thesis-master.zip
Applied Colins feedback.HEADmaster
Diffstat (limited to 'content/discussion.tex')
-rw-r--r--content/discussion.tex32
1 files changed, 16 insertions, 16 deletions
diff --git a/content/discussion.tex b/content/discussion.tex
index 34bdc4e..cd646cc 100644
--- a/content/discussion.tex
+++ b/content/discussion.tex
@@ -3,14 +3,14 @@
We will now discuss the results in more detail, zooming in on all the separate phases and see if we can either verify our expectations or explain any deviations.
-\subsection{Justification af claims}
+\subsection{Justification of claims}
\label{sec:microbench}
-To justify the claims made in the rest of the discussion section a micro benchmark was created for both implementations, \ec and \mg. Please note that these are raw operation on their respective elements and thus does not take into account the difference in size. Therefore the \ec results should effectively be multiplied by $8$ to have a straight up comparison.
+To justify the claims made in the rest of the discussion section a micro benchmark was created for both implementations, \ec and \mg. Please note that these are raw operation on their respective elements and thus does not take into account the difference in group element size. Therefore the \ec results should effectively be multiplied by $8$ to have a straight up comparison.
\begin{description}
-\item[Random] The time it took to generate 10000 random group elements / curve points.
-\item[Combine] The time it took to multiply 10000 group elements or to add 10000 curve points.
-\item[Uncombine] The time it took to invert 10000 group elements / curve points. And then multiply or add them.
+\item[Random] The time it took to generate 10,000 random group elements / curve points.
+\item[Combine] The time it took to multiply 10,000 group elements or to add 10,000 curve points.
+\item[Uncombine] The time it took to invert 10,000 group elements / curve points and subsequently multiply or add them.
\end{description}
\noindent The results of a micro benchmark on multiplicative group.
@@ -19,40 +19,40 @@ To justify the claims made in the rest of the discussion section a micro benchma
\noindent The result of a micro benchmark on elliptic curve Ed25519
\vspace{-\parsep}\vspace{-\parsep}\VerbatimInput{results/mb_ec.txt}
-This clearly shows that adding curve points is much faster than multiplying 2048 bit numbers in a group, but it also clearly shows but it also shows how slow inverting is. This gives us insight into how the results we see in the above tables came to be.
+This clearly shows that adding curve points is much faster than multiplying 2048 bit numbers in a group. It also clearly shows how slow inverting is. This gives us insight into how the results we see in the above tables (Tables \ref{tab:ec500} and \ref{tab:mg500}) came to be.
\subsection{Precomputation precomputation}
-Right away we can see a large difference between \ec and \mg, with \ec being more than $2$ times slower. Most likely this can be explained by 2 factors. The first being that this step includes generating all the randomness needed for the \cmix run. It generates $R$ and $S$ and the permutation $\pi$. For Ed25519 it tries a $Y$-value and if it doesn't map onto the curve we try a different one. On average only half of the points would be valid.
+Right away we can see a large difference between \ec and \mg, with \ec being more than $2$ times slower. Most likely this can be explained by two factors. The first being that this step includes generating all the randomness needed for the \cmix{} run. It generates $R$, $S$, and the permutation $\pi$. For $Ed25519$ it tries a $y$-value and if it doesn't map onto the curve we try a different one. On average only half of the points would be valid. This means that, in the limit towards infinity, twice as many candidate $y$ coordinates need to be generated and checked if they map to the curve, than we need random curve points (see \autoref{eq:limit}.)
\begin{equation}
-\label{eq:prepre}
+\label{eq:limit}
\sum_{x=0}^{\infty} n \cdot \left( \frac{1}{2}\right)^x =\ 2n
\end{equation}
-A quick benchmark done with ``valgrind --tool=callgrind''\cite{ValgrindCallgrind} shows us that for the entire run $70\%$ of time is spend in getting random bytes trough the $libgcrypt$ api for \ec. But ``only'' $46\%$ is spend gathering random bytes for the \mg
+A quick benchmark done with ``valgrind --tool=callgrind''\cite{ValgrindCallgrind} shows us that for the entire run $70\%$ of time is spend in getting random bytes trough the $libgcrypt$ api for \ec, but only $46\%$ is spend gathering random bytes for the \mg
-Another slowing factor might be the memory fragmentation. The way that libgcrypt operates is that each object is it's own opaque pointer to an internal data structure. So there is no way to consolidate memory into a single block. This wouldn't be a factor if both algorithms would use the same amount of allocations, but now that we do 8 simultaneous mixes for \ec we have 8 times the amount of the allocations for the \ec algorithm. This impacts the performance quite heavily
+Another slowing factor might be the memory fragmentation. The way that libgcrypt operates is that each object is it's own opaque pointer to an internal data structure. So there is no way to consolidate memory into a single block. This would not be a factor if both algorithms would use the same amount of allocations, but now that we do 8 simultaneous mixes for \ec we have 8 times the amount of the allocations for the \ec algorithm. This impacts the performance quite heavily
\subsection{Precomputation mixing}
\label{sec:premix}
-Another big time loss is attributed to the wire format chosen. because we send over the affine $x$ and $y$ coordinate for \ec, we need to convert the coordinate from projected to affine coordinate systems. This takes up extra time which can be eliminated if we choose. Unfortunately if we want to do this using libgcrypt we need to send over all 3 projected coordinates as there is no way to use only the $Y$ and $Z$ coordinates to reconstruct the curve point in libgcrypt.
+Another big time loss is attributed to the wire format chosen, because we send over the affine $x$ and $y$ coordinate for \ec, we need to convert the coordinate from projected to affine coordinate systems. This takes up extra time which can be eliminated if desired. Unfortunately if we want to do this using libgcrypt we need to send over all three projected coordinates as there is no way to only use the $Y$ and $Z$ coordinates to reconstruct the curve point in libgcrypt.
-Another downside to fixing this by sending projected coordinates is that there is no way of knowing how many bits a projected coordinate is, so the allocation of buffer space in the wire format container will become dependent on the point input. Something which is not necessary for the \mg implementation because it's buffer size will always be the size of the group element. This has to be exposed via the generic API if you to be able to send projected coordinates, because the application binaries, the nodes and clients, are responsible for all the network transmissions, not the \cmix libraries.
+Another downside to fixing this by sending projected coordinates is that there is no way of knowing how many bits of buffer space a projected coordinate will occupy, so the allocation of buffer space in the wire format container will become dependent on the point input. Something which is not necessary for the \mg implementation because it's buffer size will always be the size of the group element. Because the application binaries, the nodes and clients, are responsible for all the network transmissions. This specific projected point size needs to be exposed by the generic API.
\subsection{Precomputation postprocessing}
-In this stage we only do a `simple' cryptographic operation on each of the points to determine the node's decryption share. There are no extra allocations and no unneeded inversions, and we can see \ec is $2.5$ times faster.
+In this stage we only do a `simple' cryptographic operation on each of the points to determine the node's decryption share. There are no extra allocations nor unneeded inversions, and we can see \ec is $2.5$ times faster then \mg.
\subsection{Realtime precomputation}
-this phase is a large time saver, namely \ec is almost $13$ times faster. Additionally it is a time saver in the realtime portion of the protocol. This will massively impact perceived latency by the clients. The cryptographic primitives in this step are $1$ inversion and $1$ multiplication or point add. All though inversion are still very slow compared to adding 2 points in Ed25519, it is still considerably faster than a inversion in 2048 bit multiplicative group as we can see in section \ref{sec:microbench}.
+This phase is a large time saver, namely \ec is almost $13$ times faster. Additionally it is a time saver in the realtime portion of the protocol. This will massively impact perceived latency by the clients. The cryptographic primitives in this step are $1$ inversion and $1$ multiplication or point addition. Although inversions are still very slow compared to adding 2 points in Ed25519, it is still considerably faster than an inversion in 2048 bit multiplicative group as we can see in section \ref{sec:microbench}.
\subsection{Realtime mix and realtime postcomputation}
-In the last 2 phases \ec is considerably slower than \mg. This seemed surprising. On closer inspection with ``valgrind --tool=callgrind''\cite{ValgrindCallgrind} we find that almost 94\% of the time was spent in the $gcry\_mpi\_ec\_get\_affine$ function call. Which does inversions which we already know are slow from our micro benchmark. To reiterate 1 of these inversions could be avoided if we would change the wire format to not send over the affine Y coordinate.
+In the last two phases \ec is considerably slower than \mg. This seems surprising. On closer inspection with ``valgrind --tool=callgrind''\cite{ValgrindCallgrind} we find that almost 94\% of the time was spent in the $gcry\_mpi\_ec\_get\_affine$ function call. Which does inversions which are known to be slow based on our micro benchmark. One of these inversions could be avoided if we would change the wire format to not send over the affine $y$ coordinate.
-That being said, the phases take around $1$s per node. And the time saved during the realtime precomputation phase still makes this worth while.
+That being said, the phases take around $1$ second per node. And the time saved during the realtime precomputation phase still makes using \ec worth while.