\section{Discussion}
\label{sec:discussion}
\newcommand{\ec}[0]{\emph{ec}\xspace}

\newcommand{\mg}[0]{\emph{mg}\xspace}

In this section \ec and \mg will refer to the 2 different implementations that we compare. They stand for elliptic curve and multiplicative group.

So lets first talk about what our expectations will be.
The size of the 2 types of messages, for \mg and \ec, are slightly different.
We use a 2048 bit multiplicative group so we have 256 byte messages.
Ed25519 has a little below 255 bit group elements giving us less then 32 byte messages.
For convenience and because we're using Koblitz's method we only use 31 byte for the message part. But to counter this we do 8 mixes at a time to go up to 248 byte messages. Still 8 bytes short, but this makes the timings more comparable.

Note that these separate runs of the ed25519 protocol can trivially be parallelized, possibly making it even more interesting by comparison. But as we are interested in a straight up comparison this implementation does not parallelize multiple mixes.

This implementation uses a prefix to the message data that is the destination id. This takes up $20$ bytes of each message, as it is the SHA1 hash of the public key of the receiver.So the payload would become $236$ and $228$ bytes for \mg and \ec respectively.

\subsection{Justification af claims}
\label{sec:microbench}
To justify the claims made in the rest of the discussion section I created a micro benchmark on both implementations, \ec and \mg. Please note that these are raw operation on their respective elements and thus does not take into account the difference in size. Therefore the \ec results should effectively be multiplied by $8$ to have a straight up comparison.
 
\begin{description}
\item[Random] The time it took to generate 10000 random group elements / curve points.
\item[Combine] The time it took to multiply 10000 group elements or to add 10000 curve points.
\item[Uncombine] The time it took to invert 10000 group elements / curve points. And then multiply or add them. 
\end{description}

\noindent The results of a micro benchmark on multiplicative group.
\vspace{-\parsep}\vspace{-\parsep}\VerbatimInput{results/mb_mg.txt}

\noindent The result of a micro benchmark on elliptic curve Ed25519
\vspace{-\parsep}\vspace{-\parsep}\VerbatimInput{results/mb_ec.txt}

This clearly shows that adding curve points is much faster than multiplying 2048 bit numbers in a group, but it also clearly shows but it also shows how slow inverting is. This gives us insight into how the results we see in the above tables came to be.

\subsection{Precomputation precomputation}

Right away we can see a large difference between \ec and \mg, with \ec being more than $2$ times slower. Most likely this can be explained by 2 factors. The first being that this step includes generating all the randomness needed for the \cmix run. It generates $R$ and $S$ and the permutation $\pi$. For Ed25519 it tries a $Y$-value and if it doesn't map onto the curve we try a different one. On average only half of the points would be valid.

\begin{equation}
\label{eq:prepre}
\sum_{x=0}^{\infty} n \cdot \left( \frac{1}{2}\right)^x =\ 2n
\end{equation}

A quick benchmark done with ``valgrind --tool=callgrind'' shows us that for the entire run $70\%$ of time is spend in getting random bytes trough the $libgcrypt$ api for \ec. But ``only'' $46\%$ is spend gathering random bytes for the \mg

Another slowing factor might be the memory fragmentation. The way that libgcrypt operates is that each object is it's own opaque pointer to an internal data structure. So there is no way to consolidate memory into a single block. This wouldn't be a factor if both algorithms would use the same amount of allocations, but now that we do 8 simultaneous mixes for \ec we have 8 times the amount of the allocations for the \ec algorithm. This impacts the performance quite heavily 

\subsection{Precomputation mixing}
\label{sec:premix}

Another big time loss is attributed to the wire format chosen. because we only send over the Y coordinate, we need to convert the coordinate from projected to affine coordinate systems. This takes up extra time which can be eliminated if we choose. Unfortunately if we want to do this using libgcrypt we need to send over all 3 projected coordinates as there is no way to use only the $Y$ and $Z$ coordinates to reconstruct the curve point in libgcrypt.

\subsection{Precomputation postprocessing}

In this stage we only do a `simple' cryptographic operation on each of the points to determine the node's decryption share. There are no extra allocations and no unneeded inversions, and we can see \ec is $2.5$ times faster.
	
\subsection{Realtime precomputation}

Here is a large time saver, namely \ec is almost $13$ times faster. Additionally it is a time saver in the realtime portion of the protocol. This will massively impact perceived latency by the clients. The cryptographic primitives in this step are $1$ inversion and $1$ multiplication or point add. All though inversion are still very slow compared to adding 2 points in Ed25519, it is still considerably faster than a inversion in 2048 bit multiplicative group as we can see in section \ref{sec:microbench}. 

\subsection{Realtime mix and realtime postcomputation}

In the last 2 phases \ec is considerably slower than \mg. The unnecessary inversion in the wire protocol might be to blame for this. It is hard to say for certain but taking into account the operations done in those phases and section \ref{sec:microbench} it is the most likely cause.

That being said, the phases take around $1$s per node. And the time saved during the realtime precomputation phase still makes this worth while.