\section{Results} \label{sec:results} \newcommand{\ec}[0]{\emph{ec}\xspace} \newcommand{\mg}[0]{\emph{mg}\xspace} In this section \ec and \mg will refer to the two different implementations that we compare. They stand for elliptic curve and multiplicative group respectively. The raw results, which can be found in the \cmix{} repository see Appendix \autoref{app:code}, were obtained by running 3 nodes and 500 clients on the same computer. The clients and nodes operated in the way you would normally see in a \cmix{} setup. All connections, either from node to node or client to node, are TCP connections encrypted using TLS. Each one of these 500 clients prepares a message of 248 bytes for \ec or a message of 256 bytes for \mg and send it to the first node. This is achieved by either using a 2048 bit group for \mg or by using 31 bytes of the ed25519 group element and doing 8 mixes per run to get up to 248 bytes. The timings in the Tables \ref{tab:ec500} and \ref{tab:mg500} correspond with the average of 100 runs and the standard deviation of the average for each step in the protocol. For example, $prepre$ stands for $precomputation$ $precomputation$ phase and $realpost$ stands for $realtime$ $postcomputation$ phase. Note that these separate runs of the $Ed25519$ protocol can trivially be parallelized, possibly making benchmark results even more interesting by comparison. But as we are interested in a straight up comparison this implementation does not parallelize multiple mixes. This implementation uses a prefix to the message data that is the destination id. This takes up $20$ bytes of each message, as it is the SHA1 hash of the public key of the receiver. So the payload would become $236$ and $228$ bytes for \mg and \ec, respectively. Network latency is negligible because all the participants are running on the same computer but the goal is not to measure network latency. Rather we want to know if there is a benefit in using elliptic curve as apposed to multiplicative group ElGamal. The reason behind running three nodes is simple. There are subtle distinctions between what nodes do, depending on the position in the network. The first node needs to aggregate messages and initiate the mix when enough messages have been received. The last node needs to do additional calculations to prevent the tagging attack mentioned in section \ref{sec:tagging}. Additionally the last node needs to decrypt the final message and send it to its destination. So the minimal test case should contain 3 nodes. one first, one middle and one last node. No large difference in time between these nodes is expected, with the exception of the ``RealPost'' step as the last node needs to decrypt the ciphertext and prepare plaintext buffers to send out to the clients. In this benchmark we are running 500 clients per mix because of two reasons. In the original \cmix{} paper \cite{cMix} The largest test was ran with 500 clients. So this benchmark mimics that. The second reason is that it is still feasible to run 500 clients using a single pc with 12GB or RAM. We could still increase the number of clients by about 100 but running 500 of them gives us large enough timings that we don't need to worry about the timer resolution of the used CPU timers. In fact not running the extra 100 clients gives us some headroom with regards to other applications still running in that background. For the timings this benchmark used the \emph{boost::timer::cpu\_timer}\cite{BoostCpuTimer} which has a timer resolution of $10,000,000ns$ for both user and system clocks on a Linux environment. This is why all the results are accurate to one-hundredth of a second. The timings used are the so called ``User'' timings. This eliminates the time spend context switching, which gives us slightly more accurate results. The system and wall times are also recorded, but just filtered out in the results table as they are not relevant. Gathering results is done by the $statsd$ application mentioned in \autoref{sec:implementation}. The program receives timer snapshots over TCP. So each node sends a snapshot just before they start working on a phase of the \cmix{} algorithm. After a node is done with its computational work but before sending the data to the next node, another time snapshot is send to the $statsd$. This means the results are purely about the computation of that \cmix{} phase. Some overhead of serializing the timer snapshot is recorded by the this measuring method, however this is a very small contribution to the total time spent in comparison to measured effect and even compared to the timers resolution. Gathering the results over TCP with a separate daemon enables people to run this same benchmark over separate servers. Enabling some test vectors as you can control network congestion and packet loss. \subsection{Summary of the results} The following results were gathered with the pc specs as listed in Appendix \autoref{app-specs}. The optimization specific flags that were used are listed in Appendix \autoref{app-ccopts}. \begin{table}[!ht] \centering \begin{footnotesize} \begin{tabular}{|r|r|r|r|r|r|r|} \hline \input{results/ec_summary.tab} \end{tabular} \end{footnotesize} \caption{Node time average over 100 runs with standard deviation in seconds using ed25519 and running 500 clients.} \label{tab:ec500} \end{table} \begin{table}[!ht] \centering \begin{footnotesize} \begin{tabular}{|r|r|r|r|r|r|r|} \hline \input{results/mg_summary.tab} \end{tabular} \end{footnotesize} \caption{Node time average over 100 runs with standard deviation in seconds using 2048 bit multiplicative group and running 500 clients.} \label{tab:mg500} \end{table} To show the algorithms scale linearly in the number of clients used in a run, additional data has been gathered (see Tables \ref{tab:ec100} to \ref{tab:mg400} in Appendix \autoref{app:addtables}). The Graphs \ref{graph:prepre} to \ref{graph:realpost} show the average timings and standard deviation of all the results summarized in tables. From these graphs we can derive that most of the operation are linear, except for two. These differences can easily be explained by external factors like entropy running low or micro effects like inefficient cache usage. We can also clearly see that all nodes do the same amount of work except for the $realtime$ $postcomputation$ phase. This particular part of the protocol needs to perform some additional calculations which are not performed by the other nodes and it is clearly visible in the graph. \clearpage \tikzset{every picture/.style={line width=2pt}} \begin{figure}[!ht] \centering \begin{tikzpicture} \begin{axis}[ xmin=0, xmax=500, xtick=data, ymin=0, ymax=200, ymajorgrids=true, grid style=dashed, legend pos=north west, width=.98*\columnwidth, height=.63*\columnwidth, xlabel={Number of clients}, ylabel={Time in s} ] \input{results/prePre.graph} \legend{Node1 ec prepre,,,,Node2 ec prepre,,,,Node3 ec prepre,,,,Node1 mg prepre,,,,Node2 mg prepre,,,,Node3 mg prepre} \end{axis} \end{tikzpicture} \caption{A graph of the average times for the precomputation precomputation step.} \label{graph:prepre} \end{figure} \begin{figure}[!ht] \centering \begin{tikzpicture} \begin{axis}[ xmin=0, xmax=500, xtick=data, ymin=0, ymax=80, ymajorgrids=true, grid style=dashed, legend pos=north west, width=.98*\columnwidth, height=.63*\columnwidth, xlabel={Number of clients}, ylabel={Time in s} ] \input{results/preMix.graph} \legend{Node1 ec premix,,,,Node2 ec premix,,,,Node3 ec premix,,,,Node1 mg premix,,,,Node2 mg premix,,,,Node3 mg premix} \end{axis} \end{tikzpicture} \caption{A graph of the average times for the precomputation mix step.} \label{fig:premix} \end{figure} \begin{figure}[!ht] \centering \begin{tikzpicture} \begin{axis}[ xmin=0, xmax=500, xtick=data, ymin=0, ymax=22, ymajorgrids=true, grid style=dashed, legend pos=north west, width=.98*\columnwidth, height=.63*\columnwidth, xlabel={Number of clients}, ylabel={Time in s} ] \input{results/prePost.graph} \legend{Node1 ec prepost,,,,Node2 ec prepost,,,,Node3 ec prepost,,,,Node1 mg prepost,,,,Node2 mg prepost,,,,Node3 mg prepost} \end{axis} \end{tikzpicture} \caption{A graph of the average times for the precomputation postcomputation step.} \label{fig:prepost} \end{figure} \begin{figure}[!ht] \centering \begin{tikzpicture} \begin{axis}[ xmin=0, xmax=500, xtick=data, ymin=0, ymax=15, ymajorgrids=true, grid style=dashed, legend pos=north west, width=.98*\columnwidth, height=.64*\columnwidth, xlabel={Number of clients}, ylabel={Time in s} ] \input{results/realPre.graph} \legend{Node1 ec realpre,,,,Node2 ec realpre,,,,Node3 ec realpre,,,,Node1 mg realpre,,,,Node2 mg realpre,,,,Node3 mg realpre} \end{axis} \end{tikzpicture} \caption{A graph of the average times for the realtime precomputation step.} \label{fig:realpre} \end{figure} \begin{figure}[!ht] \centering \begin{tikzpicture} \begin{axis}[ scaled ticks=false, tick label style={/pgf/number format/fixed}, xmin=0, xmax=500, xtick=data, ymin=0, ymax=1.2, ymajorgrids=true, grid style=dashed, legend pos=north west, width=.98*\columnwidth, height=.64*\columnwidth, xlabel={Number of clients}, ylabel={Time in s} ] \input{results/realMix.graph} \legend{Node1 ec realmix,,,,Node2 ec realmix,,,,Node3 ec realmix,,,,Node1 mg realmix,,,,Node2 mg realmix,,,,Node3 mg realmix} \end{axis} \end{tikzpicture} \caption{A graph of the average times for the realtime mix step.} \label{fig:realmix} \end{figure} \begin{figure}[!ht] \centering \begin{tikzpicture} \begin{axis}[ scaled ticks=false, tick label style={/pgf/number format/fixed}, xmin=0, xmax=500, xtick=data, ymin=0, ymax=2.5, ymajorgrids=true, grid style=dashed, legend pos=north west, width=.98*\columnwidth, height=.63*\columnwidth, xlabel={Number of clients}, ylabel={Time in s} ] \input{results/realPost.graph} \legend{Node1 ec realpost,,,,Node2 ec realpost,,,,Node3 ec realpost,,,,Node1 mg realpost,,,,Node2 mg realpost,,,,Node3 mg realpost} \end{axis} \end{tikzpicture} \caption{A graph of the average times for the realtime postcomputation step.} \label{graph:realpost} \end{figure} \pagebreak