Applied Colins feedback.HEAD master

author: Dennis Brentjes <dennis@brentj.es> 2018-09-04 19:28:20 +0200
committer: Dennis Brentjes <dennis@brentj.es> 2018-09-07 00:21:35 +0200
commit: 4cf29b0c3beacafd8565ca5461381f53832688ed (patch)
tree: d70c95146ca8d72eb9017fce0e9c3e2e88308472 /content/results.tex
parent: 1e316c9a7437580f499453cdafbb0c7433a46b88 (diff)
download: thesis-4cf29b0c3beacafd8565ca5461381f53832688ed.tar.gz
thesis-4cf29b0c3beacafd8565ca5461381f53832688ed.tar.bz2
thesis-4cf29b0c3beacafd8565ca5461381f53832688ed.zip
1 files changed, 18 insertions, 13 deletions
diff --git a/content/results.tex b/content/results.tex
index 9f81d2d..40f50de 100644
--- a/content/results.tex
+++ b/content/results.tex
@@ -4,29 +4,29 @@
 
 \newcommand{\mg}[0]{\emph{mg}\xspace}
 
-In this section \ec and \mg will refer to the 2 different implementations that we compare. They stand for elliptic curve and multiplicative group.
+In this section \ec and \mg will refer to the two different implementations that we compare. They stand for elliptic curve and multiplicative group respectively.
 
-So the raw results, which can be found in the cmix repository \ref{app:code}, were obtained by running 3 nodes and 500 clients on the same computer. The clients and nodes operated in the way you would normally see a \cmix setup. All connections, either from node to node or client to node, are TCP connections encrypted using TLS. Each one of these 500 clients prepare a message of 248 bytes for \ec or a message of 256 bytes for \mg and send it to the first node of the network. This is achieved by either using a 2048 bit group for \mg or by using 31 bytes of the ed25519 group element and doing 8 mixes per run to get up to 248 bytes. The timings in the table correspond with the average of 100 runs and the standard deviation of that average for each step in the protocol. For example $prepre$ stands for $precomputation precomputation$ phase and $realpost$ stands for $realtime postcomputation$ phase.
+The raw results, which can be found in the \cmix{} repository see Appendix \autoref{app:code}, were obtained by running 3 nodes and 500 clients on the same computer. The clients and nodes operated in the way you would normally see in a \cmix{} setup. All connections, either from node to node or client to node, are TCP connections encrypted using TLS. Each one of these 500 clients prepares a message of 248 bytes for \ec or a message of 256 bytes for \mg and send it to the first node. This is achieved by either using a 2048 bit group for \mg or by using 31 bytes of the ed25519 group element and doing 8 mixes per run to get up to 248 bytes. The timings in the Tables \ref{tab:ec500} and \ref{tab:mg500} correspond with the average of 100 runs and the standard deviation of the average for each step in the protocol. For example, $prepre$ stands for $precomputation$ $precomputation$ phase and $realpost$ stands for $realtime$ $postcomputation$ phase.
 
-Note that these separate runs of the ed25519 protocol can trivially be parallelized, possibly making it even more interesting by comparison. But as we are interested in a straight up comparison this implementation does not parallelize multiple mixes.
+Note that these separate runs of the $Ed25519$ protocol can trivially be parallelized, possibly making benchmark results even more interesting by comparison. But as we are interested in a straight up comparison this implementation does not parallelize multiple mixes.
 
-This implementation uses a prefix to the message data that is the destination id. This takes up $20$ bytes of each message, as it is the SHA1 hash of the public key of the receiver. So the payload would become $236$ and $228$ bytes for \mg and \ec respectively.
+This implementation uses a prefix to the message data that is the destination id. This takes up $20$ bytes of each message, as it is the SHA1 hash of the public key of the receiver. So the payload would become $236$ and $228$ bytes for \mg and \ec, respectively.
 
 Network latency is negligible because all the participants are running on the same computer but the goal is not to measure network latency. Rather we want to know if there is a benefit in using elliptic curve as apposed to multiplicative group ElGamal.
 
-The reason behind running 3 nodes is simple. There are subtle distinctions between what nodes do, depending on the position in the network. The first node needs to aggregate messages and initiate the mix when enough messages have been received. The last node needs to do additional calculations to prevent the tagging attack mentioned in section \ref{sec:tagging}. Additionally the last node needs to decrypt the final message en send it to its destination. So the minimal test case should contain 3 nodes. 1 first, 1 middle and 1 last node. I don't expect to see much difference between these nodes with the exception of the ``RealPost'' step as the last node needs to decrypt the ciphertext, and prepare plaintext buffers to send out to the clients.
+The reason behind running three nodes is simple. There are subtle distinctions between what nodes do, depending on the position in the network. The first node needs to aggregate messages and initiate the mix when enough messages have been received. The last node needs to do additional calculations to prevent the tagging attack mentioned in section \ref{sec:tagging}. Additionally the last node needs to decrypt the final message and send it to its destination. So the minimal test case should contain 3 nodes. one first, one middle and one last node. No large difference in time between these nodes is expected, with the exception of the ``RealPost'' step as the last node needs to decrypt the ciphertext and prepare plaintext buffers to send out to the clients.
 
-In this benchmark we are running 500 clients per mix because of 2 reasons. In the original \cmix paper \cite{cMix} The largest test was with 500 clients. So I wanted to mimic that. The second reason is that it still feasible to do 500 clients using a single pc with 12GB or RAM. We could still increase the number of clients by about 100 but running 500 of them gives us large enough timings that we don't need to worry about the timer resolution of the used CPU timers. And not running the extra 100 clients just gives us some headroom when other applications needs some extra ram.
+In this benchmark we are running 500 clients per mix because of two reasons. In the original \cmix{} paper \cite{cMix} The largest test was ran with 500 clients. So this benchmark mimics that. The second reason is that it is still feasible to run 500 clients using a single pc with 12GB or RAM. We could still increase the number of clients by about 100 but running 500 of them gives us large enough timings that we don't need to worry about the timer resolution of the used CPU timers. In fact not running the extra 100 clients gives us some headroom with regards to other applications still running in that background.
 
-For the timings I used the \emph{boost::timer::cpu\_timer}\cite{BoostCpuTimer} which has a timer resolution of $10000000ns$ for both user and system clocks on a Linux environment. This is why all the results are accurate to one-hundredth of a second. The timings used are the so called ``User'' timings. This eliminates the time spend context switching, which gives us slightly more accurate results. The system and wall times are also recorded, but just filtered out in the results table as they are not relevant. 
+For the timings this benchmark used the \emph{boost::timer::cpu\_timer}\cite{BoostCpuTimer} which has a timer resolution of $10,000,000ns$ for both user and system clocks on a Linux environment. This is why all the results are accurate to one-hundredth of a second. The timings used are the so called ``User'' timings. This eliminates the time spend context switching, which gives us slightly more accurate results. The system and wall times are also recorded, but just filtered out in the results table as they are not relevant. 
 
-So for gathering results I created a program called statsd, it is included in the repository. The program receives timer snapshots over TCP. So each node sends a snapshot just before they start working on a phase of the \cmix algorithm. After we are done with computational work but before sending the data to the next node another snapshot of the clock state is send to the statsd. So the results are purely about the computation of that \cmix phase. With some additional conversions to the wire format of the timer snapshots, but not the overhead of sending the message over the socket. This is done just after the \cmix operation complete courtesy of the implicit ``strand'' of the boost::asio asynchronous socket operations.
+Gathering results is done by the $statsd$ application mentioned in \autoref{sec:implementation}. The program receives timer snapshots over TCP. So each node sends a snapshot just before they start working on a phase of the \cmix{} algorithm. After a node is done with its computational work but before sending the data to the next node, another time snapshot is send to the $statsd$. This means the results are purely about the computation of that \cmix{} phase. Some overhead of serializing the timer snapshot is recorded by the this measuring method, however this is a very small contribution to the total time spent in comparison to measured effect and even compared to the timers resolution.
 
-Gathering the results over TCP with a separate daemon enables people to run this same benchmark over separate servers. Enabling some nice test vectors as you can control network congestion and packet loss. 
+Gathering the results over TCP with a separate daemon enables people to run this same benchmark over separate servers. Enabling some test vectors as you can control network congestion and packet loss. 
 
 \subsection{Summary of the results}
 
-The following results were gathered with the pc specs as listed in appendix \ref{app-specs}. The optimization specific flags that were used are listed in appendix \ref{app-ccopts}.
+The following results were gathered with the pc specs as listed in Appendix \autoref{app-specs}. The optimization specific flags that were used are listed in Appendix \autoref{app-ccopts}.
 
 \begin{table}[!ht]
 	\centering
@@ -38,6 +38,7 @@ The following results were gathered with the pc specs as listed in appendix \ref
 	\end{footnotesize}
 	
 	\caption{Node time average over 100 runs with standard deviation in seconds using ed25519 and running 500 clients.}
+	\label{tab:ec500}
 \end{table}
 
 \begin{table}[!ht]
@@ -49,9 +50,13 @@ The following results were gathered with the pc specs as listed in appendix \ref
 		\end{tabular}
 	\end{footnotesize}
 	\caption{Node time average over 100 runs with standard deviation in seconds using 2048 bit multiplicative group and running 500 clients.}
-	
+	\label{tab:mg500}
 \end{table}
 
+To show the algorithms scale linearly in the number of clients used in a run, additional data has been gathered (see Tables \ref{tab:ec100} to \ref{tab:mg400} in Appendix \autoref{app:addtables}). The Graphs \ref{graph:prepre} to \ref{graph:realpost} show the average timings and standard deviation of all the results summarized in tables. From these graphs we can derive that most of the operation are linear, except for two. These differences can easily be explained by external factors like entropy running low or micro effects like inefficient cache usage. 
+
+We can also clearly see that all nodes do the same amount of work except for the $realtime$ $postcomputation$ phase. This particular part of the protocol needs to perform some additional calculations which are not performed by the other nodes and it is clearly visible in the graph.
+
 \clearpage
 
 \tikzset{every picture/.style={line width=2pt}}
@@ -77,7 +82,7 @@ The following results were gathered with the pc specs as listed in appendix \ref
 	\end{axis}
 	\end{tikzpicture}
 	\caption{A graph of the average times for the precomputation precomputation step.}
-	\label{fig:prepre}
+	\label{graph:prepre}
 \end{figure}
 
 \begin{figure}[!ht]
@@ -204,7 +209,7 @@ The following results were gathered with the pc specs as listed in appendix \ref
 	
 	\end{tikzpicture}
 	\caption{A graph of the average times for the realtime postcomputation step.}
-	\label{fig:realpost}
+	\label{graph:realpost}
 \end{figure}
 
 \pagebreak
author	Dennis Brentjes <dennis@brentj.es>	2018-09-04 19:28:20 +0200
committer	Dennis Brentjes <dennis@brentj.es>	2018-09-07 00:21:35 +0200
commit	4cf29b0c3beacafd8565ca5461381f53832688ed (patch)
tree	d70c95146ca8d72eb9017fce0e9c3e2e88308472 /content/results.tex
parent	1e316c9a7437580f499453cdafbb0c7433a46b88 (diff)
download	thesis-4cf29b0c3beacafd8565ca5461381f53832688ed.tar.gz thesis-4cf29b0c3beacafd8565ca5461381f53832688ed.tar.bz2 thesis-4cf29b0c3beacafd8565ca5461381f53832688ed.zip