summaryrefslogtreecommitdiff
path: root/Projects/leakindexer.markdown
diff options
context:
space:
mode:
Diffstat (limited to 'Projects/leakindexer.markdown')
-rw-r--r--Projects/leakindexer.markdown45
1 files changed, 45 insertions, 0 deletions
diff --git a/Projects/leakindexer.markdown b/Projects/leakindexer.markdown
new file mode 100644
index 0000000..4f65cb7
--- /dev/null
+++ b/Projects/leakindexer.markdown
@@ -0,0 +1,45 @@
+
+This was my first project I ever did.
+It was during my first year in computing science.
+I had no prior knowledge of programming besides a introductory course in C++.
+But when the opportunity arose during a security lecture I just had to grab it.
+The security teacher was befriended with a research Journalist Huub Jasper.
+He wanted an easy way to search trough the Wikileaks documents that just got leaked at the time.
+So he asked our Security professor if he knew some students that might be interested.
+The following lecture he asked the whole lecture room who was interested and 5 people raised their hands.
+Erik Boss, [Sjors Gielen], Rik Harink, Nick Overdijk and Dennis Brentjes (me).
+
+The project was pretty time intensive and I had to learn a lot and be quick on my feet as I was the least knowledgeable member of the group at the time.
+But in the long run this project was a fun and wonderful experience.
+The cooperation with the Research Journalists was refreshing.
+In a way they are power-users of search machines, but they don't necessarily know how to express their power-user needs.
+This became obvious when we started testing the first versions of the software with small group of researchers.
+Some of them compared this to another search engine called Lexus Nexus and highlighted missing features.
+Some of these features were then implemented by us.
+
+The project culminated in the [VVOJ][] [Legebeke Legaat 2011][] where Huub Jaspers presented this product to a large group of Dutch of research Journalist.
+We also hosted a small workshop on the site which unfortunately was planned alongside other interesting talks and therefore didn't attract that many people.
+But the research tool did come up during a discussion panel with some prominent editors of the Dutch press.
+The discussion was focused on how to disclose the information contained in the Wikileaks documents now this search engine exists.
+The documents were un-redacted, and could pose serious threats to the people disclosed in those documents.
+Huub Jasper explained that only other journalists that approached him, VPRO or Argos would get access.
+This was decided by Huub Jasper in the beginning of our project.
+Although other public search engines did exists it was a matter of principle to not disclose possibly dangerous information.
+Also the added capabilities to search for dates and geo-coordinates made him decide to make it publicly available.
+
+But looking back at this project we could have done things differently.
+All things considered we used standard search engine techniques like reversed indexes.
+We were able to do full text search, search for dates and date ranges and even tried our hand on geo-coordinates.
+The search engine tailored to the needs of these particular researchers.
+The problem though is that we had no idea how to process these relatively large datasets.
+We kept everything in memory which was barely possible.
+So the system stopped scaling after the Iraqi and Afghanistan war-logs were added.
+
+Nowadays we should be able to solve these problems or even use and extend a standard search engine system like Xapian.
+Something we didn't find when looking for standard solution when we begun with this project.
+
+The logo was created by Erik Boss
+
+[Sjors Gielen]: http://sjorsgielen.nl/
+[VVOJ]: http://www.vvoj.nl/
+[Legebeke Legaat 2011]: http://www.vvoj.nl/2011/10/24/programma-legebeke-legaat-2011/