diff options
| author | Dennis Brentjes <d.brentjes@gmail.com> | 2014-10-26 14:53:49 +0100 |
|---|---|---|
| committer | Dennis Brentjes <d.brentjes@gmail.com> | 2014-10-26 14:53:49 +0100 |
| commit | 5d31563d239d40824b5f312f4fa48aa267964bb6 (patch) | |
| tree | 0a6d2c602d3c0b646dc2ecbe88e57b057cc9df7e /Projects/leakindexer.markdown | |
| parent | 2f9f2c85e9974a7d0284e282fe4b5cdbe0e36dae (diff) | |
| download | brentj.es-5d31563d239d40824b5f312f4fa48aa267964bb6.tar.gz brentj.es-5d31563d239d40824b5f312f4fa48aa267964bb6.tar.bz2 brentj.es-5d31563d239d40824b5f312f4fa48aa267964bb6.zip | |
Added a start for a project page/portfolio.
Diffstat (limited to 'Projects/leakindexer.markdown')
| -rw-r--r-- | Projects/leakindexer.markdown | 45 |
1 files changed, 45 insertions, 0 deletions
diff --git a/Projects/leakindexer.markdown b/Projects/leakindexer.markdown new file mode 100644 index 0000000..4f65cb7 --- /dev/null +++ b/Projects/leakindexer.markdown @@ -0,0 +1,45 @@ + +This was my first project I ever did. +It was during my first year in computing science. +I had no prior knowledge of programming besides a introductory course in C++. +But when the opportunity arose during a security lecture I just had to grab it. +The security teacher was befriended with a research Journalist Huub Jasper. +He wanted an easy way to search trough the Wikileaks documents that just got leaked at the time. +So he asked our Security professor if he knew some students that might be interested. +The following lecture he asked the whole lecture room who was interested and 5 people raised their hands. +Erik Boss, [Sjors Gielen], Rik Harink, Nick Overdijk and Dennis Brentjes (me). + +The project was pretty time intensive and I had to learn a lot and be quick on my feet as I was the least knowledgeable member of the group at the time. +But in the long run this project was a fun and wonderful experience. +The cooperation with the Research Journalists was refreshing. +In a way they are power-users of search machines, but they don't necessarily know how to express their power-user needs. +This became obvious when we started testing the first versions of the software with small group of researchers. +Some of them compared this to another search engine called Lexus Nexus and highlighted missing features. +Some of these features were then implemented by us. + +The project culminated in the [VVOJ][] [Legebeke Legaat 2011][] where Huub Jaspers presented this product to a large group of Dutch of research Journalist. +We also hosted a small workshop on the site which unfortunately was planned alongside other interesting talks and therefore didn't attract that many people. +But the research tool did come up during a discussion panel with some prominent editors of the Dutch press. +The discussion was focused on how to disclose the information contained in the Wikileaks documents now this search engine exists. +The documents were un-redacted, and could pose serious threats to the people disclosed in those documents. +Huub Jasper explained that only other journalists that approached him, VPRO or Argos would get access. +This was decided by Huub Jasper in the beginning of our project. +Although other public search engines did exists it was a matter of principle to not disclose possibly dangerous information. +Also the added capabilities to search for dates and geo-coordinates made him decide to make it publicly available. + +But looking back at this project we could have done things differently. +All things considered we used standard search engine techniques like reversed indexes. +We were able to do full text search, search for dates and date ranges and even tried our hand on geo-coordinates. +The search engine tailored to the needs of these particular researchers. +The problem though is that we had no idea how to process these relatively large datasets. +We kept everything in memory which was barely possible. +So the system stopped scaling after the Iraqi and Afghanistan war-logs were added. + +Nowadays we should be able to solve these problems or even use and extend a standard search engine system like Xapian. +Something we didn't find when looking for standard solution when we begun with this project. + +The logo was created by Erik Boss + +[Sjors Gielen]: http://sjorsgielen.nl/ +[VVOJ]: http://www.vvoj.nl/ +[Legebeke Legaat 2011]: http://www.vvoj.nl/2011/10/24/programma-legebeke-legaat-2011/ |
