diff options
| author | Dennis Brentjes <d.brentjes@gmail.com> | 2014-10-27 18:16:50 +0100 |
|---|---|---|
| committer | Dennis Brentjes <d.brentjes@gmail.com> | 2014-10-27 18:16:50 +0100 |
| commit | 8d4d4ca442696e10254468049270dfe3fa477585 (patch) | |
| tree | db48d93b79cbc2d4c018c6dfdaf71472554877f5 /Projects | |
| parent | 863ec71a7603423436236a883cb1fe8e484f4674 (diff) | |
| download | brentj.es-8d4d4ca442696e10254468049270dfe3fa477585.tar.gz brentj.es-8d4d4ca442696e10254468049270dfe3fa477585.tar.bz2 brentj.es-8d4d4ca442696e10254468049270dfe3fa477585.zip | |
Improved the project: wikileaks leak indexer.
Diffstat (limited to 'Projects')
| -rw-r--r-- | Projects/leakindexer.markdown | 15 |
1 files changed, 6 insertions, 9 deletions
diff --git a/Projects/leakindexer.markdown b/Projects/leakindexer.markdown index da2b2fc..090c84b 100644 --- a/Projects/leakindexer.markdown +++ b/Projects/leakindexer.markdown @@ -27,16 +27,13 @@ This was decided by Huub Jasper in the beginning of our project. Although other public search engines did exists it was a matter of principle to not disclose possibly dangerous information. Also the added capabilities to search for dates and geo-coordinates made him decide to make it publicly available. -But looking back at this project we could have done things differently. -All things considered we used standard search engine techniques like reversed indexes. -We were able to do full text search, search for dates and date ranges and even tried our hand on geo-coordinates. -The search engine tailored to the needs of these particular researchers. -The problem though is that we had no idea how to process these relatively large datasets. -We kept everything in memory which was barely possible. -So the system stopped scaling after the Iraqi and Afghanistan war-logs were added. +Looking back at this project we could have done things differently. +Although we did use standard search engine techniques like reversed indexes and smart merging of result vectors. +We could have implemented the search engine in a standard search engine package like Xapian. +But as we didn't find Xapian when we started this project we implemented everything on our own and this was a good learning experience for all of us. +We were able to do full text search, search for dates and date ranges and even tried our hand on geo-coordinates. But the most important thing; the search engine is tailored to the needs of the research journalists. -Nowadays we should be able to solve these problems or even use and extend a standard search engine system like Xapian. -Something we didn't find when looking for standard solution when we begun with this project. +The downside of our 'roll your own' searchengine was it's scalability after indexing the afghan and iraqi warlogs and all the released cables we used up all of our 16GB ram. We little to no idea how to reduce the ram usage without dramatically impacting the performance. Nowadays we have some ideas how to this mostly due to experience in software constructure/architecture that we now have, for which this project was a great kickstarter. The logo was created by Erik Boss |
