Some MVT modules such as the WhatsApp module can be very slow as it was taking a naive approach to look for IOCs. The code was checking URLs (potentially more than 100k) against
1000's of IOC domains resulting in a quadratic run-time with hundreds of millions of comparisons as the number of IOCs increases.
This commit add an Aho-Corasick library which allows the efficient search in a string (the URL in this case) for all matches in set of keys (the IOCs). This data structure is perfect for this use case.
A quick measurement shows a 80% performance improvement for a WhatsApp database with 100k entries. The slow path is now the time spent fetching and expanding short URLs found in the database. This
can also be sped up significantly by fetching each URL asynchronously. This would require reworking modules to split the URL expansion from the IOC check so I will implement in a separate PR.
Not every system uses 'utf-8' as a default encoding for opening files in Python.
Before you say that there must be a way to set default encoding in one line, no, there is not. At least, I didn't found a way to do this.
Squashed commit of the following:
commit c0d9e8d5d188c13e7e5ec0612e99bfb7e25f47d4
Author: Donncha Ó Cearbhaill <donncha.ocearbhaill@amnesty.org>
Date: Fri Jan 7 16:05:12 2022 +0100
Update name of indicators JSON file
commit f719e49c5f942cef64931ecf422b6a6e7b8c9f17
Author: Donncha Ó Cearbhaill <donncha.ocearbhaill@amnesty.org>
Date: Fri Jan 7 15:38:03 2022 +0100
Do not set indicators option on module if no indicators were loaded
commit a289eb8de936f7d74c6c787cbb8daf5c5bec015c
Author: Donncha Ó Cearbhaill <donncha.ocearbhaill@amnesty.org>
Date: Fri Jan 7 14:43:00 2022 +0100
Simplify code for loading IoCs
commit 0804563415ee80d76c13d3b38ffe639fa14caa14
Author: Donncha Ó Cearbhaill <donncha.ocearbhaill@amnesty.org>
Date: Fri Jan 7 13:43:47 2022 +0100
Add metadata to IoC entries
commit 97d0e893c1a0736c4931363ff40f09a030b90cf6
Author: tek <tek@randhome.io>
Date: Fri Dec 17 16:43:09 2021 +0100
Implements automated loading of indicators
commit c381e14df92ae4d7d846a1c97bcf6639cc526082
Author: tek <tek@randhome.io>
Date: Fri Dec 17 12:41:15 2021 +0100
Improves download-indicators
commit b938e02ddfd0b916fd883f510b467491a4a84e5f
Author: tek <tek@randhome.io>
Date: Fri Dec 17 01:44:26 2021 +0100
Adds download-indicators for mvt-ios and mvt-android