mvt/mvt
Donncha Ó Cearbhaill 41db117168 Improve performance when checking URLs and domains
Some MVT modules such as the WhatsApp module can be very slow as it was taking a naive approach to look for IOCs. The code was checking URLs (potentially more than 100k) against
1000's of IOC domains resulting in a quadratic run-time with hundreds of millions of comparisons as the number of IOCs increases.

This commit add an Aho-Corasick library which allows the efficient search in a string (the URL in this case) for all matches in set of keys (the IOCs). This data structure is perfect for this use case.

A quick measurement shows a 80% performance improvement for a WhatsApp database with 100k entries. The slow path is now the time spent fetching and expanding short URLs found in the database. This
can also be sped up significantly by fetching each URL asynchronously. This would require reworking modules to split the URL expansion from the IOC check so I will implement in a separate PR.
2023-06-29 14:14:44 +02:00
..
android Linted code using isort + autoflake + black, fixed wrong use of Optional[bool] 2023-06-01 23:40:26 +02:00
common Improve performance when checking URLs and domains 2023-06-29 14:14:44 +02:00
ios Add new iOS versions and build numbers 2023-06-22 00:17:52 +00:00
__init__.py Update copyright information 2023-02-08 20:18:16 +01:00