Senior Data Science @
SentinelOne
The Customer: SentinelOne (S1) is an Israeli Unicorn with an antivirus product for big corporations, that solve the problem of ransomware (tens of thousands of customers, including fortune 500, millions of end users).
The group: Big data infra is responsible for clever AWS utilization to analyze the vast data collected from millions of end user computers (including process creation, network events and some other 50 types of events).
Background: SentinelOne has already performed (1) effective rule based malware detection at scale (2) sophisticated ML to identify sophisticated advanced persistent threats (APTs) carried out by professional cyber gangs. The challenge: push to production. To address this, Rafy Bryl, a senior director who worked with me directly, devised a collaborative vision utilizing open sources tools and AWS tools.
The project: Build simple anomaly detection algorithms at scale (EMR-SPARK, Jupyter, PySpark, Scala, Presto). Steps: (1) read events data files (Parque, ORC). (2) perform feature engineering, feature enrichments (3) apply statistical tests as well as counter, entropy and seasonality algorithms (fft, stl ). Surprising, relatively simple analysis has detected super interesting findings at multiple customers (active viruses). Those proved the value of the system, and motivated the team to push it to production. The system is now in beta.