This is the Definitive Security Data Science and Machine Learning Guide. It includes books, tutorials, presentations, blog posts, and research papers about solving security problems using data science.
Table of Contents
- Machine Learning and Security Papers
- Deep Learning and Security Papers
- Deep Learning and Security Presentations
- Security Data Science Blogs
- Security Data Science Blogposts / Tutorials
- Security Data Science Projects
- Security Data
- Security Data Science Books
- Security Data Science Presentations / Talks
- Misc
Machine Learning and Security Papers
Intrusion Detection Papers
- A Close Look on n-Grams in Intrusion Detection- Anomaly Detection vs. Classification
- A Framework for the Application of Association Rule Mining in Large Intrusion Detection Infrastructures
- A Kill Chain Analysis of the 2013 Target Data Breach
- A Lone Wolf No More - Supporting Network Intrusion Detection with Real-Time Intelligence
- A Machine-learning Approach for Classifying and Categorizing Android Sources and Sinks
- Acquiring Digital Evidence from Botnet Attacks: Procedures and Methods (PhD Thesis)
- ALERT-ID - Analyze Logs of the network Element in Real Time for Intrusion Detection
- Anagram - A Content Anomaly Detector Resistant to Mimicry Attack
- Anagram - A Content Anomaly Detector Resistant to Mimicry Attack
- Anomaly-based Intrusion Detection in Software as a Service
- Application of the PageRank Algorithm to Alarm Graphs
- Back to Basics - Beyond Network Hygiene
- Beehive - Large-Scale Log Analysis for Detecting Suspicious Activity in Enterprise Networks
- Behavioral Clustering of HTTP-based Malware and Signature Generation Using Malicious Network Traces
- Beheading Hydras - Performing Effective Botnet Takedowns
- Bloodhound - Searching Out Malicious Input in Network Flows for Automatic Repair Validation
- Boosting the Scalability of Botnet Detection Using Adaptive Traffic Sampling
- CAMP - Content Agnostic Malware Protection
- CAMP - Content Agnostic Malware Protection
- Casting out demons - Sanitizing training data for anomaly sensors
- CloudFence - Data Flow Tracking as a Cloud Service
- Comparing anomaly detection techniques for HTTP
- Cujo - Efficient detection and prevention of drive-by-download attacks
- Decoy Document Deployment for Effective Masquerade Attack Detection
- Detecting Spammers with SNARE - Spatio-temporal Network-level Automatic Reputation Engine
- Detecting Unknown Network Attacks Using Language Models
- Early Detection of Malicious Flux Networks via Large-Scale Passive DNS Traffic Analysis
- Effective Anomaly Detection with Scarce Training Data
- Efficient Multidimensional Aggregation for Large Scale Monitoring
- EFFORT - Efficient and Effective Bot Malware Detection
- ExecScent- Mining for New C and C Domains in Live Networks with Adaptive Control Protocol Templates - slides
- ExecScent- Mining for New C and C Domains in Live Networks with Adaptive Control Protocol Templates
- EXPOSURE - Finding Malicious Domains Using Passive DNS Analysis
- EXPOSURE - Finding Malicious Domains Using Passive DNS Analysis
- FiG - Automatic Fingerprint Generation
- Filtering Spam with Behavioral Blacklisting
- Finding The Needle - Suppression of False Alarms in Large Intrusion Detection Data Sets
- FLIPS - Hybrid Adaptive Intrusion Prevention
- Heuristics for Improved Enterprise Intrusion Detection by Jim Treinen
- HMMPayl - An Intrusion Detection System Based on Hidden Markov Models
- Kopis - Detecting malware domains at the upper dns hierarchy
- Kopis - Detecting malware domains at the upper dns hierarchy
- Large-Scale Malware Analysis, Detection, and Signature Generation
- Leveraging Honest Users - Stealth Command-and-Control of Botnets - slides
- Leveraging Honest Users - Stealth Command-and-Control of Botnets
- Local System Security via SSHD Instrumentation
- Machine Learning In Adversarial Environments
- Malware vs. Big Data (Umbrella Labs)
- McPAD - A Multiple Classifier System for Accurate Payload-based Anomaly Detection
- Measuring and Detecting Malware Downloads in Live Network Traffic
- Mining Botnet Sink Holes - slides
- MISHIMA - Multilateration of Internet hosts hidden using malicious fast-flux agents
- Monitoring the Initial DNS Behavior of Malicious Domains
- N-Gram against the Machine - On the Feasibility of the N-Gram Network Analysis for Binary Protocols
- Nazca - Detecting Malware Distribution in Large-Scale Networks
- Nazca - Detecting Malware Distribution in Large-Scale Networks
- Netgator - Malware Detection Using Program Interactive Challenges - slides
- Network Traffic Characterization Using (p, n)-grams Packet Representation
- Notos - Building a Dynamic Reputation System for DNS
- Notos - Building a Dynamic Reputation System for DNS
- On the Feasibility of Online Malware Detection with Performance Counters
- On the Infeasibility of Modeling Polymorphic Shellcode
- On the Mismanagement and Maliciousness of Networks
- Outside the Closed World - On Using Machine Learning For Network Intrusion Detection
- PAYL - Anomalous Payload-based Network Intrusion Detection
- PAYL - Anomalous Payload-based Network Intrusion Detection
- PAYL2 - Anomalous Payload-based Worm Detection and Signature Generation
- Pleiades - From Throw-away Traffic To Bots - Detecting The Rise Of DGA-based Malware
- Pleiades - From Throw-away Traffic To Bots - Detecting The Rise Of DGA-based Malware
- Polonium - Tera-Scale Graph Mining for Malware Detection
- Practical Comprehensive Bounds on Surreptitious Communication Over DNS - slides
- Practical Comprehensive Bounds on Surreptitious Communication Over DNS
- Privacy-preserving Payload-based Correlation for Accurate Malicious Traffic Detection
- Revealing Botnet Membership Using DNSBL Counter-Intelligence
- Revolver - An Automated Approach to the Detection of Evasive Web-based Malware
- Self-organized Collaboration of Distributed IDS Sensors
- SinkMiner- Mining Botnet Sinkholes for Fun and Profit
- Spamming Botnets - Signatures and Characteristics
- Spectrogram - A Mixture of Markov Chain models for Anomaly Detection in Web Traffic
- The Security of Machine Learning
- Toward Stealthy Malware Detection
- Traffic Aggregation for Malware Detection
- Understanding the Domain Registration Behavior of Spammers
- Understanding the Network-Level Behavior of Spammers
- VAST- Network Visibility Across Space and Time
Malware Papers
- A static, packer-agnostic filter to detect similar malware samples
- A study of malcode-bearing documents
- A survey on automated dynamic malware-analysis techniques and tools
- APT1 Technical backstage (malware.lu hack backs of APT1 servers)
- Automatic Analysis of Malware Behavior using Machine Learning
- BitShred - Fast, Scalable Code Reuse Detection in Binary Code
- BitShred - Fast, Scalable Malware Triage
- Deobfuscating Embedded Malware using Probable-Plaintext Attacks
- Escape from Monkey Island - Evading High-Interaction Honeyclients
- Eureka - A framework for enabling static malware analysis
- Extraction of Statistically Significant Malware Behaviors
- Fast Automated Unpacking and Classification of Malware
- FIRMA - Malware Clustering and Network Signature Generation with Mixed Network Behaviors
- FuncTracker - Discovering Shared Code (to aid malware forensics) - slides
- FuncTracker - Discovering Shared Code to Aid Malware Forensics Extended Abstract
- Malware files clustering based on file geometry and visualization using R language
- Mobile Malware Detection Based on Energy Fingerprints — A Dead End
- Polonium - Tera-Scale Graph Mining for Malware Detection
- Putting out a HIT - Crowdsourcing Malware Installs
- Scalable Fine-grained Behavioral Clustering of HTTP-based Malware
- Selecting Features to Classify Malware by Karthik Raman
- SigMal - A Static Signal Processing Based Malware Triage
- Tracking Memory Writes for Malware Classification and Code Reuse Identification
- Using File Relationships in Malware Classification
- VAMO - Towards a Fully Automated Malware Clustering Validity Analysis
Data Collection Papers
- Crawling BitTorrent DHTs for Fun and Profit
- CyberProbe - Towards Internet-Scale Active Detection of Malicious Servers
- Demystifying service discovery - Implementing an internet-wide scanner
- gitDigger - Creating useful wordlists from GitHub
- PoisonAmplifier - A Guided Approach of Discovering Compromised Websites through Reversing Search Poisoning Attacks
- ZMap - Fast Internet-Wide Scanning and its Security Applications (slides)
- ZMap - Fast Internet-Wide Scanning and its Security Applications
Vulnerability Analysis/Reversing Papers
- A Preliminary Analysis of Vulnerability Scores for Attacks in Wild
- Attacker Economics for Internet-scale Vulnerability Risk Assessment
- Detecting Logic Vulnerabilities in E-Commerce Applications
- ReDeBug - Finding Unpatched Code Clones in Entire OS Distributions
- The Classification of Valuable Data in an Assumption of Breach Paradigm
- Toward Black-Box Detection of Logic Flaws in Web Applications
- Vulnerability Extrapolation - Assisted Discovery of Vulnerabilities using Machine Learning - slides
- Vulnerability Extrapolation - Assisted Discovery of Vulnerabilities using Machine Learning
Anonymity/Privacy/OPSEC/Censorship Papers
- Anonymous Hacking Group – #OpNewblood Super Secret Security Handbook
- Detecting Traffic Snooping in Tor Using Decoys
- Risks and Realization of HTTPS Traffic Analysis
- Selling Off Privacy at Auction
- The Sniper Attack - Anonymously Deanonymizing and Disabling the Tor Network
- The Velocity of Censorship - High-Fidelity Detection of Microblog Post Deletions - slides
- The Velocity of Censorship - High-Fidelity Detection of Microblog Post Deletions
- Tor vs. NSA
Data Mining Papers
- An Exploration of Geolocation and Traffic Visualization Using Network Flows to Aid in Cyber Defense
- DSpin - Detecting Automatically Spun Content on the Web
- Gyrus - A Framework for User-Intent Monitoring of Text-Based Networked Applications
- Indexing Million of Packets per Second using GPUs
- Multi-Label Learning with Millions of Labels - Recommending Advertiser Bid Phrases for Web Pages
- Real-Time Handling of Network Monitoring Data Using a Data-Intensive Framework
- Shingled Graph Disassembly - Finding the Undecideable Path
- Synoptic Graphlet - Bridging the Gap between Supervised and Unsupervised Profiling of Host-level Network Traffic
Cyber Crime Papers
- Connected Colors - Unveiling the Structure of Criminal Networks
- Image Matching for Branding Phishing Kit Images - slides
- Image Matching for Branding Phishing Kit Images
- Inside a Targeted Point-of-Sale Data Breach
- Investigating Advanced Persistent Threat 1 (APT1)
- Measuring pay-per-install - the Commoditization of Malware Distribution
- Scambaiter - Understanding Targeted Nigerian Scams on Craigslist
- Sherlock Holmes and the Case of the Advanced Persistent Threat
- The Role of the Underground Market in Twitter Spam and Abuse
- The Tangled Web of Password Reuse
- Trafficking Fraudulent Accounts - The Role of the Underground Market in Twitter Spam and Abuse
CND/CNA/CNE/CNO Papers
- Amplification Hell - Revisiting Network Protocols for DDoS Abuse
- Defending The Enterprise, the Russian Way
- Protecting a Moving Target - Addressing Web Application Concept Drift
- Timing of Cyber Conflict
Deep Learning and Security Papers
- A Deep Learning Approach for Network Intrusion Detection System
- A Hybrid Malicious Code Detection Method based on Deep Learning
- A Hybrid Spectral Clustering and Deep Neural Network Ensemble Algorithm for Intrusion Detection in Sensor Networks
- A Multi-task Learning Model for Malware Classification with Useful File Access Pattern from API Call Sequence
- A Novel LSTM-RNN Decoding Algorithm in CAPTCHA Recognition (Short paper)
- An Analysis of Recurrent Neural Networks for Botnet Detection Behavior
- Application of Recurrent Neural Networks for User Verification based on Keystroke Dynamics
- Applications of Deep Learning On Traffic Identification (video: here)
- Combining Restricted Boltzmann Machine and One Side Perceptron for Malware Detection
- Comparison Deep Learning Method to Traditional Methods Using for Network Intrusion Detection (short paper)
- Convolutional Neural Networks for Malware Classification (THESIS)
- Deep Learning Approach for Network Intrusion Detection in Software Defined Networking
- Deep Learning for Classification of Malware System Call Sequences
- Deep Learning for Zero-day Flash Malware Detection (Short Paper)
- Deep Learning is a Good Steganalysis Tool When Embedding Key is Reused for Different Images, even if there is a cover source mismatch
- Deep Learning-based Feature Selection for Intrusion Detection System in Transport Layer (Short Paper)
- Deep Neural Network Based Malware Detection using Two Dimensional Binary Program Features
- DeepDGA: Adversarially-Tuned Domain Generation and Detection
- DeepSign: Deep Learning for Automatic Malware Signature Generation and Classification
- DL4MD: A Deep Learning Framework for Intelligent Malware Detection
- Droid-Sec: Deep Learning in Android Malware Detection
- DroidDetector: Android Malware Characterization and Detection using Deep Learning
- HADM: Hybrid Analysis for Detection of Malware
- Identifying Top Sellers In Underground Economy Using Deep Learning-based Sentiment Analysis
- Intrusion Detection System Using Deep Neural Network for In-Vehicle Network Security
- Large-scale Malware Classification using Random Projections and Neural Networks
- Learning a Static Analyzer: A Case Study on a Toy Language
- Learning Spam Features using Restricted Boltzmann Machines
- Long Short Term Memory Recurrent Neural Network Classifier for Intrusion Detection
- LSTM-based System-call Language Modeling and Robust Ensemble Method for Designing Host-based Intrusion Detection Systems
- Malware Classification on Time Series Data Through Machine Learning (THESIS)
- Malware Classification with Recurrent Networks
- Malware Detection with Deep Neural Network using Process Behavior
- MS-LSTM: a Multi-Scale LSTM Model for BGP Anomaly Detection
- MtNet: A Multi-Task Neural Network for Dynamic Malware Classification
- Network Anomaly Detection with the Restricted Boltzmann Machine
- Predicting Domain Generation Algorithms with Long Short-Term Memory Networks
- Recognizing Functions in Binaries with Neural Networks
- The Limitations of Deep Learning in Adversarial Settings
- Toward large-scale vulnerability discovery using Machine Learning
Deep Learning and Security Presentations
- A Deep Learning Approach for Network Intrusion Detection System
- Deep Learning on Disassembly Data (video: here)
Security Data Science Blogs
Blogs that frequently cover topics on security data science, machine learning, etc. These are recommended for your RSS feed.
Security Data Science Blogposts / Tutorials
- An Introduction to Machine Learning for Cybersecurity and Threat Hunting (code)
- Click Security’s Data Hacking (code)
- Dominos, Botnets, and a little LSTM (code)
- Machine Learning based Password Strength Classification (code)
- Recurrent neural networks for decoding CAPTCHAS
- Sequence to sequence learning to decode variable length captchas
- Using deep learning to break a Captcha system (code)
- Using Machine Learning to Detect Malicious URLs (code)
- Using Neural Networks to generate human readable passwords
- Netflow Flow2vec
Security Data Science Projects
Open source projects and code applying data science/machine learning to security problems.
- Clearcut - a tool that uses machine learning to help you focus on the log entries that really need manual review
- Click Security’s Data Hacking Project
- Combine - Tool to gather Threat Intelligence indicators from publicly available sources
- dga_predict - Predicting Domain Generation Algorithms using LSTMs.
- mlsec.org - Various Machine Learning and Computer Security Research projects from mlsec.org.
- tiq-test - Threat Intelligence Quotient Test - Dataviz and Statistical Analysis of TI feeds.
- CuckooML: Machine Learning for Cuckoo Sandbox https://honeynet.github.io/cuckooml/
Security Data
Collection of Security and Network Data Resources.
- See Covert.io Data Page
- See Covert.io Threat Intelligence Page
- See secrepo.com is more comprehensive and should be checked as well.
Security Data Science Books
- A Machine-Learning Approach to Phishing Detection and Defense
- Applied Security Visualization
- Data Mining and Machine Learning in Cybersecurity
- Data-Driven Security: Analysis, Visualization and Dashboards
- Information Security Analytics: Finding Security Insights, Patterns, and Anomalies in Big Data
- Machine Learning and Data Mining for Computer Security
- Network Anomaly Detection: A Machine Learning Perspective
- Network Security Through Data Analysis: Building Situational Awareness
Security Data Science Presentations / Talks
- Applied Machine Learning for Data Exfil and Other Fun Topics
- Applying Machine Learning to Network Security Monitoring
- Build an Antivirus in 5 Min – Fresh Machine Learning #7. A fun video to watch
- CrowdSource: Crowd Trained Machine Learning Model for Malware Capability Det
- Data-Driven Threat Intelligence: Metrics On Indicator Dissemination And Sharing
- Defeating Machine Learning What Your Security Vendor Is Not Telling You
- Defeating Machine Learning: Systemic Deficiencies for Detecting Malware
- Defending Networks With Incomplete Information: A Machine Learning Approach
- Defending Networks with Incomplete Information
- Delta Zero, KingPhish3r – Weaponizing Data Science for Social Engineering
- Fraud detection using machine learning & deep learning
- Hunting for Malware with Machine Learning
- Machine Duping 101: Pwning Deep Learning Systems
- Machine Learning and the Cloud: Disrupting Threat Detection and Prevention
- Machine Learning for Threat Detection
- Measuring the IQ of your Threat Intelligence Feeds
- Packet Capture Village – Theodora Titonis – How Machine Learning Finds Malware
- Secure Because Math: A Deep-Dive on ML-Based Monitoring
- The Applications Of Deep Learning On Traffic Identification
- Using Machine Learning to Support Information Security
- Clusterf*ck Actionable Intelligence from Machine Learning
- I Am Packer And So Can You
- Practical Applications of Data Science in Detection