Update (1/1/2017): I will not be updating this page and instead will make all updates to this page: The Definitive Security Data Science and Machine Learning Guide (see Machine Learning and Security Papers section).
Over the past several years I have collected and read many security research papers/slides and have started a small catalog of sorts. The topics of these papers range from intrusion detection, anomaly detection, machine learning/data mining, Internet scale data collection, malware analysis, and intrusion/breach reports. I figured this collection might useful to others. All links lead to PDFs hosted here.
I hope to clean this up (add author info, date, and publication) when I get some more time as well as adding some detailed notes I have on the various features, models, algorithms, and datasets used in many of these papers.
Here are some of my favorites (nice uses of machine learning, graph analytics, and/or anomaly detection to solve interesting security problems):
- CAMP - Content Agnostic Malware Protection
- Notos - Building a Dynamic Reputation System for DNS
- Kopis - Detecting malware domains at the upper dns hierarchy
- Pleiades - From Throw-away Traffic To Bots - Detecting The Rise Of DGA-based Malware
- EXPOSURE - Finding Malicious Domains Using Passive DNS Analysis
- Polonium - Tera-Scale Graph Mining for Malware Detection
- Nazca - Detecting Malware Distribution in Large-Scale Networks
- PAYL - Anomalous Payload-based Network Intrusion Detection
- Anagram - A Content Anomaly Detector Resistant to Mimicry Attack
Here is the entire collection:
Intrusion Detection
- A Close Look on n-Grams in Intrusion Detection- Anomaly Detection vs. Classification
- A Kill Chain Analysis of the 2013 Target Data Breach
- A Lone Wolf No More - Supporting Network Intrusion Detection with Real-Time Intelligence
- A Machine-learning Approach for Classifying and Categorizing Android Sources and Sinks
- Acquiring Digital Evidence from Botnet Attacks: Procedures and Methods (PhD Thesis)
- ALERT-ID - Analyze Logs of the network Element in Real Time for Intrusion Detection
- Anagram - A Content Anomaly Detector Resistant to Mimicry Attack
- Anomaly-based Intrusion Detection in Software as a Service
- Back to Basics - Beyond Network Hygiene
- Beehive - Large-Scale Log Analysis for Detecting Suspicious Activity in Enterprise Networks
- Behavioral Clustering of HTTP-based Malware and Signature Generation Using Malicious Network Traces
- Beheading Hydras - Performing Effective Botnet Takedowns
- Bloodhound - Searching Out Malicious Input in Network Flows for Automatic Repair Validation
- Boosting the Scalability of Botnet Detection Using Adaptive Traffic Sampling
- CAMP - Content Agnostic Malware Protection
- Casting out demons - Sanitizing training data for anomaly sensors
- CloudFence - Data Flow Tracking as a Cloud Service
- Comparing anomaly detection techniques for HTTP
- Cujo - Efficient detection and prevention of drive-by-download attacks
- Decoy Document Deployment for Effective Masquerade Attack Detection
- Detecting Spammers with SNARE - Spatio-temporal Network-level Automatic Reputation Engine
- Detecting Unknown Network Attacks Using Language Models
- Early Detection of Malicious Flux Networks via Large-Scale Passive DNS Traffic Analysis
- Effective Anomaly Detection with Scarce Training Data
- Efficient Multidimensional Aggregation for Large Scale Monitoring
- EFFORT - Efficient and Effective Bot Malware Detection
- ExecScent- Mining for New C and C Domains in Live Networks with Adaptive Control Protocol Templates - slides
- ExecScent- Mining for New C and C Domains in Live Networks with Adaptive Control Protocol Templates
- EXPOSURE - Finding Malicious Domains Using Passive DNS Analysis
- FiG - Automatic Fingerprint Generation
- Filtering Spam with Behavioral Blacklisting
- FLIPS - Hybrid Adaptive Intrusion Prevention
- HMMPayl - An Intrusion Detection System Based on Hidden Markov Models
- Kopis - Detecting malware domains at the upper dns hierarchy
- Large-Scale Malware Analysis, Detection, and Signature Generation
- Leveraging Honest Users - Stealth Command-and-Control of Botnets - slides
- Leveraging Honest Users - Stealth Command-and-Control of Botnets
- Local System Security via SSHD Instrumentation
- Machine Learning In Adversarial Environments
- Malware vs. Big Data (Umbrella Labs)
- McPAD - A Multiple Classifier System for Accurate Payload-based Anomaly Detection
- Measuring and Detecting Malware Downloads in Live Network Traffic
- Mining Botnet Sink Holes - slides
- MISHIMA - Multilateration of Internet hosts hidden using malicious fast-flux agents
- Monitoring the Initial DNS Behavior of Malicious Domains
- N-Gram against the Machine - On the Feasibility of the N-Gram Network Analysis for Binary Protocols
- Nazca - Detecting Malware Distribution in Large-Scale Networks
- Netgator - Malware Detection Using Program Interactive Challenges - slides
- Network Traffic Characterization Using (p, n)-grams Packet Representation
- Notos - Building a Dynamic Reputation System for DNS
- On the Feasibility of Online Malware Detection with Performance Counters
- On the Infeasibility of Modeling Polymorphic Shellcode
- On the Mismanagement and Maliciousness of Networks
- Outside the Closed World - On Using Machine Learning For Network Intrusion Detection
- PAYL - Anomalous Payload-based Network Intrusion Detection
- PAYL2 - Anomalous Payload-based Worm Detection and Signature Generation
- Pleiades - From Throw-away Traffic To Bots - Detecting The Rise Of DGA-based Malware
- Practical Comprehensive Bounds on Surreptitious Communication Over DNS - slides
- Practical Comprehensive Bounds on Surreptitious Communication Over DNS
- Privacy-preserving Payload-based Correlation for Accurate Malicious Traffic Detection
- Revealing Botnet Membership Using DNSBL Counter-Intelligence
- Revolver - An Automated Approach to the Detection of Evasive Web-based Malware
- Self-organized Collaboration of Distributed IDS Sensors
- SinkMiner- Mining Botnet Sinkholes for Fun and Profit
- Spamming Botnets - Signatures and Characteristics
- Spectrogram - A Mixture of Markov Chain models for Anomaly Detection in Web Traffic
- The Security of Machine Learning
- Toward Stealthy Malware Detection
- Traffic Aggregation for Malware Detection
- Understanding the Domain Registration Behavior of Spammers
- Understanding the Network-Level Behavior of Spammers
- VAST- Network Visibility Across Space and Time
- A Framework for the Application of Association Rule Mining in Large Intrusion Detection Infrastructures
- Application of the PageRank Algorithm to Alarm Graphs
- Finding The Needle - Suppression of False Alarms in Large Intrusion Detection Data Sets
- Heuristics for Improved Enterprise Intrusion Detection by Jim Treinen
Malware
- A static, packer-agnostic filter to detect similar malware samples
- A study of malcode-bearing documents
- A survey on automated dynamic malware-analysis techniques and tools
- APT1 Technical backstage (malware.lu hack backs of APT1 servers)
- Automatic Analysis of Malware Behavior using Machine Learning
- BitShred - Fast, Scalable Code Reuse Detection in Binary Code
- BitShred - Fast, Scalable Malware Triage
- Deobfuscating Embedded Malware using Probable-Plaintext Attacks
- Escape from Monkey Island - Evading High-Interaction Honeyclients
- Eureka - A framework for enabling static malware analysis
- Extraction of Statistically Significant Malware Behaviors
- Fast Automated Unpacking and Classification of Malware
- FIRMA - Malware Clustering and Network Signature Generation with Mixed Network Behaviors
- FuncTracker - Discovering Shared Code (to aid malware forensics) - slides
- FuncTracker - Discovering Shared Code to Aid Malware Forensics Extended Abstract
- Malware files clustering based on file geometry and visualization using R language
- Mobile Malware Detection Based on Energy Fingerprints — A Dead End
- Polonium - Tera-Scale Graph Mining for Malware Detection
- Putting out a HIT - Crowdsourcing Malware Installs
- Scalable Fine-grained Behavioral Clustering of HTTP-based Malware
- SigMal - A Static Signal Processing Based Malware Triage
- Tracking Memory Writes for Malware Classification and Code Reuse Identification
- Using File Relationships in Malware Classification
- VAMO - Towards a Fully Automated Malware Clustering Validity Analysis
- Selecting Features to Classify Malware by Karthik Raman
Data Collection
- Crawling BitTorrent DHTs for Fun and Profit
- CyberProbe - Towards Internet-Scale Active Detection of Malicious Servers
- Demystifying service discovery - Implementing an internet-wide scanner
- gitDigger - Creating useful wordlists from GitHub
- PoisonAmplifier - A Guided Approach of Discovering Compromised Websites through Reversing Search Poisoning Attacks
- ZMap - Fast Internet-Wide Scanning and its Security Applications (slides)
- ZMap - Fast Internet-Wide Scanning and its Security Applications
Vulnerability Analysis/Reversing
- A Preliminary Analysis of Vulnerability Scores for Attacks in Wild
- Attacker Economics for Internet-scale Vulnerability Risk Assessment
- Detecting Logic Vulnerabilities in E-Commerce Applications
- ReDeBug - Finding Unpatched Code Clones in Entire OS Distributions
- The Classification of Valuable Data in an Assumption of Breach Paradigm
- Toward Black-Box Detection of Logic Flaws in Web Applications
- Vulnerability Extrapolation - Assisted Discovery of Vulnerabilities using Machine Learning - slides
- Vulnerability Extrapolation - Assisted Discovery of Vulnerabilities using Machine Learning
Anonymity/Privacy/OPSEC/Censorship
- Anonymous Hacking Group – #OpNewblood Super Secret Security Handbook
- Detecting Traffic Snooping in Tor Using Decoys
- Risks and Realization of HTTPS Traffic Analysis
- Selling Off Privacy at Auction
- The Sniper Attack - Anonymously Deanonymizing and Disabling the Tor Network
- The Velocity of Censorship - High-Fidelity Detection of Microblog Post Deletions - slides
- The Velocity of Censorship - High-Fidelity Detection of Microblog Post Deletions
- Tor vs. NSA
Data Mining
- An Exploration of Geolocation and Traffic Visualization Using Network Flows to Aid in Cyber Defense
- DSpin - Detecting Automatically Spun Content on the Web
- Gyrus - A Framework for User-Intent Monitoring of Text-Based Networked Applications
- Indexing Million of Packets per Second using GPUs
- Multi-Label Learning with Millions of Labels - Recommending Advertiser Bid Phrases for Web Pages
- Real-Time Handling of Network Monitoring Data Using a Data-Intensive Framework
- Shingled Graph Disassembly - Finding the Undecideable Path
- Synoptic Graphlet - Bridging the Gap between Supervised and Unsupervised Profiling of Host-level Network Traffic
Cyber Crime
- Connected Colors - Unveiling the Structure of Criminal Networks
- Image Matching for Branding Phishing Kit Images - slides
- Image Matching for Branding Phishing Kit Images
- Inside a Targeted Point-of-Sale Data Breach
- Investigating Advanced Persistent Threat 1 (APT1)
- Measuring pay-per-install - the Commoditization of Malware Distribution
- Scambaiter - Understanding Targeted Nigerian Scams on Craigslist
- Sherlock Holmes and the Case of the Advanced Persistent Threat
- The Role of the Underground Market in Twitter Spam and Abuse
- The Tangled Web of Password Reuse
- Trafficking Fraudulent Accounts - The Role of the Underground Market in Twitter Spam and Abuse
CND/CNA/CNE/CNO
- Amplification Hell - Revisiting Network Protocols for DDoS Abuse
- Defending The Enterprise, the Russian Way
- Protecting a Moving Target - Addressing Web Application Concept Drift
- Timing of Cyber Conflict
–Jason
@jason_trost