Faculty Focus: Kexin Pei

January 22, 2025

Kexin Pei

Kexin Pei is a Neubauer Family Assistant Professor of Computer Science. His research interest lies at the intersection of security, software engineering, and machine learning. He is interested in developing data-driven program analysis approaches to improve the security and reliability of software systems. Specifically, he develops machine learning models that can reason about program structure and behavior to precisely and efficiently analyze, detect, and fix software vulnerabilities. His research has received the Best Paper Award in SOSP, a Distinguished Artifact Award, been featured in CACM Research Highlight, and won CSAW Applied Research Competition Runner-Up. He works with the Learning for Code team at Google DeepMind, building program analysis tools based on large language models.

What overarching question are you trying to answer with your research? 

My research focuses on building safe machine learning for safe software. The research questions involve building intelligent ML models that can reliably and precisely reason about symbolic data, i.e., computer code, such that its prediction should be interpretable and have formal guarantees. I also explore how the improved ML approaches can automate existing software security applications and enable new capabilities.

What are you working on right now? 

Machine learning for code, with applications in software security (e.g., vulnerability detection and repair) and reliability (finding correctness and performance bugs).

Can you share an example of how interdisciplinary collaboration has enhanced your research and led to unexpected or exciting findings? 

My research spans machine learning and program analysis, two seemingly interdisciplinary areas in CS. While this direction has been quite popular these days, such as large language models for code like Github Copilot, it was an unusual interdisciplinary research bridging two communities with disparate philosophies—connectionism (neural network) vs. symbolism (symbolic methods), for instance.

Alpha the Alaskan malamute
Alpha, the Alaskan malamute

By collaborating with experts from both domains, I am amazed by how many new capabilities and impactful applications can be enabled by combining the two complementary directions. For example, by pre-training large language models (ML models) on computer code and the execution traces (symbolic data), I built a binary code similarity tool to detect vulnerabilities, outperforming the state-of-the-art approaches by orders of magnitude in both accuracy and efficiency (impactful results).

How do you spend your time outside of work? 

I play basketball and walk my dog, a 100lb Alaskan malamute named Alpha. 

Related News

Faculty, PSD Spotlights, Newsclips