Mike Wilkins

HPC/AI Researcher

Education

Ph.D. Computer Engineering

Northwestern University

2023

M.S. Computer Engineering

Northwestern University

2021

B.S. Computer Engineering

Rose-Hulman Institute of Technology

2019

Welcome!

My name is Mike Wilkins, and I research optimizations for AI workloads on high-performance computing systems. As a Maria Goeppert Mayer Fellow at Argonne National Laboratory, I’m currently leading the development of a new holistic online autotuner. I previously completed my Ph.D. in Computer Engineering at Northwestern University and have industry experience at Cornelis Networks and Meta. I am open to collaboration opportunities, please feel free to reach out with ideas or questions!

Experiences

Maria Goeppert Mayer Fellow

Oct 2024 - Present

Argonne National Laboratory

Directed an independent research program on autotuning and collective communication, supported by a 3-year, $1M award from Argonne
Translated my MPI autotuning research into production, achieving speedups up to 35x for collective operations on Argonne’s exascale system, Aurora
Contributed major enhancements to MPICH, the leading open-source MPI implementation, with a focus on optimizing collective communication for high-performance computing environments

Software Engineer

Jan-Sep 2024

Cornelis Networks

Spearheaded major performance optimizations for the OPX libfabric provider, achieving 5× bandwidth improvements for GPU communications and other critical improvements
Led the architecture and development of the reference libfabric provider for the Ultra Ethernet Consortium, achieving a key milestone in the standard’s development
Created OPX developer tools, including a profiler and autotuner, boosting team velocity

AI Research Intern

Summer 2023

Research Aide/Visiting Student

2020 - 2023

Argonne National Laboratory

Founded the MPI collective algorithm/machine learning project, initially under the supervision of Dr. Min Si and Dr. Pavan Balaji, later Dr. Yanfei Guo and Dr. Rajeev Thakur
Earned perpetual external funding from ANL for the remainder of my Ph.D

Undergraduate Internships

Power Solutions International (2016), Flexware Innovation (2017), National Instruments (2018)

Sample Research Projects

Here is a high-level description of some of my active and former research projects.

ML Autotuning for MPI

Ongoing

Invented many optimizations to make ML-based MPI autotuning feasible on large-scale systems
Developed the world’s first exascale-capable MPI collective algorithm autotuner and achieved up to 20% speedups for production applications
Exploring new “holistic” tuning methodologies to encompass performance-critical parameters across the software stack, targeting large scale AI workloads

Algorithms for Collective Communication

Ongoing

Created new generalized MPI collective algorithms that expose a tunable radix and outperform the previous best algorithms by up to 4.5x
Exploring new generalized algorithms for GPU-specific collective communication (e.g., NCCL) and new abstractions (e.g., circulant graphs)

High-Level Parallel Languages for HPC

2019-2023

Developed a new hardware/software co-design for the Standard ML language targeted at HPC systems and applications, including AI
Created a new version of the NAS benchmark suite using MPL (a parallel compiler for Standard ML) to enable direct comparison between HLPLs and lower-level languages for HPC

Cache Coherence for High-Level Parallel Languages

2019-2022

Identified a low-level memory property called WARD in high-level parallel programs
Implemented a custom cache coherence protocol in the Sniper architectural simulator and found an average speedup of 1.46x across the PBBS benchmark suite.

Publications

On Transparent Optimizations for Communication in Highly Parallel Systems

Michael Wilkins

Ph.D. Thesis

Generalized Collective Algorithms for the Exascale Era

Michael Wilkins, Hanming Wang, Peizhi Liu, Bangyen Pham, Yanfei Guo, Rajeev Thakur, Nikos Hardavellas, and Peter Dinda

CLUSTER'23

Evaluating Functional Memory-Managed Parallel Languages for HPC using the NAS Parallel Benchmarks

Michael Wilkins, Garrett Weil, Luke Arnold, Nikos Hardavellas, Peter Dinda

HIPS'23 Workshop

WARDen: Specializing Cache Coherence for High-Level Parallel Languages

Michael Wilkins, Sam Westrick, Vijay Kandiah, Alex Bernat, Brian Suchy, Enrico Armenio Deiana, Simone Campanoni, Umut Acar, Peter Dinda, Nikos Hardavellas

CGO'23

Program State Element Characterization

Enrico Deiana, Brian Suchy, Michael Wilkins, Brian Homerding, Tommy McMichen, Katarzyna Dunajewski, Nikos Hardavellas, Peter Dinda, Simone Campanoni

CGO'23

ACCLAiM: Advancing the Practicality of MPI Collective Communication Autotuning Using Machine Learning

Michael Wilkins, Yanfei Guo, Rajeev Thakur, Peter Dinda, Nikos Hardavellas

CLUSTER'22

A FACT-Based Approach: Making Machine Learning Collective Autotuning Feasible on Exascale Systems

Michael Wilkins, Yanfei Guo, Rajeev Thakur, Nikos Hardavellas, Peter Dinda, Min Si

ExaMPI'21 Workshop

Skills

Software/Scripting Languages

C, C++, Python, Standard/Parallel ML, C#, LabVIEW, Java, SQL, Bash

Parallel Programming/Communication

MPI, Libfabric, NCCL, CUDA, PyTorch, Parallel ML

Simulators/Tools

Sniper, gem5, ZSim, Xilinx Vivado, Xilinx ISE, Quartus II

Hardware Description Languages

Chisel, VHDL, Verilog, SPICE