Welcome!

My name is Mike Wilkins, and I research optimizations for AI workloads on high-performance computing systems. As a Maria Goeppert Mayer Fellow at Argonne National Laboratory, I’m currently leading the development of a new holistic online autotuner. I previously completed my Ph.D. in Computer Engineering at Northwestern University and have industry experience at Cornelis Networks and Meta. I am open to collaboration opportunities, please feel free to reach out with ideas or questions!

Experiences

Maria Goeppert Mayer Fellow

Oct 2024 - Present
Argonne National Laboratory
  • Directed an independent research program on autotuning and collective communication, supported by a 3-year, $1M award from Argonne
  • Translated my MPI autotuning research into production, achieving speedups up to 35x for collective operations on Argonne’s exascale system, Aurora
  • Contributed major enhancements to MPICH, the leading open-source MPI implementation, with a focus on optimizing collective communication for high-performance computing environments

Software Engineer

Jan-Sep 2024
Cornelis Networks
  • Spearheaded major performance optimizations for the OPX libfabric provider, achieving 5× bandwidth improvements for GPU communications and other critical improvements
  • Led the architecture and development of the reference libfabric provider for the Ultra Ethernet Consortium, achieving a key milestone in the standard’s development
  • Created OPX developer tools, including a profiler and autotuner, boosting team velocity

AI Research Intern

Summer 2023
Meta
  • Designed and implemented an application-aware communication (NCCL) autotuner for large-scale AI workloads
  • Developed an AI application emulation tool that mimics production models by overlapping communication and genericized compute kernels

Research Aide/Visiting Student

2020 - 2023
Argonne National Laboratory
  • Founded the MPI collective algorithm/machine learning project, initially under the supervision of Dr. Min Si and Dr. Pavan Balaji, later Dr. Yanfei Guo and Dr. Rajeev Thakur
  • Earned perpetual external funding from ANL for the remainder of my Ph.D

Undergraduate Internships

Power Solutions International (2016), Flexware Innovation (2017), National Instruments (2018)

Sample Research Projects

Here is a high-level description of some of my active and former research projects.

ML Autotuning for MPI

Ongoing
  • Invented many optimizations to make ML-based MPI autotuning feasible on large-scale systems
  • Developed the world’s first exascale-capable MPI collective algorithm autotuner and achieved up to 20% speedups for production applications
  • Exploring new “holistic” tuning methodologies to encompass performance-critical parameters across the software stack, targeting large scale AI workloads

Algorithms for Collective Communication

Ongoing
  • Created new generalized MPI collective algorithms that expose a tunable radix and outperform the previous best algorithms by up to 4.5x
  • Exploring new generalized algorithms for GPU-specific collective communication (e.g., NCCL) and new abstractions (e.g., circulant graphs)

High-Level Parallel Languages for HPC

2019-2023
  • Developed a new hardware/software co-design for the Standard ML language targeted at HPC systems and applications, including AI
  • Created a new version of the NAS benchmark suite using MPL (a parallel compiler for Standard ML) to enable direct comparison between HLPLs and lower-level languages for HPC

Cache Coherence for High-Level Parallel Languages

2019-2022
  • Identified a low-level memory property called WARD in high-level parallel programs
  • Implemented a custom cache coherence protocol in the Sniper architectural simulator and found an average speedup of 1.46x across the PBBS benchmark suite.

Publications

  • Generalized Collective Algorithms for the Exascale Era
  • Michael Wilkins, Hanming Wang, Peizhi Liu, Bangyen Pham, Yanfei Guo, Rajeev Thakur, Nikos Hardavellas, and Peter Dinda
    CLUSTER'23
  • Evaluating Functional Memory-Managed Parallel Languages for HPC using the NAS Parallel Benchmarks
  • Michael Wilkins, Garrett Weil, Luke Arnold, Nikos Hardavellas, Peter Dinda
    HIPS'23 Workshop
  • WARDen: Specializing Cache Coherence for High-Level Parallel Languages
  • Michael Wilkins, Sam Westrick, Vijay Kandiah, Alex Bernat, Brian Suchy, Enrico Armenio Deiana, Simone Campanoni, Umut Acar, Peter Dinda, Nikos Hardavellas
    CGO'23
  • Program State Element Characterization
  • Enrico Deiana, Brian Suchy, Michael Wilkins, Brian Homerding, Tommy McMichen, Katarzyna Dunajewski, Nikos Hardavellas, Peter Dinda, Simone Campanoni
    CGO'23
  • ACCLAiM: Advancing the Practicality of MPI Collective Communication Autotuning Using Machine Learning
  • Michael Wilkins, Yanfei Guo, Rajeev Thakur, Peter Dinda, Nikos Hardavellas
    CLUSTER'22
  • A FACT-Based Approach: Making Machine Learning Collective Autotuning Feasible on Exascale Systems
  • Michael Wilkins, Yanfei Guo, Rajeev Thakur, Nikos Hardavellas, Peter Dinda, Min Si
    ExaMPI'21 Workshop

    Skills

    Software/Scripting Languages

    C, C++, Python, Standard/Parallel ML, C#, LabVIEW, Java, SQL, Bash

    Parallel Programming/Communication

    MPI, Libfabric, NCCL, CUDA, PyTorch, Parallel ML

    Simulators/Tools

    Sniper, gem5, ZSim, Xilinx Vivado, Xilinx ISE, Quartus II

    Hardware Description Languages

    Chisel, VHDL, Verilog, SPICE