Welcome!
My name is Mike Wilkins, and I research high-performance computing systems, specifically optimizing for scientific and AI workloads. I am currently a Maria Goeppert Mayer Fellow at Argonne National Laboratory, supervised by Dr. Yanfei Guo and Rajeev Thakur. I completed my Ph.D. in Computer Engineering at Northwestern University under the advisement of Dr. Peter Dinda and Dr. Nikos Hardavellas. Below you will find details regarding my experiences and current/past projects.
Experiences
- Leading my own research project at the intersection between HPC and AI, excited to share more soon!
- Optimized the OPX libfabric provider, achieved a 5x bandwidth improvement for GPU communication among other advancements
- Led the development of the reference libfabric provider for the Ultra Ethernet Consortium
- Created developer productivity tooling, including an OPX performance profiler and a runtime parameter autotuner
- Designed and implemented an application-aware communication (NCCL) autotuner for large-scale AI workloads
- Developed an AI application emulation tool that mimics production models by overlapping communication and genericized compute kernels
- Founded the MPI collective algorithm/machine learning project, initially under the supervision of Dr. Min Si and Dr. Pavan Balaji, later Dr. Yanfei Guo and Dr. Rajeev Thakur
- Earned perpetual external funding from ANL for the remainder of my Ph.D
- Engaged with technical leaders through field presentations to multiple companies in the Seattle area
- Assisted customers to design and troubleshoot data-acquisition applications using NI platforms
- Designed an innovative RFID tracking solution to repair a malfunctioning inventory locating system
- Produced a full-stack BI database solution analyzing internal employee and revenue data
- Organized and managed the company’s inventory of CNC machining tools, valued at more than $500,000
- Trained company technicians on new processes and managed tool services employees
Research Projects
Here is a high-level description of my active and former research projects.
- Creating new generalized MPI collective algorithms and a machine-learning autotuner that automatically selects and optimizes the best algorithm
- Invented multiple optimizations to make ML-based MPI autotuning feasible on large-scale systems
- Developing a new hardware/software co-design for the Standard ML language targeted at HPC systems and applications, including AI
- Created a new version of the NAS benchmark suite using MPL (a parallel compiler for Standard ML) to enable direct comparison between HLPLs and lower-level languages for HPC
- Identified a low-level memory property called WARD that can be introduced by construction in high-level parallel programs
- Implemented a custom cache coherence protocol in the Sniper architectural simulator and found an average speedup of 1.46x across the PBBS benchmark suite.
- Implemented source code-level automatic parallelization tool using compiler and runtime techniques
- Built a pintool using the Intel pin interface to report memory locations allocated and freed within statically compiled libraries
Publications
Skills
Software/Scripting Languages
C, C++, Python, Standard/Parallel ML, C#, LabVIEW, Java, SQL, Bash
Parallel Programming/Communication
MPI, Libfabric, NCCL, CUDA, Parallel ML, PyTorch
Simulators/Tools
ZSim, gem5, Xilinx Vivado, Xilinx ISE, Quartus II
Hardware Description Languages
Chisel, VHDL, Verilog, SPICE