My name is Mike Wilkins, and I research high-performance computing systems, specifically optimizing for scientific and AI workloads. This year, I expect to complete my Ph.D. in Computer Engineering at Northwestern University under the advisement of Dr. Peter Dinda and Dr. Nikos Hardavellas. I am also a visiting student at Argonne National Laboratory co-advised by Yanfei Guo and Rajeev Thakur. Below you will find details regarding my experiences and current/past projects.
- Member of the AI and Systems Co-Design research team
- Founded the MPI collective algorithm/machine learning project, initially under the supervision of Dr. Min Si and Dr. Pavan Balaji, now Dr. Yanfei Guo and Dr. Rajeev Thakur
- Earned perpetual external funding from ANL for the remainder of my Ph.D
- Engaged with technical leaders through field presentations to multiple companies in the Seattle area
- Assisted customers to design and troubleshoot data-acquisition applications using NI platforms
- Designed an innovative RFID tracking solution to repair a malfunctioning inventory locating system
- Produced a full-stack BI database solution analyzing internal employee and revenue data
- Organized and managed the company’s inventory of CNC machining tools, valued at more than $500,000
- Trained company technicians on new processes and managed tool services employees
Here is a high-level description of my active and former research projects.
- Creating new generalized MPI collective algorithms and a machine-learning autotuner that automatically selects and optimizes the best algorithm
- Invented multiple optimizations to make ML-based MPI autotuning feasible on large-scale systems
- Developing a new hardware/software co-design for the Standard ML language targeted at HPC systems and applications, including AI
- Created a new version of the NAS benchmark suite using MPL (a parallel compiler for Standard ML) to enable direct comparison between HLPLs and lower-level languages for HPC
- Identified a low-level memory property called WARD that can be introduced by construction in high-level parallel programs
- Implemented a custom cache coherence protocol in the Sniper architectural simulator and found an average speedup of 1.46x across the PBBS benchmark suite.
- Implemented source code-level automatic parallelization tool using compiler and runtime techniques
- Built a pintool using the Intel pin interface to report memory locations allocated and freed within statically compiled libraries