Portfolio

Post Moore’s Law: Specialized Hardware Design for Scientific Computing

As Moore’s Law continues to plateau, traditional performance scaling based on transistor density has become insufficient for many scientific computing workloads, which are increasingly constrained by data movement, memory bandwidth, and latency rather than raw arithmetic throughput. In response, the computing ecosystem has shifted toward specialized hardware acceleration, where domain-specific architectures are co-designed with algorithms to overcome the memory wall and latency wall that limit conventional CPU- and GPU-based systems. This approach has demonstrated significant benefits across scientific computing domains, including linear algebra, signal processing, and simulation-driven workloads, by exploiting fine-grained parallelism, customized dataflows, and application-aware memory hierarchies. In the post–Moore’s Law era, specialized hardware acceleration is therefore a fundamental enabler for sustained performance and energy efficiency, allowing scientific applications to scale beyond the limits of general-purpose architectures through tight algorithm–hardware co-design.

A Quantum-Classical Co-Design Framework for Scalable Quantum Circuit Simulation and Acceleration

As Moore’s Law continues to plateau, the industry has increasingly turned to specialized hardware design solutions to sustain performance scaling, particularly for artificial intelligence and high-performance computing applications. Representative examples of this paradigm shift include Google’s Tensor Processing Units (TPUs); Amazon AWS’s Graviton, Trainium, and Inferentia processors; Apple’s A- and M-series system-on-chips (SoCs); Meta’s Training and Inference Accelerator (MTIA); Microsoft’s Maia and Cobalt chips; Tesla’s Dojo accelerators; Broadcom’s custom AI application-specific integrated circuits (ASICs); and large-scale FPGA-based systems. Building on this trend, this project introduces a transformative quantum–classical co-design framework that tightly integrates circuit-cutting techniques with specialized multi-FPGA architectures to enable scalable and accelerated simulation.