Evolution of GPU-based Computation

[Aug 2025]

2000s

Beginning with CUDA’s introduction (2006) and the first Tesla compute boards (2007–2008), researchers gained programmable access to commodity GPUs and BLAS libraries such as cuBLAS. By 2009, practical toolchains (e.g., CUDAMat) and wider cross-vendor support via OpenCL made GPU acceleration accessible beyond graphics labs. This enabled order-of-magnitude speedups for core linear-algebra workloads used in neural language models—embedding lookups, dense/sparse matrix multiplies, and backpropagation—reducing training times from days to hours and allowing larger vocabularies, wider layers, and deeper stacks to be explored. Raina et al. (ICML 2009) quantified these gains for deep architectures, catalyzing adoption across NLP groups that had been constrained by CPU throughput. While many 2000s systems (including Collobert & Weston, 2008) were still CPU-bound, the 2009 inflection established the computational path that immediately benefited early 2010s neural LMs (e.g., RNN-LMs) and set the stage for the large-scale transformer era by normalizing GPU-centric experimentation and scaling.