Explore Insights by Sudipta Sarkar
FlashAttention represents a groundbreaking advancement in optimizing the attention mechanism within Transformer models by addressing inherent memory-bound bottlenecks through I/O-aware design principles. This survey traces the evolution from the original FlashAttention algorithm—which introduced tiling and recomputation to enable exact, memory-efficient attention—to subsequent iterations that leverage advanced GPU architectures for unprecedented performance gains. We examine the core innovations, performance benchmarks, and broader impacts on model training and deployment, including significant speedups in training foundational models such as LLaMA and Falcon. By minimizing high-bandwidth memory (HBM) accesses and maximizing hardware utilization, FlashAttention enables scaling to longer context lengths while reducing computational costs, profoundly influencing the development landscape of large language models.
Read the Full ArticleThis paper provides a comprehensive and intuitive introduction to fundamental linear algebra concepts, structured to facilitate understanding for students and researchers. Con- cepts are presented in a logical progression, from vectors to eigenvalues, each with precise definitions, intuitive real-life analogies, and detailed mathematical and geometric examples. Visualizations in 2D and 3D coordinate systems, rendered using tikz, enhance geometric intuition. The paper serves as a resource for academic study and applications in fields like machine learning, physics, and engineering.
Read the Full ArticleThis report presents a comprehensive graduate-level survey of complexity theory, a foundational branch of theoretical computer science concerned with the computational resources required to solve problems. The discussion begins with computability theory, covering fundamental notions such as decidability, the Chomsky hierarchy, and variants of Turing machines (TM, DTM, NTM, HTM). We then introduce formal measures of complexity and examine central complexity classes (L, NL, P, NP, co-NP, NP-Hard, NP- Complete, PSPACE, EXP, and beyond), providing illustrative examples and emphasizing their theoretical and practical importance. The report also catalogs major theorems (e.g., Cook–Levin, Savitch, PCP), foundational axioms and theses (e.g., Church–Turing Thesis), central hypotheses (e.g., P ≠ NP, ETH), and canonical decision problems (e.g., the Halting Problem, Circuit-SAT, TQBF). Where appropriate, proof sketches and visual aids are included to clarify concepts, and the interrelationships among classes are summarized. Designed for both seminar presentations and academic study, this document aims to deliver a rigorous yet accessible account of complexity theory as of August 2025.
Read the Full ArticleThis paper provides a comprehensive introduction to proof methodologies in discrete mathematics and computer science, emphasizing graph theory and algorithms. We define proofs, categorize their types, and highlight their importance in validating theoretical and applied results. Disproof strategies, particularly counterexamples, are also explored. Each proof method is illustrated with three detailed examples, primarily in graph theory and algorithms, with step-by-step explanations to ensure clarity. This work serves as an accessible resource for students and researchers.
Read the Full ArticleBackpropagation is the cornerstone of neural network training, enabling weight opti- mization to minimize prediction errors. This paper provides a rigorous yet intuitive intro- duction to backpropagation, covering weighted sums, activation functions, loss and cost functions, and gradient descent. Six numerical examples, each fully solved with forward and backward pass calculations and neural network diagrams, demonstrate its application across diverse architectures, activation functions, and loss functions. Visualizations of gradient descent and network structures, combined with real-world analogies, make the concepts accessible to researchers and practitioners.
Read the Full Article