tinygrad/OVERVIEW.md at bbb0ad48003756111bdb720639db103b92ebfb4a

tinygrad has four pieces

frontend (Tensor -> LazyBuffer)
- See tensor.py, function.py, multi.py, and lazy.py
- The user interacts with the Tensor class
- This outputs LazyBuffers, which form the simple compute graph
scheduler (LazyBuffer -> ScheduleItem)
- See engine/schedule.py
- When a Tensor is realized, the scheduler is run to get its LazyBuffers to be computed
- This takes in LazyBuffers and groups them as appropriate into kernels.
- It returns a list of ScheduleItems + all the Variables used in the graph
lowering (TODO: lots of work to clean this up still)
- See codegen/ (ScheduleItem.ast -> UOps)
  - ScheduleItems have an ast that's compiled into actual GPU code
  - Many optimization choices can be made here, this contains a beam search.
- renderer/compiler (UOps -> machine code)
  - UOps are tinygrad's IR, similar to LLVM IR
  - Here we either convert them to a high level language or machine code directly
- engine/realize.py (ScheduleItem -> ExecItem)
runtime
- See runtime/
- Runtime actually interacts with the GPUs
- It manages Buffers, Programs, and Queues
- Sadly, METAL and GPU (OpenCL) don't have a compiler that can be pulled out from the device itself