tinygrad/docs/developer/developer.md

The tinygrad framework has four pieces

* a PyTorch like <b>frontend</b>.
* a <b>scheduler</b> which breaks the compute into kernels.
* a <b>lowering</b> engine which converts ASTs into code that can run on the accelerator.
* an <b>execution</b> engine which can run that code.

There is a good [bunch of tutorials](https://mesozoic-egg.github.io/tinygrad-notes/) by Di Zhu that go over tinygrad internals.

## Frontend

Everything in [Tensor](../tensor/index.md) is syntactic sugar around [function.py](function.md), where the forwards and backwards passes are implemented for the different functions. There's about 25 of them, implemented using about 20 basic ops. Those basic ops go on to construct a graph of:

::: tinygrad.lazy.LazyBuffer
    options:
        show_source: false

The `LazyBuffer` graph specifies the compute in terms of low level tinygrad ops. Not all LazyBuffers will actually become realized. There's two types of LazyBuffers, base and view. base contains compute into a contiguous buffer, and view is a view (specified by a ShapeTracker). Inputs to a base can be either base or view, inputs to a view can only be a single base.

## Scheduling

The [scheduler](https://github.com/tinygrad/tinygrad/tree/master/tinygrad/engine/schedule.py) converts the graph of LazyBuffers into a list of `ScheduleItem`. One `ScheduleItem` is one kernel on the GPU, and the scheduler is responsible for breaking the large compute graph into subgraphs that can fit in a kernel. `ast` specifies what compute to run, and `bufs` specifies what buffers to run it on.

::: tinygrad.engine.schedule.ScheduleItem

## Lowering

The code in [realize](https://github.com/tinygrad/tinygrad/tree/master/tinygrad/engine/realize.py) lowers `ScheduleItem` to `ExecItem` with

::: tinygrad.engine.realize.lower_schedule

There's a ton of complexity hidden behind this, see the `codegen/` directory.

First we lower the AST to UOps, which is a linear list of the compute to be run. This is where the BEAM search happens.

Then we render the UOps into code with a `Renderer`, then we compile the code to binary with a `Compiler`.

## Execution

Creating `ExecItem`, which has a run method

::: tinygrad.engine.realize.ExecItem
    options:
        members: true

Lists of `ExecItem` can be condensed into a single ExecItem with the Graph API (rename to Queue?)

## Runtime

Runtimes are responsible for device-specific interactions. They handle tasks such as initializing devices, allocating memory, loading/launching programs, and more. You can find more information about the runtimes API on the [runtime overview page](runtime.md).

All runtime implementations can be found in the [runtime directory](https://github.com/tinygrad/tinygrad/tree/master/tinygrad/runtime).

### HCQ Compatible Runtimes

HCQ API is a lower-level API for defining runtimes. Interaction with HCQ-compatible devices occurs at a lower level, with commands issued directly to hardware queues. Some examples of such backends are [NV](https://github.com/tinygrad/tinygrad/tree/master/tinygrad/runtime/ops_nv.py) and [AMD](https://github.com/tinygrad/tinygrad/tree/master/tinygrad/runtime/ops_amd.py), which are userspace drivers for NVIDIA and AMD devices respectively. You can find more information about the API on [HCQ overview page](hcq.md)
more docs (#4271) * more work on docs * CompilerOptions is dataclass 2024-04-24 10:52:42 +08:00			`The tinygrad framework has four pieces`

			`* a PyTorch like <b>frontend</b>.`
			`* a <b>scheduler</b> which breaks the compute into kernels.`
			`* a <b>lowering</b> engine which converts ASTs into code that can run on the accelerator.`
			`* an <b>execution</b> engine which can run that code.`

split tensor docs (#4754) 2024-05-29 02:03:52 +08:00			`There is a good [bunch of tutorials](https://mesozoic-egg.github.io/tinygrad-notes/) by Di Zhu that go over tinygrad internals.`

New docs are in mkdocs (#4178) * start mkdocs * simple docs for tensor * more docs * move those back * more docs * copy markdown extensions * docs legacy * docs building workflow * fix showcase links * only that? * install tinygrad * add docs to setup.py * Delete examples/llm.c/data 2024-04-16 14:59:51 +08:00			`## Frontend`

docs: user runtime docs (#5756) 2024-07-28 04:21:54 +08:00			`Everything in [Tensor](../tensor/index.md) is syntactic sugar around [function.py](function.md), where the forwards and backwards passes are implemented for the different functions. There's about 25 of them, implemented using about 20 basic ops. Those basic ops go on to construct a graph of:`
New docs are in mkdocs (#4178) * start mkdocs * simple docs for tensor * more docs * move those back * more docs * copy markdown extensions * docs legacy * docs building workflow * fix showcase links * only that? * install tinygrad * add docs to setup.py * Delete examples/llm.c/data 2024-04-16 14:59:51 +08:00
			`::: tinygrad.lazy.LazyBuffer`
			`options:`
			`show_source: false`
update docs, remove corealize (#4264) * update docs, remove corealize * handle 0 line count * tensor schedule 2024-04-23 16:05:29 +08:00
more docs (#4271) * more work on docs * CompilerOptions is dataclass 2024-04-24 10:52:42 +08:00			The `LazyBuffer` graph specifies the compute in terms of low level tinygrad ops. Not all LazyBuffers will actually become realized. There's two types of LazyBuffers, base and view. base contains compute into a contiguous buffer, and view is a view (specified by a ShapeTracker). Inputs to a base can be either base or view, inputs to a view can only be a single base.
update docs, remove corealize (#4264) * update docs, remove corealize * handle 0 line count * tensor schedule 2024-04-23 16:05:29 +08:00
more docs (#4271) * more work on docs * CompilerOptions is dataclass 2024-04-24 10:52:42 +08:00			`## Scheduling`

Update links in new docs (#4363) tensor and nn links to tensor.md and nn.md 2024-05-08 21:13:00 +08:00			The [scheduler](https://github.com/tinygrad/tinygrad/tree/master/tinygrad/engine/schedule.py) converts the graph of LazyBuffers into a list of `ScheduleItem`. One `ScheduleItem` is one kernel on the GPU, and the scheduler is responsible for breaking the large compute graph into subgraphs that can fit in a kernel. `ast` specifies what compute to run, and `bufs` specifies what buffers to run it on.
update docs, remove corealize (#4264) * update docs, remove corealize * handle 0 line count * tensor schedule 2024-04-23 16:05:29 +08:00
move scheduleitem to schedule.py (#4541) * move scheduleitem to schedule.py * don't need that type checking anymore 2024-05-12 12:13:04 +08:00			`::: tinygrad.engine.schedule.ScheduleItem`
update docs, remove corealize (#4264) * update docs, remove corealize * handle 0 line count * tensor schedule 2024-04-23 16:05:29 +08:00
more docs (#4271) * more work on docs * CompilerOptions is dataclass 2024-04-24 10:52:42 +08:00			`## Lowering`

Update links in new docs (#4363) tensor and nn links to tensor.md and nn.md 2024-05-08 21:13:00 +08:00			The code in [realize](https://github.com/tinygrad/tinygrad/tree/master/tinygrad/engine/realize.py) lowers `ScheduleItem` to `ExecItem` with
update docs, remove corealize (#4264) * update docs, remove corealize * handle 0 line count * tensor schedule 2024-04-23 16:05:29 +08:00
			`::: tinygrad.engine.realize.lower_schedule`

more docs (#4271) * more work on docs * CompilerOptions is dataclass 2024-04-24 10:52:42 +08:00			There's a ton of complexity hidden behind this, see the `codegen/` directory.

hotfix: fix docs 2024-05-11 12:51:35 +08:00			`First we lower the AST to UOps, which is a linear list of the compute to be run. This is where the BEAM search happens.`
more docs (#4271) * more work on docs * CompilerOptions is dataclass 2024-04-24 10:52:42 +08:00
hotfix: fix docs 2024-05-11 12:51:35 +08:00			Then we render the UOps into code with a `Renderer`, then we compile the code to binary with a `Compiler`.
more docs (#4271) * more work on docs * CompilerOptions is dataclass 2024-04-24 10:52:42 +08:00
update docs, remove corealize (#4264) * update docs, remove corealize * handle 0 line count * tensor schedule 2024-04-23 16:05:29 +08:00			`## Execution`

			Creating `ExecItem`, which has a run method

			`::: tinygrad.engine.realize.ExecItem`
			`options:`
			`members: true`

split tensor docs (#4754) 2024-05-29 02:03:52 +08:00			Lists of `ExecItem` can be condensed into a single ExecItem with the Graph API (rename to Queue?)
start hcq docs (#5411) * start hcq docs * more hcq docs * docs * docs * linter * correct args * linter * ts returns int 2024-07-16 02:31:11 +08:00
			`## Runtime`

docs: user runtime docs (#5756) 2024-07-28 04:21:54 +08:00			`Runtimes are responsible for device-specific interactions. They handle tasks such as initializing devices, allocating memory, loading/launching programs, and more. You can find more information about the runtimes API on the [runtime overview page](runtime.md).`
start hcq docs (#5411) * start hcq docs * more hcq docs * docs * docs * linter * correct args * linter * ts returns int 2024-07-16 02:31:11 +08:00
			`All runtime implementations can be found in the [runtime directory](https://github.com/tinygrad/tinygrad/tree/master/tinygrad/runtime).`

			`### HCQ Compatible Runtimes`

docs: user runtime docs (#5756) 2024-07-28 04:21:54 +08:00			HCQ API is a lower-level API for defining runtimes. Interaction with HCQ-compatible devices occurs at a lower level, with commands issued directly to hardware queues. Some examples of such backends are [NV](https://github.com/tinygrad/tinygrad/tree/master/tinygrad/runtime/ops_nv.py) and [AMD](https://github.com/tinygrad/tinygrad/tree/master/tinygrad/runtime/ops_amd.py), which are userspace drivers for NVIDIA and AMD devices respectively. You can find more information about the API on [HCQ overview page](hcq.md)