mirror of https://github.com/commaai/tinygrad.git
cd97b036cc
* triton can add * print stuff from triton * write out file * ops triton working * reduce ops * sort of works * Triton bugfixes & implementation of remaining ops (#490) * padding * support pow, max, relu, gt0 * allocate return buffer * Fix reduce * Add tests for power op * Fix triton illegal memory accesses and memory leak (#512) * Fix mypy issue * Add triton to setup.py * Replace torch with pycuda * Use one cuda stream for data transfer and kernels * Remove triton submodule * Fix memory leak by using weakrefs for caching * Fix memory access by adding valid as mask for load * Fix invalid kernel launches by flattening the grid (#515) --------- Co-authored-by: Martin Loretz <20306567+martinloretzzz@users.noreply.github.com> |
||
---|---|---|
.. | ||
test.py |