.. |
.gitignore
|
…
|
|
amx.py
|
update amx gemm (#3991)
|
2024-03-29 11:45:03 -04:00 |
cuda_matmul.py
|
fix 'Import Error: cannot import name compile_cuda from tinygrad.runtime.ops_cuda' error in extra/gemm/cuda_matmul.py (#3531)
|
2024-02-28 17:15:32 -08:00 |
fuzz_matmul.py
|
wmma: widen TC usage in search by using PADTO on TC axes when possible (#4216)
|
2024-04-22 16:50:31 -04:00 |
gemm.c
|
only 62 gflops (#2629)
|
2023-12-05 13:28:24 -08:00 |
gemm.py
|
only 62 gflops (#2629)
|
2023-12-05 13:28:24 -08:00 |
hip_matmul.py
|
retire hsa (#4885)
|
2024-06-09 11:33:03 +03:00 |
intel_xmx.py
|
Intel XMX Tensor Core Support (#5622)
|
2024-08-16 09:19:21 -07:00 |
jax_pmatmul.py
|
jax parallel matmul example
|
2023-11-28 13:48:11 -08:00 |
metal_conv.py
|
create engine folder and move code (#3948)
|
2024-03-26 20:38:03 -07:00 |
metal_matmul.py
|
create engine folder and move code (#3948)
|
2024-03-26 20:38:03 -07:00 |
metal_matvec.py
|
move GlobalCounter to helpers (#4002)
|
2024-03-30 00:30:30 -04:00 |
mlx_matmul.py
|
mlx benchmark, a lil slower than tg
|
2023-12-05 19:00:43 -08:00 |
real_pmatmul.py
|
pmatmul example + GB/s bugfix [run_process_replay] (#5974)
|
2024-08-07 22:32:11 -07:00 |
simple_conv.py
|
wmma: refactor to remove wmma_func and create TC funcs as needed (#3945)
|
2024-03-27 16:43:09 -04:00 |
simple_matmul.py
|
test: add fuzz_matmul and better debugging for simple_matmul (#4199)
|
2024-04-16 23:40:31 -04:00 |
simple_matvec.py
|
extra/gemm/simple_matvec: add simple_matvec.py (#4021)
|
2024-03-31 16:38:52 -04:00 |
tf_gemm.py
|
Add tensorflow GEMM benchmark script (#1000)
|
2023-06-18 10:57:45 -07:00 |
tinygrad_nv_matmul.py
|
work to make GEMV fast (#5824)
|
2024-07-30 17:41:40 -07:00 |
torch_gemm.py
|
faster RDNA assembly backend (#990)
|
2023-06-16 12:06:38 -07:00 |
triton_nv_matmul.py
|
extra/gemm/triton_nv_matmul: fix Program arguments (#6212)
|
2024-08-20 14:05:38 -07:00 |
tvm_gemm.py
|
lowerer is kernel [run_process_replay] (#5437)
|
2024-07-12 18:50:55 -07:00 |