Francis Lam
bbb0ad4800
wmma: widen TC usage in search by using PADTO on TC axes when possible ( #4216 )
...
* wmma: widen TC usage in search by using PADTO on TC axes when possible
* test: start tests for the new padding TC behavior
* search: upgrade padded TC search to TC_OPT >= 2
* test: add behavior and correctness test for padded TC
added optional argument to apply_tensor_core to set TC_OPT level
* linearizer: add tests for the PADTO behvaior and docs
2024-04-22 16:50:31 -04:00
Francis Lam
c91b7b1739
test: add fuzz_matmul and better debugging for simple_matmul ( #4199 )
...
also show unoptimized shape in verify_kernel
2024-04-16 23:40:31 -04:00
Francis Lam
dcb58d3bed
extra/gemm/simple_matvec: add simple_matvec.py ( #4021 )
...
we can test with this or add it to CI for benchmarks
2024-03-31 16:38:52 -04:00
Francis Lam
04746022b1
extra/gemm/hip_matmul: fix to use new HSA devices and no headers ( #3999 )
...
* extra/gemm/hip_matmul: fix to use new HSA devices and no headers
* remove compile_hip import
2024-03-30 15:42:23 -04:00
chenyu
c71627fee6
move GlobalCounter to helpers ( #4002 )
...
break circular import between ops and buffer
2024-03-30 00:30:30 -04:00
Akshit Talwar
0affbbf81c
update amx gemm ( #3991 )
2024-03-29 11:45:03 -04:00
chenyu
b47f6cebb2
LinearizerOptions -> CompilerOptions ( #3978 )
2024-03-28 17:50:23 -04:00
Francis Lam
7c5729a3bd
wmma: refactor to remove wmma_func and create TC funcs as needed ( #3945 )
...
* wmma: refactor to remove wmma_func and create TC funcs as needed
* test_linearizer: disable bf16 CUDA during emulation testing
* cstyle: clean up creation of CUDA vec dtypes
* extra/gemm: add option to accumulate to bfloat16
* cleanups
* benchmark: add CUDA bfloat16 matmul
* more cleanups
2024-03-27 16:43:09 -04:00
George Hotz
68ca4d4276
split to schedule.py ( #3949 )
...
* split to schedule.py
* split
2024-03-26 21:02:46 -07:00
George Hotz
150ea2eb76
create engine folder and move code ( #3948 )
...
* retry
* older tf
* that
2024-03-26 20:38:03 -07:00
George Hotz
778d17fbd3
intel matmul ( #3830 )
...
* almost right
* intel xmx
2024-03-25 22:37:20 -07:00
Francis Lam
a26090d404
search: change to use "spawn" and limit the number of tasks per child ( #3862 )
...
also clean up some examples to use __main__ and not initialize
resources outside of main
2024-03-21 21:23:36 -07:00
Caleb Bunch
0b1fc5888a
fix 'Import Error: cannot import name compile_cuda from tinygrad.runtime.ops_cuda' error in extra/gemm/cuda_matmul.py ( #3531 )
2024-02-28 17:15:32 -08:00
George Hotz
2e60012bcf
move create schedule and delete old API ( #3377 )
...
* move create schedule and delete old API
* fix test multitensor
2024-02-12 18:10:45 +01:00
George Hotz
41efaa848c
move graph.py and jit.py into features ( #3376 )
...
* move graph.py into features
* move jit into features
* fix quickstart
2024-02-12 17:34:34 +01:00
Yoshinori Sano
98c732cf9d
fix metal compile error in extra/gemm ( #3365 )
2024-02-10 12:54:41 +01:00
Francis Lam
4273aabe31
extra/gemm: add a simple_conv.py along with correctness check ( #3236 )
...
* extra/gemm: add a simple_conv.py along with correctness check
The goal is to easily test tensor core triggering situations
* test: add tests for acc_dtype handling and fixed typing
2024-01-26 19:06:57 -08:00
Ahmed Harmouche
168b1f879c
Fix hip_matmul gemm in extra ( #3241 )
2024-01-25 16:03:04 -08:00
Francis Lam
ddbdb52f77
wmma: enable METAL half tensor cores and clean up cstyle ( #3095 )
...
* wmma: enable METAL half tensor cores and clean up cstyle
* revert simple_matmul rand changes and break line in tensor
* added metal fp16->fp32 tensor core
2024-01-12 16:25:28 -05:00
chenyu
1d730b8853
remove ACCUM_FP32 in simple_matmul.py ( #3045 )
...
* remove ACCUM_FP32 in simple_matmul.py
accumate for half inputs is always in float
* move test llama compile speed to metal
2024-01-08 17:37:57 -05:00
George Hotz
a280cfe169
move dtypes to dtype.py ( #2964 )
...
* move dtypes to dtype.py
* fix urllib
2024-01-01 14:58:48 -08:00
George Hotz
c81ce9643d
move globalcounters to ops ( #2960 )
...
* move globalcounters to ops
* missed a few
* sick of that failing
2024-01-01 14:21:02 -08:00
George Hotz
7da2325dc7
get_lazyops() -> lazyops ( #2884 )
...
* get_lazyops() -> lazyops
* don't compare empty mem
2023-12-20 18:04:49 -08:00
Rory Clear
f409b57854
update metal matmul and matvec for new device style ( #2732 )
...
* update for new device style
* create device before compile
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
2023-12-17 16:15:07 -05:00
Nguyen Nguyen Phuong
07cf45e133
fix cuda matmul ( #2725 )
2023-12-12 07:59:31 -08:00
George Hotz
b5fd160b39
hotfix: increase rtol on simple_matmul
2023-12-11 10:10:29 -08:00
George Hotz
a73579919f
mlx benchmark, a lil slower than tg
2023-12-05 19:00:43 -08:00
George Hotz
0be5d16950
only 62 gflops ( #2629 )
2023-12-05 13:28:24 -08:00
Yixiang Gao
fde44aed76
update hip_matmul with new abstraction ( #2605 )
2023-12-04 13:37:10 -08:00
Jake
5588922884
Update cuda_matmul.py ( #2495 )
2023-11-28 19:46:01 -08:00
George Hotz
3f137b134a
jax parallel matmul example
2023-11-28 13:48:11 -08:00
Davi Silva
186ac77ec3
Update hip_matmul.py ( #2480 )
2023-11-27 18:36:19 -08:00
George Hotz
9e07824542
move device to device.py ( #2466 )
...
* move device to device.py
* pylint test --disable R,C,W,E --enable E0611
* fix tests
2023-11-27 11:34:37 -08:00
George Hotz
0cbf6c1811
move things, clean up extra ( #2292 )
...
* move things
* idk why pylint needs that now
* delete unused
2023-11-13 20:18:40 -08:00
Rory Clear
553688f12a
update metal matmul and matvec for compile api ( #2238 )
2023-11-08 08:08:35 -08:00
George Hotz
2f7aab3d13
move optimize_local_size ( #2221 )
...
* move optimize_local_size
* interpret_ast
2023-11-05 21:00:52 -08:00
George Hotz
5472a14544
openpilot compile2 ( #1977 )
...
* start compile2
* tweak
* why are there two more kernels?
* minor cleanups
* don't break onnx tests
* add __metadata__ support to safetensors
* no early realize in onnx
* cleanups
* bugfix
* clean up image type, add optimize
* opt to match old
* try that
* opt work
* run compile2
* optimizer
* prt more
* prerealize
* imp
* NOLOCALS works
* no locals means no locals
* support fractional globals
* all locals welcome
* int that
* cleanups
* show gemv regression
* clean up diff
* use idx for the cond
* nolocals
---------
Co-authored-by: Comma Device <device@comma.ai>
2023-10-15 20:39:46 -07:00
George Hotz
8db92bd060
fix tvm gemm example
2023-10-08 05:57:41 -07:00
Francis Lam
dece9958f8
wmma: clean up to make WMMA arg order consistent ( #2014 )
...
also add cache defeat to extra/gemm/simple_matmul.py
2023-10-07 17:45:40 -07:00
Francis Lam
0ba75c4370
optimizer: add matvec optimizations ( #1972 )
...
* optimizer: add matvec optimizations
* renderer: fix alignment of shared memory in opencl
2023-10-04 14:16:27 -07:00
George Hotz
717451a244
Revert "optimizer: add matvec optimizations ( #1753 )" ( #1959 )
...
This reverts commit f520323054
.
2023-10-03 00:28:42 -07:00
Francis Lam
f520323054
optimizer: add matvec optimizations ( #1753 )
...
* optimizer: add matvec optimizations
* Update optimizer.py
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-10-03 00:01:59 -07:00
Francis Lam
f445e056ed
wmma: add test and tensor core shape ( #1925 )
2023-09-28 18:04:28 -07:00
George Hotz
c36d0e3bd8
tvm import hook
2023-09-28 09:24:32 -07:00
qazal
d0e752003d
fixes ( #1893 )
2023-09-22 07:20:27 +08:00
George Hotz
4613c9e77c
add tvm example, formatting ( #1813 )
...
* add tvm example
* no realize
2023-09-07 11:50:41 -07:00
Pavol Rusnak
52a92bf95d
use class Foo: instead of class Foo(): ( #1797 )
...
* use class Foo: instead of class Foo():
* add ruff linter, copy settings from .flake8 to ruff.toml
2023-09-06 12:20:25 -07:00
George Hotz
a6d842af7a
move device to ops ( #1646 )
...
* move device to ops
* mlops types
* 2 lines
2023-08-23 08:30:17 -07:00
George Hotz
e464442adf
WMMA for 7900XTX ( #1563 )
...
* go
* hip no LRU
* work
* works
* 16 TFLOPS
* 29 TFLOPS
* 30 TFLOPS
* never mind, it's 60 TFLOPS
* fix metal WMMA
* put hip alloc back
2023-08-19 09:07:23 -07:00
George Hotz
c417cd3c97
fast HIP gemm -> 100 TFLOPS ( #1476 )
...
* fast HIP gemm
* wmma
* correct b
* fix spilling
* 60 TFLOPS
* 64 TFLOPS
* 65 TFLOPS
2023-08-09 06:54:15 -07:00