Commit Graph

76 Commits

Author SHA1 Message Date
Francis Lam bbb0ad4800
wmma: widen TC usage in search by using PADTO on TC axes when possible (#4216)
* wmma: widen TC usage in search by using PADTO on TC axes when possible

* test: start tests for the new padding TC behavior

* search: upgrade padded TC search to TC_OPT >= 2

* test: add behavior and correctness test for padded TC

added optional argument to apply_tensor_core to set TC_OPT level

* linearizer: add tests for the PADTO behvaior and docs
2024-04-22 16:50:31 -04:00
Francis Lam c91b7b1739
test: add fuzz_matmul and better debugging for simple_matmul (#4199)
also show unoptimized shape in verify_kernel
2024-04-16 23:40:31 -04:00
Francis Lam dcb58d3bed
extra/gemm/simple_matvec: add simple_matvec.py (#4021)
we can test with this or add it to CI for benchmarks
2024-03-31 16:38:52 -04:00
Francis Lam 04746022b1
extra/gemm/hip_matmul: fix to use new HSA devices and no headers (#3999)
* extra/gemm/hip_matmul: fix to use new HSA devices and no headers

* remove compile_hip import
2024-03-30 15:42:23 -04:00
chenyu c71627fee6
move GlobalCounter to helpers (#4002)
break circular import between ops and buffer
2024-03-30 00:30:30 -04:00
Akshit Talwar 0affbbf81c
update amx gemm (#3991) 2024-03-29 11:45:03 -04:00
chenyu b47f6cebb2
LinearizerOptions -> CompilerOptions (#3978) 2024-03-28 17:50:23 -04:00
Francis Lam 7c5729a3bd
wmma: refactor to remove wmma_func and create TC funcs as needed (#3945)
* wmma: refactor to remove wmma_func and create TC funcs as needed

* test_linearizer: disable bf16 CUDA during emulation testing

* cstyle: clean up creation of CUDA vec dtypes

* extra/gemm: add option to accumulate to bfloat16

* cleanups

* benchmark: add CUDA bfloat16 matmul

* more cleanups
2024-03-27 16:43:09 -04:00
George Hotz 68ca4d4276
split to schedule.py (#3949)
* split to schedule.py

* split
2024-03-26 21:02:46 -07:00
George Hotz 150ea2eb76
create engine folder and move code (#3948)
* retry

* older tf

* that
2024-03-26 20:38:03 -07:00
George Hotz 778d17fbd3
intel matmul (#3830)
* almost right

* intel xmx
2024-03-25 22:37:20 -07:00
Francis Lam a26090d404
search: change to use "spawn" and limit the number of tasks per child (#3862)
also clean up some examples to use __main__ and not initialize
resources outside of main
2024-03-21 21:23:36 -07:00
Caleb Bunch 0b1fc5888a
fix 'Import Error: cannot import name compile_cuda from tinygrad.runtime.ops_cuda' error in extra/gemm/cuda_matmul.py (#3531) 2024-02-28 17:15:32 -08:00
George Hotz 2e60012bcf
move create schedule and delete old API (#3377)
* move create schedule and delete old API

* fix test multitensor
2024-02-12 18:10:45 +01:00
George Hotz 41efaa848c
move graph.py and jit.py into features (#3376)
* move graph.py into features

* move jit into features

* fix quickstart
2024-02-12 17:34:34 +01:00
Yoshinori Sano 98c732cf9d
fix metal compile error in extra/gemm (#3365) 2024-02-10 12:54:41 +01:00
Francis Lam 4273aabe31
extra/gemm: add a simple_conv.py along with correctness check (#3236)
* extra/gemm: add a simple_conv.py along with correctness check

The goal is to easily test tensor core triggering situations

* test: add tests for acc_dtype handling and fixed typing
2024-01-26 19:06:57 -08:00
Ahmed Harmouche 168b1f879c
Fix hip_matmul gemm in extra (#3241) 2024-01-25 16:03:04 -08:00
Francis Lam ddbdb52f77
wmma: enable METAL half tensor cores and clean up cstyle (#3095)
* wmma: enable METAL half tensor cores and clean up cstyle

* revert simple_matmul rand changes and break line in tensor

* added metal fp16->fp32 tensor core
2024-01-12 16:25:28 -05:00
chenyu 1d730b8853
remove ACCUM_FP32 in simple_matmul.py (#3045)
* remove ACCUM_FP32 in simple_matmul.py

accumate for half inputs is always in float

* move test llama compile speed to metal
2024-01-08 17:37:57 -05:00
George Hotz a280cfe169
move dtypes to dtype.py (#2964)
* move dtypes to dtype.py

* fix urllib
2024-01-01 14:58:48 -08:00
George Hotz c81ce9643d
move globalcounters to ops (#2960)
* move globalcounters to ops

* missed a few

* sick of that failing
2024-01-01 14:21:02 -08:00
George Hotz 7da2325dc7
get_lazyops() -> lazyops (#2884)
* get_lazyops() -> lazyops

* don't compare empty mem
2023-12-20 18:04:49 -08:00
Rory Clear f409b57854
update metal matmul and matvec for new device style (#2732)
* update for new device style

* create device before compile

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2023-12-17 16:15:07 -05:00
Nguyen Nguyen Phuong 07cf45e133
fix cuda matmul (#2725) 2023-12-12 07:59:31 -08:00
George Hotz b5fd160b39 hotfix: increase rtol on simple_matmul 2023-12-11 10:10:29 -08:00
George Hotz a73579919f mlx benchmark, a lil slower than tg 2023-12-05 19:00:43 -08:00
George Hotz 0be5d16950
only 62 gflops (#2629) 2023-12-05 13:28:24 -08:00
Yixiang Gao fde44aed76
update hip_matmul with new abstraction (#2605) 2023-12-04 13:37:10 -08:00
Jake 5588922884
Update cuda_matmul.py (#2495) 2023-11-28 19:46:01 -08:00
George Hotz 3f137b134a jax parallel matmul example 2023-11-28 13:48:11 -08:00
Davi Silva 186ac77ec3
Update hip_matmul.py (#2480) 2023-11-27 18:36:19 -08:00
George Hotz 9e07824542
move device to device.py (#2466)
* move device to device.py

* pylint test --disable R,C,W,E --enable E0611

* fix tests
2023-11-27 11:34:37 -08:00
George Hotz 0cbf6c1811
move things, clean up extra (#2292)
* move things

* idk why pylint needs that now

* delete unused
2023-11-13 20:18:40 -08:00
Rory Clear 553688f12a
update metal matmul and matvec for compile api (#2238) 2023-11-08 08:08:35 -08:00
George Hotz 2f7aab3d13
move optimize_local_size (#2221)
* move optimize_local_size

* interpret_ast
2023-11-05 21:00:52 -08:00
George Hotz 5472a14544
openpilot compile2 (#1977)
* start compile2

* tweak

* why are there two more kernels?

* minor cleanups

* don't break onnx tests

* add __metadata__ support to safetensors

* no early realize in onnx

* cleanups

* bugfix

* clean up image type, add optimize

* opt to match old

* try that

* opt work

* run compile2

* optimizer

* prt more

* prerealize

* imp

* NOLOCALS works

* no locals means no locals

* support fractional globals

* all locals welcome

* int that

* cleanups

* show gemv regression

* clean up diff

* use idx for the cond

* nolocals

---------

Co-authored-by: Comma Device <device@comma.ai>
2023-10-15 20:39:46 -07:00
George Hotz 8db92bd060 fix tvm gemm example 2023-10-08 05:57:41 -07:00
Francis Lam dece9958f8
wmma: clean up to make WMMA arg order consistent (#2014)
also add cache defeat to extra/gemm/simple_matmul.py
2023-10-07 17:45:40 -07:00
Francis Lam 0ba75c4370
optimizer: add matvec optimizations (#1972)
* optimizer: add matvec optimizations

* renderer: fix alignment of shared memory in opencl
2023-10-04 14:16:27 -07:00
George Hotz 717451a244
Revert "optimizer: add matvec optimizations (#1753)" (#1959)
This reverts commit f520323054.
2023-10-03 00:28:42 -07:00
Francis Lam f520323054
optimizer: add matvec optimizations (#1753)
* optimizer: add matvec optimizations

* Update optimizer.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-10-03 00:01:59 -07:00
Francis Lam f445e056ed
wmma: add test and tensor core shape (#1925) 2023-09-28 18:04:28 -07:00
George Hotz c36d0e3bd8 tvm import hook 2023-09-28 09:24:32 -07:00
qazal d0e752003d
fixes (#1893) 2023-09-22 07:20:27 +08:00
George Hotz 4613c9e77c
add tvm example, formatting (#1813)
* add tvm example

* no realize
2023-09-07 11:50:41 -07:00
Pavol Rusnak 52a92bf95d
use class Foo: instead of class Foo(): (#1797)
* use class Foo: instead of class Foo():

* add ruff linter, copy settings from .flake8 to ruff.toml
2023-09-06 12:20:25 -07:00
George Hotz a6d842af7a
move device to ops (#1646)
* move device to ops

* mlops types

* 2 lines
2023-08-23 08:30:17 -07:00
George Hotz e464442adf
WMMA for 7900XTX (#1563)
* go

* hip no LRU

* work

* works

* 16 TFLOPS

* 29 TFLOPS

* 30 TFLOPS

* never mind, it's 60 TFLOPS

* fix metal WMMA

* put hip alloc back
2023-08-19 09:07:23 -07:00
George Hotz c417cd3c97
fast HIP gemm -> 100 TFLOPS (#1476)
* fast HIP gemm

* wmma

* correct b

* fix spilling

* 60 TFLOPS

* 64 TFLOPS

* 65 TFLOPS
2023-08-09 06:54:15 -07:00