tinygrad/.github/workflows
CaltropHungerton 38fb1e14a2
Intel XMX Tensor Core Support (#5622)
* fixed xmx demo

* i think i'm invoking the DPAS but it's slow

* compiler build arg to stop register spilling, indicated where to fix flop counter

* don't mind this

* do NOT mind me

* do not mind me

* do not view

* i will add bf16 later

* in process of figuring out tc fields

* we figured out the fields!!!

* added check for cl device vendor, added seperate IntelRenderer

* remove tc thread_local_aliases

* cleaning debris before draft pr

* edits for linter

* deduping and checking device extensions

* i will find more line reductions in other places

* before merge upstream

* double grf size in compiler to fix register spilling (bandaid), device checking changes

* tc python emulation

* fixed emulation

* tests for emulated intel tensor core

* TC=0, 1 working on upstream, fixed perf

* test

* debris

* check for specialized cl device when we canonicalize device

* bf16 support, tc=3 test added

* address tests

* revert half2 loads on intel tc, cleanup

* linter

* fold_expanded revert

* lint, whitespace fix

* cuda bf16 (only one with bf16) is skipped in test tensor cores, so i will skip for intel bf16 too

* make line shorter, no need for noqa E501

* removed device intel

* fix python emulation

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-08-16 09:19:21 -07:00
..
benchmark.yml load balance NV benchmark ci (#6107) 2024-08-16 10:08:08 -04:00
docs.yml add strict mkdocs check (#5497) 2024-07-15 14:21:37 -07:00
python-publish.yml update gh actions (#3033) 2024-01-09 17:52:22 -08:00
szdiff.yml update gh actions (#3033) 2024-01-09 17:52:22 -08:00
test.yml Intel XMX Tensor Core Support (#5622) 2024-08-16 09:19:21 -07:00