qazal
9d2ea94fe9
temp: disable process replay on metal ( #6062 )
2024-08-13 16:31:55 +03:00
nimlgen
8f787785d9
fix openpilot benchmark ( #6049 )
2024-08-12 21:12:32 +03:00
chenyu
e6c7c3e499
update pylint path to check indent/space for all ( #6022 )
...
also fixed many errors. it was not checking nested dirs. exclude autogen for now.
can we use ruff for this?
2024-08-10 14:41:09 -04:00
George Hotz
cfb04c67d1
run unit tests separate from others (and only once) ( #6020 )
...
* run unit tests separate from others
* ignore unit tests elsewhere
2024-08-10 11:17:56 -07:00
qazal
266afad8ed
hotfix: skip schedule capture in benchmarks ( #6012 )
2024-08-10 17:13:53 +03:00
qazal
24c7c41ce0
diff LazyBuffer schedules in process replay ( #5996 )
...
* start diff printing
* this should be 2
* add to process_replay.py
* enable schedule capture
* arange diff is process replay
2024-08-09 14:16:43 +03:00
George Hotz
3d445039c2
hotfix: 8800 lines for AMX+intel tc
2024-08-06 17:50:26 -07:00
chenyu
adba5efc64
enable llama 2 70B in tinybox green CI ( #5905 )
...
runnable with MAX_CONTEXT=256
2024-08-04 18:48:46 -04:00
George Hotz
7348c40d9d
sampling time sync (8700 lines) ( #5843 )
...
* sampling time sync
* jitter matrix
* comment
* pass mypy
* line count
2024-08-02 14:44:35 -07:00
wozeparrot
acadccf344
comma benchmark ( #5518 )
2024-08-02 14:36:54 -07:00
chenyu
f27f949a5d
Revert "revert some UOp IDIV bound ( #5863 )" ( #5871 )
...
This reverts commit 0c8d202348
.
2024-08-01 21:38:31 -04:00
chenyu
df138bc558
Revert "revert a mod pattern ( #5864 )" ( #5870 )
...
This reverts commit 5c8de2d044
.
2024-08-01 20:44:26 -04:00
chenyu
1b0314d9ef
Revert "remove one more UOp mod pattern ( #5865 )" ( #5868 )
...
This reverts commit b03b8e18c2
.
2024-08-01 20:28:35 -04:00
chenyu
b03b8e18c2
remove one more UOp mod pattern ( #5865 )
...
fixed UOP_IS_SYMBOLIC=1 test_failure_40
2024-08-01 18:29:04 -04:00
chenyu
5c8de2d044
revert a mod pattern ( #5864 )
...
fixed UOP_IS_SYMBOLIC=1 linearizer failure 47
2024-08-01 17:24:26 -04:00
chenyu
0c8d202348
revert some UOp IDIV bound ( #5863 )
...
* revert some UOp IDIV bound
breaks conv with UOP_IS_SYMBOLIC, added some conv tests in CI
* those are correct
* skip slow ones
2024-08-01 15:09:06 -04:00
George Hotz
5eedd9e3ad
raise the line ceiling to 8600. USE LINES CAREFULLY
2024-07-31 09:56:39 -07:00
wozeparrot
eebb1b9922
feat: temperature 0 llama3 benchmark ( #5806 )
2024-07-30 12:05:36 -07:00
chenyu
cb6718347f
`python -m mkdocs build --strict` in CI ( #5800 )
2024-07-29 16:46:30 -04:00
chenyu
be3899d211
hotfix increase ci timeout to 20 mintues ( #5799 )
...
when cache is clear it takes time to populate cache
2024-07-29 16:25:27 -04:00
chenyu
471b188d79
fix mypy errors in latest mypy ( #5794 )
...
* fix mypy errors in latest mypy
mypy has stricter partial and api arg checks now
* PYTHONPATH="."
2024-07-29 14:53:30 -04:00
George Hotz
0392123e6e
TC=2 still sets tensor cores (and TC=3 support for locals) ( #5780 )
...
* TC=2 still sets tensor cores
* add TC=3 support for using locals
* bugfix
* lines + TC=3 tests
* CUDA can use threads, fix fuzz linearizer
2024-07-28 16:16:53 -07:00
qazal
3e49d86c01
process replay diffs 3 things now ( #5731 )
...
* github api infra
* process replay is 3 parts now
* parse benchmarks
* add gh_token
* complete diff
* move process replay tests
* last successful run
* add tempdir
* skip master
2024-07-27 12:52:20 +03:00
qazal
57b4a8e98d
assert process replay asserts ( #5737 )
...
* assert process replay asserts
* one ci job is fine
* test: Revert "separate process replay main loop (#5734 )"
This reverts commit 94d578396f
.
* mac sed needs that
* Revert "test: Revert "separate process replay main loop (#5734 )""
This reverts commit e4ad7684d5472a64841a66b43bc1db7c9bbbf9e8.
* disable process replay capture
* save time
* amd is tiny
* send to /dev/null
2024-07-27 12:07:50 +03:00
George Hotz
db1d093b29
reenable LLaMA-3 8B BEAM on NV ( #5746 )
2024-07-26 16:56:41 -07:00
chenyu
eff7c5fd2c
halve kernel counts in metal Fuzz Test linearizer ( #5716 )
...
the test time has increased to 3 minutes
2024-07-25 14:35:11 -04:00
chenyu
7c8fe0fe47
skip interpolate tests for PYTHON=1 ( #5664 )
2024-07-23 18:47:15 -04:00
George Hotz
e3f00ac77d
Fix cuda tc emu test ( #5663 )
...
* fix acc folding for NV tensor cores
* fix correctness of reduce_before_expand
* fix test emulated CUDA tensor cores
* test_gemm_fp16 on some devices
2024-07-23 15:04:25 -07:00
qazal
fdfc0015a7
[run_process_replay] for opencl/openpilot ( #5009 )
...
* lil reset script
* find the prg
* use lower_schedule_item
* add process replay back
* cleanups
2024-07-18 19:42:33 +03:00
wozeparrot
6ccb2390c3
feat: update_benchmark_staging ( #5529 )
2024-07-17 20:40:57 -07:00
George Hotz
d3b098299d
add failing regression test for image ( #5540 )
...
* add failing regression test for image
* tg type
* simpler test
* don't realize image to image casts caused issue
* simple pad
2024-07-17 17:27:18 -07:00
wozeparrot
218e157f00
benchmark on update_benchmark_staging ( #5541 )
2024-07-17 17:11:52 -07:00
Alessandro Benetti
13e200b437
add strict mkdocs check ( #5497 )
2024-07-15 14:21:37 -07:00
qazal
40ec9410f9
simpler process replay ( #5452 )
...
* remove check_process_replay
* that can go to the top
* add assert back
* [run_process_replay]
* checkout code [run_process_replay]
* temp [run_process_replay]
* revert temp [run_process_replay]
* ahh this is why [run_process_replay]
* revert temp [run_process_replay]
2024-07-13 19:55:06 +03:00
George Hotz
955e1179fb
move compile tests and merge ( #5451 )
...
* move compile tests and merge
* revert enet move, bump download cache
* oh, try setting clang
2024-07-13 08:04:46 -07:00
chenyu
9a187e6102
fix handcode_opt script ( #5435 )
...
* fix handcode_opt script
* run in ci
* real run in ci
* HALF=0
2024-07-12 20:52:28 -04:00
George Hotz
b055ece550
hotfix: bump to cache gpuocelot
2024-07-12 13:54:14 -07:00
chenyu
b17e4adb3a
add `-c advice.detachedHead=false` to process replay git checkout ( #5419 )
...
remove the noisy `Note: switching to 'origin/master'.
You are in 'detached HEAD' state. You can look around, make experimental
changes...` in log
2024-07-12 15:13:26 -04:00
qazal
31fcc516dc
more process replay tooling ( #5407 )
...
* replays
* what's in there
* can it be up there
* sha is enough
* insert sha as the key
* fix str
* update reset utils
* that nested try/except was terrible
* github_context can go
2024-07-12 13:11:34 +03:00
Roelof van Dijk
6ec7dbc287
ci: parallelize uops tests ( #5405 )
2024-07-12 11:22:41 +03:00
qazal
b91a0ccdc3
make [run_process_replay] [no_assert] the default ( #5390 )
2024-07-11 22:36:59 +03:00
qazal
004366b193
context aware process replay [run_process_replay] ( #5378 )
...
* test tc as ctx var
* remove from opts
* process replay
* pop variable
* B -> Variable
* fix re-assign
* pop temp vars
* move TRANSCENDENTAL=2
2024-07-11 13:07:28 +03:00
chenyu
2396ab9b33
more transcend cleanup [run_process_replay] ( #5369 )
...
fix test name, less # noqa: E501 and removed the cast
2024-07-10 23:05:03 -04:00
chenyu
64986f949c
more transcend math tests in ci ( #5368 )
...
* more transcend math tests in ci
test large input to trig functions that hit different reduction algo, and test TRANSCENDENTAL=2 for all backend
* no CUDACPU
* try that
2024-07-10 21:19:09 -04:00
chenyu
322c37e621
use helpers.JIT in llama and gpt2 examples ( #5350 )
...
* use helpers.JIT in llama and gpt2 examples
replaced getenv("JIT"), effectively made gpt2 default jit
* fix test_gpt2
2024-07-09 15:04:43 -04:00
Ian Paul
d5a68ae6b3
Simple abstractions3.py fix ( #5343 )
...
* abstractions3.py fix
* Add abstractions3.py to CI tests
2024-07-09 13:48:42 +03:00
chenyu
631bc974a0
raise line count limit to 8500 ( #5331 )
2024-07-08 14:00:28 -04:00
SnakeOnex
8c03816ae9
fix README example ( #5284 )
...
* fixed README example
* README test
* changed py -> python markdown code flags in REAME
2024-07-04 11:15:07 -04:00
chenyu
191463a919
add timing to SDXL ( #5273 )
2024-07-02 23:29:54 -04:00
chenyu
5808c37302
hotfix disable flaky llama3 beam benchmark on green ( #5249 )
2024-07-01 15:00:47 -04:00
chenyu
b9122ecdaf
revert stable diffusion validation with threefry ( #5248 )
...
* Revert "use threefry in stable diffusion benchmark (#4988 )"
This reverts commit 44dfa37c70
.
* sdxl and validation fix
* relax threshold
2024-07-01 14:43:47 -04:00
nimlgen
57e89645cd
hcq spec test ( #5226 )
...
* start hcq spec test
* more test
* fixes
* run on amd as well
* test amdgpu exec
* fix amd
* amd mockgpu support sdma timestamp
2024-07-01 17:36:37 +03:00
chenyu
88763eb9ff
fix stable_diffusion with fp16 ( #5239 )
2024-06-30 12:59:31 -04:00
nimlgen
dd7eef7d71
libc defs to autogen ( #5217 )
...
* libc defs to autogen
* amd import libc
* linter
* better a bit
* remove comment, check this
* not hardcoded path
2024-06-29 14:37:33 +03:00
nimlgen
6b08cb5e38
ptx runs on nv in benchmarks ( #5224 )
2024-06-29 11:06:44 +03:00
nimlgen
b4c49ae3fa
remove cudacpu in favour of mockgpu ( #5225 )
...
* remove cudacpu in favour of mockgpu
* remove unused import
* not used as well
2024-06-29 11:05:16 +03:00
chenyu
7090eac8cb
validate sdxl output and put it in benchmark ( #5211 )
...
* validate sdxl output and put it in benchmark
* don't print fetch progress_bar in CI
2024-06-28 11:40:52 -04:00
chenyu
d8dc43ad06
remove JIT_BATCH_SIZE=4 from gpt2 NV benchmark ( #5198 )
...
this no longer helps
2024-06-27 15:20:34 -04:00
chenyu
83da8b3558
use NV instead of CUDA in benchmark ( #5192 )
...
also reenabled mixtral on green
2024-06-27 13:52:58 -04:00
chenyu
0c6c7c5f7b
CACHELEVEL=0 -> IGNORE_BEAM_CACHE=1 in benchmark ( #5191 )
...
ignoring beam cache but using compile cache should be fine, saved some benchmark time.
also updated `beam_search` to check flag value before accessing diskcache
2024-06-27 13:15:18 -04:00
chenyu
c12de4f47d
benchmark use JITBEAM for llama and gpt2 ( #5189 )
2024-06-27 12:56:02 -04:00
qazal
3af17849bf
safely parse quoted titles [run_process_replay] ( #5183 )
2024-06-27 16:39:48 +03:00
qazal
6ca7b13ed1
limit pickled objects [run_process_replay] ( #5154 )
...
* limit pickled objects
* delete uop from the list
* debug metal
* need self.opts for TC
* dont need device
* [run_process_replay]
* minor
2024-06-26 13:51:32 +03:00
qazal
8aa786232d
docs for running process replay locally ( #5083 )
2024-06-21 09:55:08 -04:00
nimlgen
fb1bf48cfe
io_uring for copies from disk ( #5035 )
...
* exp uring
* fixes and old version
* nv
* cleaner
* cmp vs aio
* fix
* no lib
* fix nv
* linter
* disk_speed_test now runs default
* fixes
* uring -> io_uring
* linter happy
* get_temp_buf comment added
* tiny nits
* put wait back
* test runs everywhere
* remove consts
* remove mmap consts
* do not require iouring to run test, they are generic
2024-06-21 11:36:51 +03:00
qazal
97f1347dd9
fix check_process_replay for special characters ( #5072 )
...
* 'test' [run_process_replay] [no_assert]
* test with ( ) { } '' " "
* remove the log [run_process_replay] '' () { } '{
* helpful echos [run_process_replay] [no_assert] () ''
* test [run_process_replay] [no_assert]
* test2 [run_process_replay] [no_assert]
* test3 [run_process_replay] [no_assert]
* it's also correct this way [run_process_replay] [no_assert]
* remove extras [run_process_replay]
2024-06-20 20:23:29 +03:00
qazal
a6a5dba637
Revert "UPat for has_valid in load/store ( #5052 )" ( #5056 )
...
* manually insert in the Linearizer
* fix process replay
2024-06-19 20:53:36 +03:00
qazal
ee01e464e3
use process replay as a diff creator ( #4903 )
...
* add no_assert option [run_process_replay] [no_assert]
* test [run_process_replay] [no_assert]
* [run_process_replay]
* back to normal [run_process_replay]
* remove the log
2024-06-19 18:17:31 +03:00
chenyu
dc942bf1f6
jit sampling functionn in test_randomness.test_multinomial ( #5034 )
...
* jit sampling functionn in test_randomness.test_multinomial
`THREEFRY=1 python3 -m pytest test/test_randomness.py::TestRandomness::test_multinomial --durations 1` 7 sec -> 1.2 sec
* skip that
2024-06-18 14:21:05 -04:00
chenyu
e9c6a36894
remove CACHELEVEL=0 in llama3 benchmark ( #5025 )
2024-06-17 22:43:16 -04:00
chenyu
acaf9a490d
RECIP(-0.0) should be -inf ( #5024 )
...
* RECIP(-0.0) should be -inf
added test_dtype_alu for PYTHON backend
* catcht that
* fix those two
2024-06-17 22:26:58 -04:00
George Hotz
bee8fc29ee
add GPT2 half/half+beam to AMD ( #5000 )
...
* add GPT2 half/half+beam to AMD
* winograd in training. half and half/beam file upload
2024-06-16 14:07:14 -07:00
chenyu
44dfa37c70
use threefry in stable diffusion benchmark ( #4988 )
...
also updated default steps to 10. easier to tell the image is following the prompt.
2024-06-15 20:25:29 -04:00
wozeparrot
ce1ed374c9
more tinychat fixes ( #4971 )
2024-06-15 16:29:39 -07:00
qazal
ff8e9eefc3
hotfix: don't use ASSERT_COMPILE for benchmarks process replay ( #4981 )
...
* use replay_codegen [run_process_replay]
* disable for now [run_process_replay]
2024-06-15 16:57:47 +03:00
uuuvn
92f49efd06
Trigger process replay from pull request title [run_process_replay] ( #4980 )
...
* Trigger process replay from pull request title
* idk how this thing works btw
* test if it will work
* try 2
* Revert "idk how this thing works btw"
This reverts commit 580da51b07a243020f79b1c333c8a2349ea00beb.
* Revert "try 2"
This reverts commit 7ff1e86d5d15d1a1745a139db1e1c13c5903b366.
* test if it works
* meh
* Reapply "idk how this thing works btw"
This reverts commit dd33ad7c143d1649d3f071970aceeb266291d24f.
* revert
2024-06-15 16:21:00 +03:00
wozeparrot
62dc36d371
autogen _try_dlopen ( #4949 )
2024-06-14 12:12:18 -07:00
chenyu
f902af4f0b
increase metal ci test timeout to 20 minutes ( #4920 )
...
make it less annoying for now
2024-06-11 18:45:51 -04:00
qazal
7f3d9e6d94
revert hsa autogen removal ( #4914 )
...
* Revert "only install comgr in AMD CI (#4909 )"
This reverts commit 7f03420d05
.
* rocm-llvm only removal
2024-06-11 12:55:45 -04:00
qazal
7f03420d05
only install comgr in AMD CI ( #4909 )
...
* test
* delete hsa autogen
2024-06-11 06:19:33 -04:00
qazal
8b5bcf309a
process replay in all of CI ( #4884 )
2024-06-10 14:49:29 -04:00
George Hotz
f42183ba28
hotfix: relax cifar to 93.2
2024-06-09 13:09:21 +02:00
nimlgen
654a8b9ef7
retire hsa ( #4885 )
...
* retire hsa
* EMULATE_AMD
2024-06-09 11:33:03 +03:00
nimlgen
6327b50e51
amd in benchmarks ( #4861 )
...
* amd in benchmarks
* remove all hsa
2024-06-08 23:24:46 +03:00
qazal
66dfd5e7bf
faster codegen process replay ( #4858 )
...
* faster codegen process replay
* use self.copy
* regenerate
* delete copy
* test a real error [run_process_replay]
* revert the error change
2024-06-07 16:20:57 +03:00
qazal
0db9674dea
skip process replay on master ( #4808 )
2024-06-03 12:29:28 +03:00
qazal
f64fa51a64
process replay for test/* ( #4799 )
...
* add input to unit tests [run_process_replay]
* add setup [run_process_replay]
* run tests [run_process_replay]
* add cuda and amd [run_process_replay]
* run everything but BEAM=2 [run_process_replay]
* skip export_model [run_process_replay]
* fix amd CI
* add concurrency back
2024-06-03 12:01:58 +03:00
qazal
240d6b5bc0
process replay benchmarks ( #4668 )
2024-06-01 14:36:21 +03:00
nimlgen
bd2e7c8b31
amd registers from file ( #4778 )
...
* amd registers from file
* remove commentes
* linetr
* no off
2024-05-31 18:48:57 +03:00
Szymon Ożóg
a4de81e9a6
Update ocelot version ( #4715 )
2024-05-24 14:32:53 -04:00
chenyu
38bc38cdff
fix llama example quantize ( #4699 )
...
* fix llama example quantize
import quantize layers from new example llama3
add to mac benchmark
* fix that
* save the files
2024-05-23 15:35:26 -04:00
chenyu
72560e30fe
add CACHELEVEL=0 to tinybox green GEMM BEAM ( #4693 )
...
* add CACHELEVEL=0 to tinybox green GEMM BEAM
* BEAM=4 is more stable
2024-05-22 23:59:50 -04:00
Yury Zhuravlev
af56f0e68a
fix HSA/KFD load for system-wide installation ( #4218 )
...
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2024-05-22 20:33:21 -07:00
nimlgen
12339f6564
disable cuda test in ci ( #4630 )
...
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-05-22 23:23:32 -04:00
qazal
498cf3e7e0
fuzzer path search for DEFINE_ACC ( #4656 )
...
* insert acc
* add test_ops
* find toposorts
* todo - not yet ready
* remove the import
* atol and childless children
2024-05-23 00:50:01 +03:00
qazal
458a3961eb
catch compile errors in uops tests ( #4672 )
...
* use helper and compile
* llama beam=2
* ast length
* skip float4, fix hsa
* use empty tensors
2024-05-21 12:20:35 +03:00
wozeparrot
00432496d7
feat: tinyboxgreen ( #4366 )
...
* feat: tinyboxgreen
* feat: tinyboxgreenv2
* fix symlink weights
* fix: remove llama 2 70b for now
* feat: naming
* fix: remove extra cifar steps
* feat: disable mixtral on nvidia
2024-05-20 22:39:34 -04:00
chenyu
8a0d1ca7bb
CI test timeout 20 min -> 10 min ( #4645 )
...
if it takes more than 10 usually setup fails anyway. also updated matmul_kfd -> matmul_amd in benchmark
2024-05-18 13:58:28 -04:00
George Hotz
b74cc1d01a
uops cleanup ( #4634 )
...
* def add cleanup
* minor speedup
* add back ptx speed
* a little faster
* merge that
* only linearize once for ptx
* two graph rewrites for ptx, bug?
2024-05-17 20:02:38 -07:00
George Hotz
07b350a8f4
new uops is an actual graph ( #4560 )
...
* new uops is an actual graph
* it's way slower
* simpler
* fix define acc
* render_loop unique
* ops test pass
* add pattern matcher back, there's bugs
* rewrite
* use priority queue
* recursive children
* fix tests
* fix tests with SINK
* fix abstractions
* fix assembly
* simpler
* link define_acc
* fix DEFINE_ACC placement
* type verify
* full cmp
* fix cmp
* ACCESS_ACC
* insert DEFINE_ACC
* fix PHI
* recursive rewrite
* fix many tests
* sum collapse
* more patterns
* correct change
* fold arange
* fix that lin test
* space
* big folding rule works
* close
* has more maxes, meh
* cached node replace
* set changed
* simplest folding yet
* works
* works
* DIV
* all tests pass
* del
* fuzz linearizer fails
* sum_collapse
* test depth 2 cf
* fix lin test 14
* fix clang depth
* disable that
* failure 14 is fixed
* fix ptx
* failure 27 is fixed
* fix llama
* run_cnt
* Revert "Optimize PTX gated loads index calculation (#4304 )"
This reverts commit d97d5a7689
.
* fix uops loop
* fix ptx bugs
* add barrier
* print
* mem_type in ptx direct
* bypass tests that fail in CI but pass locally
* ptx remove ptr_ar
* more ptx passing
* fix ptx tests
* assert compile support
* remove model inference benchmark from red
2024-05-17 18:00:18 -07:00