tinygrad

Commit Graph

Author	SHA1	Message	Date
wozeparrot	acadccf344	comma benchmark (#5518 )	2024-08-02 14:36:54 -07:00
chenyu	f27f949a5d	Revert "revert some UOp IDIV bound (#5863 )" (#5871 ) This reverts commit `0c8d202348`.	2024-08-01 21:38:31 -04:00
chenyu	df138bc558	Revert "revert a mod pattern (#5864 )" (#5870 ) This reverts commit `5c8de2d044`.	2024-08-01 20:44:26 -04:00
chenyu	1b0314d9ef	Revert "remove one more UOp mod pattern (#5865 )" (#5868 ) This reverts commit `b03b8e18c2`.	2024-08-01 20:28:35 -04:00
chenyu	b03b8e18c2	remove one more UOp mod pattern (#5865 ) fixed UOP_IS_SYMBOLIC=1 test_failure_40	2024-08-01 18:29:04 -04:00
chenyu	5c8de2d044	revert a mod pattern (#5864 ) fixed UOP_IS_SYMBOLIC=1 linearizer failure 47	2024-08-01 17:24:26 -04:00
chenyu	0c8d202348	revert some UOp IDIV bound (#5863 ) * revert some UOp IDIV bound breaks conv with UOP_IS_SYMBOLIC, added some conv tests in CI * those are correct * skip slow ones	2024-08-01 15:09:06 -04:00
George Hotz	5eedd9e3ad	raise the line ceiling to 8600. USE LINES CAREFULLY	2024-07-31 09:56:39 -07:00
wozeparrot	eebb1b9922	feat: temperature 0 llama3 benchmark (#5806 )	2024-07-30 12:05:36 -07:00
chenyu	cb6718347f	`python -m mkdocs build --strict` in CI (#5800 )	2024-07-29 16:46:30 -04:00
chenyu	be3899d211	hotfix increase ci timeout to 20 mintues (#5799 ) when cache is clear it takes time to populate cache	2024-07-29 16:25:27 -04:00
chenyu	471b188d79	fix mypy errors in latest mypy (#5794 ) * fix mypy errors in latest mypy mypy has stricter partial and api arg checks now * PYTHONPATH="."	2024-07-29 14:53:30 -04:00
George Hotz	0392123e6e	TC=2 still sets tensor cores (and TC=3 support for locals) (#5780 ) * TC=2 still sets tensor cores * add TC=3 support for using locals * bugfix * lines + TC=3 tests * CUDA can use threads, fix fuzz linearizer	2024-07-28 16:16:53 -07:00
qazal	3e49d86c01	process replay diffs 3 things now (#5731 ) * github api infra * process replay is 3 parts now * parse benchmarks * add gh_token * complete diff * move process replay tests * last successful run * add tempdir * skip master	2024-07-27 12:52:20 +03:00
qazal	57b4a8e98d	assert process replay asserts (#5737 ) * assert process replay asserts * one ci job is fine * test: Revert "separate process replay main loop (#5734)" This reverts commit `94d578396f`. * mac sed needs that * Revert "test: Revert "separate process replay main loop (#5734)"" This reverts commit e4ad7684d5472a64841a66b43bc1db7c9bbbf9e8. * disable process replay capture * save time * amd is tiny * send to /dev/null	2024-07-27 12:07:50 +03:00
George Hotz	db1d093b29	reenable LLaMA-3 8B BEAM on NV (#5746 )	2024-07-26 16:56:41 -07:00
chenyu	eff7c5fd2c	halve kernel counts in metal Fuzz Test linearizer (#5716 ) the test time has increased to 3 minutes	2024-07-25 14:35:11 -04:00
chenyu	7c8fe0fe47	skip interpolate tests for PYTHON=1 (#5664 )	2024-07-23 18:47:15 -04:00
George Hotz	e3f00ac77d	Fix cuda tc emu test (#5663 ) * fix acc folding for NV tensor cores * fix correctness of reduce_before_expand * fix test emulated CUDA tensor cores * test_gemm_fp16 on some devices	2024-07-23 15:04:25 -07:00
qazal	fdfc0015a7	[run_process_replay] for opencl/openpilot (#5009 ) * lil reset script * find the prg * use lower_schedule_item * add process replay back * cleanups	2024-07-18 19:42:33 +03:00
wozeparrot	6ccb2390c3	feat: update_benchmark_staging (#5529 )	2024-07-17 20:40:57 -07:00
George Hotz	d3b098299d	add failing regression test for image (#5540 ) * add failing regression test for image * tg type * simpler test * don't realize image to image casts caused issue * simple pad	2024-07-17 17:27:18 -07:00
wozeparrot	218e157f00	benchmark on update_benchmark_staging (#5541 )	2024-07-17 17:11:52 -07:00
Alessandro Benetti	13e200b437	add strict mkdocs check (#5497 )	2024-07-15 14:21:37 -07:00
qazal	40ec9410f9	simpler process replay (#5452 ) * remove check_process_replay * that can go to the top * add assert back * [run_process_replay] * checkout code [run_process_replay] * temp [run_process_replay] * revert temp [run_process_replay] * ahh this is why [run_process_replay] * revert temp [run_process_replay]	2024-07-13 19:55:06 +03:00
George Hotz	955e1179fb	move compile tests and merge (#5451 ) * move compile tests and merge * revert enet move, bump download cache * oh, try setting clang	2024-07-13 08:04:46 -07:00
chenyu	9a187e6102	fix handcode_opt script (#5435 ) * fix handcode_opt script * run in ci * real run in ci * HALF=0	2024-07-12 20:52:28 -04:00
George Hotz	b055ece550	hotfix: bump to cache gpuocelot	2024-07-12 13:54:14 -07:00
chenyu	b17e4adb3a	add `-c advice.detachedHead=false` to process replay git checkout (#5419 ) remove the noisy `Note: switching to 'origin/master'. You are in 'detached HEAD' state. You can look around, make experimental changes...` in log	2024-07-12 15:13:26 -04:00
qazal	31fcc516dc	more process replay tooling (#5407 ) * replays * what's in there * can it be up there * sha is enough * insert sha as the key * fix str * update reset utils * that nested try/except was terrible * github_context can go	2024-07-12 13:11:34 +03:00
Roelof van Dijk	6ec7dbc287	ci: parallelize uops tests (#5405 )	2024-07-12 11:22:41 +03:00
qazal	b91a0ccdc3	make [run_process_replay] [no_assert] the default (#5390 )	2024-07-11 22:36:59 +03:00
qazal	004366b193	context aware process replay [run_process_replay] (#5378 ) * test tc as ctx var * remove from opts * process replay * pop variable * B -> Variable * fix re-assign * pop temp vars * move TRANSCENDENTAL=2	2024-07-11 13:07:28 +03:00
chenyu	2396ab9b33	more transcend cleanup [run_process_replay] (#5369 ) fix test name, less # noqa: E501 and removed the cast	2024-07-10 23:05:03 -04:00
chenyu	64986f949c	more transcend math tests in ci (#5368 ) * more transcend math tests in ci test large input to trig functions that hit different reduction algo, and test TRANSCENDENTAL=2 for all backend * no CUDACPU * try that	2024-07-10 21:19:09 -04:00
chenyu	322c37e621	use helpers.JIT in llama and gpt2 examples (#5350 ) * use helpers.JIT in llama and gpt2 examples replaced getenv("JIT"), effectively made gpt2 default jit * fix test_gpt2	2024-07-09 15:04:43 -04:00
Ian Paul	d5a68ae6b3	Simple abstractions3.py fix (#5343 ) * abstractions3.py fix * Add abstractions3.py to CI tests	2024-07-09 13:48:42 +03:00
chenyu	631bc974a0	raise line count limit to 8500 (#5331 )	2024-07-08 14:00:28 -04:00
SnakeOnex	8c03816ae9	fix README example (#5284 ) * fixed README example * README test * changed py -> python markdown code flags in REAME	2024-07-04 11:15:07 -04:00
chenyu	191463a919	add timing to SDXL (#5273 )	2024-07-02 23:29:54 -04:00
chenyu	5808c37302	hotfix disable flaky llama3 beam benchmark on green (#5249 )	2024-07-01 15:00:47 -04:00
chenyu	b9122ecdaf	revert stable diffusion validation with threefry (#5248 ) * Revert "use threefry in stable diffusion benchmark (#4988)" This reverts commit `44dfa37c70`. * sdxl and validation fix * relax threshold	2024-07-01 14:43:47 -04:00
nimlgen	57e89645cd	hcq spec test (#5226 ) * start hcq spec test * more test * fixes * run on amd as well * test amdgpu exec * fix amd * amd mockgpu support sdma timestamp	2024-07-01 17:36:37 +03:00
chenyu	88763eb9ff	fix stable_diffusion with fp16 (#5239 )	2024-06-30 12:59:31 -04:00
nimlgen	dd7eef7d71	libc defs to autogen (#5217 ) * libc defs to autogen * amd import libc * linter * better a bit * remove comment, check this * not hardcoded path	2024-06-29 14:37:33 +03:00
nimlgen	6b08cb5e38	ptx runs on nv in benchmarks (#5224 )	2024-06-29 11:06:44 +03:00
nimlgen	b4c49ae3fa	remove cudacpu in favour of mockgpu (#5225 ) * remove cudacpu in favour of mockgpu * remove unused import * not used as well	2024-06-29 11:05:16 +03:00
chenyu	7090eac8cb	validate sdxl output and put it in benchmark (#5211 ) * validate sdxl output and put it in benchmark * don't print fetch progress_bar in CI	2024-06-28 11:40:52 -04:00
chenyu	d8dc43ad06	remove JIT_BATCH_SIZE=4 from gpt2 NV benchmark (#5198 ) this no longer helps	2024-06-27 15:20:34 -04:00
chenyu	83da8b3558	use NV instead of CUDA in benchmark (#5192 ) also reenabled mixtral on green	2024-06-27 13:52:58 -04:00
chenyu	0c6c7c5f7b	CACHELEVEL=0 -> IGNORE_BEAM_CACHE=1 in benchmark (#5191 ) ignoring beam cache but using compile cache should be fine, saved some benchmark time. also updated `beam_search` to check flag value before accessing diskcache	2024-06-27 13:15:18 -04:00
chenyu	c12de4f47d	benchmark use JITBEAM for llama and gpt2 (#5189 )	2024-06-27 12:56:02 -04:00
qazal	3af17849bf	safely parse quoted titles [run_process_replay] (#5183 )	2024-06-27 16:39:48 +03:00
qazal	6ca7b13ed1	limit pickled objects [run_process_replay] (#5154 ) * limit pickled objects * delete uop from the list * debug metal * need self.opts for TC * dont need device * [run_process_replay] * minor	2024-06-26 13:51:32 +03:00
qazal	8aa786232d	docs for running process replay locally (#5083 )	2024-06-21 09:55:08 -04:00
nimlgen	fb1bf48cfe	io_uring for copies from disk (#5035 ) * exp uring * fixes and old version * nv * cleaner * cmp vs aio * fix * no lib * fix nv * linter * disk_speed_test now runs default * fixes * uring -> io_uring * linter happy * get_temp_buf comment added * tiny nits * put wait back * test runs everywhere * remove consts * remove mmap consts * do not require iouring to run test, they are generic	2024-06-21 11:36:51 +03:00
qazal	97f1347dd9	fix check_process_replay for special characters (#5072 ) * 'test' [run_process_replay] [no_assert] * test with ( ) { } '' " " * remove the log [run_process_replay] '' () { } '{ * helpful echos [run_process_replay] [no_assert] () '' * test [run_process_replay] [no_assert] * test2 [run_process_replay] [no_assert] * test3 [run_process_replay] [no_assert] * it's also correct this way [run_process_replay] [no_assert] * remove extras [run_process_replay]	2024-06-20 20:23:29 +03:00
qazal	a6a5dba637	Revert "UPat for has_valid in load/store (#5052 )" (#5056 ) * manually insert in the Linearizer * fix process replay	2024-06-19 20:53:36 +03:00
qazal	ee01e464e3	use process replay as a diff creator (#4903 ) * add no_assert option [run_process_replay] [no_assert] * test [run_process_replay] [no_assert] * [run_process_replay] * back to normal [run_process_replay] * remove the log	2024-06-19 18:17:31 +03:00
chenyu	dc942bf1f6	jit sampling functionn in test_randomness.test_multinomial (#5034 ) * jit sampling functionn in test_randomness.test_multinomial `THREEFRY=1 python3 -m pytest test/test_randomness.py::TestRandomness::test_multinomial --durations 1` 7 sec -> 1.2 sec * skip that	2024-06-18 14:21:05 -04:00
chenyu	e9c6a36894	remove CACHELEVEL=0 in llama3 benchmark (#5025 )	2024-06-17 22:43:16 -04:00
chenyu	acaf9a490d	RECIP(-0.0) should be -inf (#5024 ) * RECIP(-0.0) should be -inf added test_dtype_alu for PYTHON backend * catcht that * fix those two	2024-06-17 22:26:58 -04:00
George Hotz	bee8fc29ee	add GPT2 half/half+beam to AMD (#5000 ) * add GPT2 half/half+beam to AMD * winograd in training. half and half/beam file upload	2024-06-16 14:07:14 -07:00
chenyu	44dfa37c70	use threefry in stable diffusion benchmark (#4988 ) also updated default steps to 10. easier to tell the image is following the prompt.	2024-06-15 20:25:29 -04:00
wozeparrot	ce1ed374c9	more tinychat fixes (#4971 )	2024-06-15 16:29:39 -07:00
qazal	ff8e9eefc3	hotfix: don't use ASSERT_COMPILE for benchmarks process replay (#4981 ) * use replay_codegen [run_process_replay] * disable for now [run_process_replay]	2024-06-15 16:57:47 +03:00
uuuvn	92f49efd06	Trigger process replay from pull request title [run_process_replay] (#4980 ) * Trigger process replay from pull request title * idk how this thing works btw * test if it will work * try 2 * Revert "idk how this thing works btw" This reverts commit 580da51b07a243020f79b1c333c8a2349ea00beb. * Revert "try 2" This reverts commit 7ff1e86d5d15d1a1745a139db1e1c13c5903b366. * test if it works * meh * Reapply "idk how this thing works btw" This reverts commit dd33ad7c143d1649d3f071970aceeb266291d24f. * revert	2024-06-15 16:21:00 +03:00
wozeparrot	62dc36d371	autogen _try_dlopen (#4949 )	2024-06-14 12:12:18 -07:00
chenyu	f902af4f0b	increase metal ci test timeout to 20 minutes (#4920 ) make it less annoying for now	2024-06-11 18:45:51 -04:00
qazal	7f3d9e6d94	revert hsa autogen removal (#4914 ) * Revert "only install comgr in AMD CI (#4909)" This reverts commit `7f03420d05`. * rocm-llvm only removal	2024-06-11 12:55:45 -04:00
qazal	7f03420d05	only install comgr in AMD CI (#4909 ) * test * delete hsa autogen	2024-06-11 06:19:33 -04:00
qazal	8b5bcf309a	process replay in all of CI (#4884 )	2024-06-10 14:49:29 -04:00
George Hotz	f42183ba28	hotfix: relax cifar to 93.2	2024-06-09 13:09:21 +02:00
nimlgen	654a8b9ef7	retire hsa (#4885 ) * retire hsa * EMULATE_AMD	2024-06-09 11:33:03 +03:00
nimlgen	6327b50e51	amd in benchmarks (#4861 ) * amd in benchmarks * remove all hsa	2024-06-08 23:24:46 +03:00
qazal	66dfd5e7bf	faster codegen process replay (#4858 ) * faster codegen process replay * use self.copy * regenerate * delete copy * test a real error [run_process_replay] * revert the error change	2024-06-07 16:20:57 +03:00
qazal	0db9674dea	skip process replay on master (#4808 )	2024-06-03 12:29:28 +03:00
qazal	f64fa51a64	process replay for test/* (#4799 ) * add input to unit tests [run_process_replay] * add setup [run_process_replay] * run tests [run_process_replay] * add cuda and amd [run_process_replay] * run everything but BEAM=2 [run_process_replay] * skip export_model [run_process_replay] * fix amd CI * add concurrency back	2024-06-03 12:01:58 +03:00
qazal	240d6b5bc0	process replay benchmarks (#4668 )	2024-06-01 14:36:21 +03:00
nimlgen	bd2e7c8b31	amd registers from file (#4778 ) * amd registers from file * remove commentes * linetr * no off	2024-05-31 18:48:57 +03:00
Szymon Ożóg	a4de81e9a6	Update ocelot version (#4715 )	2024-05-24 14:32:53 -04:00
chenyu	38bc38cdff	fix llama example quantize (#4699 ) * fix llama example quantize import quantize layers from new example llama3 add to mac benchmark * fix that * save the files	2024-05-23 15:35:26 -04:00
chenyu	72560e30fe	add CACHELEVEL=0 to tinybox green GEMM BEAM (#4693 ) * add CACHELEVEL=0 to tinybox green GEMM BEAM * BEAM=4 is more stable	2024-05-22 23:59:50 -04:00
Yury Zhuravlev	af56f0e68a	fix HSA/KFD load for system-wide installation (#4218 ) Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2024-05-22 20:33:21 -07:00
nimlgen	12339f6564	disable cuda test in ci (#4630 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2024-05-22 23:23:32 -04:00
qazal	498cf3e7e0	fuzzer path search for DEFINE_ACC (#4656 ) * insert acc * add test_ops * find toposorts * todo - not yet ready * remove the import * atol and childless children	2024-05-23 00:50:01 +03:00
qazal	458a3961eb	catch compile errors in uops tests (#4672 ) * use helper and compile * llama beam=2 * ast length * skip float4, fix hsa * use empty tensors	2024-05-21 12:20:35 +03:00
wozeparrot	00432496d7	feat: tinyboxgreen (#4366 ) * feat: tinyboxgreen * feat: tinyboxgreenv2 * fix symlink weights * fix: remove llama 2 70b for now * feat: naming * fix: remove extra cifar steps * feat: disable mixtral on nvidia	2024-05-20 22:39:34 -04:00
chenyu	8a0d1ca7bb	CI test timeout 20 min -> 10 min (#4645 ) if it takes more than 10 usually setup fails anyway. also updated matmul_kfd -> matmul_amd in benchmark	2024-05-18 13:58:28 -04:00
George Hotz	b74cc1d01a	uops cleanup (#4634 ) * def add cleanup * minor speedup * add back ptx speed * a little faster * merge that * only linearize once for ptx * two graph rewrites for ptx, bug?	2024-05-17 20:02:38 -07:00
George Hotz	07b350a8f4	new uops is an actual graph (#4560 ) * new uops is an actual graph * it's way slower * simpler * fix define acc * render_loop unique * ops test pass * add pattern matcher back, there's bugs * rewrite * use priority queue * recursive children * fix tests * fix tests with SINK * fix abstractions * fix assembly * simpler * link define_acc * fix DEFINE_ACC placement * type verify * full cmp * fix cmp * ACCESS_ACC * insert DEFINE_ACC * fix PHI * recursive rewrite * fix many tests * sum collapse * more patterns * correct change * fold arange * fix that lin test * space * big folding rule works * close * has more maxes, meh * cached node replace * set changed * simplest folding yet * works * works * DIV * all tests pass * del * fuzz linearizer fails * sum_collapse * test depth 2 cf * fix lin test 14 * fix clang depth * disable that * failure 14 is fixed * fix ptx * failure 27 is fixed * fix llama * run_cnt * Revert "Optimize PTX gated loads index calculation (#4304)" This reverts commit `d97d5a7689`. * fix uops loop * fix ptx bugs * add barrier * print * mem_type in ptx direct * bypass tests that fail in CI but pass locally * ptx remove ptr_ar * more ptx passing * fix ptx tests * assert compile support * remove model inference benchmark from red	2024-05-17 18:00:18 -07:00
chenyu	ca1df20fa9	benchmark name fix - resnet eval is on eval data (#4628 )	2024-05-17 12:56:12 -04:00
chenyu	e5d4e6a8aa	BEAM=2 in green CI for 100 TFLOPS (#4624 )	2024-05-16 23:28:28 -04:00
nimlgen	eb9689336e	nv mockgpu (#4600 ) * mockgpu nv * works * comment that out * fix merge * setup gpuocelot * install packages * not run all of them * passes * fix ci * almost * should pass * linter * linter 2 * try this? * ugn, not supported * ci * remove ticket from description * better descs	2024-05-15 23:46:08 +03:00
George Hotz	5ba611787d	move image into tensor.py. delete features (#4603 ) * move image into tensor.py * change setup.py * openpilot tests need pythonpath now	2024-05-15 10:50:25 -07:00
George Hotz	afa9753d39	ruff cleanup (#4594 ) * check editor config * no editorconfig, it doesn't work * ruff cleanups	2024-05-14 21:16:14 -07:00
George Hotz	9425973bc7	docs cleanup and move (#4593 ) * cleanup and move * docs-legacy is gone * don't update setup.py	2024-05-14 20:44:59 -07:00
George Hotz	fd02ab1e8b	move disassemblers and openpilot (#4592 ) * move disassemblers and openpilot * delete junk * put that in pre-commit * fixup readme	2024-05-14 19:30:02 -07:00
nimlgen	9b02aef45a	remove rhip (#4579 ) * remove rhip * remove hip runner	2024-05-14 17:58:19 +03:00
nimlgen	2131556c2c	amd mockgpu (#4535 ) * start mock amd gpu * virt files * cleaner * init ci * small fixes * linter * better? * ugh * linter * fix * diable some * run shorter * fixes * add hcq test * fix * fix cmd revert	2024-05-14 14:28:04 +03:00

1 2 3 4 5 ...

566 Commits