tinygrad

Commit Graph

Author	SHA1	Message	Date
chenyu	f902af4f0b	increase metal ci test timeout to 20 minutes (#4920 ) make it less annoying for now	2024-06-11 18:45:51 -04:00
qazal	7f3d9e6d94	revert hsa autogen removal (#4914 ) * Revert "only install comgr in AMD CI (#4909)" This reverts commit `7f03420d05`. * rocm-llvm only removal	2024-06-11 12:55:45 -04:00
qazal	7f03420d05	only install comgr in AMD CI (#4909 ) * test * delete hsa autogen	2024-06-11 06:19:33 -04:00
qazal	8b5bcf309a	process replay in all of CI (#4884 )	2024-06-10 14:49:29 -04:00
George Hotz	f42183ba28	hotfix: relax cifar to 93.2	2024-06-09 13:09:21 +02:00
nimlgen	654a8b9ef7	retire hsa (#4885 ) * retire hsa * EMULATE_AMD	2024-06-09 11:33:03 +03:00
nimlgen	6327b50e51	amd in benchmarks (#4861 ) * amd in benchmarks * remove all hsa	2024-06-08 23:24:46 +03:00
qazal	66dfd5e7bf	faster codegen process replay (#4858 ) * faster codegen process replay * use self.copy * regenerate * delete copy * test a real error [run_process_replay] * revert the error change	2024-06-07 16:20:57 +03:00
qazal	0db9674dea	skip process replay on master (#4808 )	2024-06-03 12:29:28 +03:00
qazal	f64fa51a64	process replay for test/* (#4799 ) * add input to unit tests [run_process_replay] * add setup [run_process_replay] * run tests [run_process_replay] * add cuda and amd [run_process_replay] * run everything but BEAM=2 [run_process_replay] * skip export_model [run_process_replay] * fix amd CI * add concurrency back	2024-06-03 12:01:58 +03:00
qazal	240d6b5bc0	process replay benchmarks (#4668 )	2024-06-01 14:36:21 +03:00
nimlgen	bd2e7c8b31	amd registers from file (#4778 ) * amd registers from file * remove commentes * linetr * no off	2024-05-31 18:48:57 +03:00
Szymon Ożóg	a4de81e9a6	Update ocelot version (#4715 )	2024-05-24 14:32:53 -04:00
chenyu	38bc38cdff	fix llama example quantize (#4699 ) * fix llama example quantize import quantize layers from new example llama3 add to mac benchmark * fix that * save the files	2024-05-23 15:35:26 -04:00
chenyu	72560e30fe	add CACHELEVEL=0 to tinybox green GEMM BEAM (#4693 ) * add CACHELEVEL=0 to tinybox green GEMM BEAM * BEAM=4 is more stable	2024-05-22 23:59:50 -04:00
Yury Zhuravlev	af56f0e68a	fix HSA/KFD load for system-wide installation (#4218 ) Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2024-05-22 20:33:21 -07:00
nimlgen	12339f6564	disable cuda test in ci (#4630 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2024-05-22 23:23:32 -04:00
qazal	498cf3e7e0	fuzzer path search for DEFINE_ACC (#4656 ) * insert acc * add test_ops * find toposorts * todo - not yet ready * remove the import * atol and childless children	2024-05-23 00:50:01 +03:00
qazal	458a3961eb	catch compile errors in uops tests (#4672 ) * use helper and compile * llama beam=2 * ast length * skip float4, fix hsa * use empty tensors	2024-05-21 12:20:35 +03:00
wozeparrot	00432496d7	feat: tinyboxgreen (#4366 ) * feat: tinyboxgreen * feat: tinyboxgreenv2 * fix symlink weights * fix: remove llama 2 70b for now * feat: naming * fix: remove extra cifar steps * feat: disable mixtral on nvidia	2024-05-20 22:39:34 -04:00
chenyu	8a0d1ca7bb	CI test timeout 20 min -> 10 min (#4645 ) if it takes more than 10 usually setup fails anyway. also updated matmul_kfd -> matmul_amd in benchmark	2024-05-18 13:58:28 -04:00
George Hotz	b74cc1d01a	uops cleanup (#4634 ) * def add cleanup * minor speedup * add back ptx speed * a little faster * merge that * only linearize once for ptx * two graph rewrites for ptx, bug?	2024-05-17 20:02:38 -07:00
George Hotz	07b350a8f4	new uops is an actual graph (#4560 ) * new uops is an actual graph * it's way slower * simpler * fix define acc * render_loop unique * ops test pass * add pattern matcher back, there's bugs * rewrite * use priority queue * recursive children * fix tests * fix tests with SINK * fix abstractions * fix assembly * simpler * link define_acc * fix DEFINE_ACC placement * type verify * full cmp * fix cmp * ACCESS_ACC * insert DEFINE_ACC * fix PHI * recursive rewrite * fix many tests * sum collapse * more patterns * correct change * fold arange * fix that lin test * space * big folding rule works * close * has more maxes, meh * cached node replace * set changed * simplest folding yet * works * works * DIV * all tests pass * del * fuzz linearizer fails * sum_collapse * test depth 2 cf * fix lin test 14 * fix clang depth * disable that * failure 14 is fixed * fix ptx * failure 27 is fixed * fix llama * run_cnt * Revert "Optimize PTX gated loads index calculation (#4304)" This reverts commit `d97d5a7689`. * fix uops loop * fix ptx bugs * add barrier * print * mem_type in ptx direct * bypass tests that fail in CI but pass locally * ptx remove ptr_ar * more ptx passing * fix ptx tests * assert compile support * remove model inference benchmark from red	2024-05-17 18:00:18 -07:00
chenyu	ca1df20fa9	benchmark name fix - resnet eval is on eval data (#4628 )	2024-05-17 12:56:12 -04:00
chenyu	e5d4e6a8aa	BEAM=2 in green CI for 100 TFLOPS (#4624 )	2024-05-16 23:28:28 -04:00
nimlgen	eb9689336e	nv mockgpu (#4600 ) * mockgpu nv * works * comment that out * fix merge * setup gpuocelot * install packages * not run all of them * passes * fix ci * almost * should pass * linter * linter 2 * try this? * ugn, not supported * ci * remove ticket from description * better descs	2024-05-15 23:46:08 +03:00
George Hotz	5ba611787d	move image into tensor.py. delete features (#4603 ) * move image into tensor.py * change setup.py * openpilot tests need pythonpath now	2024-05-15 10:50:25 -07:00
George Hotz	afa9753d39	ruff cleanup (#4594 ) * check editor config * no editorconfig, it doesn't work * ruff cleanups	2024-05-14 21:16:14 -07:00
George Hotz	9425973bc7	docs cleanup and move (#4593 ) * cleanup and move * docs-legacy is gone * don't update setup.py	2024-05-14 20:44:59 -07:00
George Hotz	fd02ab1e8b	move disassemblers and openpilot (#4592 ) * move disassemblers and openpilot * delete junk * put that in pre-commit * fixup readme	2024-05-14 19:30:02 -07:00
nimlgen	9b02aef45a	remove rhip (#4579 ) * remove rhip * remove hip runner	2024-05-14 17:58:19 +03:00
nimlgen	2131556c2c	amd mockgpu (#4535 ) * start mock amd gpu * virt files * cleaner * init ci * small fixes * linter * better? * ugh * linter * fix * diable some * run shorter * fixes * add hcq test * fix * fix cmd revert	2024-05-14 14:28:04 +03:00
chenyu	5de4a46f10	re-enable gpt2 half/beam mac benchmark (#4496 ) * re-enable gpt2 half/beam mac benchmark from fuzzer it seems to be flaky due to numerical issue, not kernel bug. we used to have half in splitted reduce. run this in M1 Max for 20 loops and it's fine * that should be jitted	2024-05-09 19:15:32 -04:00
chenyu	c508eb7425	revert the removal of CAST_BEFORE_VIEW (#4471 ) this brings most of the memory gain for resnet back.	2024-05-08 00:14:29 -04:00
qazal	760776c59d	merge EfficientNet to C with clang job (#4426 ) * merge ImageNet to C with linters * add to clang * delete from linter	2024-05-05 20:33:12 +03:00
chenyu	d4062cb6fc	NV tensor_cores in kernel.py (#4399 )	2024-05-02 22:33:08 -04:00
chenyu	dce7ac0160	NOCLANG=1 for tinybox green ci. (#4378 ) CLANG was disabled for tinybox red for speed	2024-05-01 13:31:01 -04:00
wozeparrot	4a26718ca9	feat: tinyboxgreen (#4365 )	2024-04-30 19:05:37 -04:00
chenyu	fdc8fabae5	disable flaky mac gpt2 beam benchmark and add back cifar mac with JIT=2 (#4358 ) * debug flaky mac gpt2 beam run * disable for now	2024-04-30 10:41:37 -04:00
Francis Lata	bb849a57d1	[MLPerf] UNet3D dataloader (#4343 ) * add support for train/val datasets for kits19 * split dataset into train and val sets * add tests for kits19 dataloader * add MLPerf dataset tests to CI * update unet3d model_eval script * fix linting * add nibabel * fix how mock dataset gets created * update ref implementation with permalink and no edits * clean up test and update rand_flip implementation * cleanups	2024-04-28 22:34:18 -04:00
chenyu	3ec4b745d6	JIT=2 for mac cifar benchmark (#4300 ) also double BS for resnet training benchmark to match submission target	2024-04-25 18:33:40 -04:00
chenyu	c1fbacb182	resnet benchmarks use DEFAULT_FLOAT=HALF (#4285 ) also update LR default to scaled based on 1536 (the BS we are submitting)	2024-04-24 12:10:57 -04:00
Szymon Ożóg	002a14088e	Ptx store gate cast to bool (#4284 ) * Cast gate to bool * Update * Add PTX fuzzing to benchmark	2024-04-24 11:43:44 -04:00
George Hotz	dbe3e1d548	or true fixes ci (#4283 ) * or true fixes ci * all with two pipes	2024-04-24 20:48:26 +08:00
chenyu	759b4f41c3	few more KFD -> AMD (#4262 ) benchmark gemm and default_parallel	2024-04-23 10:15:37 -04:00
Francis Lam	3f6c7ca8bf	test: fix test_tensor_core_padded on CUDA and add to benchmarks (#4258 ) * test: fix test_tensor_core_padded on CUDA and add to benchmarks * fix linter * run both tests in one call	2024-04-22 23:22:11 -04:00
Francis Lam	bbb0ad4800	wmma: widen TC usage in search by using PADTO on TC axes when possible (#4216 ) * wmma: widen TC usage in search by using PADTO on TC axes when possible * test: start tests for the new padding TC behavior * search: upgrade padded TC search to TC_OPT >= 2 * test: add behavior and correctness test for padded TC added optional argument to apply_tensor_core to set TC_OPT level * linearizer: add tests for the PADTO behvaior and docs	2024-04-22 16:50:31 -04:00
George Hotz	9e53d6cffa	hotfix: 8000 lines	2024-04-22 20:58:16 +04:00
nimlgen	e6227bdb15	nv driver (#4044 ) * start * fix err 93 * gpu * ioctl mappings * alloc like cuda * semaphores * wait for semaphores value * start ops_nv * very simple kernels work * init several gpus * qmd dumper * dirty, but most of kernels work * always all test_ops * progress, more tests, stable * test_ops passes, gpt2 works but wth big fifo, wrap of fifo doesn't work, i think it's something coherency releated * need better sync * fix sync * alloc2 * all tests pass! * cleanup 1 * cleanup * multigpu, simple transfer * fix sync * correct init * nv_gpu autogen + sync bug fix * clean extra/nv_gpu_driver * p2p * clean up * remove old gen * small fixes * cleanup * cleanup 2 * small fixes * bigger queue size * cleanups * wait * fixed signals for devs * fix hang + parallel beam * small fixes * detect when local memory is big in kernel * correct assert * small fixes * correct tls size est * one va space * less lines * shorter * save 2 lines * save some lines * remove type ignores --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-04-22 19:50:20 +04:00
chenyu	f1d9d0a151	cleanup external_test_opt (#4234 ) no more OPT=2 or OPT=3, check strict number of kernels, enabled tests that fusion works now	2024-04-20 04:00:08 -04:00

1 2 3 4 5 ...

448 Commits