tinygrad

Commit Graph

Author	SHA1	Message	Date
George Hotz	150ea2eb76	create engine folder and move code (#3948 ) * retry * older tf * that	2024-03-26 20:38:03 -07:00
George Hotz	41efaa848c	move graph.py and jit.py into features (#3376 ) * move graph.py into features * move jit into features * fix quickstart	2024-02-12 17:34:34 +01:00
George Hotz	c81ce9643d	move globalcounters to ops (#2960 ) * move globalcounters to ops * missed a few * sick of that failing	2024-01-01 14:21:02 -08:00
chenyu	73cadfbb3c	Remove pytest markers (#2831 ) * remove pytest marker * fix some, skip some * tweak * fix * skip slow * skip more	2023-12-18 18:53:28 -05:00
George Hotz	6d6eb9302d	ruff checks the max line length is 150 (#2734 ) * ruff checks the max line length is 150 * fix tensor.py * a lot more * done	2023-12-12 17:34:47 -08:00
Christopher Mauri Milan	7f01dd04f0	Apply ruff linting rules to tests (#2473 ) * everything except F821 * enable F821 with noqa * dumb fix * fix remaining imports and (former) lambdas * replace _ with noqa to avoid gc	2023-11-27 21:24:06 -08:00
George Hotz	9e07824542	move device to device.py (#2466 ) * move device to device.py * pylint test --disable R,C,W,E --enable E0611 * fix tests	2023-11-27 11:34:37 -08:00
George Hotz	2f7aab3d13	move optimize_local_size (#2221 ) * move optimize_local_size * interpret_ast	2023-11-05 21:00:52 -08:00
chenyu	f582ec56d5	Replace (getenv("CI", "") != "") with helpers.CI (#2213 )	2023-11-03 15:20:44 -07:00
Francis Lam	f445e056ed	wmma: add test and tensor core shape (#1925 )	2023-09-28 18:04:28 -07:00
Francis Lam	651205fa5c	linearizer: support local and group_for_reduce dimensions together (#1821 ) also minor changes to test_speed_v_torch.py and size of UOps.SPECIAL	2023-09-08 12:39:27 -07:00
Adrian Kretz	3473c9e88d	Metal conv tensor cores (#1696 ) * Benchmark 5x5 conv kernel which is optimized * Use Metal tensor cores in 2d convs	2023-09-04 15:14:46 -07:00
George Hotz	a6d842af7a	move device to ops (#1646 ) * move device to ops * mlops types * 2 lines	2023-08-23 08:30:17 -07:00
George Hotz	d25046e66a	matvec tests (#1634 ) * matvec tests * f16 * f16 is broken	2023-08-22 17:33:58 -07:00
Yixiang Gao	4f02491cd4	add cpu if torch tensor (#1609 )	2023-08-21 16:57:59 -07:00
Yixiang Gao	8d6662a741	.cpu().numpy() -> .numpy() (#1594 ) * .cpu().numpy() -> .numpy() * restore ops_torch * restore test_speed_v_torch	2023-08-21 09:53:29 -07:00
Diogo	d7d1011f1e	Add WEBGPU tests to CI (#1463 ) * webgpu tests * assert device is webgpu * missed env set * exclude failing ci tests * ignore test file * changed acc for adam test	2023-08-06 10:32:01 -07:00
George Hotz	486a9dbfd9	speed v torch (#1464 ) * speed v torch * always print * change print * torch speed tee * all exposed	2023-08-06 09:32:33 -07:00
Diogo	ba5e3818a0	Limit dims based on max size (#1390 ) * working * whitespace * changed defaults to None * linter * last linter error	2023-07-31 19:18:19 -07:00
Umut Zengin	d4ebadf2da	Small Tensor.cat optimization and reformating (#1347 )	2023-07-26 18:01:12 -04:00
cheeetoo	a0965ee198	CI < 5 minutes (#1252 ) * models matrix * fix typo and install gpu deps * install llvm deps if needed * fix * testops with cuda * remove pip cache since not work * cuda env * install cuda deps * maybe it will work now * i can't read * all tests in matrix * trim down more * opencl stuff in matrix * opencl pip cache * test split * change cuda test exclusion * test * fix cuda maybe * add models * add more n=auto * third thing * fix bug * cache pip more * change name * update tests * try again cause why not * balance * try again... * try apt cache for cuda * try on gpu: * try cuda again * update packages step * replace libz-dev with zlib1g-dev * only cache cuda * why error * fix gpuocelot bug * apt cache err * apt cache to slow? * opt and image in single runner * add a couple n=autos * remove test matrix * try cuda apt cache again * libz-dev -> zlib1g-dev * remove -s since not supported by xdist * the cache takes too long and doesn't work * combine webgpu and metal tests * combine imagenet to c and cpu tests * torch tests with linters * torch back by itself * small windows clang test with torch tests * fix a goofy windows bug * im dumb * bro * clang with linters * fix pylint error * linter not work on windows * try with clang again * clang and imagenet? * install deps * fix * fix quote * clang by itself (windows too slow) * env vars for imagenet * cache pip for metal and webgpu tests * try torch with metal and webgpu * doesn't work, too long * remove -v * try -n=logical * don't use logical * revert accidental thing * remove some prints unless CI * fix print unless CI * ignore speed tests for slow tests * clang windows in matrix (ubuntu being tested in imagenet->c test) * try manual pip cache * fix windows pip cache path * all manual pip cache * fix pip cache dir for macos * print_ci function in helpers * CI as variable, no print_ci * missed one * cuda tests with docker image * remove setup-python action for cuda * python->python3? * remove -s -v * try fix pip cache * maybe fix * try to fix pip cache * is this the path? * maybe cache pip * try again * create wheels dir * ? * cuda pip deps in dockerfile * disable pip cache for clang * image from ghcr instead of docker hub * why is clang like this * fast deps * try use different caches * remove the fast thing * try with lighter image * remove setup python for cuda * small docker and cuda fast deps * ignore a few more tests * cool docker thing (maybe) * oops * quotes * fix docker command * fix bug * ignore train efficientnet test * remove dockerfile (docker stuff takes too long) * remove docker stuff and normal cuda * oops * ignore the tests for cuda * does this work * ignore test_train on slow backends * add space * llvm ignore same tests as cuda * nvm * ignore lr scheduler tests * get some stats * fix ignore bug * remove extra ' * remove and * ignore test for llvm * change ignored tests and durationon all backends * fix * and -> or * ignore some more cuda tests * finally? * does this fix it * remove durations=0 * add some more tests to llvm * make last pytest more readable * fix * don't train efficientnet on cpu * try w/out pip cache * pip cache seems to be generally better * pytest file markers * try apt fast for cuda * use quick install for apt-fast * apt-fast not worth * apt-get to apt * fix typo * suppress warnings * register markers * disable debug on fuzz tests * change marker names * apt update and apt install in one command * update marker names in test.yml * webgpu pytest marker	2023-07-23 13:00:56 -07:00
George Hotz	9dffc9ba23	Use nevergrad to optimize kernels (try 2) (#1301 ) * nevergrad try 2 * touchups * no ones * opt fixup * cleanups * touchup * make new optimizer file	2023-07-20 16:46:45 -07:00
George Hotz	17830e25da	real world tests (#1297 ) * real world test * touchup * sync device	2023-07-20 10:50:22 -07:00
George Hotz	d6637623e3	torch test touchup	2023-07-19 09:37:23 -07:00
Umut Zengin	f8c539989e	Re-open create cumsum speed test (#1255 ) * Reduced tensor size in testing * Update formatting test_speed_v_torch.py	2023-07-17 18:59:36 -07:00
Diogo	a9a1df785f	Webgpu support (#1077 ) * initial commit * 81 passing * 105 passing tests * 148 passing * CI tests * install dep on ci * try opencl pkgs * try using vulkan * down to only 6 failing * refactor * cleaning up * another test skipped due to buffer limit * linter * segfault * indent fix * another segfault found * small touchups * Fix max and maxpool tests * Add constant folding * Add javascript export script * better asserts in codegen * manual upcasting * reverted token type change * skip safetensor test due to unsupported type * FIx efficientnet and all other model tests * Remove np copy * fixed indent and missing import * manually destroy the buffer * revert back to length * linter errors * removed extra val * skip broken tests * skipping more tests * Make the page pretty * Save model weights as safetensor * Fix imagenet to c test * Fix second imagenet to c bug * Async and paralel kernel compilation * workgroup support * reversed local size * fixed non local bug * correct local groups * ci experiment * removed typo * Fix define local by using shared memory * Refactor * try running on mac * match metal tests * add more workers * scope down tests * trying windows runner * fixed windows env * see how many it can do * merged master * refactor * missed refactor * increase test suite coverage * missing import * whitespace in test_efficientnet.py * getting there * fixed reset * fixed bufs * switched to cstyle * cleanup * min/max rename * one more linter issue * fixed demo * linter * testing ci chrome * add unsafe webgpu arg * add build step * remove WEBGPU from cmd line * use module * try forcing directx * trying forced metal backend * temp disable conv2d for CI * disable conv_trasnpose2d --------- Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-07-12 12:52:06 -07:00
George Hotz	67e34b356a	good stuff from tensor cores branch (#1199 )	2023-07-08 16:58:26 -07:00
George Hotz	a98e361da0	torch speed test, add add	2023-06-26 18:55:27 -07:00
George Hotz	2977fb17f6	various touchups (#1058 ) * op isn't optional * barrier + named local buffers * end global and local loop together to avoid useless if statement * better comments	2023-06-26 15:41:23 -07:00
George Hotz	18892242b0	global -> group (#1007 ) * global -> group * allow None for local_size in custom function * lil local * comment on shape * fix cuda * smart local cast * better local heuristic * fix ptx, and work_dim cleanup * fix metal * fix ops test * fix openpilot jit * no more optlocal * might fix metal tests * try metal now * see generated metal code * test free removal. REVERT THIS * mergable	2023-06-21 11:50:43 -07:00
George Hotz	5428b5d774	good changes from tensor_cores branch (#1005 ) * good changes from tensor_cores branch * touchups * real_strides fixup * refactor merge_views	2023-06-18 20:28:06 -07:00
George Hotz	20894991ed	good changes from the M1 Tensor Core project (#730 ) * good changes * working except llvm * llvm types * nice acc * archprobe * lang.float4 * use self.acc for late acc * fix store bug	2023-03-29 05:11:02 +04:00
George Hotz	23f88fb026	synchronize for honest speed compare	2023-03-24 10:24:27 -07:00
George Hotz	036737a12a	mem_estimate tracks bytes, not items	2023-03-10 09:44:12 -08:00
George Hotz	28a6ada4ce	line reduction in metal	2023-03-03 23:14:40 -08:00
George Hotz	8919ca8163	test cleanups	2023-03-03 06:36:06 -08:00
George Hotz	fca055bd66	NOOP means contiguous	2023-03-01 21:54:51 -08:00
George Hotz	d062cc82b8	put restrict back	2023-03-01 21:34:45 -08:00
George Hotz	b442e75c7a	test speed v torch	2023-03-01 19:50:12 -08:00
George Hotz	bfcec234a2	Refactor ASTs (#622 ) * ugh worst branch name * compiler refactor continues * scc -> cloc * buf -> _buf * finish _buf, and program -> runtime * gpu is still working, clang isn't * clang in new style * ops_metal * something broke it * improve metal * clean up tons of cl crap * hack fix sync * cleaner gpu * gpu metal clang * cleanups * minor refactor * GPUCodegen * fix up LLVM * blind CUDA refactor * codegen / runtime * keep ops naming * linter passes * woah, llvm was allocing 4x what it needed to * bugfixes * fix openpilot compiler * fix compile_efficientnet * method cache should fix tests * deal with duped functions	2023-03-01 18:57:29 -08:00
George Hotz	1d01842232	remove fake test	2023-02-25 10:21:07 -08:00
voidz	94bec40110	moved extras/jit.py -> tinygrad/jit.py (#599 ) * moved extras/jit.py to tinygrad/jit.py * fixed indent * removed tinygrad.helpers.DEBUG from jit.py	2023-02-25 08:32:33 -08:00
George Hotz	628ce067a1	add tests to mypy	2023-02-22 07:07:38 -08:00
George Hotz	104c3c5e73	oops, forgot that debug	2023-02-22 06:58:27 -08:00
George Hotz	fae7654924	fix sync issue	2023-02-17 12:42:45 -08:00
George Hotz	5e6265be6e	metal timing, fix speed test	2023-02-17 12:31:54 -08:00
George Hotz	121bd03cbd	metal globalcounters	2023-02-17 12:02:54 -08:00
George Hotz	20a03d5017	woah, don't sync torch if it's not torch	2023-02-12 07:48:56 -08:00
George Hotz	de71c13934	test speed v torch uses jit	2023-02-12 07:43:17 -08:00
George Hotz	b9f02671d3	oops, broke torch speed test	2023-02-10 16:13:53 -06:00

1 2

72 Commits