tinygrad

Commit Graph

Author	SHA1	Message	Date
qazal	be09cc87c1	Bitcast support / fast bf16 load (#2011 ) * bitcast renderers * fast llama load * make it one kernel * regression testing p1: re-enable test_dtype for all backends fix GPU * regression testing p2: fuzz all possible cases against numpy remove hancoded tests since the fuzzer covers them * define ushort * fix indent, probably need flake8 back for CI to catch --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-12-05 16:19:28 -08:00
George Hotz	8c67eb1c92	GPT bugfixes (#2624 ) * simple fixes * fix exp2 * fixed * parallel beam for CUDA * fix image dtypes	2023-12-05 11:42:28 -08:00
George Hotz	d87a246439	move to new cached fetch (#2493 ) * move to new cached fetch * extra.utils is over * loads * bump download cache * bump timeout	2023-11-28 17:36:55 -08:00
George Hotz	9e07824542	move device to device.py (#2466 ) * move device to device.py * pylint test --disable R,C,W,E --enable E0611 * fix tests	2023-11-27 11:34:37 -08:00
George Hotz	8ff2e13550	From teeny (#2426 ) * changes from teenygrad work * support not supporting ImageDType/PtrDType * fixups from teeny	2023-11-24 12:50:56 -08:00
qazal	b6aaf12df7	Internal cast 2 with more tests (#2257 ) * Change linearizer to parse CAST * Oneliner renders for cstyle and triton * LLVM cast and ALU implementation * pylint fixes * cast in gep * remove printbufs * use cast for post-load ops * get rid of parse_cast * partially supported vectorized dtypes for initial dev * render phi as the dtype * Revert "partially supported vectorized dtypes for initial dev" This reverts commit 1bf1a818a3350d74314806f00f5aaacb075bdf51. * Revert "render phi as the dtype" This reverts commit d08cb270b42266f06e4a78b199f9937cb9dc4711. * reenable triton tests * no vstore_half if dtype is already half * upcast max	2023-11-10 10:42:39 -08:00
George Hotz	330484c072	Revert "Internal casting support (#2046 )" (#2256 ) This reverts commit `7e1d08b2ae`.	2023-11-09 21:27:13 -08:00
qazal	7e1d08b2ae	Internal casting support (#2046 ) * Change linearizer to parse CAST * Oneliner renders for cstyle and triton * LLVM cast and ALU implementation * pylint fixes * cast in gep * remove printbufs * use cast for post-load ops * get rid of parse_cast * partially supported vectorized dtypes for initial dev * render phi as the dtype * Revert "partially supported vectorized dtypes for initial dev" This reverts commit 1bf1a818a3350d74314806f00f5aaacb075bdf51. * Revert "render phi as the dtype" This reverts commit d08cb270b42266f06e4a78b199f9937cb9dc4711. * reenable triton tests --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-11-09 21:02:32 -08:00
qazal	2465d5d267	fix ops tests in test_dtype (#2237 ) * fix test ops * decompose the err from test_ops * skipTest skips the entire test, we dont want that * handle cases with the same priority * add int16 to torch map	2023-11-09 15:17:43 -08:00
qazal	be5f185ac0	Higher test coverage for dtypes (#2156 ) * refactor unit tests for dtypes * add missing dtypes in llvmir.py and lib.py * skip torch tests * webgpu * cleaner skips * fix llvm bool casting issue using compare * llvm 100% passing * llvm segfault * TEMP decrease timeout mins to 11 debug * add bf16 to setup * skip half tests in cuda cpu * check for CUDACPU insetad * add int16 to triton dtypes * u16 for triton * remove debug - diff is still hard to read * derive from base class TestDType * enhance test_upcast and downcast by running on every possible version * dummy commit to rerun the flakey test * skip the correct tests for CUDA * bf16 should be skipped in the common TestDType cases * re-enable bf16 * more consistent structure * tiny changes to is_dtype_supported 1 * tiny changes 2 add reason * fuzz * fuzzer p2 * run fp32 twice * remove duplicate fp32 run * clang: use stdbool * skip triton on bool casts * merge and resolve conflicts	2023-10-30 22:38:42 -07:00
qazal	a7439af786	Fix llvm int->bool cast (#2164 ) * add to ir * add test case * minimize diff * todo * enable fast math * added both False and True case	2023-10-30 15:28:23 -07:00
George Hotz	1bf4aef0f5	fix image dtype cmp (#2089 ) * fix image dtype cmp * print that with debug 3	2023-10-16 17:52:38 -07:00
Ahmed Harmouche	fb4d830a2a	Fix cast error in render_load in wgsl (#1956 ) * Fix cast error in wgsl * User render_cast intead of introducing new method * Make it shorter * Add back webgpu tests: efficientnet and dtypes	2023-10-04 02:29:14 -07:00
George Hotz	a6d842af7a	move device to ops (#1646 ) * move device to ops * mlops types * 2 lines	2023-08-23 08:30:17 -07:00
George Hotz	739f327d2d	Shorter (#1582 ) * deleting lines * remove insert dims * if statement is never hit * bug fixes	2023-08-20 08:12:16 -07:00
corranr	68ebbd2954	for issue #1555 , int64 and int8 in CI=1 ARM64=1 CLANG=1 (#1572 ) * fixed for int8,int64, added dtype broadcasting test, passing all CI,ARM64,CLANG tests * remove shifts	2023-08-18 21:40:13 -07:00
Ethan Sorrell	cb62911f6b	PTX Reintegration and Passing Tests (#1512 ) * move assembly, assembly_ptx * successful but broken rendering of ptx asm * clear ins before render asm * slightly less broken :') * we needed thread syncs * fix float16 loading, rounding modifiers and other casting stuff, passing casts_from_half * Fix runtime_args for gpuocelot * our casts were flipped on both ends * more casting * add ternary where op * dealing with storing/loading bool * add test for casting to bool from negative * Fix args.valid on ConstOp * add to CI, TODO: fix runtime_args for test_uops * fix placement of runtime_args to work with lazy.Device * undo ci changes so I can push * fix lints * start cleanup and fix things we broke fixing lints * add checks for PTX specifc asm instructions * revert added test -- doesn't pass on llvm * skip tests for underflow,overflow * another fix for how we're setting runtime args * Less broken cleanup * add to CI * add more env variables for ci test * fix ci to install pycuda for ptx * ci: copy cuda test command * cleanup * assert to make sure we're actually running ptx in ci * remove test assert * move is_ptx arg * move assembly, assembly_ptx back to extras * fix imports * initial merge fixes * clear registers, fix UOps.LOAD with invalid value * draft merge fixes * remove prints * quick lint and merge fixes * cleanup * remove PTXProgram wrapper * final cleanup * temp change for ci rerun * ci rerun * rollback ISA version	2023-08-16 16:20:20 -07:00
Diogo	d17ecccd78	Torch/LLVM/arm F64 support (#1551 )	2023-08-15 21:21:08 -04:00
geohotstan	07b79f210f	llvmir support for bool <-> float casting (#1492 )	2023-08-09 13:12:52 -04:00
Diogo	d7d1011f1e	Add WEBGPU tests to CI (#1463 ) * webgpu tests * assert device is webgpu * missed env set * exclude failing ci tests * ignore test file * changed acc for adam test	2023-08-06 10:32:01 -07:00
George Hotz	d67e248d9b	simple bitcast 2 (#1445 ) * simple bitcast 2 * bc 2 * empty * Revert "empty" This reverts commit d8ee083655b67947afb1e577020b4395d001832c.	2023-08-06 00:30:50 -07:00
cheeetoo	a0965ee198	CI < 5 minutes (#1252 ) * models matrix * fix typo and install gpu deps * install llvm deps if needed * fix * testops with cuda * remove pip cache since not work * cuda env * install cuda deps * maybe it will work now * i can't read * all tests in matrix * trim down more * opencl stuff in matrix * opencl pip cache * test split * change cuda test exclusion * test * fix cuda maybe * add models * add more n=auto * third thing * fix bug * cache pip more * change name * update tests * try again cause why not * balance * try again... * try apt cache for cuda * try on gpu: * try cuda again * update packages step * replace libz-dev with zlib1g-dev * only cache cuda * why error * fix gpuocelot bug * apt cache err * apt cache to slow? * opt and image in single runner * add a couple n=autos * remove test matrix * try cuda apt cache again * libz-dev -> zlib1g-dev * remove -s since not supported by xdist * the cache takes too long and doesn't work * combine webgpu and metal tests * combine imagenet to c and cpu tests * torch tests with linters * torch back by itself * small windows clang test with torch tests * fix a goofy windows bug * im dumb * bro * clang with linters * fix pylint error * linter not work on windows * try with clang again * clang and imagenet? * install deps * fix * fix quote * clang by itself (windows too slow) * env vars for imagenet * cache pip for metal and webgpu tests * try torch with metal and webgpu * doesn't work, too long * remove -v * try -n=logical * don't use logical * revert accidental thing * remove some prints unless CI * fix print unless CI * ignore speed tests for slow tests * clang windows in matrix (ubuntu being tested in imagenet->c test) * try manual pip cache * fix windows pip cache path * all manual pip cache * fix pip cache dir for macos * print_ci function in helpers * CI as variable, no print_ci * missed one * cuda tests with docker image * remove setup-python action for cuda * python->python3? * remove -s -v * try fix pip cache * maybe fix * try to fix pip cache * is this the path? * maybe cache pip * try again * create wheels dir * ? * cuda pip deps in dockerfile * disable pip cache for clang * image from ghcr instead of docker hub * why is clang like this * fast deps * try use different caches * remove the fast thing * try with lighter image * remove setup python for cuda * small docker and cuda fast deps * ignore a few more tests * cool docker thing (maybe) * oops * quotes * fix docker command * fix bug * ignore train efficientnet test * remove dockerfile (docker stuff takes too long) * remove docker stuff and normal cuda * oops * ignore the tests for cuda * does this work * ignore test_train on slow backends * add space * llvm ignore same tests as cuda * nvm * ignore lr scheduler tests * get some stats * fix ignore bug * remove extra ' * remove and * ignore test for llvm * change ignored tests and durationon all backends * fix * and -> or * ignore some more cuda tests * finally? * does this fix it * remove durations=0 * add some more tests to llvm * make last pytest more readable * fix * don't train efficientnet on cpu * try w/out pip cache * pip cache seems to be generally better * pytest file markers * try apt fast for cuda * use quick install for apt-fast * apt-fast not worth * apt-get to apt * fix typo * suppress warnings * register markers * disable debug on fuzz tests * change marker names * apt update and apt install in one command * update marker names in test.yml * webgpu pytest marker	2023-07-23 13:00:56 -07:00
waifairer	7cac5ea16c	[GH-1305] Refactor test_dtypes.py to be cleaner (#1306 ) Co-authored-by: waifairer <waifairer@gmail.com>	2023-07-21 18:18:02 -04:00
George Hotz	ca77d6cd72	bfloat16 in LLVM (enough for llama 2) (#1293 ) * add bf16 support to LLVM * bf16 read works	2023-07-19 20:18:32 -07:00
Diogo	a9a1df785f	Webgpu support (#1077 ) * initial commit * 81 passing * 105 passing tests * 148 passing * CI tests * install dep on ci * try opencl pkgs * try using vulkan * down to only 6 failing * refactor * cleaning up * another test skipped due to buffer limit * linter * segfault * indent fix * another segfault found * small touchups * Fix max and maxpool tests * Add constant folding * Add javascript export script * better asserts in codegen * manual upcasting * reverted token type change * skip safetensor test due to unsupported type * FIx efficientnet and all other model tests * Remove np copy * fixed indent and missing import * manually destroy the buffer * revert back to length * linter errors * removed extra val * skip broken tests * skipping more tests * Make the page pretty * Save model weights as safetensor * Fix imagenet to c test * Fix second imagenet to c bug * Async and paralel kernel compilation * workgroup support * reversed local size * fixed non local bug * correct local groups * ci experiment * removed typo * Fix define local by using shared memory * Refactor * try running on mac * match metal tests * add more workers * scope down tests * trying windows runner * fixed windows env * see how many it can do * merged master * refactor * missed refactor * increase test suite coverage * missing import * whitespace in test_efficientnet.py * getting there * fixed reset * fixed bufs * switched to cstyle * cleanup * min/max rename * one more linter issue * fixed demo * linter * testing ci chrome * add unsafe webgpu arg * add build step * remove WEBGPU from cmd line * use module * try forcing directx * trying forced metal backend * temp disable conv2d for CI * disable conv_trasnpose2d --------- Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-07-12 12:52:06 -07:00
Yosef Frost	613bcd945d	Added Test Coverage to Int32 and Make Sure Tests Succeed (#1174 ) * Added test coverage for int32 in `test/test_dtype.py` Tests for int32 include: - testing that int32 can be converted into a numpy array - testing that float and int64 can be cast into int32 - testing that int32 can be cast into float and int64 - testing addition, multiplication, and matrix multiplication with int32 - testing that addition, multiplication, and matrix multiplication with int32 and either float or int64 gets successfully cast into float and int64, respectively Additional changes include testing that int8 casts into int32 and testing that float16 casts into int32 * Added type casting to the add, subtract, and divide binary operations * Added automatic type casting when types differ to FusedOps.MULACC I moved the match_types function back so that I could call it in einsum_mulacc where it would cast the types of the MULACC to be the same * Added unit test for match_types and added type hints to the parameters * Added tests for ops_cpu.match_types * Changed ops_cpu.einsum logic to play nicely with PyTorch Changed `tinygrad.runtime.ops_cpu.einsum_mulacc` logic to not perform type matching. Type matching was instead moved to the numpy_fxn_for_op dictionary in the ops_cpu file. Since ops_torch uses the same einsum_mulacc function, this should fix all the broken pytorch tests. * empty commit to rerun ci * reverting PR#1213 in attempt to fix broken test * Removed all tests I added to see if they are causing CI issues * Added back type matching tests * removed type matching tests and added back int tests * added back part of the type matching tests * removed braking type matching tests * empty commit for testing * added test back but inside comment * removed a test from the comment to see if it breaks CI * removed another function * more testing * emptied test comment * cleaned up comments * Added optimize=True flag to einsum_mullac in cpu_ops.py * Removed unnecessary imports from tests * optimized match_types by removing unnecessary array copying	2023-07-12 10:29:15 -07:00
Reza Rezvan	535224ac20	Remove float64 (#1101 ) * Refactor: Remove float64 * Refactor: Remove unused imports * Refactor: Remove float64 * Refactor: Remove float64 * Refactor: Exclude float64 onnx backend * Add: Skip jacobian and gradcheck tests;	2023-07-04 08:40:51 -07:00
cloud11665	2407690d82	add cuda on cpu tests (#1020 )	2023-06-22 14:15:50 -07:00
George Hotz	039f0d372f	delete ltypes (#984 ) * delete ltypes * only upcast float types * test dtype on mac passes * ugh, these upcasts	2023-06-15 16:24:45 -07:00
Diogo	0629791cbd	F64 support (#976 ) * initial commit * added osx check for opencl * added llvm f64 conversions * typo in llvmir * more tests and modified unsupported error * fixed linting error * added pragma fp64 * simplified exclusion for OSX * fixed device check and also added it to cast func * added ifdef check for fp16 in ops_gpu * Revert "added ifdef check for fp16 in ops_gpu" This reverts commit 92de754d48cba19c04ef20b3d4a1c3003046a9d0. * f64 prekernel signature match f16 * moved condition to buffer init	2023-06-13 21:31:31 -07:00
Diogo	1272d8526a	Llvm int support (#866 ) * added int val support to llvm * lint fix * added types * fix merge issues	2023-05-30 17:49:26 -07:00
Diogo	0dab8edc97	support Int64 type in cstyle gen (#860 ) * added metal int64 and some simple tests * removed bool return type def * typo in test * also missing in clang and gpu runtimes * switched order for opencl * increased atol and removed new line in kernel prefix	2023-05-30 16:04:46 -07:00
wozeparrot	2fd2fb6380	int8/uint8 support (#837 ) * feat: int8 support * feat: uint8 support * feat: int8 tests * fix: fix uint8 on clang * feat: test casting between int8/uint8/float16/float32 * clean: way cleaner dtype tests * feat: preprocess_imagenet using the correct dtype * feat: add test for overflow between uint8 and int8	2023-05-28 23:15:06 -07:00
Jacky Lee	fafe8e9ce2	casting: support all backends and implement half (#726 ) * casting: support all backends and implement half * map torch types in ops_torch * reuse type map for torch buffer * inverse dict lookup	2023-03-24 09:58:03 -07:00
Jacky Lee	e009b6f341	Add tests for casting (#724 ) * Add tests for casting * Skip half_matmul_upcast when TORCH=1 * Fix promotion on torch * Fix spacing	2023-03-23 08:02:52 -07:00
George Hotz	5495c7d64e	linearizer! (#714 ) * linearizer outputs something * working ish * cstyle codegen * clang mostly works * fix load valid * fix numberless loop * fancy gen * working * fix enet compiler * cleanups * float4 upcasting * less lines * supports_float4 * constant folding * mulacc * internet tests flaky in CI * 90% image support * fix image generic * bugs exposed with shapetracker and single view * new llvm * use vload, remove OLD * that's really poorly done * ending up being more lines	2023-03-19 23:43:49 -07:00
George Hotz	dc9a6b4bb7	fix float16 in CLANG on linux	2023-03-11 21:51:22 -08:00
George Hotz	1826ff6b89	dtypes nice and clean (#673 ) * add dtype class * dtypes * buffers are lazy * dtype is tracked by lazybuffer and GenericShape * fix types in llvm * llvm store * dtype tests * fix tests maybe * fix flop counter * fix CI * CI fix and check format * fix dtype and dtype check * fix custom test * fix test graph	2023-03-10 16:56:07 -08:00

38 Commits