tinygrad

Commit Graph

Author	SHA1	Message	Date
George Hotz	86a32ffb1a	lt sum (#1617 )	2023-08-21 21:19:16 -07:00
George Hotz	c64c47a6ae	test arange simple	2023-08-21 20:16:17 -07:00
Yixiang Gao	4f02491cd4	add cpu if torch tensor (#1609 )	2023-08-21 16:57:59 -07:00
Yixiang Gao	4d54afb6df	sparse cat cross entropy (#1597 ) * add sparse cat cross entropy * minor fix * add log_softmax into loss function * add test * update docs * fix training loss * add device	2023-08-21 14:14:54 -07:00
George Hotz	2e60920317	Revert "sparse cat cross entropy (#1591 )" (#1596 ) This reverts commit `f0ee850e98`.	2023-08-21 10:04:26 -07:00
Yixiang Gao	f0ee850e98	sparse cat cross entropy (#1591 ) * add sparse cat cross entropy * minor fix * add log_softmax into loss function * add test * update docs	2023-08-21 09:56:41 -07:00
Yixiang Gao	8d6662a741	.cpu().numpy() -> .numpy() (#1594 ) * .cpu().numpy() -> .numpy() * restore ops_torch * restore test_speed_v_torch	2023-08-21 09:53:29 -07:00
Umut Zengin	35bf21276f	Argmax/Argmin Feature (#1576 ) * implemented argmax and argmin * lint * lint * match torch behaviour * format * removed flip	2023-08-20 18:46:46 -07:00
George Hotz	012ee7d162	not worth the speed (#1584 ) * not worth the speed * no slots * uops comments * bump to python 3.11 for speed * add critical slots back	2023-08-20 10:24:58 -07:00
George Hotz	739f327d2d	Shorter (#1582 ) * deleting lines * remove insert dims * if statement is never hit * bug fixes	2023-08-20 08:12:16 -07:00
David Hou	4fbce972d7	CSE at uop level (#1483 ) * uop-level cse * add test * don't cache reduce alu ops * types * rename variable * fix * delete lines	2023-08-19 23:40:40 -07:00
David Hou	92754e177c	cache buffer loads across multiple bufs (#1482 ) * cache loads across buffers (since they may share rawbufs) * typing * add test * fix test * small changes to test * fix test * one big cache * whitespace * golf a line? * invalid is RawBuffer(0)[0], valid 1.	2023-08-19 09:09:58 -07:00
corranr	68ebbd2954	for issue #1555 , int64 and int8 in CI=1 ARM64=1 CLANG=1 (#1572 ) * fixed for int8,int64, added dtype broadcasting test, passing all CI,ARM64,CLANG tests * remove shifts	2023-08-18 21:40:13 -07:00
chenyu	ae39cf84ab	Symbolic Shape JIT main PR (#1353 ) * Symbolic Shape JIT update tests 2 variables symbolic ops, adding more tests test passing cleanup * more test cases * single flag * review update * jit attention one piece * realize * symbolic_jit test for cuda * old artifact * works with cuda gpu but failed ci * CUDACPU	2023-08-18 14:39:55 -07:00
chenyu	be50b2fe8f	more symbolic symbolic ops (#1564 ) * more symbolic symbolic ops * handle NumNode in __mul__	2023-08-18 09:21:41 -07:00
chenyu	dfec16cc83	Support arg int for CUDA kernel (#1565 )	2023-08-18 09:19:40 -07:00
nimlgen	bd111411bf	init allocator for compiled backends (#1467 ) * init allocator for compiled backends * Update ops_webgpu.py --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-08-17 10:33:32 -07:00
geohotstan	a293c18d34	Gather bugfix (#1561 )	2023-08-16 19:53:14 -04:00
Ethan Sorrell	cb62911f6b	PTX Reintegration and Passing Tests (#1512 ) * move assembly, assembly_ptx * successful but broken rendering of ptx asm * clear ins before render asm * slightly less broken :') * we needed thread syncs * fix float16 loading, rounding modifiers and other casting stuff, passing casts_from_half * Fix runtime_args for gpuocelot * our casts were flipped on both ends * more casting * add ternary where op * dealing with storing/loading bool * add test for casting to bool from negative * Fix args.valid on ConstOp * add to CI, TODO: fix runtime_args for test_uops * fix placement of runtime_args to work with lazy.Device * undo ci changes so I can push * fix lints * start cleanup and fix things we broke fixing lints * add checks for PTX specifc asm instructions * revert added test -- doesn't pass on llvm * skip tests for underflow,overflow * another fix for how we're setting runtime args * Less broken cleanup * add to CI * add more env variables for ci test * fix ci to install pycuda for ptx * ci: copy cuda test command * cleanup * assert to make sure we're actually running ptx in ci * remove test assert * move is_ptx arg * move assembly, assembly_ptx back to extras * fix imports * initial merge fixes * clear registers, fix UOps.LOAD with invalid value * draft merge fixes * remove prints * quick lint and merge fixes * cleanup * remove PTXProgram wrapper * final cleanup * temp change for ci rerun * ci rerun * rollback ISA version	2023-08-16 16:20:20 -07:00
geohotstan	8763037f0e	Fancy indexing is fancy wow and gather thing (#1399 )	2023-08-16 18:35:49 -04:00
chenyu	11dd9b1741	symbolic codegen and exec (#1552 ) * symbolic codegen and exec * fix and add test * no sketchy * merge_dicts type * dtypes._arg_int32	2023-08-16 14:43:41 -07:00
George Hotz	1e1d48b4e6	single model (#1560 )	2023-08-16 13:22:19 -07:00
JaSpa99	491e85597a	Run onnx commavq model (#1537 ) * try to run commavq * fix 0 dim, start implementing new ops - Implement EmbedLayerNormalization - Implement Attention * SkipLayerNormalization and FastGelu * use original torch model, cast inputs * fix some ops: - properly do Cast - Attention: bi- and unidirectional - FastGelu: add bias before gelu * cleanup onnx_ops.py * add validation option to benchmark * cleanup imports * add checks incase onnx2torch implements ops in future * run onnx instead of original torch * just skip gpu on m1 * reactivate the other models * check for strange params & squash whitespace * cleanup * fix causal mask Attention * Range doesn't need int cast * embedding vocab_counter same dtype as input * no need to cast * always validate, fix PosixPath ort --------- Co-authored-by: George Hotz <george@comma.ai>	2023-08-16 12:24:40 -07:00
nimlgen	c93e63b8b5	make TestNonFloatUOps.test_mul_bool pass on all platforms (#1557 )	2023-08-16 11:34:09 -07:00
madt2709	962972ee68	Fix uops int32 for llvm (#1554 ) * fix-uops-int32-llvm * fix tests * Ignore mypy error	2023-08-15 23:22:32 -07:00
Diogo	d17ecccd78	Torch/LLVM/arm F64 support (#1551 )	2023-08-15 21:21:08 -04:00
George Hotz	0b5930d406	more uops testing, who isn't passing right now... (#1522 ) * more uops * llvm refactor * update test uops * rest of the nodes * ors and ands	2023-08-15 09:07:26 -07:00
Steven Anderson	93a36c3659	Arm (#1421 ) * testing new memops * better debugging * testing padded conv * branching with load * refactoring a bit * first try * fixing bugs * fixing some * eq * eq2 * do not use x's * working * fixing imm * getting things working * refactor * pow not working * working except one * refactor: one store mem * refactor: global load * refactor: imm * refactor: cleaning * fixing big offsets * refactor with ci * try ci * typo * another typo * ubuntu default * forgot git * do i need git? * missing packages * adding python-dev * with cache? * buildx action * buildx name issue? * maybe now? * python3 * newline warning * maybe now * i actually need this * ci should work now * improved caching * fixing cache * maybe now it will cache * this * testing cache * trying again * load * missing platform * caching gha * testing cache * full testing * typo * now? * why * adding checkout back * bad formatting * fixing convention issues * supporting python * adding CI flag * testing all * better comments * adding debugging * takes 12x longer * does it output progress now? * ignore models for speed * fixing merge * excluding conv_transpose2d * only 2 test cuz is to slow * another approach * let's see * faster duh * my bad * T_T * typo * sup * with output? * comment test * comment test * comment test * :? * no comment * with cache * back to normal * testing that ci works * back to passing * trying again * does it create another entry * does it create another entry? * build local * hey * Revert "excluding conv_transpose2d" This reverts commit cc7348de03033e032f47d69caff174e2f1a7bfea. * does it cache if done before? * does it cache? * done * adding test ops * bad formatting * no need for this * working static mem * sum 1d * add ndim * better reg import * fix stack * back to np * working except for softmax * 5 failing * no pogress * remove keystone * remove keystone * testops passing * cleanups * more cleanup * typo * ci * ci2 * cond import * ci3 * ci4 * ci4 * ci5 * ci5 * ci6 * aligment * test all * correct test * err read_unmapped * passing test * ignore for speed * ignore for speed * ci7 * cleanup * remove docker * fixing merge * fixing bugs * add skipload for const ops * comments * First merge to master: Renderer * fix emulation * passing all tests arm64 * cleaning * fix handcoded binary * cleaning * fix errs * fix runtime arg binary * clean git diff * fix and clean * fixing metal test * cleaning * fix metal test * ci ~8 min * fix pylint and clang * cache the files in ops_clang --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-08-14 19:29:30 -07:00
chenyu	a89142e46f	ShapeTracker.var_vals (#1540 )	2023-08-14 18:53:37 -07:00
wozeparrot	9cb2bda34f	Revert "Better reshape (#1423 )" (#1538 )	2023-08-14 13:04:54 -04:00
Sieds Lykles	cf2bf1518d	Better reshape (#1423 ) * do reshaping without merge_views and reshape masks * added tests * properly do reshaping of zero or negative masks * replace while loop with single expression * remove old condition * add more tests and comments * remove empty file	2023-08-14 09:09:04 -07:00
YiMing Han	e00acb1eaf	fix deepwalk ctx check (#1536 )	2023-08-13 23:03:17 -07:00
nimlgen	b6937acb7e	fix casting behavior for interpreted buffers (#1525 )	2023-08-13 19:21:37 -07:00
chenyu	3e0c2d256f	symbolic shapetracker (#1506 ) * symbolic shapetracker * no need * keep only symbolic and clean up * explicit // and % Node support * NumNode * Node	2023-08-12 12:22:58 -07:00
JaSpa99	d3d58a37e5	Bert: use Tensor.scaled_dot_product_attention (#1528 ) * use scaled attn from Tensor * add a test for bert * linter * no more tokenizer * without loading weights * remove prints * tribute to linter lords * smaller input and less runs * small bert	2023-08-12 08:46:04 -07:00
wozeparrot	29d5801387	distributed collectives (#1519 ) * feat: world * feat: tests * feat: no more backwards * feat: recv into * feat: whoops * feat: test in ci * feat: some debug logging * feat: workflow naming * feat: need to set pythonpath * feat: just send to same device * feat: allreduce * feat: test * feat: need contiguous * feat: test in ci * feat: exit with correct code * feat: don't need that * feat: opencl wait_for just doesn't work * feat: synchronize on out * feat: try? * feat: try again? * feat: add extra realizes * feat: print * feat: seed * feat: tol * feat: test ones and zeros * feat: remove print * feat: are you just flaky * feat: seperate scatter and gather? * feat: just try synchronizing * feat: remove print again * feat: bring back difference * feat: no sync * feat: revert that * feat: back to wait_for * fix: typo	2023-08-11 10:22:07 -07:00
George Hotz	38fe84d92b	cleanup mlops (#1521 ) * cleanup mlops * that line belongs there	2023-08-10 19:53:28 -07:00
wozeparrot	7e7c9001e9	distributed world (#1481 ) * feat: world * feat: tests * feat: no more backwards * feat: recv into * feat: whoops * feat: test in ci * feat: some debug logging * feat: workflow naming * feat: need to set pythonpath * feat: just send to same device	2023-08-10 10:00:51 -07:00
geohotstan	07b79f210f	llvmir support for bool <-> float casting (#1492 )	2023-08-09 13:12:52 -04:00
Jacky Lee	ef5f648e2f	Tensor.scaled_dot_product_attention to match torch, used in LLaMA, and tested (#1502 ) * Implement scaled_dot_product_attention and test * Support attn_mask * Support is_causal too * Use in llama * Don't forget to reshape * Set requires_grad=False for causal * Remove staticmethod * Remove extra spaces	2023-08-08 23:27:13 -07:00
nimlgen	dabfd7569a	use allclose instead of equals in test_jit (#1504 ) Closes #1503	2023-08-08 22:22:17 -07:00
Yixiang Gao	7c2ea85bb0	Raise memory limit for CIFAR test (#1499 )	2023-08-08 19:40:56 -04:00
Yixiang Gao	6480a1a180	CIFAR 94.03% (#1340 ) * add disk_tensor * fix jit * new baseline before whitening * whitening through torch * whiting done currently at 91.65% * 91.99% * clean up mixup and 92.3% * clean up 92.30% * 92.49% before searching for new hyper-parameters * fix CI * fix white space * add whitening init in test * refactor, update hyperpara, 92.72% * converting whiting to tinygrad operation * update CI kernels count for CIFAR * add pad reflect * add random crop 92.53% * update hyperpara 93% * 93.15% on docker container, need to refactor the assignment for hyper param * print out weights and bias to be separated * bias/non-bias params separated * fix whitespace * clean up * refactor hyper-param with dict * refactor lr schedular params * fix whitespace * fix cross entropy loss * fix whitespace * move opt hyp to hyp dict * minor fixup * adjust model, loss scaling * 92.74% while using half of compute as before * update hyp for cutmix * random shuffle during batches * clean up * updating the model * update ConvGroup * disable gradients for batchnorm layer weights * whitespace * 93.92% * clean up * finally 94%git add .! * rewrite whitening to remove dependency on torch * whitespace * remove dependency on torch, 93.91% * back to 94.03% * clean up * update test_real_world	2023-08-08 15:13:24 -07:00
George Hotz	d24f936501	just cmplt (#1493 ) * just cmplt * fix maximum * don't save, there's no backward * ugh, no slot either * eq is a scam	2023-08-08 13:58:10 -07:00
nimlgen	932dad1a2b	fix cast bool->float in llvmir (#1480 ) Closes #1479	2023-08-07 21:30:51 -07:00
nimlgen	046fd7437a	use fake buffer for external_test_speed_llama.py (#1478 )	2023-08-07 22:05:44 -04:00
Diogo	d7d1011f1e	Add WEBGPU tests to CI (#1463 ) * webgpu tests * assert device is webgpu * missed env set * exclude failing ci tests * ignore test file * changed acc for adam test	2023-08-06 10:32:01 -07:00
George Hotz	486a9dbfd9	speed v torch (#1464 ) * speed v torch * always print * change print * torch speed tee * all exposed	2023-08-06 09:32:33 -07:00
George Hotz	2ab282bfec	run on update_benchmark too (#1460 ) * run on update_benchmark too * amd inference test * name it better * add 10 CIFAR training steps	2023-08-06 08:58:37 -07:00
George Hotz	d67e248d9b	simple bitcast 2 (#1445 ) * simple bitcast 2 * bc 2 * empty * Revert "empty" This reverts commit d8ee083655b67947afb1e577020b4395d001832c.	2023-08-06 00:30:50 -07:00
George Hotz	bf21aec81f	do benchmarking (#1451 ) * do benchmarking * system * artifact * go * name artifact	2023-08-05 23:35:01 -07:00
nimlgen	1ba8ae62a1	Match Torch speed for sum reduction (#1387 ) Co-authored-by: Alexander Edwards <alex@alexedw.com>	2023-08-05 22:27:33 -07:00
George Hotz	7fa730b506	external model benchmark test	2023-08-05 22:10:48 -07:00
George Hotz	7b8d06c9f1	test uops (#1444 ) * test uops * tests should pass * improve uops * precision	2023-08-05 12:35:56 -07:00
George Hotz	84c430355e	fix backends for new style (#1443 ) * fix backends for new style * fix method cache * fix fakeless * llvm blacklist * fix kernel optimizer	2023-08-05 11:07:04 -07:00
George Hotz	bd7f4b1249	move renamer to linearizer (#1442 ) * move renamer to linearizer * uops converter * Delete test_uops.py	2023-08-05 08:53:25 -07:00
nimlgen	669b406ec6	correct children count with lazycache (#1429 )	2023-08-05 00:30:16 -07:00
Felix	97a6029cf7	Corrected a few misspelled words (#1435 )	2023-08-04 16:51:08 -07:00
Francesco Castelli	579f4615a0	Add assert for wrong matmul/dot shapes (#1438 )	2023-08-04 18:16:56 -04:00
Umut Zengin	52db7d7435	inf, -inf support for pad (#1436 )	2023-08-04 15:05:25 -04:00
Alex Telon	7325bc914f	fix: Context (#1430 ) * Fixed issue in Context * Cleaned up fix Now that DEBUG.value = 3 always works we can do so in __new__ as well.	2023-08-04 10:53:48 -04:00
wozeparrot	801bed4f66	Add ops_shm (#1413 ) * feat: add ops_shm * clean: extra newline * feat: add test * feat: ci doesn't like that * feat: ci still doesn't like that * feat: skip big test on ci * feat: testing * feat: big * feat: testing again * feat: reskip test	2023-08-03 17:40:52 -07:00
chenyu	34f348643b	Support constant expand to symbolic shape (#1411 )	2023-08-02 21:21:22 -07:00
chenyu	6572ca6835	support symbolic expand (#1407 )	2023-08-02 20:03:46 -04:00
chenyu	18d0a93f09	LazyBuffer.get_variable_buffers() (#1391 ) * LazyBudder.get_variable_buffers() * remove left_only, add ProdNode * no vars for OpNode.b * do not change symbolic vars, remove ProdNode	2023-08-02 09:01:35 -07:00
Umut Zengin	8889821547	Const pad support to pad2d and slice (#1392 ) * slice to pad2d migrate * Gain line * Mypy happy * Mypy happy * Revert * whitespace	2023-08-02 08:58:52 -07:00
Alex Telon	b66361843a	Timing and Context can now be used as decorators (#1385 ) * Context and Timing can now be used as decorators * Using Timing decorator in quickstart.md The time formating is better and is a useful tool to learn. Old: Time: 3.5260659999912605 New: Time: 3526.14 ms * Updated env_vars documentation for Context * Added test for Context decorator * Put new import on same line as others	2023-08-01 17:16:10 -07:00
Diogo	4dc8595069	simple exporting models (#1344 ) * unified exporting * json exporting * ignore more * simplified buffer export * added dtypes * added assert * swift example * fix tests * linter * remove whitespace * fixed tests * remove swift example * remove unintended changes * allow callable models to be used * whitespace * more readable json export * name change * whitespace * whitespace	2023-08-01 09:35:48 -07:00
Diogo	ba5e3818a0	Limit dims based on max size (#1390 ) * working * whitespace * changed defaults to None * linter * last linter error	2023-07-31 19:18:19 -07:00
chenyu	b2fde9ec36	reshape to register variable value (#1386 ) * reshape to register variable value * better error message	2023-07-31 17:10:02 -07:00
Umut Zengin	0de5f20970	Re-open constant pad support to Tensor.pad (#1388 ) * Added const padding support to .pad * Linter	2023-07-31 17:08:57 -07:00
Alex Telon	2d10e0340e	Refactored ContextVars (#1331 )	2023-07-31 15:44:46 -04:00
chenyu	f5ef445cb6	trim space (#1381 )	2023-07-31 10:37:57 -07:00
JaSpa99	5ab12059da	rng hlops: add normal and kaiming_normal (#1378 ) * add normal and kaiming_normal * make sure its float * add tests	2023-07-31 10:37:02 -07:00
George Hotz	37fa7e96fb	Revert "update editorconfig, enforce via CI (#1343 )" (#1380 ) This reverts commit `da2efecbe2`.	2023-07-31 10:35:50 -07:00
Pavol Rusnak	da2efecbe2	update editorconfig, enforce via CI (#1343 ) * update editorconfig to set unix-style newlines and trim whitespace * add editorconfig github action to the CI * fix whitespace	2023-07-30 18:44:30 -07:00
S-Lykles	c2b82ea8ac	fix to_shape_strides (#1374 ) * add tests for expr_node and expr_idxs * simplify condition and add missing optimization	2023-07-30 18:42:46 -07:00
chenyu	1fdf560fb1	simplify get_contraction (#1373 )	2023-07-30 18:35:22 -07:00
S-Lykles	a32c677601	Fix off by one error in View.expr_node (#1363 ) * Fix off_by_one error in View.expr_node * Add test for expr_node * Remove whitespace before : * test no arguments and properly test idx=None	2023-07-29 08:10:37 -07:00
Karan Handa	e0a69bdbe6	Fix argfix and add tests (#1365 ) * Remove unreachable code * Fixed argfix * Add empty check and tests * Removed redundant tests"	2023-07-28 09:09:49 -07:00
wozeparrot	32d1afa4b5	feat: correct case when base is 0 (#1360 )	2023-07-27 13:53:38 -04:00
wozeparrot	c22e77abfd	Match torch on fractional negative base pow (#1352 ) * feat: match torch on fractional negative base pow * feat: tests for trunc	2023-07-26 19:14:54 -07:00
Umut Zengin	d4ebadf2da	Small Tensor.cat optimization and reformating (#1347 )	2023-07-26 18:01:12 -04:00
geohotstan	4056f97187	Gather (#1329 )	2023-07-25 15:05:41 -04:00
Francis Lam	9d142430cb	Add option in llama.py to quantize weights to int8 at runtime (#1289 ) * Add option in llama.py to quantize weights to int8 at runtime Also added lm-eval to external * Add support for llama-2 evaluation	2023-07-24 17:22:38 -07:00
Pavol Rusnak	cd60b8561c	Add LLaMA-2 support (#1284 ) Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2023-07-24 17:12:02 -04:00
waifairer	d89fb729e5	flake8 (#1323 ) * flake8: Ignore frequent violations, correct infrequent ones * Ignore some rules in test * Reorder test ignores * Lint test + main * EOF indent * Include all E71,E72 errors * Test the failing case in CI * Revert "Test the failing case in CI" This reverts commit 110add0a70f5a619d07631269104e84f908af6b9. * Push to test! This reverts commit f317532779a0e1ac8401e2474fd5c6c8695c08e9. * ok back to passing This reverts commit ba5052685f93f83e06152cdc696b9e26131d8ab7. * Prove that CI fails when formatting is incorrect. * Fix formatting * Remove duplicitous E117 rule * Use flake8 config for precommit --------- Co-authored-by: waifairer <waifairer@gmail.com>	2023-07-24 11:19:58 -04:00
George Hotz	086382b64e	Revert "Fix max nan (#1298 )" (#1334 ) This reverts commit `50774470b2`.	2023-07-23 20:41:28 -07:00
uncommonSensor	50774470b2	Fix max nan (#1298 ) * Fix max nan * Adds nan check option to max function * Calls to max can pass in "ignore_nan=True" argument * Added max nan CI tests * Fix max nan * Adds nan check option to max function * Calls to max can pass in "ignore_nan=True" argument * Added max nan CI tests * Turned off due to the need for granularity	2023-07-23 19:39:44 -07:00
cheeetoo	a0965ee198	CI < 5 minutes (#1252 ) * models matrix * fix typo and install gpu deps * install llvm deps if needed * fix * testops with cuda * remove pip cache since not work * cuda env * install cuda deps * maybe it will work now * i can't read * all tests in matrix * trim down more * opencl stuff in matrix * opencl pip cache * test split * change cuda test exclusion * test * fix cuda maybe * add models * add more n=auto * third thing * fix bug * cache pip more * change name * update tests * try again cause why not * balance * try again... * try apt cache for cuda * try on gpu: * try cuda again * update packages step * replace libz-dev with zlib1g-dev * only cache cuda * why error * fix gpuocelot bug * apt cache err * apt cache to slow? * opt and image in single runner * add a couple n=autos * remove test matrix * try cuda apt cache again * libz-dev -> zlib1g-dev * remove -s since not supported by xdist * the cache takes too long and doesn't work * combine webgpu and metal tests * combine imagenet to c and cpu tests * torch tests with linters * torch back by itself * small windows clang test with torch tests * fix a goofy windows bug * im dumb * bro * clang with linters * fix pylint error * linter not work on windows * try with clang again * clang and imagenet? * install deps * fix * fix quote * clang by itself (windows too slow) * env vars for imagenet * cache pip for metal and webgpu tests * try torch with metal and webgpu * doesn't work, too long * remove -v * try -n=logical * don't use logical * revert accidental thing * remove some prints unless CI * fix print unless CI * ignore speed tests for slow tests * clang windows in matrix (ubuntu being tested in imagenet->c test) * try manual pip cache * fix windows pip cache path * all manual pip cache * fix pip cache dir for macos * print_ci function in helpers * CI as variable, no print_ci * missed one * cuda tests with docker image * remove setup-python action for cuda * python->python3? * remove -s -v * try fix pip cache * maybe fix * try to fix pip cache * is this the path? * maybe cache pip * try again * create wheels dir * ? * cuda pip deps in dockerfile * disable pip cache for clang * image from ghcr instead of docker hub * why is clang like this * fast deps * try use different caches * remove the fast thing * try with lighter image * remove setup python for cuda * small docker and cuda fast deps * ignore a few more tests * cool docker thing (maybe) * oops * quotes * fix docker command * fix bug * ignore train efficientnet test * remove dockerfile (docker stuff takes too long) * remove docker stuff and normal cuda * oops * ignore the tests for cuda * does this work * ignore test_train on slow backends * add space * llvm ignore same tests as cuda * nvm * ignore lr scheduler tests * get some stats * fix ignore bug * remove extra ' * remove and * ignore test for llvm * change ignored tests and durationon all backends * fix * and -> or * ignore some more cuda tests * finally? * does this fix it * remove durations=0 * add some more tests to llvm * make last pytest more readable * fix * don't train efficientnet on cpu * try w/out pip cache * pip cache seems to be generally better * pytest file markers * try apt fast for cuda * use quick install for apt-fast * apt-fast not worth * apt-get to apt * fix typo * suppress warnings * register markers * disable debug on fuzz tests * change marker names * apt update and apt install in one command * update marker names in test.yml * webgpu pytest marker	2023-07-23 13:00:56 -07:00
George Hotz	47f9d82722	test_conv: relax to 0.93	2023-07-23 12:57:29 -07:00
chenyu	aa05495620	symbolic stride (#1326 )	2023-07-23 12:41:22 -07:00
Cole Sutyak	2d4e182294	change fetch to allow for local file selection (#1309 )	2023-07-23 15:00:16 -04:00
waifairer	7cac5ea16c	[GH-1305] Refactor test_dtypes.py to be cleaner (#1306 ) Co-authored-by: waifairer <waifairer@gmail.com>	2023-07-21 18:18:02 -04:00
Jacob Pradels	b112edd2c3	Add pylint trailing whitespace rule (#1314 )	2023-07-21 13:37:55 -04:00
madt2709	d2c1e8409a	Update arange to be (start, stop, step) (#1308 )	2023-07-21 00:27:23 -04:00
George Hotz	f45013f0a3	stable diffusion: remove realizes we don't need	2023-07-20 19:53:07 -07:00
George Hotz	9dffc9ba23	Use nevergrad to optimize kernels (try 2) (#1301 ) * nevergrad try 2 * touchups * no ones * opt fixup * cleanups * touchup * make new optimizer file	2023-07-20 16:46:45 -07:00
George Hotz	50a399ffa3	real world test: relax memory	2023-07-20 14:06:22 -07:00
George Hotz	17830e25da	real world tests (#1297 ) * real world test * touchup * sync device	2023-07-20 10:50:22 -07:00
George Hotz	ca77d6cd72	bfloat16 in LLVM (enough for llama 2) (#1293 ) * add bf16 support to LLVM * bf16 read works	2023-07-19 20:18:32 -07:00
Umut Zengin	74e63fe4ee	Added test_chunk and fixed (#1283 )	2023-07-19 22:21:26 -04:00
George Hotz	f7b0320d8b	add cifar training regression test (#1287 ) * add cifar training regression test * clean up print	2023-07-19 14:17:09 -07:00
George Hotz	45ecae1ab3	Revert "Match Torch speed for sum reduction on M1 (#1187 )" (#1286 ) This reverts commit `59af9b81c5`.	2023-07-19 13:39:16 -07:00
chenyu	120ae74008	Enable JIT test for size 1 tensor (#1285 )	2023-07-19 11:06:40 -07:00
chenyu	940b6fd21a	Revert "Fix constant folding for Tensor([3]) (#1227 )" (#1274 ) This reverts commit `ab645317c9`.	2023-07-19 10:51:06 -07:00
chenyu	0aed3f73da	More JIT test cases (#1280 ) * More JIT test cases * test against jit_cache directly * remove unused	2023-07-19 10:45:43 -07:00
George Hotz	d6637623e3	torch test touchup	2023-07-19 09:37:23 -07:00
Alexander Edwards	59af9b81c5	Match Torch speed for sum reduction on M1 (#1187 ) * Add additional kernel when reducing multiple dimensions at once. * Faster for smaller inputs * Whitespace and naming * Cleaner, guard for Metal only, and max 1 split rather than N * Draft of different approach * One additional kernel call for this test (as expected)	2023-07-19 09:18:58 -07:00
Umut Zengin	fde9f0e60d	Slice migrated in Eye op (#1281 ) * Migrated from slice to pad and shrink, made cleaner * Changed repeat with reshape and expand	2023-07-19 09:08:38 -07:00
chenyu	a5f5330d91	Add Fuzz Test symbolic / shapetracker to CI. (#1278 ) * Fuzz test symbolic and shapetracker This reverts commit d5773ddebff54c1ff608838076f0b4ff126b8aa8. * mess again * no tail * test shapetracker too * Revert mess and enable all tests * removed leftover	2023-07-19 09:05:45 -07:00
David Hou	56ee97b37f	dedup kernel args v2 (#1272 ) * new version * fix abstractions * try remove test * Revert "try remove test" This reverts commit 2fc18a9f8ed180540baf73d32b568262709822f1. * assert_allclose * minimize the test * minimize the test * minimize the test * minimize the test * Revert "minimize the test" This reverts commit e0c092959636109f745d1c8a73f2db90c75fe3c1. * Revert "minimize the test" This reverts commit 88240551b13403b21a81765043d5736103a49293. * Revert "minimize the test" This reverts commit 78328a7ce27328c8bf9a325ae017cc2a4d98f65b. * Revert "minimize the test" This reverts commit 989523fded4319b13db047e45ad8c35c861a36aa. * skip test inside body * oops * oops	2023-07-18 20:03:42 -07:00
Umut Zengin	fa0265b173	Fix: AssertionError Transpose/Permute when WHERE Op in LB (#1266 )	2023-07-18 16:09:19 -04:00
chenyu	c96bf395df	Enable JIT tests for supported devices, skip METAL and WEBGPU (#1265 ) * Enable JIT test * really test metal * Skip some device	2023-07-18 11:40:37 -07:00
Umut Zengin	f8c539989e	Re-open create cumsum speed test (#1255 ) * Reduced tensor size in testing * Update formatting test_speed_v_torch.py	2023-07-17 18:59:36 -07:00
Stan	ed472bffea	Fix: negative axis in `tensor.cumsum` (#1261 )	2023-07-17 16:16:38 -07:00
Adrian Kretz	5a8ad57163	Add WHERE ternary (or trinary?) op (#1196 ) * Rename FusedOps to TernaryOps * Support ternary broadcast * Add where llop and mlop * Make where op work in cstyle codegen * Don't skip test_inf_where * Add backward path to where op * Use bool in cstyle codegen * Add LLVM where op * Add numpy where op * Add torch where op * Simplify where mlop * Update documentation * Forgot a rename * Merged relevant changes from PR #1195 onto PR #1196 * Add test to cover changes to linearizer.ast_parse for WHERE op Without this METAL will try to use ternary op on float4 and fail * Make where op work in wgsl backend * Allow ternary ops to be merged * Make mypy happy --------- Co-authored-by: Francis Lam <flam@alum.mit.edu>	2023-07-16 00:31:55 -07:00
Stan	872e2198fe	Added `nn.ConvTranspose1d` (#1243 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-07-15 00:42:42 -07:00
Stan	264d467f2b	Added `tensor.squeeze` and support for testing exceptions (#1241 ) * WIP: `tensor.squeeze` function * Added `test_except` param to `helper_test_op` to avoid false positives * Extracted new method `helper_test_exception` for testing exceptions * Made `squeeze` not throw IndexError when ndim == 0 and dim <= 0 to match PyTorch	2023-07-15 00:33:24 -07:00
Stan	a8f3b3f4ed	Added test for nn.Conv1d (#1242 )	2023-07-15 00:30:50 -07:00
chenyu	32be39554c	Simplify symbolic.SumNode.__floordiv__ logic (#1220 )	2023-07-12 12:54:12 -07:00
Diogo	a9a1df785f	Webgpu support (#1077 ) * initial commit * 81 passing * 105 passing tests * 148 passing * CI tests * install dep on ci * try opencl pkgs * try using vulkan * down to only 6 failing * refactor * cleaning up * another test skipped due to buffer limit * linter * segfault * indent fix * another segfault found * small touchups * Fix max and maxpool tests * Add constant folding * Add javascript export script * better asserts in codegen * manual upcasting * reverted token type change * skip safetensor test due to unsupported type * FIx efficientnet and all other model tests * Remove np copy * fixed indent and missing import * manually destroy the buffer * revert back to length * linter errors * removed extra val * skip broken tests * skipping more tests * Make the page pretty * Save model weights as safetensor * Fix imagenet to c test * Fix second imagenet to c bug * Async and paralel kernel compilation * workgroup support * reversed local size * fixed non local bug * correct local groups * ci experiment * removed typo * Fix define local by using shared memory * Refactor * try running on mac * match metal tests * add more workers * scope down tests * trying windows runner * fixed windows env * see how many it can do * merged master * refactor * missed refactor * increase test suite coverage * missing import * whitespace in test_efficientnet.py * getting there * fixed reset * fixed bufs * switched to cstyle * cleanup * min/max rename * one more linter issue * fixed demo * linter * testing ci chrome * add unsafe webgpu arg * add build step * remove WEBGPU from cmd line * use module * try forcing directx * trying forced metal backend * temp disable conv2d for CI * disable conv_trasnpose2d --------- Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-07-12 12:52:06 -07:00
Yosef Frost	613bcd945d	Added Test Coverage to Int32 and Make Sure Tests Succeed (#1174 ) * Added test coverage for int32 in `test/test_dtype.py` Tests for int32 include: - testing that int32 can be converted into a numpy array - testing that float and int64 can be cast into int32 - testing that int32 can be cast into float and int64 - testing addition, multiplication, and matrix multiplication with int32 - testing that addition, multiplication, and matrix multiplication with int32 and either float or int64 gets successfully cast into float and int64, respectively Additional changes include testing that int8 casts into int32 and testing that float16 casts into int32 * Added type casting to the add, subtract, and divide binary operations * Added automatic type casting when types differ to FusedOps.MULACC I moved the match_types function back so that I could call it in einsum_mulacc where it would cast the types of the MULACC to be the same * Added unit test for match_types and added type hints to the parameters * Added tests for ops_cpu.match_types * Changed ops_cpu.einsum logic to play nicely with PyTorch Changed `tinygrad.runtime.ops_cpu.einsum_mulacc` logic to not perform type matching. Type matching was instead moved to the numpy_fxn_for_op dictionary in the ops_cpu file. Since ops_torch uses the same einsum_mulacc function, this should fix all the broken pytorch tests. * empty commit to rerun ci * reverting PR#1213 in attempt to fix broken test * Removed all tests I added to see if they are causing CI issues * Added back type matching tests * removed type matching tests and added back int tests * added back part of the type matching tests * removed braking type matching tests * empty commit for testing * added test back but inside comment * removed a test from the comment to see if it breaks CI * removed another function * more testing * emptied test comment * cleaned up comments * Added optimize=True flag to einsum_mullac in cpu_ops.py * Removed unnecessary imports from tests * optimized match_types by removing unnecessary array copying	2023-07-12 10:29:15 -07:00
Francis Lam	df86672bd4	Fix LazyBuffer SHUFFLE_PAD_OPS to prevent invalid pad movement (#1223 ) In addition to div, any ops that will generate non-zero outputs from zero inputs need to be guarded.	2023-07-11 15:30:35 -07:00
chenyu	ab645317c9	Fix constant folding for Tensor([3]) (#1227 ) * Fix constant folding for Tensor([3]) * Remove duplicated prod import * load in the same device * better numpy * add constant fold shape test cases * improve tests	2023-07-11 14:01:32 -07:00
madt2709	bb316a42af	Fix pow to work with negative tensors (#1191 )	2023-07-09 17:33:04 -07:00
George Hotz	43385c7dbf	remove contiguous on full (#1212 )	2023-07-09 17:31:15 -07:00
George Hotz	67e34b356a	good stuff from tensor cores branch (#1199 )	2023-07-08 16:58:26 -07:00
George Hotz	7151382364	Refactor load/store before tensor cores (#1193 ) * minor cleanups * render_const * now that's a nice refactor * clean up vload/vstore * clean up render_load * debugs there * dumb * err, this? * const float4 * what's failing * bugfix * statement includes semicolon * bugfix	2023-07-08 15:54:58 -07:00
fluffy χατγιρλ	628ee46627	Fix bug where Tensor.randn returns inf (#1192 ) * fix randn inf bug * add test * more compact test * clarify test purpose	2023-07-08 12:03:46 -07:00
George Hotz	0ad99038ef	Revert "Revert "Fix ShapeTracker mismatch in LazyBuffer.fromCPU (#1156 )" (#1181 )" + add test This reverts commit `a374b62bfe`.	2023-07-07 18:37:04 -07:00
George Hotz	a374b62bfe	Revert "Fix ShapeTracker mismatch in LazyBuffer.fromCPU (#1156 )" (#1181 ) This reverts commit `8ff7184b1b`.	2023-07-07 18:29:05 -07:00
fluffy χατγιρλ	8ff7184b1b	Fix ShapeTracker mismatch in LazyBuffer.fromCPU (#1156 ) * init shape tracker with strides to fix mismatch Author: sekstini <sekstinilol@gmail.com> * fix whitespace * add tests	2023-07-07 18:28:21 -07:00
Stan	69d33cab0d	Fix: auto create parent dir when downloading file (#1173 ) * Fix: auto create parent dir when downloading file also removed duplicate import `os` * Added test for auto parent dir creation when downloading file	2023-07-07 13:40:29 -07:00
terafo	aa60feda48	Fix naming conflict with huggingface datasets (#1161 ) * Rename in files * Move files * Moved to extra/datasets as suggested * Changes to files * Fixed stupid mistake --------- Co-authored-by: terafo <terafo@protonmail.com>	2023-07-07 10:43:44 -07:00
Stan	9b6e57eccd	helpers.py: improved test coverage + exception handling (#1165 ) * Fixes + improved test coverage for helpers.py - added exception handling in `proc`, if an exception was thrown, the thread would hang - made `_early_exec_process` catch any Exception, before if an exception was thrown before the process was started, it would hand the thread * Made `_early_exec_process` catch any Exception Otherwise, if an exception was thrown before the process was started, it would hang the thread. For example a type error for an argument passed to `subprocess.check_output` * Fixed `from tinygrad.helpers import Timing` import oops, for some reason my IDE cleaned that import from extra/helpers. * Fixed import in llama.py Another one that I skipped by accident, mybad * Extracted a class for tests of early exec * Normalize line endings, windows uses /r/n * Made `cross_process` not a daemon	2023-07-07 10:26:05 -07:00
Kunwar Raj Singh	8391648822	Over 90% on CIFAR with examples/hlb_cifar10.py (#1073 ) * fix eval, lr decay, best eval * 82.27 * 82.64 * 82.79, reproducable * add lr sched, 85.26 * 87.42 * 87.94 * 87.42 * tta with flip * training flip aug * refactor * using Tensor for LR is faster * 89.5 * refactor, flip only train set * 90.01 * 90.64 * eval jit * refactor * only JIT model * fix eval JIT * fix eval JIT * 90.82 * STEPS=900 reaches 90.22 * TTA envvar * TTA default 0 * fully jit training * refactor optim * fix sched * add label smoothing * param changes * patial gelu * OneCycle with pause * gelu maybe works * 90.12 * remove pause lr * maybe fix lr schedulers * scheduler test passing * comments * try mixup * shuffle! * add back the missing last eval * fix shuffle bugs * add mixup prob * fix mixup prob * 90.19 * correct mixup * correct mixup * correct mixup * 90.24 * 90.33 * refactor, add type hints * add gradient clipping * maybe fix test * full JIT * back to relu for now * pass mixup prob as param * add typehints * maybe CI works * try erf gelu * CI, types * remove useless import/ * refactor optim * refactor optim * try leakyrelu * try celu * gelu * 90.67 * remove grad clip * remove grad clip tests * revert params * add test for OneCycleLR * 90.62 * fix eval timing * fix eval timing again * so where i calculate mixup_prob matters --------- Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>	2023-07-06 20:46:22 -07:00
Rayan Hatout	9975f24452	Fold expand preceding reduce if the reduction is on the same axis as the expansion (#1134 ) * fold expands that precede a reduce if the reduction is on the same axis as the expansion * add deterministic test for SIMPLIFY_SUM_RESHAPE_EXPAND_SUM optimization * add a test case to make sure we don't fold reduce-expand-reduce on different axes	2023-07-06 13:41:05 -07:00
Eli Frigo	801564f31b	Remove POW llop and add SQRT llop (#1104 ) * fixed division by zero for fast operations * made et closer to 0 * replace POW llop with SQRT * updated mlops to swap SQRT and POW llops * updated hlops to swap POW and SQRT * added sqrt llop to cpu runtime * added sqrt llop to cstyle codegen * added POW llop to llvm ir codegen * added SQRT llop to torch runtime * moved pow from mlops to hlops * found a better way to do reverse pow * fixed indentation * added SQRT llop to triton * update docs to match new llops * removed POW operator from assembly codegen * added sqrt and rsqrt to pow hlop * rewrote pow function in tensor.py * Adjust tolerance * Adjust for adamw * Reduce for Adam too * removed accidental leftover code * removed all of accidental code * added rsqrt test * removed pow from mlops again it was added back when resolving merge conflicts --------- Co-authored-by: Jacky Lee <jla524@sfu.ca>	2023-07-05 18:07:58 -07:00
Reza Rezvan	d1356cac27	Fix: Jacobian tests [WIP] (#1126 ) * Fix: Jacobian tests; num_jacobian either bugged or not accurate enough; * Fix: Jacobian tests; * Fix: Gradcheck;	2023-07-05 15:36:22 -07:00
George Hotz	793a670187	from tensor cores + lb touchup (#1127 )	2023-07-04 15:45:20 -07:00
Reza Rezvan	535224ac20	Remove float64 (#1101 ) * Refactor: Remove float64 * Refactor: Remove unused imports * Refactor: Remove float64 * Refactor: Remove float64 * Refactor: Exclude float64 onnx backend * Add: Skip jacobian and gradcheck tests;	2023-07-04 08:40:51 -07:00
Daniel Hipke	b4ce23e4b8	Make cross_process use cloudpickle (#1118 ) * fix syntax issues in imagenet_download.py * use cloudpickle in cross_process to make it work in Python 3.9+ * add cross_process test * prevent unpickling on every function call * add cloudpickle to setup.py * add support for args/kwargs	2023-07-04 00:47:34 -07:00
George Hotz	c709dec8b5	gelu: weird test was broken for metal	2023-07-04 00:43:54 -07:00
George Hotz	daf8e1942f	sigmoid: test large postive also and add note	2023-07-04 00:18:31 -07:00
Kunwar Raj Singh	9e6067378f	Broken Sigmoid backward: Add test and mlop for Sigmoid (#1113 ) * Add failing sigmoid test * update more tests * add mlop for sigmoid * add back test * math.log(math.e) = 1 * remove divides --------- Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>	2023-07-04 00:14:22 -07:00
Reza Rezvan	8ae9a054ae	Refactor nn.optim (#1091 ) * Refactor: nn.optim.py * Refactor: nn.optim.py; Fix all tests * Refactor: Replace all optim.get_parameters() * Refactor: Revert list comp. * Refactor: Replace optim.get_state_dict * Refactor: Change quickstart.md	2023-07-02 15:07:30 -07:00
geohotstan	575f75f613	hello (#1084 )	2023-07-01 01:29:35 -07:00
Jacky Lee	754e54ebb9	Fix Tensor ceil and floor for whole numbers (#1071 ) * Works on non-special numbers * Test different cases	2023-06-27 23:22:17 -07:00
George Hotz	d16c16ec28	new upcast works (#1066 ) * new upcast works * float4 try * fix unaligned float4 * disallow unaligned access * upcast dim * maybe good now * fix gpu half * vstore_half4 * fix deep image bugs * improve symbolic to fix issues * fix symbolic * cl test * this maybe * gcd of 1 is 1 * real fix for old python * improve fuzzer	2023-06-27 19:34:53 -07:00

1 2 3 4 5 ...

878 Commits