tinygrad

Commit Graph

Author	SHA1	Message	Date
George Hotz	55ae73e951	Replicate llm.c in tinygrad (#4179 ) * write llm.c and add a few new methods to tensor * training works * add jit * tests for new functions * test tolist * simple fix for onnx test failures (#4186) * write llm.c and add a few new methods to tensor * training works * add jit * tests for new functions * bump line count to 7500 * simplest fix * safenumpy tolist for now --------- Co-authored-by: George Hotz <geohot@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> --------- Co-authored-by: geohotstan <135171913+geohotstan@users.noreply.github.com>	2024-04-16 15:40:48 +04:00
geohotstan	183708b3fd	broadcast expand to match torch (#4085 ) * initial version * heh gimme grrrreen * version 2 * clean ups * some test confusion * fix onnx * rename to _broadcast_tensors * improved errors and test * fixed? * some test fixup * version 3 lol * comments * cleaner * add failure test for expand to 0 test * 1 more assertRaises test * make err msg better * also rewrite the expand onnx op? :s	2024-04-07 16:23:13 -04:00
wozeparrot	a0ab755317	threefry again (#3785 ) * feat: initial xor * feat: initial threefly * feat: remove custom random * fix: really need to install precommit * feat: lmao forgot that this is rotate not a shift * clean: put that there * feat: numpy xor * feat: quick test for xor * feat: llvm xor * feat: slightly working xor in torch * feat: rand works in jit * clean: save a line * feat: match jax * feat: maybe test against jax * feat: requires_grad * fix: fix test_symbolic_ops * feat: lower alpha * feat: just pad * fix: maybe fix training tests? * fix: fix some llvm stuff * feat: cursed realize on the way out * feat: testing jax * fix: why is the jax install process not simple * fix: maybe passing test * fix: symbolic workarounds * clean: still need that precommit * fix: aaaa * fix: more test fixes * fix: quick fix for wgsl * feat: need to set requires_grad on the final tensor * feat: one more tensor * feat: don't take forever * feat: seeing y ci is brok * feat: can't allocate 64GiB lmao * fix: fix this * feat: hope this doesn't break smth before i go to bed * feat: don't destroy ram * feat: int * feat: remove jax * feat: properish workaround? * feat: skip slow webgpu tests * feat: no longer fails * feat: use dtypes * feat: real number * fix: torch * fix: don't test against reference for torch * feat: to device * feat: fix advanced indexing * feat: correct casting * feat: even rng_counter * feat: match master * feat: this was actually bad * fix: maybe? * feat: store * feat: remove realizes * feat: somehow this is important * feat: somehow this is also important * feat: save a line * fix: don't need that anymore * feat: restore this * fix: linter * feat: remove realizes * fix: realized is in base now * fix: add back cast * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: :( * fix: :( * fix: not being dumb * feat: try changing less tests * feat: shouldn't have to change that * feat: contiguous bumps it by one * fix: hmm * fix: numpy memory moment * fix: cl_khr_fp16 * fix: torch has different tensor count * fix: missing contiguous * hmm: hmm * fix: some fixes * fix: typing * feat: dont do that * feat: typing fixes * feat: why is this realize required? * feat: ngl kinda odd typing * feat: oh * feat: remove realizes * feat: why is this realize required? * fix: hacky patch for cudacpu * fix: without this realize pytest crashes????? * fix: shorter line * fix: cudacpu fixes * fix: cudacpu fixes * feat: real buffer * feat: don't search when searching lmao * fix: can't use contiguous things * fix: no more 100GB arrays * fix: revert * fix: skip 7 and 10 * feat: working ish beam * feat: minimize changes * feat: seed 0 stable diffusion example changed * fix: different on ci * fix: no beam * feat: make threefry optional * fix: check value * fix: unused import * feat: threefry default * fix: 5d * feat: allow non upcast div * fix: 5d better * fix: 5d better * fix: save all dtype * feat: proper error * feat: lazyop key * fix: check float * feat: try removing this realize now * feat: disable threefry for uops hip tensor cores * feat: don't need that * feat: only check upcast * fix: disable threefry for some metal tests * feat: disable for metal tensor uops as well * feat: disable for most uops * fix: disable threefry for new uops tests * feat: multitensor * fix: typing * feat: threefry default off * feat: skip threefry half rand * feat: restore old * fix: bad git * clean: ruff * feat: bfloat16 fix * fix: :\| * feat: restore old --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-03-18 16:47:07 -04:00
George Hotz	311cf2b7d3	Revert "threefry_2x32 (#2601 )" (#3784 ) This reverts commit `db3de54bc4`.	2024-03-17 10:27:20 -07:00
wozeparrot	db3de54bc4	threefry_2x32 (#2601 ) * feat: initial xor * feat: initial threefly * feat: remove custom random * fix: really need to install precommit * feat: lmao forgot that this is rotate not a shift * clean: put that there * feat: numpy xor * feat: quick test for xor * feat: llvm xor * feat: slightly working xor in torch * feat: rand works in jit * clean: save a line * feat: match jax * feat: maybe test against jax * feat: requires_grad * fix: fix test_symbolic_ops * feat: lower alpha * feat: just pad * fix: maybe fix training tests? * fix: fix some llvm stuff * feat: cursed realize on the way out * feat: testing jax * fix: why is the jax install process not simple * fix: maybe passing test * fix: symbolic workarounds * clean: still need that precommit * fix: aaaa * fix: more test fixes * fix: quick fix for wgsl * feat: need to set requires_grad on the final tensor * feat: one more tensor * feat: don't take forever * feat: seeing y ci is brok * feat: can't allocate 64GiB lmao * fix: fix this * feat: hope this doesn't break smth before i go to bed * feat: don't destroy ram * feat: int * feat: remove jax * feat: properish workaround? * feat: skip slow webgpu tests * feat: no longer fails * feat: use dtypes * feat: real number * fix: torch * fix: don't test against reference for torch * feat: to device * feat: fix advanced indexing * feat: correct casting * feat: even rng_counter * feat: match master * feat: this was actually bad * fix: maybe? * feat: store * feat: remove realizes * feat: somehow this is important * feat: somehow this is also important * feat: save a line * fix: don't need that anymore * feat: restore this * fix: linter * feat: remove realizes * fix: realized is in base now * fix: add back cast * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: :( * fix: :( * fix: not being dumb * feat: try changing less tests * feat: shouldn't have to change that * feat: contiguous bumps it by one * fix: hmm * fix: numpy memory moment * fix: cl_khr_fp16 * fix: torch has different tensor count * fix: missing contiguous * hmm: hmm * fix: some fixes * fix: typing * feat: dont do that * feat: typing fixes * feat: why is this realize required? * feat: ngl kinda odd typing * feat: oh * feat: remove realizes * feat: why is this realize required? * fix: hacky patch for cudacpu * fix: without this realize pytest crashes????? * fix: shorter line * fix: cudacpu fixes * fix: cudacpu fixes * feat: real buffer * feat: don't search when searching lmao * fix: can't use contiguous things * fix: no more 100GB arrays * fix: revert * fix: skip 7 and 10 * feat: working ish beam * feat: minimize changes * feat: seed 0 stable diffusion example changed * fix: different on ci * fix: no beam * feat: make threefry optional * fix: check value * fix: unused import * feat: threefry default * fix: 5d * feat: allow non upcast div * fix: 5d better * fix: 5d better * fix: save all dtype * feat: proper error * feat: lazyop key * fix: check float * feat: try removing this realize now * feat: disable threefry for uops hip tensor cores * feat: don't need that * feat: only check upcast * fix: disable threefry for some metal tests * feat: disable for metal tensor uops as well * feat: disable for most uops * fix: disable threefry for new uops tests * feat: multitensor * fix: typing * feat: threefry default off * feat: skip threefry half rand * feat: restore old * fix: bad git * clean: ruff * feat: bfloat16 fix * fix: :\| --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-03-17 10:19:33 -07:00
Francis Lata	957ae9b594	Fix Tensor's __repr__ for printing out grad (#3673 ) * update check for Tensor's __repr__ with grad * add test for repr with grad bugfix	2024-03-10 17:04:29 -04:00
Maximilian Wolf	8ae85b2cf5	add inference_mode context manager with decorator support (#3621 ) * add inference_mode context manager with decorator support * change val to mode for train and inference_mode * fix wrong rename	2024-03-09 08:38:26 -08:00
chenyu	4552248c84	fix Tensor.to preserves grad.data (#3636 )	2024-03-06 21:44:49 -05:00
chenyu	8f10bfa2ff	ban __bool__ on Tensor (#3632 ) * ban __bool__ on Tensor avoid misuse * test case * fix tests * fix more tests	2024-03-06 17:12:35 -05:00
chenyu	282bbd5acb	check the input length into argfix (#3610 ) * check the input length into argfix it's possible to overlook setting keyword for kwargs and argfix silently truncates input * add test	2024-03-04 19:50:17 -05:00
Marcin Słowik	56d21d77b3	Fix two bugs concerning Tensor.to. (#3593 ) 1. Tensor.to should return self if device == self.device. This was not the case if provided with non-canonical name of self.device. 2. Tensor.to result was missing graph, even though requires_grad and grad were propagated . Add corresponding tests.	2024-03-03 08:48:56 -08:00
chenyu	30f26279c5	add back "CPU" in test_onnx_backend supports_device (#3426 ) the onnx tests were all skipped.	2024-02-16 00:49:30 -05:00
xarkes	28a8b72024	Remove Interpreted device & remaining CPU/TORCH ref (#3423 ) * Remove Interpreted device & remaining CPU/TORCH ref * Oops * supports_device was useful * Fix doc wording --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-02-16 00:30:21 -05:00
George Hotz	b1c0d8c99d	remove cpu and torch backends (#3399 ) * remove cpu and torch backends * don't copy to cpu * use clang instead of cpu * multitensor gathers on the first device * clang is cpu + use default * fixup * bugfix	2024-02-15 16:55:39 +01:00
George Hotz	93eceef727	remove cpu prereqs (#3410 )	2024-02-15 13:45:06 +01:00
Obada Khalili	ee25f73283	Fix Tensor.mean to compute the mean correctly when 0-length axes are selected (#3318 ) * fix Tensor.mean to compute the mean correctly with 0-length axes are selected * add a regression test * rename sum variable to sum_t to avoid conflict with built it function * refactor Tensor.mean to has less lines	2024-02-05 01:40:37 -05:00
chenyu	2f4b3ab1c0	shard and to should preserve requires_grad (#3224 ) dtypes are inferred from underlying lazydata, requires_grad needs to be passed explicitly	2024-01-24 00:15:10 -05:00
chenyu	e6c71f1b26	fix device of Tensor.arange inside Tensor.one_hot (#3199 ) it should have the same device as self	2024-01-21 21:03:50 -05:00
George Hotz	a280cfe169	move dtypes to dtype.py (#2964 ) * move dtypes to dtype.py * fix urllib	2024-01-01 14:58:48 -08:00
chenyu	2783e1b50d	bugfix Tensor.item when it's unbased (#2913 ) it's possible for numel 1 tensor lazydata to be unbased and should call lazydata.base.realized	2023-12-22 13:50:06 -05:00
chenyu	20ea43b6e7	dtypes.from_py to convert py types to dtypes (#2826 ) also updated some tests to test against default dtypes	2023-12-18 14:23:31 -05:00
chenyu	0723f26c80	dtypes.default_float and dtypes.default_int (#2824 )	2023-12-18 12:21:44 -05:00
chenyu	b4fa189c8c	Revert "Revert "Make Tensor creation allow multi-dim list of int and bool (#2793 )" (#2810 )" (#2813 ) This reverts commit `71a60762ed`.	2023-12-17 11:48:27 -05:00
chenyu	71a60762ed	Revert "Make Tensor creation allow multi-dim list of int and bool (#2793 )" (#2810 ) This reverts commit `798bf813b1`.	2023-12-17 02:03:52 -05:00
geohotstan	798bf813b1	Make Tensor creation allow multi-dim list of int and bool (#2793 ) * the universe is flat as a 2D tensor * try this * TESTS * less lines in test * don't change all_int since other places use it * add tests and del noqa by making non-aesthetic spacing LOOOOOL * some reordering * fixed empty list and add tests * more tests * add list bool tensors * clearer with least lines added * added bool * oops * more tests * improved tests * oops	2023-12-17 01:58:10 -05:00
chenyu	c5fa9eb36e	int / List[int] data -> dtypes.int32 (#2789 )	2023-12-16 01:25:44 -05:00
George Hotz	d87a246439	move to new cached fetch (#2493 ) * move to new cached fetch * extra.utils is over * loads * bump download cache * bump timeout	2023-11-28 17:36:55 -08:00
Christopher Mauri Milan	7f01dd04f0	Apply ruff linting rules to tests (#2473 ) * everything except F821 * enable F821 with noqa * dumb fix * fix remaining imports and (former) lambdas * replace _ with noqa to avoid gc	2023-11-27 21:24:06 -08:00
chenyu	c4cc4966ed	update some test_tensor.py cases with 0 in shape (#2368 )	2023-11-19 20:35:05 -05:00
chenyu	6add808f6a	support tuple shape input for rand and empty (#2367 )	2023-11-19 20:20:39 -05:00
chenyu	9a20bc08d6	Tensor(None) is Tensor([]) (#2316 )	2023-11-15 13:49:18 -05:00
chenyu	f1f863c953	allow 0-dim array to broadcast into zero shape tensor (#2315 ) * allow 0-dim array to broadcast into zero shape tensor * not in	2023-11-15 13:12:21 -05:00
chenyu	123a0b86b2	support zero in shape (#2303 ) * zero in shape start * no assert for that * if output size is 0, return without exec * tweak * strides * reduce over non-zero * shrink and expand * fix import * test_elementwise where * cannot reshape from size 0 to size 1 * compiled backend reduce over 0 * zeros for numpy * reduce over 0 and keepdim resulted in 1 * reduce empty set default values * compare with same input * pad test case * cat test case * torch does not support that?	2023-11-15 11:57:48 -05:00
imaolo	6ee0435263	added from unaligned np test (#2134 )	2023-10-23 11:38:57 -04:00
nimlgen	2a49f7e456	fix transfer to mapped buffers (#1923 )	2023-09-29 00:50:24 -07:00
Yixiang Gao	094d3d71be	with Tensor.train() (#1935 ) * add with.train * remove the rest TODOs * fix pyflake * fix pyflake error * fix mypy	2023-09-28 18:02:31 -07:00
Yixiang Gao	a32951a001	add test_tensor_copy (#1840 ) * add test_tensor_copy * fix whitespace * add value check	2023-09-10 16:01:58 -07:00
badcc	ee9ac20752	Use correct dtype in Tensor when data is an ndarray (#1785 ) * use correct dtype in Tensor when data is an ndarray * attempt 2 * add assert to be consistent * Add test case for ndarray * Add test case for list * remove whitespace	2023-09-06 07:35:32 -07:00
nimlgen	355b02dc3f	allow zerosized tensors (#1659 ) * allow zerosized tensors * works with numpy	2023-08-30 10:39:24 -07:00
Yixiang Gao	8d6662a741	.cpu().numpy() -> .numpy() (#1594 ) * .cpu().numpy() -> .numpy() * restore ops_torch * restore test_speed_v_torch	2023-08-21 09:53:29 -07:00
YiMing Han	e00acb1eaf	fix deepwalk ctx check (#1536 )	2023-08-13 23:03:17 -07:00
Diogo	d7d1011f1e	Add WEBGPU tests to CI (#1463 ) * webgpu tests * assert device is webgpu * missed env set * exclude failing ci tests * ignore test file * changed acc for adam test	2023-08-06 10:32:01 -07:00
Diogo	ba5e3818a0	Limit dims based on max size (#1390 ) * working * whitespace * changed defaults to None * linter * last linter error	2023-07-31 19:18:19 -07:00
JaSpa99	5ab12059da	rng hlops: add normal and kaiming_normal (#1378 ) * add normal and kaiming_normal * make sure its float * add tests	2023-07-31 10:37:02 -07:00
Karan Handa	e0a69bdbe6	Fix argfix and add tests (#1365 ) * Remove unreachable code * Fixed argfix * Add empty check and tests * Removed redundant tests"	2023-07-28 09:09:49 -07:00
cheeetoo	a0965ee198	CI < 5 minutes (#1252 ) * models matrix * fix typo and install gpu deps * install llvm deps if needed * fix * testops with cuda * remove pip cache since not work * cuda env * install cuda deps * maybe it will work now * i can't read * all tests in matrix * trim down more * opencl stuff in matrix * opencl pip cache * test split * change cuda test exclusion * test * fix cuda maybe * add models * add more n=auto * third thing * fix bug * cache pip more * change name * update tests * try again cause why not * balance * try again... * try apt cache for cuda * try on gpu: * try cuda again * update packages step * replace libz-dev with zlib1g-dev * only cache cuda * why error * fix gpuocelot bug * apt cache err * apt cache to slow? * opt and image in single runner * add a couple n=autos * remove test matrix * try cuda apt cache again * libz-dev -> zlib1g-dev * remove -s since not supported by xdist * the cache takes too long and doesn't work * combine webgpu and metal tests * combine imagenet to c and cpu tests * torch tests with linters * torch back by itself * small windows clang test with torch tests * fix a goofy windows bug * im dumb * bro * clang with linters * fix pylint error * linter not work on windows * try with clang again * clang and imagenet? * install deps * fix * fix quote * clang by itself (windows too slow) * env vars for imagenet * cache pip for metal and webgpu tests * try torch with metal and webgpu * doesn't work, too long * remove -v * try -n=logical * don't use logical * revert accidental thing * remove some prints unless CI * fix print unless CI * ignore speed tests for slow tests * clang windows in matrix (ubuntu being tested in imagenet->c test) * try manual pip cache * fix windows pip cache path * all manual pip cache * fix pip cache dir for macos * print_ci function in helpers * CI as variable, no print_ci * missed one * cuda tests with docker image * remove setup-python action for cuda * python->python3? * remove -s -v * try fix pip cache * maybe fix * try to fix pip cache * is this the path? * maybe cache pip * try again * create wheels dir * ? * cuda pip deps in dockerfile * disable pip cache for clang * image from ghcr instead of docker hub * why is clang like this * fast deps * try use different caches * remove the fast thing * try with lighter image * remove setup python for cuda * small docker and cuda fast deps * ignore a few more tests * cool docker thing (maybe) * oops * quotes * fix docker command * fix bug * ignore train efficientnet test * remove dockerfile (docker stuff takes too long) * remove docker stuff and normal cuda * oops * ignore the tests for cuda * does this work * ignore test_train on slow backends * add space * llvm ignore same tests as cuda * nvm * ignore lr scheduler tests * get some stats * fix ignore bug * remove extra ' * remove and * ignore test for llvm * change ignored tests and durationon all backends * fix * and -> or * ignore some more cuda tests * finally? * does this fix it * remove durations=0 * add some more tests to llvm * make last pytest more readable * fix * don't train efficientnet on cpu * try w/out pip cache * pip cache seems to be generally better * pytest file markers * try apt fast for cuda * use quick install for apt-fast * apt-fast not worth * apt-get to apt * fix typo * suppress warnings * register markers * disable debug on fuzz tests * change marker names * apt update and apt install in one command * update marker names in test.yml * webgpu pytest marker	2023-07-23 13:00:56 -07:00
chenyu	940b6fd21a	Revert "Fix constant folding for Tensor([3]) (#1227 )" (#1274 ) This reverts commit `ab645317c9`.	2023-07-19 10:51:06 -07:00
chenyu	ab645317c9	Fix constant folding for Tensor([3]) (#1227 ) * Fix constant folding for Tensor([3]) * Remove duplicated prod import * load in the same device * better numpy * add constant fold shape test cases * improve tests	2023-07-11 14:01:32 -07:00
fluffy χατγιρλ	628ee46627	Fix bug where Tensor.randn returns inf (#1192 ) * fix randn inf bug * add test * more compact test * clarify test purpose	2023-07-08 12:03:46 -07:00
Reza Rezvan	d1356cac27	Fix: Jacobian tests [WIP] (#1126 ) * Fix: Jacobian tests; num_jacobian either bugged or not accurate enough; * Fix: Jacobian tests; * Fix: Gradcheck;	2023-07-05 15:36:22 -07:00

1 2

92 Commits