tinygrad

Commit Graph

Author	SHA1	Message	Date
George Hotz	d9c62a33c3	add cifar to datasets.py (#6210 )	2024-08-20 11:42:49 -07:00
George Hotz	8390feb7b9	optim.OptimizerGroup in hlb_cifar (#5401 )	2024-07-11 20:14:36 -07:00
George Hotz	5ba611787d	move image into tensor.py. delete features (#4603 ) * move image into tensor.py * change setup.py * openpilot tests need pythonpath now	2024-05-15 10:50:25 -07:00
David Hou	c0a048c044	batchnorm d(var)/d(mean) = 0 (#4430 ) * d(var)/d(mean) = 0 * drop the number in test_schedule!	2024-05-05 00:25:45 -04:00
David Hou	593c90d7d6	Resnet fp16 training with fp32 master weight copy (#4144 ) * add casts to layers * FLOAT flag * detach * no_grad for eval * whitespace * explicit fp32 initialization * oops * whitespace * put back config['DEFAULT_FLOAT'] * bad * live dangerously (don't hide bugs) * don't bundle changes --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-04-14 11:25:08 -04:00
chenyu	c71627fee6	move GlobalCounter to helpers (#4002 ) break circular import between ops and buffer	2024-03-30 00:30:30 -04:00
David Hou	4b95350c41	fp16 resnet (without expand backwards sum in float, doesn't work) (#3816 ) * fp16 resnet * cast running mean and var back to default float * extra cast * check symbolic no overflow * add linearizer failure * loss scaler after grad contig * oops * i think this works * don't loss scale fp32 * remove overflow test case * remove symbolic bounds check * loss scaler should be float * temporarily disable padto cuz bug shruggie * make running stats in batchnorm float32? * calculate lars stuff in fp32? * oops * remove most changes * move loss scaler out of optimizer * no more FP16 var * oops --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-03-28 01:25:37 -04:00
chenyu	83f39a8ceb	env var to change default float (#3902 ) * env var to change default float to fp16 or bf16 looking for standard names for these. we have FLOAT16 that does something to IMAGE and HALF to convert weights. working on default bf16 too. ``` RuntimeError: compile failed: <null>(6): error: identifier "__bf16" is undefined __bf16 cast0 = (nv_bfloat16)(val0); ``` remove that in cifar * DEFAULT_FLOAT * default of default * unit test * don't check default * tests work on linux	2024-03-24 20:33:57 -04:00
chenyu	e22d78b3d2	training cifar with BF16 on CUDA (#3905 ) * training cifar with BF16 on CUDA memory usage is between float and half due to numpy calls on dataset preprocessing, which converts into float. * simpler bf16 functions * bf16 cifar works for HSA too just very slow * simpler bf16 functions, we love cuda	2024-03-24 01:37:47 -04:00
Francis Lam	a26090d404	search: change to use "spawn" and limit the number of tasks per child (#3862 ) also clean up some examples to use __main__ and not initialize resources outside of main	2024-03-21 21:23:36 -07:00
chenyu	b13457e4a7	explicit dtypes in hlb_cifar (#3707 ) prepared bfloat16 change. added float() and cast(default_float) in whiteing, explicitly set dtype in various places that convert between numpy and Tensor	2024-03-12 18:20:23 -04:00
David Hou	d16aa89561	don't allow MLB assigns with different axes (#3557 ) * allow LB <- MLB assign, but don't reuse buffer * update test * update test * assign assert axes are the same * update tests to manually shard running stats * unused import	2024-03-01 07:59:06 -05:00
David Hou	e5385eecfc	UnsyncedBatchNorm with synced trainable weights for hlb cifar (#3472 ) * UnsyncedBatchNorm with synced trainable weights for hlb cifar * multitensor reshape tests * test mlb assign change axis * E501 * argfix axis * don't import batchnorm from hlb_cifar in test_multitensor * pass num_devices to UnsyncedBatchNorm in test, allow UnsyncedBatchNorm to be used with LB * add backprop test for UnsyncedBatchNorm * break out MLB assign and reshape changes * manually shard running mean and running var * don't shard unless syncbn=0 * replace nn.BatchNorm2d with UnsyncedBatchNorm * don't increment num_batches_tracked if not tracking running stats * update tests * oops * Revert "oops" This reverts commit 5e8a67a535abea2ff288b1b804a9aa95eba40732. * Revert "update tests" This reverts commit 7ebf65d89ace1d3a32c3b28ee323ddee253262d6. * Revert "don't increment num_batches_tracked if not tracking running stats" This reverts commit 78de0ea9ee8cbd65dce28bd4abcc131c98451aa2. * Revert "replace nn.BatchNorm2d with UnsyncedBatchNorm" This reverts commit d03da53da70f009338e95f2b46315ac02a30149a. * don't increment num_batched_tracked if not tracking running stats * oops * test_batchnorm_axis * compare against torch * types --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-02-29 22:52:07 -05:00
chenyu	d8ad9e5660	verify eval acc for hlb_cifar training (#3344 ) set to 93% to reduce flakiness for now	2024-02-07 19:19:59 -05:00
chenyu	18e854cdbf	shrink MLB on sharded axis (#3255 ) * shrink MLB on sharded axis use onehot structure to store the real partition. goal is unsynced batchnorm2d that can be run on multigpu for training. draft version in https://github.com/chenyuxyz/tinygrad/pull/109 * SYNCBN flag * test unclean shrinks * UnsyncedBatchNorm reuses BatchNorm * more robust pad arg check * better types * more tests! * 6 gpus in benchmark * disable slow GPUS=6 benchmark	2024-01-31 21:48:25 -05:00
chenyu	b0a755288f	cifar EVAL_BS set default value to BS (#3274 ) less compile time for eval due to cache. 500 was a slow uneven number for 6 GPU too. eval time 5.9s -> 3.4s	2024-01-29 17:37:12 -05:00
chenyu	9e5409be6c	cifar move GlobalCounters.reset() before shard (#3217 ) * cifar move GlobalCounters.reset() before shard also shard mini batch inplace * don't eval with DISABLE_BACKWARD	2024-01-23 16:07:43 -05:00
chenyu	3c179cc27c	cifar only shuffle data at epoch start (#3216 ) save 1ms CPU time per batch. also only shuffle training set	2024-01-23 14:41:22 -05:00
chenyu	8465938d29	minor hlb_cifar cleanups (#3208 ) mostly cosmetic. LATEBEAM=4 single 7900xtx 59.2 seconds	2024-01-22 12:38:39 -05:00
chenyu	827b7a3c64	cleanup pad_reflect and make_square_mask in hlb_cifar (#3206 ) removed some complicated looking stuff. no wall time difference	2024-01-22 11:30:46 -05:00
chenyu	99884f4c98	cifar flags for RANDOM_CROP, RANDOM_FLIP, and CUTMIX (#3204 ) experimenting with different setups, also would like to jit the data augmentation next	2024-01-22 01:12:51 -05:00
chenyu	836883fedc	comment out cutmix in hlb_cifar (#3201 ) it's no-op with multi gpu and less STEPS. also the patch was selected from the whole dataset, not from the same batch	2024-01-21 22:24:53 -05:00
chenyu	e52a609240	make WINO a context var, and LATEWINO in hlb_cifar (#3161 )	2024-01-17 20:21:26 -05:00
chenyu	589c16756f	hlb_cifar multi gpu training (#3150 ) * cifar train with multi gpu * GPUS=1 is noop	2024-01-16 14:38:45 -05:00
chenyu	b9d470577c	gelu -> quick_gelu in hlb_cifar (#3147 ) 89 -> 86 seconds, same eval acc	2024-01-16 02:03:37 -05:00
chenyu	ec5a212b0a	modernize hlb_cifar (#3146 ) * modernize hlb_cifar do more things in Tensor space instead of numpy, clean up dtypes and use more Tensor methods. * eigens are float64	2024-01-16 01:35:11 -05:00
chenyu	22920a7e55	add LATEBEAM to hlb_cifar (#3142 ) still too slow to search on tinybox though	2024-01-15 23:26:03 -05:00
Yixiang Gao	8e1fd6ae9d	test works	2024-01-03 07:22:01 -08:00
Yixiang Gao	4f89f8b73a	make sure the old hyp breaks the test	2024-01-03 07:13:54 -08:00
Yixiang Gao	b753d280f7	move hyp out of the train so it can be imported	2024-01-02 15:56:17 -08:00
Yixiang Gao	2e4d9ad936	adjsut div factor to avoid underflow	2024-01-02 13:47:13 -08:00
George Hotz	a280cfe169	move dtypes to dtype.py (#2964 ) * move dtypes to dtype.py * fix urllib	2024-01-01 14:58:48 -08:00
George Hotz	c81ce9643d	move globalcounters to ops (#2960 ) * move globalcounters to ops * missed a few * sick of that failing	2024-01-01 14:21:02 -08:00
chenyu	6d7e9e0a56	hotfix convert Y_train to int before passing into index (#2850 )	2023-12-19 11:40:56 -05:00
chenyu	0723f26c80	dtypes.default_float and dtypes.default_int (#2824 )	2023-12-18 12:21:44 -05:00
George Hotz	c6eb618013	tests from new lazy branch (#2774 ) * tests from new lazy branch * fix lin 11 * that was needed * doesn't fail * mark * meant that * llvm passes	2023-12-14 23:06:39 -08:00
qazal	ab2d4d8d29	Fix cl import in the copy_speed test and cifar example (#2586 ) * fix CL import * update test to only run on GPU * update hlb_cifar too	2023-12-03 09:22:07 -08:00
George Hotz	2c363b5f0b	new style device (#2530 ) * cpu tests pass * torch works * works * metal works * fix ops_disk * metal jit works * fix openpilot * llvm and clang work * fix webgpu * docs are rly broken * LRU works on metal * delete comment * revert name to ._buf. LRU only on Compiled * changes * allocator * allocator, getting closer * lru alloc * LRUAllocator * all pass * metal * cuda * test examples * linearizer * test fixes * fix custom + clean realize * fix hip * skip tests * fix tests * fix size=0 * fix MOCKHIP * fix thneed * copy better * simple * old style metal copy * fix thneed * np reshape * give cuda a device	2023-11-30 17:07:16 -08:00
George Hotz	9e07824542	move device to device.py (#2466 ) * move device to device.py * pylint test --disable R,C,W,E --enable E0611 * fix tests	2023-11-27 11:34:37 -08:00
wozeparrot	4c44d1344b	feat: remove cache_id (#2236 )	2023-11-08 08:09:21 -08:00
George Hotz	2f7aab3d13	move optimize_local_size (#2221 ) * move optimize_local_size * interpret_ast	2023-11-05 21:00:52 -08:00
wozeparrot	c29653605e	hip multigpu training (#1878 ) * feat: move to hip * feat: special path for RawBufferTransfer * feat: initial rawbuffertransfer * feat: hip ipc * feat: working hip ipc * feat: need to base device without args * feat: close mem handle * feat: modified test * feat: more multihip stuff * clean: cleanup * feat: cleaner * feat: don't crash * feat: test more * clean: way cleaner hip wrapper * feat: barrier * feat: barrier * feat: this breaks stuff * feat: we can use empty here * feat: maybe fix tests * feat: maybe fix tests again? * fix: probably fix tests * feat: no waiting here * feat: wait here * feat: much larger test * feat: need to sync here * feat: make this async * feat: no waiting! * feat: cut here * feat: sync copy * feat: random imports * feat: much cleaner world * feat: restore this * feat: restore this * clean: cleanup * feat: set this	2023-10-24 17:35:53 -04:00
George Hotz	5cfec59abc	hlb cifar touchups (#2113 ) * types and cnt and EVAL_STEPS * eval time + always print eval	2023-10-18 16:26:15 -07:00
wozeparrot	4d1e59abfd	fix: only when distributed (#2102 )	2023-10-17 20:09:04 -07:00
Sean D'Souza	999c95ea29	fix: hlb cifar types (#2099 )	2023-10-17 19:23:50 -07:00
George Hotz	9b1c3cd9ca	hlb_cifar: support EVAL_STEPS=1000, print when dataset is shuffled	2023-10-18 01:11:08 +00:00
Yixiang Gao	3187962476	CIFAR HALF mode (#2041 ) * load weights in fp16 * add dtype option in nn * fix test * no need for dtype in nn * add option to load weights in FP16, but NaN * change loss scaler * cast to float32 for norm layer * add a todo for the forward pass padding * fix transform	2023-10-12 10:19:51 -07:00
Yixiang Gao	094d3d71be	with Tensor.train() (#1935 ) * add with.train * remove the rest TODOs * fix pyflake * fix pyflake error * fix mypy	2023-09-28 18:02:31 -07:00
Yixiang Gao	cb5d6576cb	cifar step time 65ms while stay above 94% (#1888 ) * change reduceop heruistics * add model ema and jit hack * add ema eval * have to create a duplicate eval function for jit * remove manual seed * 94% achieveable with normal eval * ema is outputting the same results as normal * fix ema bug * ema achieves 94% with fix seed * multigpu tested * constant fold decay, fix jit, adjust message for multigpu * pull SpeedyResNet out of train_cifar()	2023-09-21 11:19:32 +08:00
Yixiang Gao	9d93a82354	remove FAKEDATA (#1685 )	2023-08-26 20:15:54 -04:00

1 2

92 Commits