tinygrad

Commit Graph

Author	SHA1	Message	Date
Friedrich Carl Eichenroth	859d6d0407	Fix mypy examples/beautiful_.py (#6978 ) fix mypy examples/beautiful_.py backwards * add test * Revert "add test" This reverts commit 4d88845ba3f24d83621da0abf55096553abda7fa. --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-10-10 11:34:29 -04:00
Kinvert	960c495755	added beautiful fashion mnist and example (#6961 ) * added beautiful fashion mnist and example * fixing whitespace * refactor Fashion MNIST to fewer lines * fix newline to reduce diff * Update beautiful_mnist.py * Update beautiful_mnist.py --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-10-10 12:01:07 +08:00
chenyu	b5546912e2	10% more TRAIN_STEPS for bert (#6971 ) got two very close run, adding more steps for buffer	2024-10-09 19:21:43 -04:00
chenyu	35cf48659b	limit beam param for bert on green (#6966 ) seems to mitigate the crash	2024-10-09 11:48:18 -04:00
chenyu	1ff2c98f8a	fix logfile name for bert red (#6952 )	2024-10-08 05:37:52 -04:00
chenyu	a78c96273a	update bert epoch logging (#6940 ) * update bert epoch logging epoch for bert is simply number of examples seen (which is used for RCP check) * update total steps too * more changes	2024-10-08 00:34:06 -04:00
chenyu	102dfe5510	back to 210 for bert loss scaler (#6934 ) getting 2 NaN for this, revert back to 210	2024-10-07 10:17:21 -04:00
chenyu	0cf815a93a	bert use BS=66 and update hparams (#6932 ) with dropout memory improvement, we can fit BS=66 now. revert back to the hparams in #5891 too	2024-10-07 05:08:27 -04:00
chenyu	718b959349	log epoch start and stop for bert (#6912 )	2024-10-06 06:39:46 -04:00
chenyu	16c1fa4208	use BEAM=3 for red box bert runs (#6904 ) BEAM=4 slightly exceeded 30 minutes setup	2024-10-05 09:21:12 -04:00
chenyu	0e706227a2	add seed to bert result log filename (#6903 ) * add seed to bert result log filename * different name for different benchmark	2024-10-05 09:15:24 -04:00
George Hotz	f4ec39fe58	switch symbolic from old to uops, final PR (#6872 ) * switch symbolic from old to uops, final PR * two wrong answers * not needed resolves * symbolic ops passes * symbolic ops passes * progress * tests pass (almost) * fix last test * fix some tests * global binding and unbinding * Revert "global binding and unbinding" This reverts commit 9456725630316487509980af20c6d2981de00bec. * that test works now * vars on uop doesn't recurse * fix fuzzer * update * fix type * fix gpt, it's UOp now * ssimplify symbolics	2024-10-04 16:42:27 +08:00
chenyu	7391376528	update bert hparams (#6876 ) 4h32m with this https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/q99frv1l/overview. loss scaler 213->210. matched the closest submission, no nan for ~10 runs. increased lr and total step a bit. `PARALLEL=0` after setup, same as resnet.	2024-10-04 00:39:06 -04:00
chenyu	5f77217772	bert default CKPT to 0 (#6840 ) not required	2024-10-01 21:55:56 -04:00
George Hotz	547733e57c	stunning_mnist [run_process_replay] (#6828 ) * stunning_mnist [run_process_replay] * add loss to stunning mnist	2024-10-01 15:00:48 +08:00
chenyu	f59517754e	add RESET_STEP in bert to control reset (#6818 ) same as resnet	2024-09-30 09:39:04 -04:00
George Hotz	2ed94e447f	gpt2: corealize opt and loss	2024-09-30 09:11:20 +08:00
George Hotz	a76c6c740c	hand pad gpt2 (#6805 )	2024-09-30 09:03:07 +08:00
chenyu	494b20e886	bert BS back to 54 (#6791 ) 60 does not run end to end	2024-09-27 22:16:05 -04:00
chenyu	572d77d1d9	bert script delete eval data after eval (#6790 ) fits BS=60 which is 2% faster than 54. also fixed wandb logging params	2024-09-27 20:54:00 -04:00
chenyu	f9c8e144ff	chmod +x mlperf bert script for red (#6789 ) also disabled raising power cap in setup. wozeparrot mentioned that's unstable and might cause bert training issue on red	2024-09-27 11:27:32 -04:00
Francis Lata	d3a387be63	[MLPerf] Prepare openimages dataset script (#6747 ) * prepare openimages for MLPerf * cleanup * fix issue when clearing jit_cache on retinanet eval * revert pandas specific changes	2024-09-27 11:13:56 -04:00
chenyu	2fc26890c9	default BS=9 in handcode_opt bert (#6783 ) using 54 for 6 gpus now, and 2 is not a good default	2024-09-27 04:38:16 -04:00
George Hotz	9a3f6f392d	llm.c tok/s	2024-09-27 00:46:18 -07:00
George Hotz	b0e70ab04f	llm.c updates	2024-09-27 15:25:59 +08:00
chenyu	bea7ed5986	add RUNMLPERF=1 to bert dev_run.sh (#6775 ) already set in run_and_time.sh, need RUNMLPERF=1 for it to load real data	2024-09-26 11:00:49 -04:00
chenyu	12de203a43	add IGNORE_JIT_FIRST_BEAM to bert scripts (#6769 ) * update bert BEAM params copied from resnet to start with * just IGNORE_JIT_FIRST_BEAM	2024-09-26 05:38:24 -04:00
wozeparrot	15cd42cfb9	feat: support TRACEMETA=2 in handcode_opt (#6767 )	2024-09-26 16:58:29 +08:00
chenyu	5a5fbfa1eb	smaller bert script change (#6768 ) only WANDB and RUNMLPERF order. BENCHMARK and BEAM will be done differently	2024-09-26 04:54:28 -04:00
chenyu	0424c4967d	fix handcode_opt.py for bert (#6756 )	2024-09-26 00:20:24 -04:00
chenyu	396c96357b	update mlperf bert scripts (#6755 ) removed DISABLE_DROPOUT=1. updated BS to 54 that works on tinyboxes with dropouts. used bert's sparse_categorical_crossentropy that takes Tensor ignore_index in accuracy method	2024-09-25 23:55:05 -04:00
George Hotz	7e73c7b3cc	hotfix: bump stable diffusion val distance	2024-09-26 11:15:29 +08:00
wozeparrot	c100f3d406	default threefry (#6116 )	2024-09-25 17:45:13 +08:00
George Hotz	f45d178a55	hotfix: support JIT_BATCH_SIZE=0, make that the default	2024-09-25 10:36:04 +08:00
wozeparrot	f932116e05	feat: small things from default_threefry (#6708 )	2024-09-24 17:00:47 +08:00
Anurag Lamsal	568757e087	fix model_eval.py in the mlperf folder searching for bert vocab in the wrong directory (#6649 )	2024-09-24 11:20:44 +08:00
samm393	19c11792fd	Flux.1 (#6334 ) * initial commit * whitespace * get rid of torch import * indentation * less hardcoding * add flux.1-dev * jit * no double * t5 tidy up * validation image * reuse sdxl autoencoder * typing changes * empty lines * remove unneeded comments --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-09-24 10:08:04 +08:00
George Hotz	b9e6d42a1f	Revert "gated native math in OpenCL (#6683 )" (#6691 ) This reverts commit `2fe3eeed17`.	2024-09-24 08:48:10 +08:00
George Hotz	2fe3eeed17	gated native math in OpenCL (#6683 ) * gated native math * Update cstyle.py	2024-09-23 19:22:13 +08:00
Tobias Fischer	c1bbd15bd9	Sharded SDXL Inference (#6328 ) * initial sharding fixes * sigma device fix * emptyline space fix --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-09-21 01:26:43 -04:00
chenyu	b14c1bc417	UOps.RANGE is_increasing (#6615 ) * UOps.RANGE is_increasing 283 -> 47 valids * test	2024-09-20 03:14:52 -04:00
George Hotz	d02bb270b7	add copyin copyout for image on GPU [run_process_replay] (#6580 ) * add copyin copyout for image on GPU [run_process_replay] * add timing * enqueue vs total run * it's failing but that's fine	2024-09-18 16:06:20 +08:00
George Hotz	d4b662c318	new openpilot compile (#6573 ) * new openpilot compile * note, copyout doesn't work for images	2024-09-18 14:22:50 +08:00
kormann	f5dd25d376	enable whisper batch for long sequences (#6458 ) * long batch +test * long batch +test * cleanup * rollback syntactic changes --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-09-17 00:42:10 -04:00
chenyu	798be6bb74	add gated read_image count in openpilot compile2 (#6546 ) 530 to go	2024-09-16 21:17:00 -04:00
Francis Lata	b7ce9a1530	UNet3D MLPerf (#3470 ) * add training set transforms * add DICE cross entropy loss * convert pred and label to Tensor when calculating DICE score * cleanups and allow train dataset batching * fix DICE CE loss calculation * jitted training step * clean up DICE CE loss calculation * initial support for sharding * Revert "initial support for sharding" This reverts commit e3670813b8a67469e7f694e09f2d15a8c40065da. * minor updates * cleanup imports * add support for sharding * apply temp patch to try to avoid OOM * revert cstyle changes * add gradient acc * hotfix * add FP16 support * add ability to train on smaller image sizes * add support for saving and loading checkpoints + cleanup some various modes * fix issue with using smaller patch size + update W&B logging * disable LR_WARMUP_EPOCHS * updates * minor cleanups * cleanup * update order of transformations * more cleanups * realize loss * cleanup * more cleanup * some cleanups * add RAM usage * minor cleanups * add support for gradient accumulation * cleanup imports * minor updates to not use GA_STEPS * remove FP16 option since it's available now globally * update multi-GPU setup * add timing logs for training loop * go back to using existing dataloader and add ability to preprocess data to save time * clean up optimization and re-enable JIT and multi-GPU support for training and evaluation * free train and eval steps memory * cleanups and scale batch size based on the number of GPUs * fix GlobalCounters import * fix seed * fix W&B setup * update batch size default size * add back metric divergence check * put back JIT on UNet3d eval * move dataset preprocessing inside training code * add test for dice_loss * add config logging support to W&B and other cleanups * change how default float is getting retrieved * remove TinyJit import duplicate * update config logging to W&B and remove JIT on eval_step * no need for caching preprocessed data anymore * fix how evaluation is ran and how often * add support for LR scaling * fix issue with gaussian being moved to scipy.signal.windows * remove DICE loss unit test * fix issue where loss isn't compatible with multiGPU * add individual BEAM control for train and eval steps * fix ndimage scipy import * add BENCHMARK * cleanups on BENCHMARK + fix on rand_flip augmentation during training * cleanup train and eval BEAM envs * add checkpointing support after every eval * cleanup model_eval * disable grad during eval * use new preprocessing dataset mechanism * remove unused import * use training and inference_mode contexts * start eval after benchmarking * add data fetching time * cleanup decorators * more cleanups on training script * add message during benchmarking mode * realize when reassigning LR on scheduler and update default number of epochs * add JIT on eval step * remove JIT on eval_step * add train dataloader for unet3d * move checkpointing to be done after every epoch * revert removal of JIT on unet3d inference * save checkpoint if metric is not successful * Revert "add train dataloader for unet3d" This reverts commit c166d129dfbe2e1c46d1937135a60b4ed25caa3d. * Revert "Revert "add train dataloader for unet3d"" This reverts commit 36366c65d26f59ed1227acb670d5ce7b997606ae. * hotfix: seed was defaulting to a value of 0 * fix SEED value * remove the usage of context managers for setting BEAM and going from training to inference * support new stack API for calculating eval loss and metric * Revert "remove the usage of context managers for setting BEAM and going from training to inference" This reverts commit 2c0ba8d322ec912bd8617cbe167c542e9ba229d9. * check training and test preprocessed folders separately * clean up imports and log FUSE_CONV_BW * use train and val preprocessing constants * add kits19 dataset setup script * update to use the new test decorator for disabling grad * update kits19 dataset setup script * add docs on how to train the model * set default value for BASEDIR * add detailed instruction about BASEDIR usage --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-09-10 04:37:28 -04:00
kormann	f6f4f3222f	whisper long batch (#6335 ) * reset * test * only part refactor	2024-09-09 21:03:59 -04:00
qazal	935b6b658f	delete seen from the scheduler api [run_process_replay] (#6427 ) docs	2024-09-09 16:26:34 +08:00
wozeparrot	cb61cfce24	feat: example and extra tweaks (#6310 )	2024-08-28 19:26:11 -07:00
Tobias Fischer	3517aa89d9	sdxl batched inference fixes (#6293 )	2024-08-28 07:44:58 -04:00

1 2 3 4 5 ...

864 Commits