tinygrad

Commit Graph

Author	SHA1	Message	Date
nimlgen	408c0a5e7f	qcom match texture/sampler descriptors to OpenCL (#7622 ) * qcom ioctl compare more regs * bug fix	2024-11-11 15:57:42 -05:00
nimlgen	45db7d9045	fuzz qcom vs opencl (#7130 ) * fuzz qcom vs opencl * fix nv * bettre? * typo * open both devs	2024-10-17 18:49:08 +03:00
George Hotz	3169cb386d	remove graph [pr] (#7085 )	2024-10-16 11:40:07 +08:00
nimlgen	b025495e5c	fuzz nv vs cuda (#7066 ) * fuzz nv vs cuda * fixes * smth * um * cmp the same * dnrt * correct gpfifo scan * fix	2024-10-15 22:22:40 +03:00
qazal	8ff6514ba3	delete extra/ops.py [pr] (#7072 )	2024-10-15 22:14:21 +03:00
nimlgen	586ff4c910	nv record uvm mappings (#7059 ) * nv record uvm mappings * linteeer * smth * ooops	2024-10-15 00:12:49 +03:00
nimlgen	8094340221	nv print info about faults (#7057 ) * nv print info about faults * unrelated changes * nv_gpu.GT200_DEBUGGER in mockgpu * regen with ocrrect version * spacing	2024-10-14 21:49:38 +03:00
chenyu	bd8ecf7fd6	remove NumNode (#7035 )	2024-10-13 16:42:19 -04:00
chenyu	c4c806a210	generate new kernel dataset (#7034 ) * generate new kernel dataset pre req to remove NumNode ``` extra/optimization/generate_dataset.sh gzip -k /tmp/sops mv /tmp/sops.gz extra/datasets/ ``` * fix var range in fuzz_linearizer	2024-10-13 16:19:41 -04:00
qazal	13846930cd	hotfix: extract_dataset.py (#7029 )	2024-10-13 11:18:23 +03:00
George Hotz	a71bb09ec3	remove symbolic file [pr] (#7012 )	2024-10-12 18:44:44 +08:00
George Hotz	5ae2de9845	UOp.variable (#7010 ) * UOp.variable [pr] * fix tests * clean * improve name rendering * last bug	2024-10-12 18:20:44 +08:00
qazal	20d3c2d113	unify UOps.SHAPETRACKER and UOps.SWIZZLE with UOps.VIEW (#6955 ) * add UOps.VIEW * update hardcoded asts * update sops.gz	2024-10-09 02:00:17 +08:00
Tobias Fischer	f9e32f2bb2	clip device fix (#6924 )	2024-10-07 00:47:32 +08:00
chenyu	01a2d7316d	dtype=float in bert log_softmax for loss and accuracy (#6916 )	2024-10-06 11:15:56 -04:00
George Hotz	4df5c7a4ef	move lazy to engine [pr] (#6886 ) * move lazy to engine [pr] * engine.lazy	2024-10-04 23:19:26 +08:00
George Hotz	8ca506ee37	remove the magic methods for moving between devices [pr] (#6881 ) * remove the magic methods for moving between devices [pr] * remove unneeded clang	2024-10-04 20:27:52 +08:00
chenyu	7c8849010a	fix var_vals in MCTS (#6882 ) tested with JITBEAM=100 llama	2024-10-04 08:19:35 -04:00
George Hotz	a0cb16ac61	node cleanup + local metal test speed [pr] (#6880 ) * node cleanup [pr] * fix tests, including the double one on metal * no time tqdm tests	2024-10-04 18:14:23 +08:00
George Hotz	cdff1d75b6	things that are only used in one place don't belong in helpers [pr] (#6878 ) * things that are only used in one place don't belong in helpers [pr] * pretty print moved	2024-10-04 17:27:38 +08:00
George Hotz	f4ec39fe58	switch symbolic from old to uops, final PR (#6872 ) * switch symbolic from old to uops, final PR * two wrong answers * not needed resolves * symbolic ops passes * symbolic ops passes * progress * tests pass (almost) * fix last test * fix some tests * global binding and unbinding * Revert "global binding and unbinding" This reverts commit 9456725630316487509980af20c6d2981de00bec. * that test works now * vars on uop doesn't recurse * fix fuzzer * update * fix type * fix gpt, it's UOp now * ssimplify symbolics	2024-10-04 16:42:27 +08:00
chenyu	c3c93f332a	symbolic bool raise ValueError when not sure [pr] (#6853 )	2024-10-02 09:10:58 -04:00
Tobias Fischer	33f7599158	Compute FID Score (#6802 ) * compute fid score code * cleaner s1 and m1 loading	2024-10-01 19:47:58 -04:00
Francis Lata	d3a387be63	[MLPerf] Prepare openimages dataset script (#6747 ) * prepare openimages for MLPerf * cleanup * fix issue when clearing jit_cache on retinanet eval * revert pandas specific changes	2024-09-27 11:13:56 -04:00
nimlgen	3c56aeee70	add Tensor.from_blob (#6765 ) * draft tensor from pointer init * some docs and types * comment * cleaner * test * malloc * qcom cl interop * jit example * cleaner * dealoc * wording * docs	2024-09-26 18:33:19 +08:00
chenyu	396c96357b	update mlperf bert scripts (#6755 ) removed DISABLE_DROPOUT=1. updated BS to 54 that works on tinyboxes with dropouts. used bert's sparse_categorical_crossentropy that takes Tensor ignore_index in accuracy method	2024-09-25 23:55:05 -04:00
wozeparrot	4ebc9589a6	feat: make buffer (#6745 )	2024-09-25 18:31:03 +08:00
nimlgen	56979aa3ed	qcom ioctl log levels (#6735 )	2024-09-25 14:59:27 +08:00
wozeparrot	2be0b26a1f	rand only supports single device (#6682 )	2024-09-24 16:07:44 +08:00
nimlgen	ca66b11e07	qcom fix disasm (#6703 )	2024-09-24 15:23:43 +08:00
samm393	19c11792fd	Flux.1 (#6334 ) * initial commit * whitespace * get rid of torch import * indentation * less hardcoding * add flux.1-dev * jit * no double * t5 tidy up * validation image * reuse sdxl autoencoder * typing changes * empty lines * remove unneeded comments --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-09-24 10:08:04 +08:00
chenyu	31b9c74c77	tiny import cleanup and fix typo (#6692 )	2024-09-23 21:48:23 -04:00
George Hotz	2f2f933e50	fix buffer shape regression from onnx (#6678 )	2024-09-23 16:58:42 +08:00
qazal	982086f54c	UOps.VALID try 2 (#6623 ) * make UOps.VALID compile * fixable tests * bufs dedup * cleanup the CONST spec * regenerate dataset with graph_rewrite ```py def rewrite_const(const:UOp, st_src:UOp) -> UOp: st: ShapeTracker = st_src.arg return UOp(UOps.VALID, dtypes.bool, (st.to_uop(),)).where(UOp.const(const.dtype, const.arg), UOp.const(const.dtype, 0)) pm = PatternMatcher([(UPat(UOps.CONST, name="const", src=(UPat(UOps.SHAPETRACKER, name="st_src"),)), rewrite_const)]) ``` * rm arg * remove arg * revert arg removal This reverts commit 2c35c75c950075d38c9fb8572f14640fe8235f74. * red test_pickle_define_var	2024-09-21 14:19:25 +08:00
Tobias Fischer	c1bbd15bd9	Sharded SDXL Inference (#6328 ) * initial sharding fixes * sigma device fix * emptyline space fix --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-09-21 01:26:43 -04:00
nimlgen	641586cb87	qcom updated ioctl (#6627 )	2024-09-20 18:02:40 +08:00
George Hotz	c4d5575c61	beat mlx at resnet 18 (#6611 ) * work to beat mlx at resnet18 [run_process_replay] * pruning * wino sometimes * shorter * comment	2024-09-20 11:28:01 +08:00
qazal	9295bc0189	viz more work [run_process_replay] (#6568 ) * infra * found it * real work * bring those back * cleanup test_viz * comment that out	2024-09-17 19:27:09 +08:00
nimlgen	81a4a9623c	add qcom dsp runtime (#6112 ) * calling qualcomm dsp from python * include so files * add include file * adsprpc.py * running with adsprpc * work * 32-bit support in elf * compilation works * ion * msm_ion * working DSP backend * getting 500 MFLOPS on matmul * beam works with timing * move to autogen * disasm * progress * simple tests pass * qcom_dsp * more dsp autogen * progress * some progress * works w/o lib * checkpoint * no lib * ugh, better * cleaner, but with lib. test good, but with the hack * remove autogens * small * push * simpler * revert this * run_3 * simpler * android * handle * run it * why? * run2 * to gen * cc * cleaner * elf * part of autogen * comemnt * no lib * autohen * linter * bug reproducer * cleaner * this repro is almost empty and doesn't work!!!! * with this test_ops passes, no crashes anymore * cleaner * linter * renames * shorter * remoev contextlib * ugh * myoy * cleaner * cleaner * remove import * conn * import * revert this * remove heavy .so * shorter alloc * not tue anymore --------- Co-authored-by: Comma Device <device@comma.ai> Co-authored-by: George Hotz <geohot@gmail.com> Co-authored-by: George Hotz <george@comma.ai>	2024-09-13 21:01:33 +03:00
George Hotz	bdd0c06f29	add void type to uop (#6471 ) * unwrap_dtype maybe * uopgraph stuff that hardcoded None * test_ops passes * dtypes.py fixups * update test_linearizer and friends * more ast updates * test_beam and test_schedule too * add void type to uop [run_process_replay] * remove dumb casts * start making it green * more cast cleanups * more cls methods to fix * regenerate dataset * split UOp and NOp const * maybe that too * fix docs * update test_uop_symbolic * test_verify_ast * new sops with no diff * meh, type_ignore is alright * remove that assert --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-09-11 18:16:28 +08:00
Francis Lata	b7ce9a1530	UNet3D MLPerf (#3470 ) * add training set transforms * add DICE cross entropy loss * convert pred and label to Tensor when calculating DICE score * cleanups and allow train dataset batching * fix DICE CE loss calculation * jitted training step * clean up DICE CE loss calculation * initial support for sharding * Revert "initial support for sharding" This reverts commit e3670813b8a67469e7f694e09f2d15a8c40065da. * minor updates * cleanup imports * add support for sharding * apply temp patch to try to avoid OOM * revert cstyle changes * add gradient acc * hotfix * add FP16 support * add ability to train on smaller image sizes * add support for saving and loading checkpoints + cleanup some various modes * fix issue with using smaller patch size + update W&B logging * disable LR_WARMUP_EPOCHS * updates * minor cleanups * cleanup * update order of transformations * more cleanups * realize loss * cleanup * more cleanup * some cleanups * add RAM usage * minor cleanups * add support for gradient accumulation * cleanup imports * minor updates to not use GA_STEPS * remove FP16 option since it's available now globally * update multi-GPU setup * add timing logs for training loop * go back to using existing dataloader and add ability to preprocess data to save time * clean up optimization and re-enable JIT and multi-GPU support for training and evaluation * free train and eval steps memory * cleanups and scale batch size based on the number of GPUs * fix GlobalCounters import * fix seed * fix W&B setup * update batch size default size * add back metric divergence check * put back JIT on UNet3d eval * move dataset preprocessing inside training code * add test for dice_loss * add config logging support to W&B and other cleanups * change how default float is getting retrieved * remove TinyJit import duplicate * update config logging to W&B and remove JIT on eval_step * no need for caching preprocessed data anymore * fix how evaluation is ran and how often * add support for LR scaling * fix issue with gaussian being moved to scipy.signal.windows * remove DICE loss unit test * fix issue where loss isn't compatible with multiGPU * add individual BEAM control for train and eval steps * fix ndimage scipy import * add BENCHMARK * cleanups on BENCHMARK + fix on rand_flip augmentation during training * cleanup train and eval BEAM envs * add checkpointing support after every eval * cleanup model_eval * disable grad during eval * use new preprocessing dataset mechanism * remove unused import * use training and inference_mode contexts * start eval after benchmarking * add data fetching time * cleanup decorators * more cleanups on training script * add message during benchmarking mode * realize when reassigning LR on scheduler and update default number of epochs * add JIT on eval step * remove JIT on eval_step * add train dataloader for unet3d * move checkpointing to be done after every epoch * revert removal of JIT on unet3d inference * save checkpoint if metric is not successful * Revert "add train dataloader for unet3d" This reverts commit c166d129dfbe2e1c46d1937135a60b4ed25caa3d. * Revert "Revert "add train dataloader for unet3d"" This reverts commit 36366c65d26f59ed1227acb670d5ce7b997606ae. * hotfix: seed was defaulting to a value of 0 * fix SEED value * remove the usage of context managers for setting BEAM and going from training to inference * support new stack API for calculating eval loss and metric * Revert "remove the usage of context managers for setting BEAM and going from training to inference" This reverts commit 2c0ba8d322ec912bd8617cbe167c542e9ba229d9. * check training and test preprocessed folders separately * clean up imports and log FUSE_CONV_BW * use train and val preprocessing constants * add kits19 dataset setup script * update to use the new test decorator for disabling grad * update kits19 dataset setup script * add docs on how to train the model * set default value for BASEDIR * add detailed instruction about BASEDIR usage --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-09-10 04:37:28 -04:00
George Hotz	904f6a63fa	Revert "Revert "cleanup process_replay/* namings [run_process_replay] (#6429 )…" (#6442 ) This reverts commit `eda177da84`.	2024-09-10 07:00:16 +08:00
George Hotz	dbd4536167	Revert "add UOps.VALID (#6387 )" (#6441 ) This reverts commit `8186e4e7d6`.	2024-09-09 21:33:00 +08:00
George Hotz	eda177da84	Revert "cleanup process_replay/* namings [run_process_replay] (#6429 )" (#6437 ) This reverts commit `f4e83b30b4`.	2024-09-09 18:52:36 +08:00
qazal	f4e83b30b4	cleanup process_replay/* namings [run_process_replay] (#6429 )	2024-09-09 16:59:04 +08:00
George Hotz	8186e4e7d6	add UOps.VALID (#6387 ) * uops valid * broke full_shape * fixup that st (hardcoded asts still red) * fixup DEFINE_VAR debug more debug * start moving stuff to ast_const * move test_linearizer * move test_linearizer_failures to ast_const * fixup test_schedule * small diff change * regenerate dataset * fixup test_multitensor * regen dataset try 2 --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-09-09 16:58:43 +08:00
qazal	c5bae55ec8	new generate_dataset.sh (#6423 ) * new generate_dataset.sh * keep those there * test: rm expected failures * rename to extract	2024-09-09 15:13:07 +08:00
geohotstan	65da03e186	remove _slice [run_process_replay] (#6395 ) * try * pass * clean up * done * I'm becoming dumber * clean up 2 * remove useless max * useless but make computer brrr [run_process_replay] * try process replay * try again * 1 less line, just use pad2d	2024-09-08 09:12:39 +08:00
George Hotz	8f6d0485e7	hotfix: resnet to obj.device	2024-09-06 13:06:02 +08:00
George Hotz	9d72119a0c	minor resnet cleanups (#6382 ) * minor resnet cleanups * that should have been long * jit * meh	2024-09-06 12:50:21 +08:00

1 2 3 4 5 ...

824 Commits