tinygrad

Commit Graph

Author	SHA1	Message	Date
George Hotz	f6ef283e6a	s/loadops/metaops [run_process_replay] (#5421 )	2024-07-12 13:26:50 -07:00
wozeparrot	d1cbd6bb95	unity handcode_resnet_opt and handcode_bert_opt (#5418 )	2024-07-12 12:05:01 -07:00
wozeparrot	b7cc75a9df	usage summary in handcode opt (#5414 )	2024-07-12 11:21:18 -07:00
George Hotz	8390feb7b9	optim.OptimizerGroup in hlb_cifar (#5401 )	2024-07-11 20:14:36 -07:00
wozeparrot	c24d495ef9	metadata in handcode_opt (#5400 )	2024-07-11 17:45:34 -07:00
George Hotz	5232e405ce	hotfix: add BS to beautiful_mnist	2024-07-11 10:55:05 -07:00
wozeparrot	c9b3ae6bbf	fix llama.py chat mode assert (#5366 )	2024-07-10 18:06:14 -07:00
wozeparrot	fa873df9c1	bring tinychat more inline with tinyos' version (#5358 )	2024-07-10 13:13:52 -07:00
chenyu	322c37e621	use helpers.JIT in llama and gpt2 examples (#5350 ) * use helpers.JIT in llama and gpt2 examples replaced getenv("JIT"), effectively made gpt2 default jit * fix test_gpt2	2024-07-09 15:04:43 -04:00
Elias Wahl	73bddc44f6	Fix fake dataloader (#5326 )	2024-07-08 09:07:44 -04:00
chenyu	43c3f73fbc	handcode_bert_opt.py (#5295 ) similar to handcode_resnet50_opt.py, one file to check bert kernels without dataset.	2024-07-05 11:01:20 -04:00
Tobias Fischer	0c3a35e5c2	Stable Diffusion v2 Inference (#5283 ) * model implementation * clip fix, more qol options	2024-07-03 22:47:10 -04:00
reddyn12	d3e244d8b7	prev speed improvements (#5252 ) Co-authored-by: reddyn <nikidsniper@gmail.com>	2024-07-03 09:06:01 -07:00
chenyu	191463a919	add timing to SDXL (#5273 )	2024-07-02 23:29:54 -04:00
chenyu	b2c3a28a5e	nn.RMSNorm (#5272 ) the norm itself has no significant value to add to Tensor method, but we would want Tensor.normalize	2024-07-02 21:39:01 -04:00
Tobias Fischer	8c9c1cf62f	Pulled CLIP and UNet into Seperate Files (#5253 ) * pulled clip and unet into seperate files * reference cleanup, lru cache fix * better pool indexing	2024-07-01 22:33:01 -04:00
chenyu	b9122ecdaf	revert stable diffusion validation with threefry (#5248 ) * Revert "use threefry in stable diffusion benchmark (#4988)" This reverts commit `44dfa37c70`. * sdxl and validation fix * relax threshold	2024-07-01 14:43:47 -04:00
George Hotz	3df47bc21e	OpenELM + repeat_interleave (#5234 ) * start writing openelm * progress...hit bug * repeat_interleave support * gqa * add rotary embedding * spp * i think it runs correctly * broken * output is good now * cleanups * no io_uring on android	2024-06-30 15:18:39 -07:00
chenyu	88763eb9ff	fix stable_diffusion with fp16 (#5239 )	2024-06-30 12:59:31 -04:00
chenyu	7090eac8cb	validate sdxl output and put it in benchmark (#5211 ) * validate sdxl output and put it in benchmark * don't print fetch progress_bar in CI	2024-06-28 11:40:52 -04:00
chenyu	63fa4e2a0e	fix seed = 0 in sdxl (#5209 ) removed a few unneeded realize and contiguous too	2024-06-28 08:48:59 -04:00
Tobias Fischer	4688f97d48	Add SDXL Inference to Examples (#5206 ) * added sdxl inference code * fixed trailing whitespace * use original impl code, removed uneeded numpy calls	2024-06-28 07:42:28 -04:00
chenyu	0ba093dea0	hotfix: only validate stable diffusion when using threefry (#5166 )	2024-06-26 16:50:38 -04:00
chenyu	e4a5870b36	validate stable_diffusion output (#5163 ) changed default steps, forgot to update validation	2024-06-26 16:42:21 -04:00
nimlgen	21b225ac45	llama3 download works (#5160 )	2024-06-26 22:45:13 +03:00
wozeparrot	c91b3c4079	shard llama3 on 0 sometimes (#5157 )	2024-06-26 11:50:57 -07:00
Elias Wahl	e267f3161d	Add MLLogger (#5125 ) * add MLPerf logger * eval steps * start with step 1 * compliance for 3.1.0 and 4.0.0 * more compliance * assert, comment and contiguous	2024-06-26 12:23:56 -04:00
David Hou	3604642847	Llama shard axis 0 sometimes (#5123 ) * make buffer view optional with a flag [run_process_replay] * do not view when sharding to save memory [run_process_replay] * llama shard axis=0 sometimes --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2024-06-26 10:35:25 -04:00
chenyu	dade7677cf	validate llama3 output only with model "LLaMA-3/8B-SF-DPO" (#5138 )	2024-06-24 20:58:25 -04:00
chenyu	055e616302	cleanup mnist data load in beautiful_mnist (#5106 )	2024-06-22 18:31:51 -04:00
chenyu	e356807696	tinytqdm.set_description and tinytrange (#5101 )	2024-06-22 14:45:06 -04:00
chenyu	8080298739	s/tinytqdm/tqdm (#5103 ) except in unit test where tqdm is imported	2024-06-22 14:18:26 -04:00
chenyu	e468601226	update llama attention casting (#5096 ) * update llama attention casting updated scaled_dot_product_attention middle cast and removed hard-coded half in llama attention. * fix that	2024-06-22 10:57:17 -04:00
wozeparrot	acb715c64c	fix: llama3 special tokens (#5045 )	2024-06-18 17:08:44 -07:00
chenyu	a3ed4176c8	use tinytqdm in active tests and examples (#5038 ) * use tinytqdm in active tests and examples stress test this before 0.9.1 * no set_description	2024-06-18 16:01:19 -04:00
Elias Wahl	f31ef11537	Better default hparams for large BS (#5030 ) * better default hparams for large BS * bf16 too * use tuple	2024-06-18 11:13:06 -04:00
Elias Wahl	7bfa9101c0	Float in scaled dot product attention (#4985 ) * Monkeypatch scaled-dot-product-attention * Use dot instead of matmul * new api * imports * least_upper_dtype	2024-06-18 08:16:41 -04:00
chenyu	c52352bd9a	fix yolov8 example (#5003 ) it was creating Tensor from a list of numpy arrays, which is not supported after moving creating from a list not using numpy.	2024-06-16 20:47:29 -04:00
chenyu	44dfa37c70	use threefry in stable diffusion benchmark (#4988 ) also updated default steps to 10. easier to tell the image is following the prompt.	2024-06-15 20:25:29 -04:00
wozeparrot	ce1ed374c9	more tinychat fixes (#4971 )	2024-06-15 16:29:39 -07:00
wozeparrot	8209cd3c55	easier llama3 + fetch subdir (#4938 )	2024-06-14 13:47:27 -07:00
chenyu	67e8df4969	remove numpy from dtype (#4969 ) replaced all dtype.np with _to_np_dtype defined in tensor.py. after this, the only numpy usages are (1) Tensor(np.ndarray), (2) construct .numpy() output, (3) numpy random buffer	2024-06-14 15:38:45 -04:00
wozeparrot	2a974ff257	fix: no readablestream await of, too new (#4965 )	2024-06-14 11:22:19 -07:00
Elias Wahl	d2e3c391e8	Residual in MLM loss + Change default steps (#4935 ) * Residual in mlm loss * Reduce default steps to 160K * 24 * oops * comment	2024-06-12 16:09:18 -04:00
wozeparrot	3d13c23bfa	llama3 `--download_model` (#4922 )	2024-06-11 22:59:59 -07:00
wozeparrot	2849d0a2a1	fix copying to clipboard on a non secure context (#4890 )	2024-06-08 16:51:47 -07:00
wozeparrot	6c24eda522	feat: tinychat (#4869 )	2024-06-08 12:05:45 -07:00
Brennan Kinney	9445946cae	docs: Update referenced yaml in `yolov8.py` (#4871 ) YAML files have since been relocated.	2024-06-08 15:05:00 -04:00
Nik	085c0bbf6b	add mlperf train subset of openimages (#4841 )	2024-06-05 10:10:11 -04:00
Elias Wahl	e576aca044	Disable dropout (#4837 )	2024-06-04 18:57:26 -04:00
Elias Wahl	bb248a0dd1	Optional half matmul (#4835 ) * half linear * move weight cast back * oops * matmul dtype var * todo comment	2024-06-04 17:53:41 -04:00
Elias Wahl	04e237328b	Refactor to class style (#4804 )	2024-06-04 14:08:31 -07:00
George Hotz	eecfdd2f6e	hotfix: fix dataset reading for new llm.c	2024-06-03 14:10:05 +02:00
Francis Lata	707099487a	Multiprocessing UNet3D dataloader (#4801 ) * testing dataloader * matching dataloader implementation for unet3d * remove comments * clean up dataloader * add cookie and cleanup * use shm_path when creating SharedMemory * add support for testing resnet and unet3d dataloaders * update dataset test to return preprocesed data directory in prep for dataloader testing * pass preprocessed dataset directory properly * update loader function for dataloader * add shuffling on indices * update shm name * more cleanup for unet3d dataloader * remove changes to tests --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-06-02 11:30:47 -04:00
wozeparrot	ed0a740fe4	greater chat api endpoint compat (#4792 )	2024-05-30 22:47:31 -07:00
chenyu	f2414c666f	fix train_gpt2.py (#4771 ) added `with Tensor.train():`	2024-05-29 12:01:34 -04:00
chenyu	7624ad3ddd	add --timing and --profile to llama3 example (#4767 )	2024-05-28 16:24:44 -04:00
chenyu	e614b7c696	docs: showcase remove mnist_gan and add conversation.py (#4757 ) fixed both examples, and i think it's better to show conversation	2024-05-28 11:09:26 -04:00
chenyu	fd249422f5	minor cleanup example stable_diffusion (#4753 )	2024-05-28 00:05:37 -04:00
Elias Wahl	c4b0acf095	Global norm + small changes (#4749 ) * norm * no empty * default loss scaler in float	2024-05-27 18:35:27 -04:00
chenyu	31358cbea5	change Tensor.stack to method (#4719 )	2024-05-24 17:04:19 -04:00
chenyu	38bc38cdff	fix llama example quantize (#4699 ) * fix llama example quantize import quantize layers from new example llama3 add to mac benchmark * fix that * save the files	2024-05-23 15:35:26 -04:00
chenyu	792a494eb8	fix various examples (#4691 ) * fix examples that used ax1 and ax2 for transpose * fix that * update those	2024-05-22 20:43:21 -04:00
Elias Wahl	acc0039cfc	Resume fix + scheduler for non weight decay params (#4679 ) * move ckpt dir * fix resume. Add scheduler group	2024-05-21 19:38:13 -04:00
chenyu	5e3fbbb33e	llama3 example add manual seed and log seed (#4667 )	2024-05-20 19:09:57 -04:00
chenyu	704cb1d8a0	fix conversation.py quantize (#4663 ) it used to be true for int8, not it's a string for int8 or nf4	2024-05-20 17:36:37 -04:00
chenyu	ae861325ce	update llama sample for mac 32 input buffer limit (#4662 ) set default sampling params to function call to 0, and top k in llama3 to 25.	2024-05-20 17:23:39 -04:00
Elias Wahl	993091adfa	loss scaler + nan fixes (#4661 )	2024-05-20 17:08:35 -04:00
wozeparrot	b144d4b460	new llama3 example (#4576 )	2024-05-19 22:42:23 -07:00
George Hotz	5ba611787d	move image into tensor.py. delete features (#4603 ) * move image into tensor.py * change setup.py * openpilot tests need pythonpath now	2024-05-15 10:50:25 -07:00
George Hotz	53d082a2aa	move memory into schedule (#4597 )	2024-05-15 07:54:20 -07:00
George Hotz	ff64bcab69	move graph/search to engine (#4596 )	2024-05-14 23:12:59 -07:00
George Hotz	fd02ab1e8b	move disassemblers and openpilot (#4592 ) * move disassemblers and openpilot * delete junk * put that in pre-commit * fixup readme	2024-05-14 19:30:02 -07:00
chenyu	2b0ee74bb6	lshift and rshift (#4591 )	2024-05-14 19:16:31 -04:00
qazal	9aa5e02229	update llmc export (#4584 ) * update example * move train to optim * rename * b2	2024-05-14 21:18:38 +03:00
wozeparrot	d7670f8141	quantized llama multilazybuffer fix (#4557 )	2024-05-12 14:19:21 -07:00
chenyu	01a0c1a948	slightly faster nf4 llama (#4542 )	2024-05-12 14:24:42 -04:00
wozeparrot	e07c7668b3	nf4 llama (#4540 )	2024-05-11 22:22:34 -07:00
chenyu	bed70b130c	mlperf bert getenv-able EVAL_STEP_FREQ (#4534 )	2024-05-11 14:36:56 -04:00
chenyu	04a4980a51	touchup bert script (#4531 ) small adjustments, remove duplicated training setting and stop the script once target is hit	2024-05-11 13:02:02 -04:00
George Hotz	347a3acb37	add renderer class (#4524 ) * add renderer class * tests pass * fix pylint * fix tensor cores	2024-05-10 21:40:02 -07:00
chenyu	b00b6b16f0	fix TRAIN_BEAM and Tensor.training for mlperf bert (#4525 ) also hard coded bert model config instead of looking up a file	2024-05-11 00:18:36 -04:00
George Hotz	4eef1ee9bf	move renderer into options (#4514 ) * move renderer into options * fix tests * renders are functions	2024-05-10 10:01:51 -07:00
George Hotz	7c630a9a53	hotfix: fix llama spacing + fix hcq	2024-05-10 15:10:13 +00:00
chenyu	b399d98e41	fix resnet eval (#4507 )	2024-05-10 00:49:00 -04:00
wozeparrot	a602dc67d3	feat: more mlperf fixes (#4505 )	2024-05-09 20:50:20 -07:00
chenyu	0e8aa0e288	use fake data in beam searching resnet (#4504 )	2024-05-09 23:43:50 -04:00
wozeparrot	29daea4e60	fix: core count and os (#4503 )	2024-05-09 19:55:07 -07:00
George Hotz	89e119bc58	move Allocator to buffer.py (#4502 ) * move Allocator to buffer.py * move those to realize * memory file * cleanup	2024-05-09 19:45:56 -07:00
chenyu	ef93e41a15	resnet mlperf systems add tinygrad commit and python / runtime versions (#4494 )	2024-05-09 16:04:15 -04:00
chenyu	b5afdfbc5b	first draft resnet mlperf readme (#4493 ) * start readme * something	2024-05-09 15:51:44 -04:00
chenyu	047c7f3e5b	polish resnet mlperf logging (#4490 ) don't include save final check point time in run time, and some cosmetic order changes	2024-05-09 13:04:24 -04:00
chenyu	d78e159aa3	resnet logging move RUN_START to start of the script (#4488 )	2024-05-09 12:32:32 -04:00
chenyu	1bcb58479d	resnet setup power cap red box gpu to 350W (#4484 ) 1%-2% faster	2024-05-08 23:32:41 -04:00
chenyu	0ed755bcf5	resnet use EVAL_BS=192 (#4482 ) * resnet use EVAL_BS=192 also lower green run BEAM_MIN_PROGRESS from 10 to 5 * BEAM_MIN_PROGRESS 5 is too close to setup limit	2024-05-08 22:29:27 -04:00
chenyu	1f6bf9d2f7	real diskcache_clear in model_train resnet (#4445 ) clear cache if INITMLPERF is set, or running run_and_time. dev_beam and dev_run do not clear cache	2024-05-08 19:06:09 -04:00
chenyu	1b4645bea6	hotfix resnet move init_start to start of the script (#4481 )	2024-05-08 19:03:52 -04:00
wozeparrot	a347ae94d6	feat: remove wandb (#4480 )	2024-05-08 15:31:16 -07:00
chenyu	db7e15c46f	hotfix resnet only log epoch start with RUNMLPERF (#4477 )	2024-05-08 15:14:41 -04:00
chenyu	062c6dd65d	mlperf logging, truncate dir in logs and log seed (#4475 )	2024-05-08 12:54:02 -04:00
chenyu	b62a65b617	redo faster sparse_categorical_crossentropy (#4461 ) update LR and DECAY in resnet default that help convergence too	2024-05-08 11:21:43 -04:00
George Hotz	17faae091b	optimizer shouldn't be run without training (#4460 ) * optimizer shouldn't be run without training * set training in relevant tests * fix multitensor * that too	2024-05-06 15:34:12 -07:00
George Hotz	f4e49a7c1a	resnet 50 opt: correct loop + LARS (#4449 ) * correct loop + LARS * ops	2024-05-06 08:01:26 -07:00
George Hotz	fc995d4446	add backward to handcode_resnet50_opt	2024-05-06 06:42:26 -07:00
wozeparrot	603d3a351b	feat: allow keeping multiple cookies (#4440 )	2024-05-05 19:26:48 -07:00
Francis Lam	709410071c	mlperf/resnet: updated BEAM params to increase performance (#4443 )	2024-05-05 21:49:46 -04:00
chenyu	3b30756cbb	update mlperf submission system (#4435 ) more required fields.	2024-05-05 13:19:07 -04:00
David Hou	c0a048c044	batchnorm d(var)/d(mean) = 0 (#4430 ) * d(var)/d(mean) = 0 * drop the number in test_schedule!	2024-05-05 00:25:45 -04:00
qazal	fa17dcaf07	Fix llm.c/export.py (#4423 ) * fix headers * add CI * add stdio * merge clang tests * revert llm.c * revert ci * Revert "revert llm.c" This reverts commit 5fd17e3c8b38dc9549d0548e9515185b7b032573.	2024-05-04 19:37:10 +03:00
George Hotz	cb7289f9c9	remove clang program header (#4422 ) * remove clang program header * proper max * bools are numbers * fix compile enet	2024-05-04 08:38:01 -07:00
chenyu	473ecb978a	remove SPLIT_REDUCEOP=1 from resnet scripts (#4404 ) SPLIT_REDUCEOP=1 is default	2024-05-03 12:36:23 -04:00
David Hou	b767d59684	resnet trainer: keep old cookie around until next step has been queued (#4401 ) * keep old cookie around until next step has been queued (-10ms 6gpu) * also for eval * drop cookie before data_get? * Revert "drop cookie before data_get?" This reverts commit b01e6aa2b27f49aeab04b448f09e0ef9e689ea53. * Revert "Revert "drop cookie before data_get?"" This reverts commit 23464e73d445007c15537c69818fdee89adf0740.	2024-05-03 12:15:21 -04:00
chenyu	2c3b7f8e70	pad resnet training data with training data mean (#4369 ) update model_train resnet to pad training	2024-05-02 20:26:15 -04:00
Francis Lam	3cf8291f2f	mlperf/resnet: update beam params to increase time and quality (#4396 ) * mlperf/resnet: update beam params to increase time and quality * revert upcast 8 in search space and add rocm setup function * refactor to independent setup.sh script	2024-05-02 20:14:46 -04:00
chenyu	ab01a9433d	resnet eval 4n+3 if epoch < 33 (#4391 ) the rule is as thoroughly as 4n+k and we can stop the clock as soon as eval hits target. this can save 24 evals or 12 minutes	2024-05-02 16:52:07 -04:00
chenyu	7492e5d3e7	resnet correct log name for red (#4390 )	2024-05-02 10:58:55 -04:00
chenyu	bf31837e6d	resnet correct steps_in_val_epoch in logging (#4389 ) also added random seed from system in scripts	2024-05-02 10:51:36 -04:00
ym555	3113785604	Llama 3 Models (#4339 ) * Full Impl * fix test * Fix inference loop --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-05-02 06:06:07 -07:00
chenyu	22376e53b7	resnet mlperf logging (#4361 ) * resnet mlperf logging * cropping too much?	2024-05-02 00:00:04 -04:00
chenyu	ad116dc5c6	fill in mlperf system description (#4381 ) it did not ask too many details. will put software versions later with tinygrad commit. ``` python3 -m mlperf_logging.system_desc_checker examples/mlperf/training_submission_v4.0/tinycorp/systems/tinybox_red.json training 4.0.0 INFO - System description checker passed for tinybox red ``` ``` python3 -m mlperf_logging.system_desc_checker examples/mlperf/training_submission_v4.0/tinycorp/systems/tinybox_green.json training 4. 0.0 INFO - System description checker passed for tinybox green ```	2024-05-01 16:47:45 -04:00
chenyu	9358b62073	rename resnet script to dev_beam.sh and dev_run.sh (#4379 ) final run_and_time needs to be one script for both. rename the old scripts	2024-05-01 14:41:35 -04:00
chenyu	6628e13a5f	pad resnet eval data in model_train (#4374 ) asserted if eval sample count is different from total eval file count.	2024-05-01 14:33:42 -04:00
chenyu	826cccd54d	fix mean underflow for half tensor (#4377 ) * fix mean underflow for half tensor divide only the reduce factor. added unit test and non-nan assertion in resnet training. also added a failed test cast for symbolic shape var * skip for python backend	2024-05-01 13:38:57 -04:00
George Hotz	b683d0f496	hotfix: 100% accuracy is wrong	2024-05-01 08:07:18 -07:00
chenyu	683b7c605a	pad first batch of imagenet dataloader and update eval (#4368 ) * pad first batch of imagenet dataloader and update eval * pad zero instead of empty for training	2024-05-01 00:21:52 -04:00
Francis Lam	16838eae08	mlperf/resnet: update tinybox_red parameters to new best values (#4364 ) about 27 minutes to setup and 345ms/110TF steps	2024-04-30 18:08:12 -04:00
Francis Lam	0d33c54d99	kernel: change PADTO check to allow up to 4x padding (#4354 ) * kernel: change PADTO check to allow up to 4x padding also optionally remove PADTO from the search action space with BEAM_PADTO=0. * fix test_linearizer test_tensor_cores_padded tests * update resnet runs to use SPLIT_REDUCEOP=1 * fix up search TC axis and amt checking * fix up the dimensions of the TC tests	2024-04-30 15:29:34 -04:00
Elias Wahl	babe87a8ae	BERT: Checkpoint loading tests (#4359 ) * Move checkpoint init to helpers. Add test * linters * Move the steps outside of the main train loop * Move data_get * data_get belongs to helpers	2024-04-30 14:43:41 -04:00
Elias Wahl	71ff68b445	dropout after eval step (#4351 )	2024-04-29 15:47:21 -04:00
Elias Wahl	27613dd881	MLPerf BERT: Main training loop (#4288 ) * BERT language modeling head + trunc normal initializers * add train loop + helpers * shuffle in dataloaders + slight changes in main loop * beam change * Minor changes * random.shuffle * HParam update * Use deque for dataloader * wandb bert project name * half fixes * BENCHMARK + remove epoch * cast + print() --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-04-29 14:35:27 -04:00
Francis Lata	bb849a57d1	[MLPerf] UNet3D dataloader (#4343 ) * add support for train/val datasets for kits19 * split dataset into train and val sets * add tests for kits19 dataloader * add MLPerf dataset tests to CI * update unet3d model_eval script * fix linting * add nibabel * fix how mock dataset gets created * update ref implementation with permalink and no edits * clean up test and update rand_flip implementation * cleanups	2024-04-28 22:34:18 -04:00
Arnav Mehta	f3de17912f	added the download if not present missing function (#4318 )	2024-04-28 16:31:08 +08:00
chenyu	ec65aea32f	resnet stop the script once hit target (#4303 ) * resnet stop the script once hit target * comment	2024-04-25 23:54:56 -04:00
George Hotz	1e37c4a7a1	minor llm.c improvements	2024-04-26 11:15:31 +08:00
chenyu	f9a7badace	use LR=7 for resnet with BS=1536 (#4299 ) had 3 runs after lr float32, seems quite stable and converges at epoch 34 and 35	2024-04-25 15:23:10 -04:00
chenyu	c11bad766d	prepare mlperf submission (#4270 ) * prepare mlperf submission * 28min compile and 3h53m * red 30 minute compile and 56 TFLOPS	2024-04-24 13:19:31 -04:00
chenyu	c1fbacb182	resnet benchmarks use DEFAULT_FLOAT=HALF (#4285 ) also update LR default to scaled based on 1536 (the BS we are submitting)	2024-04-24 12:10:57 -04:00
George Hotz	ad28fdecb1	si.inputs+outputs -> bufs (#4279 )	2024-04-24 15:12:34 +08:00
chenyu	8401de9922	resnet benchmark return early in eval (#4278 ) only do few eval steps to compile, and skip second epoch when doing beam + benchmark. save 2 minutes	2024-04-24 00:55:01 -04:00
chenyu	6637ecc5fe	use IGNORE_JIT_FIRST_BEAM to not BEAM in jit cnt=0 (#4269 ) we want to have different BEAM values for resnet train and eval. global JITBEAM cannot do this. added the flag to change beam behavior at cnt=0 (so it default behaves the same with or without TinyJit), and for cnt=1 it uses existing BEAM.value. Also updated the context var BEAM in resnet to be outside of TinyJit. saves about 3 minutes compile time	2024-04-23 18:59:43 -04:00
Elias Wahl	3a48773f1a	BERT dataloader (#4252 ) * add dataloader * comment	2024-04-23 13:44:49 -04:00
chenyu	37f8be6450	resnet print epoch ops and mem in benchmark (#4244 ) * resnet print epoch ops and mem in benchmark also added a flag to optionally disable reset jitted steps * real per epoch stats	2024-04-21 18:32:31 -04:00
chenyu	30fc1ad415	remove TODO: remove explicit dtypes after broadcast fix in stable_diffusion (#4241 ) this is done	2024-04-21 00:31:24 -04:00
chenyu	a1940ced77	remove the assign hack in whisper (#4240 ) no longer needed, the commented test case was removed too	2024-04-20 23:56:44 -04:00
chenyu	3f126c7664	fix examples vits / converstion.py (#4239 ) it was passing a const numpy array into Tensor.arange	2024-04-20 23:29:12 -04:00
George Hotz	cd88afc98b	datasets isn't a feature + filter docstrings (#4228 ) * datasets isn't a feature * filter docstrings in sz	2024-04-19 16:16:10 +04:00
George Hotz	d99b512084	llm.c timing (#4219 ) * add timing info * fix malloc * 8s with beam	2024-04-19 12:43:21 +04:00
George Hotz	39b60a25f0	more llm c work (#4207 ) * more llm c work * print nicely * fake load pretrained * select warmups * output c code	2024-04-18 22:20:44 +04:00
chenyu	f7416916df	update resnet hparams based on BS=1632 RCP (#4210 ) https://github.com/mlcommons/logging/blob/master/mlperf_logging/rcp_checker/training_4.0.0/rcps_resnet.json	2024-04-18 12:01:46 -04:00
George Hotz	fa57c3e7ce	continue llm.c (#4190 ) * continue llm.c * export more * progress on llm.c * simpler optim, names work	2024-04-18 10:57:54 +04:00

1 2 3 4 5 ...

872 Commits