Commit Graph

864 Commits

Author SHA1 Message Date
Friedrich Carl Eichenroth 859d6d0407
Fix mypy examples/beautiful_*.py (#6978)
* fix mypy examples/beautiful_*.py

* backwards

* add test

* Revert "add test"

This reverts commit 4d88845ba3f24d83621da0abf55096553abda7fa.

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-10-10 11:34:29 -04:00
Kinvert 960c495755
added beautiful fashion mnist and example (#6961)
* added beautiful fashion mnist and example

* fixing whitespace

* refactor Fashion MNIST to fewer lines

* fix newline to reduce diff

* Update beautiful_mnist.py

* Update beautiful_mnist.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-10-10 12:01:07 +08:00
chenyu b5546912e2
10% more TRAIN_STEPS for bert (#6971)
got two very close run, adding more steps for buffer
2024-10-09 19:21:43 -04:00
chenyu 35cf48659b
limit beam param for bert on green (#6966)
seems to mitigate the crash
2024-10-09 11:48:18 -04:00
chenyu 1ff2c98f8a
fix logfile name for bert red (#6952) 2024-10-08 05:37:52 -04:00
chenyu a78c96273a
update bert epoch logging (#6940)
* update bert epoch logging

epoch for bert is simply number of examples seen (which is used for RCP check)

* update total steps too

* more changes
2024-10-08 00:34:06 -04:00
chenyu 102dfe5510
back to 2**10 for bert loss scaler (#6934)
getting 2 NaN for this, revert back to 2**10
2024-10-07 10:17:21 -04:00
chenyu 0cf815a93a
bert use BS=66 and update hparams (#6932)
with dropout memory improvement, we can fit BS=66 now. revert back to the hparams in #5891 too
2024-10-07 05:08:27 -04:00
chenyu 718b959349
log epoch start and stop for bert (#6912) 2024-10-06 06:39:46 -04:00
chenyu 16c1fa4208
use BEAM=3 for red box bert runs (#6904)
BEAM=4 slightly exceeded 30 minutes setup
2024-10-05 09:21:12 -04:00
chenyu 0e706227a2
add seed to bert result log filename (#6903)
* add seed to bert result log filename

* different name for different benchmark
2024-10-05 09:15:24 -04:00
George Hotz f4ec39fe58
switch symbolic from old to uops, final PR (#6872)
* switch symbolic from old to uops, final PR

* two wrong answers

* not needed resolves

* symbolic ops passes

* symbolic ops passes

* progress

* tests pass (almost)

* fix last test

* fix some tests

* global binding and unbinding

* Revert "global binding and unbinding"

This reverts commit 9456725630316487509980af20c6d2981de00bec.

* that test works now

* vars on uop doesn't recurse

* fix fuzzer

* update

* fix type

* fix gpt, it's UOp now

* ssimplify symbolics
2024-10-04 16:42:27 +08:00
chenyu 7391376528
update bert hparams (#6876)
4h32m with this https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/q99frv1l/overview.

loss scaler 2**13->2**10. matched the closest submission, no nan for ~10 runs.

increased lr and total step a bit.

`PARALLEL=0` after setup, same as resnet.
2024-10-04 00:39:06 -04:00
chenyu 5f77217772
bert default CKPT to 0 (#6840)
not required
2024-10-01 21:55:56 -04:00
George Hotz 547733e57c
stunning_mnist [run_process_replay] (#6828)
* stunning_mnist [run_process_replay]

* add loss to stunning mnist
2024-10-01 15:00:48 +08:00
chenyu f59517754e
add RESET_STEP in bert to control reset (#6818)
same as resnet
2024-09-30 09:39:04 -04:00
George Hotz 2ed94e447f gpt2: corealize opt and loss 2024-09-30 09:11:20 +08:00
George Hotz a76c6c740c
hand pad gpt2 (#6805) 2024-09-30 09:03:07 +08:00
chenyu 494b20e886
bert BS back to 54 (#6791)
60 does not run end to end
2024-09-27 22:16:05 -04:00
chenyu 572d77d1d9
bert script delete eval data after eval (#6790)
fits BS=60 which is 2% faster than 54. also fixed wandb logging params
2024-09-27 20:54:00 -04:00
chenyu f9c8e144ff
chmod +x mlperf bert script for red (#6789)
also disabled raising power cap in setup. wozeparrot mentioned that's unstable and might cause bert training issue on red
2024-09-27 11:27:32 -04:00
Francis Lata d3a387be63
[MLPerf] Prepare openimages dataset script (#6747)
* prepare openimages for MLPerf

* cleanup

* fix issue when clearing jit_cache on retinanet eval

* revert pandas specific changes
2024-09-27 11:13:56 -04:00
chenyu 2fc26890c9
default BS=9 in handcode_opt bert (#6783)
using 54 for 6 gpus now, and 2 is not a good default
2024-09-27 04:38:16 -04:00
George Hotz 9a3f6f392d llm.c tok/s 2024-09-27 00:46:18 -07:00
George Hotz b0e70ab04f llm.c updates 2024-09-27 15:25:59 +08:00
chenyu bea7ed5986
add RUNMLPERF=1 to bert dev_run.sh (#6775)
already set in run_and_time.sh, need RUNMLPERF=1 for it to load real data
2024-09-26 11:00:49 -04:00
chenyu 12de203a43
add IGNORE_JIT_FIRST_BEAM to bert scripts (#6769)
* update bert BEAM params

copied from resnet to start with

* just IGNORE_JIT_FIRST_BEAM
2024-09-26 05:38:24 -04:00
wozeparrot 15cd42cfb9
feat: support TRACEMETA=2 in handcode_opt (#6767) 2024-09-26 16:58:29 +08:00
chenyu 5a5fbfa1eb
smaller bert script change (#6768)
only WANDB and RUNMLPERF order. BENCHMARK and BEAM will be done differently
2024-09-26 04:54:28 -04:00
chenyu 0424c4967d
fix handcode_opt.py for bert (#6756) 2024-09-26 00:20:24 -04:00
chenyu 396c96357b
update mlperf bert scripts (#6755)
removed DISABLE_DROPOUT=1.
updated BS to 54 that works on tinyboxes with dropouts.
used bert's sparse_categorical_crossentropy that takes Tensor ignore_index in accuracy method
2024-09-25 23:55:05 -04:00
George Hotz 7e73c7b3cc hotfix: bump stable diffusion val distance 2024-09-26 11:15:29 +08:00
wozeparrot c100f3d406
default threefry (#6116) 2024-09-25 17:45:13 +08:00
George Hotz f45d178a55 hotfix: support JIT_BATCH_SIZE=0, make that the default 2024-09-25 10:36:04 +08:00
wozeparrot f932116e05
feat: small things from default_threefry (#6708) 2024-09-24 17:00:47 +08:00
Anurag Lamsal 568757e087
fix model_eval.py in the mlperf folder searching for bert vocab in the wrong directory (#6649) 2024-09-24 11:20:44 +08:00
samm393 19c11792fd
Flux.1 (#6334)
* initial commit

* whitespace

* get rid of torch import

* indentation

* less hardcoding

* add flux.1-dev

* jit

* no double

* t5 tidy up

* validation image

* reuse sdxl autoencoder

* typing changes

* empty lines

* remove unneeded comments

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-09-24 10:08:04 +08:00
George Hotz b9e6d42a1f
Revert "gated native math in OpenCL (#6683)" (#6691)
This reverts commit 2fe3eeed17.
2024-09-24 08:48:10 +08:00
George Hotz 2fe3eeed17
gated native math in OpenCL (#6683)
* gated native math

* Update cstyle.py
2024-09-23 19:22:13 +08:00
Tobias Fischer c1bbd15bd9
Sharded SDXL Inference (#6328)
* initial sharding fixes

* sigma device fix

* emptyline space fix

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-09-21 01:26:43 -04:00
chenyu b14c1bc417
UOps.RANGE is_increasing (#6615)
* UOps.RANGE is_increasing

283 -> 47 valids

* test
2024-09-20 03:14:52 -04:00
George Hotz d02bb270b7
add copyin copyout for image on GPU [run_process_replay] (#6580)
* add copyin copyout for image on GPU [run_process_replay]

* add timing

* enqueue vs total run

* it's failing but that's fine
2024-09-18 16:06:20 +08:00
George Hotz d4b662c318
new openpilot compile (#6573)
* new openpilot compile

* note, copyout doesn't work for images
2024-09-18 14:22:50 +08:00
kormann f5dd25d376
enable whisper batch for long sequences (#6458)
* long batch +test

* long batch +test

* cleanup

* rollback syntactic changes

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-09-17 00:42:10 -04:00
chenyu 798be6bb74
add gated read_image count in openpilot compile2 (#6546)
530 to go
2024-09-16 21:17:00 -04:00
Francis Lata b7ce9a1530
UNet3D MLPerf (#3470)
* add training set transforms

* add DICE cross entropy loss

* convert pred and label to Tensor when calculating DICE score

* cleanups and allow train dataset batching

* fix DICE CE loss calculation

* jitted training step

* clean up DICE CE loss calculation

* initial support for sharding

* Revert "initial support for sharding"

This reverts commit e3670813b8a67469e7f694e09f2d15a8c40065da.

* minor updates

* cleanup imports

* add support for sharding

* apply temp patch to try to avoid OOM

* revert cstyle changes

* add gradient acc

* hotfix

* add FP16 support

* add ability to train on smaller image sizes

* add support for saving and loading checkpoints + cleanup some various modes

* fix issue with using smaller patch size + update W&B logging

* disable LR_WARMUP_EPOCHS

* updates

* minor cleanups

* cleanup

* update order of transformations

* more cleanups

* realize loss

* cleanup

* more cleanup

* some cleanups

* add RAM usage

* minor cleanups

* add support for gradient accumulation

* cleanup imports

* minor updates to not use GA_STEPS

* remove FP16 option since it's available now globally

* update multi-GPU setup

* add timing logs for training loop

* go back to using existing dataloader and add ability to preprocess data to save time

* clean up optimization and re-enable JIT and multi-GPU support for training and evaluation

* free train and eval steps memory

* cleanups and scale batch size based on the number of GPUs

* fix GlobalCounters import

* fix seed

* fix W&B setup

* update batch size default size

* add back metric divergence check

* put back JIT on UNet3d eval

* move dataset preprocessing inside training code

* add test for dice_loss

* add config logging support to W&B and other cleanups

* change how default float is getting retrieved

* remove TinyJit import duplicate

* update config logging to W&B and remove JIT on eval_step

* no need for caching preprocessed data anymore

* fix how evaluation is ran and how often

* add support for LR scaling

* fix issue with gaussian being moved to scipy.signal.windows

* remove DICE loss unit test

* fix issue where loss isn't compatible with multiGPU

* add individual BEAM control for train and eval steps

* fix ndimage scipy import

* add BENCHMARK

* cleanups on BENCHMARK + fix on rand_flip augmentation during training

* cleanup train and eval BEAM envs

* add checkpointing support after every eval

* cleanup model_eval

* disable grad during eval

* use new preprocessing dataset mechanism

* remove unused import

* use training and inference_mode contexts

* start eval after benchmarking

* add data fetching time

* cleanup decorators

* more cleanups on training script

* add message during benchmarking mode

* realize when reassigning LR on scheduler and update default number of epochs

* add JIT on eval step

* remove JIT on eval_step

* add train dataloader for unet3d

* move checkpointing to be done after every epoch

* revert removal of JIT on unet3d inference

* save checkpoint if metric is not successful

* Revert "add train dataloader for unet3d"

This reverts commit c166d129dfbe2e1c46d1937135a60b4ed25caa3d.

* Revert "Revert "add train dataloader for unet3d""

This reverts commit 36366c65d26f59ed1227acb670d5ce7b997606ae.

* hotfix: seed was defaulting to a value of 0

* fix SEED value

* remove the usage of context managers for setting BEAM and going from training to inference

* support new stack API for calculating eval loss and metric

* Revert "remove the usage of context managers for setting BEAM and going from training to inference"

This reverts commit 2c0ba8d322ec912bd8617cbe167c542e9ba229d9.

* check training and test preprocessed folders separately

* clean up imports and log FUSE_CONV_BW

* use train and val preprocessing constants

* add kits19 dataset setup script

* update to use the new test decorator for disabling grad

* update kits19 dataset setup script

* add docs on how to train the model

* set default value for BASEDIR

* add detailed instruction about BASEDIR usage

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-09-10 04:37:28 -04:00
kormann f6f4f3222f
whisper long batch (#6335)
* reset

* test

* only part refactor
2024-09-09 21:03:59 -04:00
qazal 935b6b658f
delete seen from the scheduler api [run_process_replay] (#6427)
docs
2024-09-09 16:26:34 +08:00
wozeparrot cb61cfce24
feat: example and extra tweaks (#6310) 2024-08-28 19:26:11 -07:00
Tobias Fischer 3517aa89d9
sdxl batched inference fixes (#6293) 2024-08-28 07:44:58 -04:00