George Hotz
f6ef283e6a
s/loadops/metaops [run_process_replay] ( #5421 )
2024-07-12 13:26:50 -07:00
wozeparrot
d1cbd6bb95
unity handcode_resnet_opt and handcode_bert_opt ( #5418 )
2024-07-12 12:05:01 -07:00
wozeparrot
b7cc75a9df
usage summary in handcode opt ( #5414 )
2024-07-12 11:21:18 -07:00
George Hotz
8390feb7b9
optim.OptimizerGroup in hlb_cifar ( #5401 )
2024-07-11 20:14:36 -07:00
wozeparrot
c24d495ef9
metadata in handcode_opt ( #5400 )
2024-07-11 17:45:34 -07:00
George Hotz
5232e405ce
hotfix: add BS to beautiful_mnist
2024-07-11 10:55:05 -07:00
wozeparrot
c9b3ae6bbf
fix llama.py chat mode assert ( #5366 )
2024-07-10 18:06:14 -07:00
wozeparrot
fa873df9c1
bring tinychat more inline with tinyos' version ( #5358 )
2024-07-10 13:13:52 -07:00
chenyu
322c37e621
use helpers.JIT in llama and gpt2 examples ( #5350 )
...
* use helpers.JIT in llama and gpt2 examples
replaced getenv("JIT"), effectively made gpt2 default jit
* fix test_gpt2
2024-07-09 15:04:43 -04:00
Elias Wahl
73bddc44f6
Fix fake dataloader ( #5326 )
2024-07-08 09:07:44 -04:00
chenyu
43c3f73fbc
handcode_bert_opt.py ( #5295 )
...
similar to handcode_resnet50_opt.py, one file to check bert kernels without dataset.
2024-07-05 11:01:20 -04:00
Tobias Fischer
0c3a35e5c2
Stable Diffusion v2 Inference ( #5283 )
...
* model implementation
* clip fix, more qol options
2024-07-03 22:47:10 -04:00
reddyn12
d3e244d8b7
prev speed improvements ( #5252 )
...
Co-authored-by: reddyn <nikidsniper@gmail.com>
2024-07-03 09:06:01 -07:00
chenyu
191463a919
add timing to SDXL ( #5273 )
2024-07-02 23:29:54 -04:00
chenyu
b2c3a28a5e
nn.RMSNorm ( #5272 )
...
the norm itself has no significant value to add to Tensor method, but we would want Tensor.normalize
2024-07-02 21:39:01 -04:00
Tobias Fischer
8c9c1cf62f
Pulled CLIP and UNet into Seperate Files ( #5253 )
...
* pulled clip and unet into seperate files
* reference cleanup, lru cache fix
* better pool indexing
2024-07-01 22:33:01 -04:00
chenyu
b9122ecdaf
revert stable diffusion validation with threefry ( #5248 )
...
* Revert "use threefry in stable diffusion benchmark (#4988 )"
This reverts commit 44dfa37c70
.
* sdxl and validation fix
* relax threshold
2024-07-01 14:43:47 -04:00
George Hotz
3df47bc21e
OpenELM + repeat_interleave ( #5234 )
...
* start writing openelm
* progress...hit bug
* repeat_interleave support
* gqa
* add rotary embedding
* spp
* i think it runs correctly
* broken
* output is good now
* cleanups
* no io_uring on android
2024-06-30 15:18:39 -07:00
chenyu
88763eb9ff
fix stable_diffusion with fp16 ( #5239 )
2024-06-30 12:59:31 -04:00
chenyu
7090eac8cb
validate sdxl output and put it in benchmark ( #5211 )
...
* validate sdxl output and put it in benchmark
* don't print fetch progress_bar in CI
2024-06-28 11:40:52 -04:00
chenyu
63fa4e2a0e
fix seed = 0 in sdxl ( #5209 )
...
removed a few unneeded realize and contiguous too
2024-06-28 08:48:59 -04:00
Tobias Fischer
4688f97d48
Add SDXL Inference to Examples ( #5206 )
...
* added sdxl inference code
* fixed trailing whitespace
* use original impl code, removed uneeded numpy calls
2024-06-28 07:42:28 -04:00
chenyu
0ba093dea0
hotfix: only validate stable diffusion when using threefry ( #5166 )
2024-06-26 16:50:38 -04:00
chenyu
e4a5870b36
validate stable_diffusion output ( #5163 )
...
changed default steps, forgot to update validation
2024-06-26 16:42:21 -04:00
nimlgen
21b225ac45
llama3 download works ( #5160 )
2024-06-26 22:45:13 +03:00
wozeparrot
c91b3c4079
shard llama3 on 0 sometimes ( #5157 )
2024-06-26 11:50:57 -07:00
Elias Wahl
e267f3161d
Add MLLogger ( #5125 )
...
* add MLPerf logger
* eval steps
* start with step 1
* compliance for 3.1.0 and 4.0.0
* more compliance
* assert, comment and contiguous
2024-06-26 12:23:56 -04:00
David Hou
3604642847
Llama shard axis 0 sometimes ( #5123 )
...
* make buffer view optional with a flag [run_process_replay]
* do not view when sharding to save memory [run_process_replay]
* llama shard axis=0 sometimes
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-06-26 10:35:25 -04:00
chenyu
dade7677cf
validate llama3 output only with model "LLaMA-3/8B-SF-DPO" ( #5138 )
2024-06-24 20:58:25 -04:00
chenyu
055e616302
cleanup mnist data load in beautiful_mnist ( #5106 )
2024-06-22 18:31:51 -04:00
chenyu
e356807696
tinytqdm.set_description and tinytrange ( #5101 )
2024-06-22 14:45:06 -04:00
chenyu
8080298739
s/tinytqdm/tqdm ( #5103 )
...
except in unit test where tqdm is imported
2024-06-22 14:18:26 -04:00
chenyu
e468601226
update llama attention casting ( #5096 )
...
* update llama attention casting
updated scaled_dot_product_attention middle cast and removed hard-coded half in llama attention.
* fix that
2024-06-22 10:57:17 -04:00
wozeparrot
acb715c64c
fix: llama3 special tokens ( #5045 )
2024-06-18 17:08:44 -07:00
chenyu
a3ed4176c8
use tinytqdm in active tests and examples ( #5038 )
...
* use tinytqdm in active tests and examples
stress test this before 0.9.1
* no set_description
2024-06-18 16:01:19 -04:00
Elias Wahl
f31ef11537
Better default hparams for large BS ( #5030 )
...
* better default hparams for large BS
* bf16 too
* use tuple
2024-06-18 11:13:06 -04:00
Elias Wahl
7bfa9101c0
Float in scaled dot product attention ( #4985 )
...
* Monkeypatch scaled-dot-product-attention
* Use dot instead of matmul
* new api
* imports
* least_upper_dtype
2024-06-18 08:16:41 -04:00
chenyu
c52352bd9a
fix yolov8 example ( #5003 )
...
it was creating Tensor from a list of numpy arrays, which is not supported after moving creating from a list not using numpy.
2024-06-16 20:47:29 -04:00
chenyu
44dfa37c70
use threefry in stable diffusion benchmark ( #4988 )
...
also updated default steps to 10. easier to tell the image is following the prompt.
2024-06-15 20:25:29 -04:00
wozeparrot
ce1ed374c9
more tinychat fixes ( #4971 )
2024-06-15 16:29:39 -07:00
wozeparrot
8209cd3c55
easier llama3 + fetch subdir ( #4938 )
2024-06-14 13:47:27 -07:00
chenyu
67e8df4969
remove numpy from dtype ( #4969 )
...
replaced all dtype.np with _to_np_dtype defined in tensor.py.
after this, the only numpy usages are (1) Tensor(np.ndarray), (2) construct .numpy() output, (3) numpy random buffer
2024-06-14 15:38:45 -04:00
wozeparrot
2a974ff257
fix: no readablestream await of, too new ( #4965 )
2024-06-14 11:22:19 -07:00
Elias Wahl
d2e3c391e8
Residual in MLM loss + Change default steps ( #4935 )
...
* Residual in mlm loss
* Reduce default steps to 160K * 24
* oops
* comment
2024-06-12 16:09:18 -04:00
wozeparrot
3d13c23bfa
llama3 `--download_model` ( #4922 )
2024-06-11 22:59:59 -07:00
wozeparrot
2849d0a2a1
fix copying to clipboard on a non secure context ( #4890 )
2024-06-08 16:51:47 -07:00
wozeparrot
6c24eda522
feat: tinychat ( #4869 )
2024-06-08 12:05:45 -07:00
Brennan Kinney
9445946cae
docs: Update referenced yaml in `yolov8.py` ( #4871 )
...
YAML files have since been relocated.
2024-06-08 15:05:00 -04:00
Nik
085c0bbf6b
add mlperf train subset of openimages ( #4841 )
2024-06-05 10:10:11 -04:00
Elias Wahl
e576aca044
Disable dropout ( #4837 )
2024-06-04 18:57:26 -04:00
Elias Wahl
bb248a0dd1
Optional half matmul ( #4835 )
...
* half linear
* move weight cast back
* oops
* matmul dtype var
* todo comment
2024-06-04 17:53:41 -04:00
Elias Wahl
04e237328b
Refactor to class style ( #4804 )
2024-06-04 14:08:31 -07:00
George Hotz
eecfdd2f6e
hotfix: fix dataset reading for new llm.c
2024-06-03 14:10:05 +02:00
Francis Lata
707099487a
Multiprocessing UNet3D dataloader ( #4801 )
...
* testing dataloader
* matching dataloader implementation for unet3d
* remove comments
* clean up dataloader
* add cookie and cleanup
* use shm_path when creating SharedMemory
* add support for testing resnet and unet3d dataloaders
* update dataset test to return preprocesed data directory in prep for dataloader testing
* pass preprocessed dataset directory properly
* update loader function for dataloader
* add shuffling on indices
* update shm name
* more cleanup for unet3d dataloader
* remove changes to tests
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-06-02 11:30:47 -04:00
wozeparrot
ed0a740fe4
greater chat api endpoint compat ( #4792 )
2024-05-30 22:47:31 -07:00
chenyu
f2414c666f
fix train_gpt2.py ( #4771 )
...
added `with Tensor.train():`
2024-05-29 12:01:34 -04:00
chenyu
7624ad3ddd
add --timing and --profile to llama3 example ( #4767 )
2024-05-28 16:24:44 -04:00
chenyu
e614b7c696
docs: showcase remove mnist_gan and add conversation.py ( #4757 )
...
fixed both examples, and i think it's better to show conversation
2024-05-28 11:09:26 -04:00
chenyu
fd249422f5
minor cleanup example stable_diffusion ( #4753 )
2024-05-28 00:05:37 -04:00
Elias Wahl
c4b0acf095
Global norm + small changes ( #4749 )
...
* norm
* no empty
* default loss scaler in float
2024-05-27 18:35:27 -04:00
chenyu
31358cbea5
change Tensor.stack to method ( #4719 )
2024-05-24 17:04:19 -04:00
chenyu
38bc38cdff
fix llama example quantize ( #4699 )
...
* fix llama example quantize
import quantize layers from new example llama3
add to mac benchmark
* fix that
* save the files
2024-05-23 15:35:26 -04:00
chenyu
792a494eb8
fix various examples ( #4691 )
...
* fix examples that used ax1 and ax2 for transpose
* fix that
* update those
2024-05-22 20:43:21 -04:00
Elias Wahl
acc0039cfc
Resume fix + scheduler for non weight decay params ( #4679 )
...
* move ckpt dir
* fix resume. Add scheduler group
2024-05-21 19:38:13 -04:00
chenyu
5e3fbbb33e
llama3 example add manual seed and log seed ( #4667 )
2024-05-20 19:09:57 -04:00
chenyu
704cb1d8a0
fix conversation.py quantize ( #4663 )
...
it used to be true for int8, not it's a string for int8 or nf4
2024-05-20 17:36:37 -04:00
chenyu
ae861325ce
update llama sample for mac 32 input buffer limit ( #4662 )
...
set default sampling params to function call to 0, and top k in llama3 to 25.
2024-05-20 17:23:39 -04:00
Elias Wahl
993091adfa
loss scaler + nan fixes ( #4661 )
2024-05-20 17:08:35 -04:00
wozeparrot
b144d4b460
new llama3 example ( #4576 )
2024-05-19 22:42:23 -07:00
George Hotz
5ba611787d
move image into tensor.py. delete features ( #4603 )
...
* move image into tensor.py
* change setup.py
* openpilot tests need pythonpath now
2024-05-15 10:50:25 -07:00
George Hotz
53d082a2aa
move memory into schedule ( #4597 )
2024-05-15 07:54:20 -07:00
George Hotz
ff64bcab69
move graph/search to engine ( #4596 )
2024-05-14 23:12:59 -07:00
George Hotz
fd02ab1e8b
move disassemblers and openpilot ( #4592 )
...
* move disassemblers and openpilot
* delete junk
* put that in pre-commit
* fixup readme
2024-05-14 19:30:02 -07:00
chenyu
2b0ee74bb6
lshift and rshift ( #4591 )
2024-05-14 19:16:31 -04:00
qazal
9aa5e02229
update llmc export ( #4584 )
...
* update example
* move train to optim
* rename
* b2
2024-05-14 21:18:38 +03:00
wozeparrot
d7670f8141
quantized llama multilazybuffer fix ( #4557 )
2024-05-12 14:19:21 -07:00
chenyu
01a0c1a948
slightly faster nf4 llama ( #4542 )
2024-05-12 14:24:42 -04:00
wozeparrot
e07c7668b3
nf4 llama ( #4540 )
2024-05-11 22:22:34 -07:00
chenyu
bed70b130c
mlperf bert getenv-able EVAL_STEP_FREQ ( #4534 )
2024-05-11 14:36:56 -04:00
chenyu
04a4980a51
touchup bert script ( #4531 )
...
small adjustments, remove duplicated training setting and stop the script once target is hit
2024-05-11 13:02:02 -04:00
George Hotz
347a3acb37
add renderer class ( #4524 )
...
* add renderer class
* tests pass
* fix pylint
* fix tensor cores
2024-05-10 21:40:02 -07:00
chenyu
b00b6b16f0
fix TRAIN_BEAM and Tensor.training for mlperf bert ( #4525 )
...
also hard coded bert model config instead of looking up a file
2024-05-11 00:18:36 -04:00
George Hotz
4eef1ee9bf
move renderer into options ( #4514 )
...
* move renderer into options
* fix tests
* renders are functions
2024-05-10 10:01:51 -07:00
George Hotz
7c630a9a53
hotfix: fix llama spacing + fix hcq
2024-05-10 15:10:13 +00:00
chenyu
b399d98e41
fix resnet eval ( #4507 )
2024-05-10 00:49:00 -04:00
wozeparrot
a602dc67d3
feat: more mlperf fixes ( #4505 )
2024-05-09 20:50:20 -07:00
chenyu
0e8aa0e288
use fake data in beam searching resnet ( #4504 )
2024-05-09 23:43:50 -04:00
wozeparrot
29daea4e60
fix: core count and os ( #4503 )
2024-05-09 19:55:07 -07:00
George Hotz
89e119bc58
move Allocator to buffer.py ( #4502 )
...
* move Allocator to buffer.py
* move those to realize
* memory file
* cleanup
2024-05-09 19:45:56 -07:00
chenyu
ef93e41a15
resnet mlperf systems add tinygrad commit and python / runtime versions ( #4494 )
2024-05-09 16:04:15 -04:00
chenyu
b5afdfbc5b
first draft resnet mlperf readme ( #4493 )
...
* start readme
* something
2024-05-09 15:51:44 -04:00
chenyu
047c7f3e5b
polish resnet mlperf logging ( #4490 )
...
don't include save final check point time in run time, and some cosmetic order changes
2024-05-09 13:04:24 -04:00
chenyu
d78e159aa3
resnet logging move RUN_START to start of the script ( #4488 )
2024-05-09 12:32:32 -04:00
chenyu
1bcb58479d
resnet setup power cap red box gpu to 350W ( #4484 )
...
1%-2% faster
2024-05-08 23:32:41 -04:00
chenyu
0ed755bcf5
resnet use EVAL_BS=192 ( #4482 )
...
* resnet use EVAL_BS=192
also lower green run BEAM_MIN_PROGRESS from 10 to 5
* BEAM_MIN_PROGRESS 5 is too close to setup limit
2024-05-08 22:29:27 -04:00
chenyu
1f6bf9d2f7
real diskcache_clear in model_train resnet ( #4445 )
...
clear cache if INITMLPERF is set, or running run_and_time. dev_beam and dev_run do not clear cache
2024-05-08 19:06:09 -04:00
chenyu
1b4645bea6
hotfix resnet move init_start to start of the script ( #4481 )
2024-05-08 19:03:52 -04:00
wozeparrot
a347ae94d6
feat: remove wandb ( #4480 )
2024-05-08 15:31:16 -07:00
chenyu
db7e15c46f
hotfix resnet only log epoch start with RUNMLPERF ( #4477 )
2024-05-08 15:14:41 -04:00
chenyu
062c6dd65d
mlperf logging, truncate dir in logs and log seed ( #4475 )
2024-05-08 12:54:02 -04:00
chenyu
b62a65b617
redo faster sparse_categorical_crossentropy ( #4461 )
...
update LR and DECAY in resnet default that help convergence too
2024-05-08 11:21:43 -04:00
George Hotz
17faae091b
optimizer shouldn't be run without training ( #4460 )
...
* optimizer shouldn't be run without training
* set training in relevant tests
* fix multitensor
* that too
2024-05-06 15:34:12 -07:00
George Hotz
f4e49a7c1a
resnet 50 opt: correct loop + LARS ( #4449 )
...
* correct loop + LARS
* ops
2024-05-06 08:01:26 -07:00
George Hotz
fc995d4446
add backward to handcode_resnet50_opt
2024-05-06 06:42:26 -07:00
wozeparrot
603d3a351b
feat: allow keeping multiple cookies ( #4440 )
2024-05-05 19:26:48 -07:00
Francis Lam
709410071c
mlperf/resnet: updated BEAM params to increase performance ( #4443 )
2024-05-05 21:49:46 -04:00
chenyu
3b30756cbb
update mlperf submission system ( #4435 )
...
more required fields.
2024-05-05 13:19:07 -04:00
David Hou
c0a048c044
batchnorm d(var)/d(mean) = 0 ( #4430 )
...
* d(var)/d(mean) = 0
* drop the number in test_schedule!
2024-05-05 00:25:45 -04:00
qazal
fa17dcaf07
Fix llm.c/export.py ( #4423 )
...
* fix headers
* add CI
* add stdio
* merge clang tests
* revert llm.c
* revert ci
* Revert "revert llm.c"
This reverts commit 5fd17e3c8b38dc9549d0548e9515185b7b032573.
2024-05-04 19:37:10 +03:00
George Hotz
cb7289f9c9
remove clang program header ( #4422 )
...
* remove clang program header
* proper max
* bools are numbers
* fix compile enet
2024-05-04 08:38:01 -07:00
chenyu
473ecb978a
remove SPLIT_REDUCEOP=1 from resnet scripts ( #4404 )
...
SPLIT_REDUCEOP=1 is default
2024-05-03 12:36:23 -04:00
David Hou
b767d59684
resnet trainer: keep old cookie around until next step has been queued ( #4401 )
...
* keep old cookie around until next step has been queued (-10ms 6gpu)
* also for eval
* drop cookie before data_get?
* Revert "drop cookie before data_get?"
This reverts commit b01e6aa2b27f49aeab04b448f09e0ef9e689ea53.
* Revert "Revert "drop cookie before data_get?""
This reverts commit 23464e73d445007c15537c69818fdee89adf0740.
2024-05-03 12:15:21 -04:00
chenyu
2c3b7f8e70
pad resnet training data with training data mean ( #4369 )
...
update model_train resnet to pad training
2024-05-02 20:26:15 -04:00
Francis Lam
3cf8291f2f
mlperf/resnet: update beam params to increase time and quality ( #4396 )
...
* mlperf/resnet: update beam params to increase time and quality
* revert upcast 8 in search space and add rocm setup function
* refactor to independent setup.sh script
2024-05-02 20:14:46 -04:00
chenyu
ab01a9433d
resnet eval 4n+3 if epoch < 33 ( #4391 )
...
the rule is as thoroughly as 4n+k and we can stop the clock as soon as eval hits target. this can save 24 evals or 12 minutes
2024-05-02 16:52:07 -04:00
chenyu
7492e5d3e7
resnet correct log name for red ( #4390 )
2024-05-02 10:58:55 -04:00
chenyu
bf31837e6d
resnet correct steps_in_val_epoch in logging ( #4389 )
...
also added random seed from system in scripts
2024-05-02 10:51:36 -04:00
ym555
3113785604
Llama 3 Models ( #4339 )
...
* Full Impl
* fix test
* Fix inference loop
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-05-02 06:06:07 -07:00
chenyu
22376e53b7
resnet mlperf logging ( #4361 )
...
* resnet mlperf logging
* cropping too much?
2024-05-02 00:00:04 -04:00
chenyu
ad116dc5c6
fill in mlperf system description ( #4381 )
...
it did not ask too many details. will put software versions later with tinygrad commit.
```
python3 -m mlperf_logging.system_desc_checker examples/mlperf/training_submission_v4.0/tinycorp/systems/tinybox_red.json training 4.0.0
INFO - System description checker passed for tinybox red
```
```
python3 -m mlperf_logging.system_desc_checker examples/mlperf/training_submission_v4.0/tinycorp/systems/tinybox_green.json training 4.
0.0
INFO - System description checker passed for tinybox green
```
2024-05-01 16:47:45 -04:00
chenyu
9358b62073
rename resnet script to dev_beam.sh and dev_run.sh ( #4379 )
...
final run_and_time needs to be one script for both. rename the old scripts
2024-05-01 14:41:35 -04:00
chenyu
6628e13a5f
pad resnet eval data in model_train ( #4374 )
...
asserted if eval sample count is different from total eval file count.
2024-05-01 14:33:42 -04:00
chenyu
826cccd54d
fix mean underflow for half tensor ( #4377 )
...
* fix mean underflow for half tensor
divide only the reduce factor. added unit test and non-nan assertion in resnet training. also added a failed test cast for symbolic shape var
* skip for python backend
2024-05-01 13:38:57 -04:00
George Hotz
b683d0f496
hotfix: 100% accuracy is wrong
2024-05-01 08:07:18 -07:00
chenyu
683b7c605a
pad first batch of imagenet dataloader and update eval ( #4368 )
...
* pad first batch of imagenet dataloader and update eval
* pad zero instead of empty for training
2024-05-01 00:21:52 -04:00
Francis Lam
16838eae08
mlperf/resnet: update tinybox_red parameters to new best values ( #4364 )
...
about 27 minutes to setup and 345ms/110TF steps
2024-04-30 18:08:12 -04:00
Francis Lam
0d33c54d99
kernel: change PADTO check to allow up to 4x padding ( #4354 )
...
* kernel: change PADTO check to allow up to 4x padding
also optionally remove PADTO from the search action space with
BEAM_PADTO=0.
* fix test_linearizer test_tensor_cores_padded tests
* update resnet runs to use SPLIT_REDUCEOP=1
* fix up search TC axis and amt checking
* fix up the dimensions of the TC tests
2024-04-30 15:29:34 -04:00
Elias Wahl
babe87a8ae
BERT: Checkpoint loading tests ( #4359 )
...
* Move checkpoint init to helpers. Add test
* linters
* Move the steps outside of the main train loop
* Move data_get
* data_get belongs to helpers
2024-04-30 14:43:41 -04:00
Elias Wahl
71ff68b445
dropout after eval step ( #4351 )
2024-04-29 15:47:21 -04:00
Elias Wahl
27613dd881
MLPerf BERT: Main training loop ( #4288 )
...
* BERT language modeling head + trunc normal initializers
* add train loop + helpers
* shuffle in dataloaders + slight changes in main loop
* beam change
* Minor changes
* random.shuffle
* HParam update
* Use deque for dataloader
* wandb bert project name
* half fixes
* BENCHMARK + remove epoch
* cast + print()
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-04-29 14:35:27 -04:00
Francis Lata
bb849a57d1
[MLPerf] UNet3D dataloader ( #4343 )
...
* add support for train/val datasets for kits19
* split dataset into train and val sets
* add tests for kits19 dataloader
* add MLPerf dataset tests to CI
* update unet3d model_eval script
* fix linting
* add nibabel
* fix how mock dataset gets created
* update ref implementation with permalink and no edits
* clean up test and update rand_flip implementation
* cleanups
2024-04-28 22:34:18 -04:00
Arnav Mehta
f3de17912f
added the download if not present missing function ( #4318 )
2024-04-28 16:31:08 +08:00
chenyu
ec65aea32f
resnet stop the script once hit target ( #4303 )
...
* resnet stop the script once hit target
* comment
2024-04-25 23:54:56 -04:00
George Hotz
1e37c4a7a1
minor llm.c improvements
2024-04-26 11:15:31 +08:00
chenyu
f9a7badace
use LR=7 for resnet with BS=1536 ( #4299 )
...
had 3 runs after lr float32, seems quite stable and converges at epoch 34 and 35
2024-04-25 15:23:10 -04:00
chenyu
c11bad766d
prepare mlperf submission ( #4270 )
...
* prepare mlperf submission
* 28min compile and 3h53m
* red 30 minute compile and 56 TFLOPS
2024-04-24 13:19:31 -04:00
chenyu
c1fbacb182
resnet benchmarks use DEFAULT_FLOAT=HALF ( #4285 )
...
also update LR default to scaled based on 1536 (the BS we are submitting)
2024-04-24 12:10:57 -04:00
George Hotz
ad28fdecb1
si.inputs+outputs -> bufs ( #4279 )
2024-04-24 15:12:34 +08:00
chenyu
8401de9922
resnet benchmark return early in eval ( #4278 )
...
only do few eval steps to compile, and skip second epoch when doing beam + benchmark. save 2 minutes
2024-04-24 00:55:01 -04:00
chenyu
6637ecc5fe
use IGNORE_JIT_FIRST_BEAM to not BEAM in jit cnt=0 ( #4269 )
...
we want to have different BEAM values for resnet train and eval. global JITBEAM cannot do this. added the flag to change beam behavior at cnt=0 (so it default behaves the same with or without TinyJit), and for cnt=1 it uses existing BEAM.value.
Also updated the context var BEAM in resnet to be outside of TinyJit. saves about 3 minutes compile time
2024-04-23 18:59:43 -04:00
Elias Wahl
3a48773f1a
BERT dataloader ( #4252 )
...
* add dataloader
* comment
2024-04-23 13:44:49 -04:00
chenyu
37f8be6450
resnet print epoch ops and mem in benchmark ( #4244 )
...
* resnet print epoch ops and mem in benchmark
also added a flag to optionally disable reset jitted steps
* real per epoch stats
2024-04-21 18:32:31 -04:00
chenyu
30fc1ad415
remove TODO: remove explicit dtypes after broadcast fix in stable_diffusion ( #4241 )
...
this is done
2024-04-21 00:31:24 -04:00
chenyu
a1940ced77
remove the assign hack in whisper ( #4240 )
...
no longer needed, the commented test case was removed too
2024-04-20 23:56:44 -04:00
chenyu
3f126c7664
fix examples vits / converstion.py ( #4239 )
...
it was passing a const numpy array into Tensor.arange
2024-04-20 23:29:12 -04:00
George Hotz
cd88afc98b
datasets isn't a feature + filter docstrings ( #4228 )
...
* datasets isn't a feature
* filter docstrings in sz
2024-04-19 16:16:10 +04:00
George Hotz
d99b512084
llm.c timing ( #4219 )
...
* add timing info
* fix malloc
* 8s with beam
2024-04-19 12:43:21 +04:00
George Hotz
39b60a25f0
more llm c work ( #4207 )
...
* more llm c work
* print nicely
* fake load pretrained
* select warmups
* output c code
2024-04-18 22:20:44 +04:00
chenyu
f7416916df
update resnet hparams based on BS=1632 RCP ( #4210 )
...
https://github.com/mlcommons/logging/blob/master/mlperf_logging/rcp_checker/training_4.0.0/rcps_resnet.json
2024-04-18 12:01:46 -04:00
George Hotz
fa57c3e7ce
continue llm.c ( #4190 )
...
* continue llm.c
* export more
* progress on llm.c
* simpler optim, names work
2024-04-18 10:57:54 +04:00