Commit Graph

824 Commits

Author SHA1 Message Date
nimlgen 408c0a5e7f qcom match texture/sampler descriptors to OpenCL (#7622)
* qcom ioctl compare more regs

* bug fix
2024-11-11 15:57:42 -05:00
nimlgen 45db7d9045
fuzz qcom vs opencl (#7130)
* fuzz qcom vs opencl

* fix nv

* bettre?

* typo

* open both devs
2024-10-17 18:49:08 +03:00
George Hotz 3169cb386d
remove graph [pr] (#7085) 2024-10-16 11:40:07 +08:00
nimlgen b025495e5c
fuzz nv vs cuda (#7066)
* fuzz nv vs cuda

* fixes

* smth

* um

* cmp the same

* dnrt

* correct gpfifo scan

* fix
2024-10-15 22:22:40 +03:00
qazal 8ff6514ba3
delete extra/ops.py [pr] (#7072) 2024-10-15 22:14:21 +03:00
nimlgen 586ff4c910
nv record uvm mappings (#7059)
* nv record uvm mappings

* linteeer

* smth

* ooops
2024-10-15 00:12:49 +03:00
nimlgen 8094340221
nv print info about faults (#7057)
* nv print info about faults

* unrelated changes

* nv_gpu.GT200_DEBUGGER in mockgpu

* regen with ocrrect version

* spacing
2024-10-14 21:49:38 +03:00
chenyu bd8ecf7fd6
remove NumNode (#7035) 2024-10-13 16:42:19 -04:00
chenyu c4c806a210
generate new kernel dataset (#7034)
* generate new kernel dataset

pre req to remove NumNode
```
extra/optimization/generate_dataset.sh
gzip -k /tmp/sops
mv /tmp/sops.gz extra/datasets/
```

* fix var range in fuzz_linearizer
2024-10-13 16:19:41 -04:00
qazal 13846930cd
hotfix: extract_dataset.py (#7029) 2024-10-13 11:18:23 +03:00
George Hotz a71bb09ec3
remove symbolic file [pr] (#7012) 2024-10-12 18:44:44 +08:00
George Hotz 5ae2de9845
UOp.variable (#7010)
* UOp.variable [pr]

* fix tests

* clean

* improve name rendering

* last bug
2024-10-12 18:20:44 +08:00
qazal 20d3c2d113
unify UOps.SHAPETRACKER and UOps.SWIZZLE with UOps.VIEW (#6955)
* add UOps.VIEW

* update hardcoded asts

* update sops.gz
2024-10-09 02:00:17 +08:00
Tobias Fischer f9e32f2bb2
clip device fix (#6924) 2024-10-07 00:47:32 +08:00
chenyu 01a2d7316d
dtype=float in bert log_softmax for loss and accuracy (#6916) 2024-10-06 11:15:56 -04:00
George Hotz 4df5c7a4ef
move lazy to engine [pr] (#6886)
* move lazy to engine [pr]

* engine.lazy
2024-10-04 23:19:26 +08:00
George Hotz 8ca506ee37
remove the magic methods for moving between devices [pr] (#6881)
* remove the magic methods for moving between devices [pr]

* remove unneeded clang
2024-10-04 20:27:52 +08:00
chenyu 7c8849010a
fix var_vals in MCTS (#6882)
tested with JITBEAM=100 llama
2024-10-04 08:19:35 -04:00
George Hotz a0cb16ac61
node cleanup + local metal test speed [pr] (#6880)
* node cleanup [pr]

* fix tests, including the double one on metal

* no time tqdm tests
2024-10-04 18:14:23 +08:00
George Hotz cdff1d75b6
things that are only used in one place don't belong in helpers [pr] (#6878)
* things that are only used in one place don't belong in helpers [pr]

* pretty print moved
2024-10-04 17:27:38 +08:00
George Hotz f4ec39fe58
switch symbolic from old to uops, final PR (#6872)
* switch symbolic from old to uops, final PR

* two wrong answers

* not needed resolves

* symbolic ops passes

* symbolic ops passes

* progress

* tests pass (almost)

* fix last test

* fix some tests

* global binding and unbinding

* Revert "global binding and unbinding"

This reverts commit 9456725630316487509980af20c6d2981de00bec.

* that test works now

* vars on uop doesn't recurse

* fix fuzzer

* update

* fix type

* fix gpt, it's UOp now

* ssimplify symbolics
2024-10-04 16:42:27 +08:00
chenyu c3c93f332a
symbolic bool raise ValueError when not sure [pr] (#6853) 2024-10-02 09:10:58 -04:00
Tobias Fischer 33f7599158
Compute FID Score (#6802)
* compute fid score code

* cleaner s1 and m1 loading
2024-10-01 19:47:58 -04:00
Francis Lata d3a387be63
[MLPerf] Prepare openimages dataset script (#6747)
* prepare openimages for MLPerf

* cleanup

* fix issue when clearing jit_cache on retinanet eval

* revert pandas specific changes
2024-09-27 11:13:56 -04:00
nimlgen 3c56aeee70
add Tensor.from_blob (#6765)
* draft tensor from pointer init

* some docs and types

* comment

* cleaner

* test

* malloc

* qcom cl interop

* jit example

* cleaner

* dealoc

* wording

* docs
2024-09-26 18:33:19 +08:00
chenyu 396c96357b
update mlperf bert scripts (#6755)
removed DISABLE_DROPOUT=1.
updated BS to 54 that works on tinyboxes with dropouts.
used bert's sparse_categorical_crossentropy that takes Tensor ignore_index in accuracy method
2024-09-25 23:55:05 -04:00
wozeparrot 4ebc9589a6
feat: make buffer (#6745) 2024-09-25 18:31:03 +08:00
nimlgen 56979aa3ed
qcom ioctl log levels (#6735) 2024-09-25 14:59:27 +08:00
wozeparrot 2be0b26a1f
rand only supports single device (#6682) 2024-09-24 16:07:44 +08:00
nimlgen ca66b11e07
qcom fix disasm (#6703) 2024-09-24 15:23:43 +08:00
samm393 19c11792fd
Flux.1 (#6334)
* initial commit

* whitespace

* get rid of torch import

* indentation

* less hardcoding

* add flux.1-dev

* jit

* no double

* t5 tidy up

* validation image

* reuse sdxl autoencoder

* typing changes

* empty lines

* remove unneeded comments

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-09-24 10:08:04 +08:00
chenyu 31b9c74c77
tiny import cleanup and fix typo (#6692) 2024-09-23 21:48:23 -04:00
George Hotz 2f2f933e50
fix buffer shape regression from onnx (#6678) 2024-09-23 16:58:42 +08:00
qazal 982086f54c
UOps.VALID try 2 (#6623)
* make UOps.VALID compile

* fixable tests

* bufs dedup

* cleanup the CONST spec

* regenerate dataset with graph_rewrite

```py
def rewrite_const(const:UOp, st_src:UOp) -> UOp:
  st: ShapeTracker = st_src.arg
  return UOp(UOps.VALID, dtypes.bool, (st.to_uop(),)).where(UOp.const(const.dtype, const.arg), UOp.const(const.dtype, 0))
pm = PatternMatcher([(UPat(UOps.CONST, name="const", src=(UPat(UOps.SHAPETRACKER, name="st_src"),)), rewrite_const)])
```

* rm arg

* remove arg

* revert arg removal

This reverts commit 2c35c75c950075d38c9fb8572f14640fe8235f74.

* red test_pickle_define_var
2024-09-21 14:19:25 +08:00
Tobias Fischer c1bbd15bd9
Sharded SDXL Inference (#6328)
* initial sharding fixes

* sigma device fix

* emptyline space fix

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-09-21 01:26:43 -04:00
nimlgen 641586cb87
qcom updated ioctl (#6627) 2024-09-20 18:02:40 +08:00
George Hotz c4d5575c61
beat mlx at resnet 18 (#6611)
* work to beat mlx at resnet18 [run_process_replay]

* pruning

* wino sometimes

* shorter

* comment
2024-09-20 11:28:01 +08:00
qazal 9295bc0189
viz more work [run_process_replay] (#6568)
* infra

* found it

* real work

* bring those back

* cleanup test_viz

* comment that out
2024-09-17 19:27:09 +08:00
nimlgen 81a4a9623c
add qcom dsp runtime (#6112)
* calling qualcomm dsp from python

* include so files

* add include file

* adsprpc.py

* running with adsprpc

* work

* 32-bit support in elf

* compilation works

* ion

* msm_ion

* working DSP backend

* getting 500 MFLOPS on matmul

* beam works with timing

* move to autogen

* disasm

* progress

* simple tests pass

* qcom_dsp

* more dsp autogen

* progress

* some progress

* works w/o lib

* checkpoint

* no lib

* ugh, better

* cleaner, but with lib. test good, but with the hack

* remove autogens

* small

* push

* simpler

* revert this

* run_3

* simpler

* android

* handle

* run it

* why?

* run2

* to gen

* cc

* cleaner

* elf

* part of autogen

* comemnt

* no lib

* autohen

* linter

* bug reproducer

* cleaner

* this repro is almost empty and doesn't work!!!!

* with this test_ops passes, no crashes anymore

* cleaner

* linter

* renames

* shorter

* remoev contextlib

* ugh

* myoy

* cleaner

* cleaner

* remove import

* conn

* import

* revert this

* remove heavy .so

* shorter alloc

* not tue anymore

---------

Co-authored-by: Comma Device <device@comma.ai>
Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: George Hotz <george@comma.ai>
2024-09-13 21:01:33 +03:00
George Hotz bdd0c06f29
add void type to uop (#6471)
* unwrap_dtype maybe

* uopgraph stuff that hardcoded None

* test_ops passes

* dtypes.py fixups

* update test_linearizer and friends

* more ast updates

* test_beam and test_schedule too

* add void type to uop [run_process_replay]

* remove dumb casts

* start making it green

* more cast cleanups

* more cls methods to fix

* regenerate dataset

* split UOp and NOp const

* maybe that too

* fix docs

* update test_uop_symbolic

* test_verify_ast

* new sops with no diff

* meh, type_ignore is alright

* remove that assert

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-09-11 18:16:28 +08:00
Francis Lata b7ce9a1530
UNet3D MLPerf (#3470)
* add training set transforms

* add DICE cross entropy loss

* convert pred and label to Tensor when calculating DICE score

* cleanups and allow train dataset batching

* fix DICE CE loss calculation

* jitted training step

* clean up DICE CE loss calculation

* initial support for sharding

* Revert "initial support for sharding"

This reverts commit e3670813b8a67469e7f694e09f2d15a8c40065da.

* minor updates

* cleanup imports

* add support for sharding

* apply temp patch to try to avoid OOM

* revert cstyle changes

* add gradient acc

* hotfix

* add FP16 support

* add ability to train on smaller image sizes

* add support for saving and loading checkpoints + cleanup some various modes

* fix issue with using smaller patch size + update W&B logging

* disable LR_WARMUP_EPOCHS

* updates

* minor cleanups

* cleanup

* update order of transformations

* more cleanups

* realize loss

* cleanup

* more cleanup

* some cleanups

* add RAM usage

* minor cleanups

* add support for gradient accumulation

* cleanup imports

* minor updates to not use GA_STEPS

* remove FP16 option since it's available now globally

* update multi-GPU setup

* add timing logs for training loop

* go back to using existing dataloader and add ability to preprocess data to save time

* clean up optimization and re-enable JIT and multi-GPU support for training and evaluation

* free train and eval steps memory

* cleanups and scale batch size based on the number of GPUs

* fix GlobalCounters import

* fix seed

* fix W&B setup

* update batch size default size

* add back metric divergence check

* put back JIT on UNet3d eval

* move dataset preprocessing inside training code

* add test for dice_loss

* add config logging support to W&B and other cleanups

* change how default float is getting retrieved

* remove TinyJit import duplicate

* update config logging to W&B and remove JIT on eval_step

* no need for caching preprocessed data anymore

* fix how evaluation is ran and how often

* add support for LR scaling

* fix issue with gaussian being moved to scipy.signal.windows

* remove DICE loss unit test

* fix issue where loss isn't compatible with multiGPU

* add individual BEAM control for train and eval steps

* fix ndimage scipy import

* add BENCHMARK

* cleanups on BENCHMARK + fix on rand_flip augmentation during training

* cleanup train and eval BEAM envs

* add checkpointing support after every eval

* cleanup model_eval

* disable grad during eval

* use new preprocessing dataset mechanism

* remove unused import

* use training and inference_mode contexts

* start eval after benchmarking

* add data fetching time

* cleanup decorators

* more cleanups on training script

* add message during benchmarking mode

* realize when reassigning LR on scheduler and update default number of epochs

* add JIT on eval step

* remove JIT on eval_step

* add train dataloader for unet3d

* move checkpointing to be done after every epoch

* revert removal of JIT on unet3d inference

* save checkpoint if metric is not successful

* Revert "add train dataloader for unet3d"

This reverts commit c166d129dfbe2e1c46d1937135a60b4ed25caa3d.

* Revert "Revert "add train dataloader for unet3d""

This reverts commit 36366c65d26f59ed1227acb670d5ce7b997606ae.

* hotfix: seed was defaulting to a value of 0

* fix SEED value

* remove the usage of context managers for setting BEAM and going from training to inference

* support new stack API for calculating eval loss and metric

* Revert "remove the usage of context managers for setting BEAM and going from training to inference"

This reverts commit 2c0ba8d322ec912bd8617cbe167c542e9ba229d9.

* check training and test preprocessed folders separately

* clean up imports and log FUSE_CONV_BW

* use train and val preprocessing constants

* add kits19 dataset setup script

* update to use the new test decorator for disabling grad

* update kits19 dataset setup script

* add docs on how to train the model

* set default value for BASEDIR

* add detailed instruction about BASEDIR usage

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-09-10 04:37:28 -04:00
George Hotz 904f6a63fa
Revert "Revert "cleanup process_replay/* namings [run_process_replay] (#6429)…" (#6442)
This reverts commit eda177da84.
2024-09-10 07:00:16 +08:00
George Hotz dbd4536167
Revert "add UOps.VALID (#6387)" (#6441)
This reverts commit 8186e4e7d6.
2024-09-09 21:33:00 +08:00
George Hotz eda177da84
Revert "cleanup process_replay/* namings [run_process_replay] (#6429)" (#6437)
This reverts commit f4e83b30b4.
2024-09-09 18:52:36 +08:00
qazal f4e83b30b4
cleanup process_replay/* namings [run_process_replay] (#6429) 2024-09-09 16:59:04 +08:00
George Hotz 8186e4e7d6
add UOps.VALID (#6387)
* uops valid

* broke full_shape

* fixup that st (hardcoded asts still red)

* fixup DEFINE_VAR

debug

more debug

* start moving stuff to ast_const

* move test_linearizer

* move test_linearizer_failures to ast_const

* fixup test_schedule

* small diff change

* regenerate dataset

* fixup test_multitensor

* regen dataset try 2

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-09-09 16:58:43 +08:00
qazal c5bae55ec8
new generate_dataset.sh (#6423)
* new generate_dataset.sh

* keep those there

* test: rm expected failures

* rename to extract
2024-09-09 15:13:07 +08:00
geohotstan 65da03e186
remove _slice [run_process_replay] (#6395)
* try

* pass

* clean up

* done

* I'm becoming dumber

* clean up 2

* remove useless max

* useless but make computer brrr [run_process_replay]

* try process replay

* try again

* 1 less line, just use pad2d
2024-09-08 09:12:39 +08:00
George Hotz 8f6d0485e7 hotfix: resnet to obj.device 2024-09-06 13:06:02 +08:00
George Hotz 9d72119a0c
minor resnet cleanups (#6382)
* minor resnet cleanups

* that should have been long

* jit

* meh
2024-09-06 12:50:21 +08:00