chenyu
999e3780e9
dropout contiguous after >= p ( #6892 )
...
make it a bool buffer
2024-10-06 19:40:42 -04:00
wozeparrot
9eb6eef441
seed in tensor ( #6869 )
2024-10-06 14:46:58 -04:00
Tobias Fischer
f9e32f2bb2
clip device fix ( #6924 )
2024-10-07 00:47:32 +08:00
chenyu
01a2d7316d
dtype=float in bert log_softmax for loss and accuracy ( #6916 )
2024-10-06 11:15:56 -04:00
jeffzh4ng
19a7e41113
implement logcumsumexp ( #6921 )
...
* implement logcumsumexp
* change axis=None to axis=0
2024-10-06 10:45:36 -04:00
George Hotz
f588169fdc
hotfix: ad for DEBUG=2 in the mnist tutorial
2024-10-06 21:05:48 +08:00
qazal
10ff1d6fb9
viz prep refactor for tracked scope decorator [pr] ( #6920 )
...
* viz prep refactor for tracked scope decorator [pr]
* fix fuzzer
2024-10-06 16:02:09 +03:00
qazal
837f9c6832
new viz fuzz tests, track multiple contexts ( #6913 )
...
* add FUZZ_VIZ option
* add FUZZ_VIZ=1 tests
* use .replace
* rewrites test
* add rewrite_stack
* add FUZZ_VIZ to ops
* what if FUZZ_VIZ was up there
* leave fuzz_viz for now
2024-10-06 14:58:15 +03:00
chenyu
75d9dcf000
support dtype in softmax and log_softmax ( #6914 )
...
matches torch. for mixed precision training, we would want to use float for softmax
2024-10-06 07:18:15 -04:00
chenyu
718b959349
log epoch start and stop for bert ( #6912 )
2024-10-06 06:39:46 -04:00
qazal
b066ef2282
small changes from the viz_rewrite branch [pr] ( #6907 )
...
* simpler replace
* dont show shapetracker consts
* changed_nodes shouldn't exist for the first sink
2024-10-06 12:00:55 +03:00
chenyu
16c1fa4208
use BEAM=3 for red box bert runs ( #6904 )
...
BEAM=4 slightly exceeded 30 minutes setup
2024-10-05 09:21:12 -04:00
chenyu
0e706227a2
add seed to bert result log filename ( #6903 )
...
* add seed to bert result log filename
* different name for different benchmark
2024-10-05 09:15:24 -04:00
George Hotz
8ed3a00c9c
ceildiv helper [pr] ( #6899 )
2024-10-05 14:59:10 +08:00
chenyu
fd68b6dbc2
type annotation to round_up ( #6898 )
...
* type annotation to round_up
also cleaned up places where round_up was potentially called on symbolic
* fix
2024-10-04 23:27:23 -04:00
chenyu
3c12244cfc
remove DTypeLike from lazy ( #6897 )
...
keep only in tensor
2024-10-04 22:49:21 -04:00
George Hotz
0d6216aba1
bump the download cache ( #6896 )
2024-10-05 10:23:18 +08:00
George Hotz
4058a99275
symbolic in ops 2 [pr] ( #6895 )
...
* move symbolic to ops, simple [pr]
* fix for shapetracker
2024-10-05 10:20:07 +08:00
chenyu
08414d7b7c
cleanup test_uop_symbolic.py ( #6894 )
...
no more test_symbolic for reference, so force expected output to be exact instead of a set
2024-10-04 20:53:10 -04:00
ignaciosica
555bcb5e54
static access for code_for_op ( #6889 )
2024-10-05 07:38:01 +08:00
vladov
5f6b6162b3
Suppress warnings in transcendental tests. ( #6891 )
2024-10-05 07:37:17 +08:00
nimlgen
707c805a68
nv set localmem sm count to max ( #6890 )
2024-10-04 23:29:46 +03:00
George Hotz
4df5c7a4ef
move lazy to engine [pr] ( #6886 )
...
* move lazy to engine [pr]
* engine.lazy
2024-10-04 23:19:26 +08:00
George Hotz
6b063450df
move hcq device to runtime [pr] ( #6879 )
...
* things that are only used in one place don't belong in helpers [pr]
* start moving hcq device [pr]
* fix paths
2024-10-04 22:26:50 +08:00
George Hotz
5be2bd18a6
use UOps.BIND instead of ASSIGN, it's different ( #6885 )
2024-10-04 22:26:33 +08:00
chenyu
4c3895744e
type annotation for layernorm ( #6883 )
2024-10-04 09:03:56 -04:00
George Hotz
8ca506ee37
remove the magic methods for moving between devices [pr] ( #6881 )
...
* remove the magic methods for moving between devices [pr]
* remove unneeded clang
2024-10-04 20:27:52 +08:00
chenyu
7c8849010a
fix var_vals in MCTS ( #6882 )
...
tested with JITBEAM=100 llama
2024-10-04 08:19:35 -04:00
George Hotz
a0cb16ac61
node cleanup + local metal test speed [pr] ( #6880 )
...
* node cleanup [pr]
* fix tests, including the double one on metal
* no time tqdm tests
2024-10-04 18:14:23 +08:00
George Hotz
cdff1d75b6
things that are only used in one place don't belong in helpers [pr] ( #6878 )
...
* things that are only used in one place don't belong in helpers [pr]
* pretty print moved
2024-10-04 17:27:38 +08:00
George Hotz
f4ec39fe58
switch symbolic from old to uops, final PR ( #6872 )
...
* switch symbolic from old to uops, final PR
* two wrong answers
* not needed resolves
* symbolic ops passes
* symbolic ops passes
* progress
* tests pass (almost)
* fix last test
* fix some tests
* global binding and unbinding
* Revert "global binding and unbinding"
This reverts commit 9456725630316487509980af20c6d2981de00bec.
* that test works now
* vars on uop doesn't recurse
* fix fuzzer
* update
* fix type
* fix gpt, it's UOp now
* ssimplify symbolics
2024-10-04 16:42:27 +08:00
George Hotz
738a5794a9
last update for new symbolic [pr] ( #6877 )
2024-10-04 14:58:51 +08:00
chenyu
7391376528
update bert hparams ( #6876 )
...
4h32m with this https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/q99frv1l/overview .
loss scaler 2**13->2**10. matched the closest submission, no nan for ~10 runs.
increased lr and total step a bit.
`PARALLEL=0` after setup, same as resnet.
2024-10-04 00:39:06 -04:00
George Hotz
0dee49637e
small symbolic changes [pr] ( #6874 )
...
* small symbolic changes [pr]
* need that unbind
2024-10-04 12:03:08 +08:00
George Hotz
c50d3c4979
move const mover to ops [pr] ( #6873 )
...
* move const mover to ops [pr]
* move more
2024-10-04 11:49:32 +08:00
Tim Becker
d42cb5596f
Restore fast path for matching new_src in rewrite ( #6870 )
2024-10-04 11:22:24 +08:00
ignaciosica
8931f20765
CLANG fixed ops python [run_process_replay] ( #6866 )
...
* hotfix: fixed values in ops_python for AMX
* hotfix: remove unused import
2024-10-03 20:40:04 +08:00
George Hotz
4b6732c4f6
safe changes for new symbolic [pr] ( #6864 )
2024-10-03 20:39:15 +08:00
qazal
17068410e6
give EXT schedules metadata [pr] ( #6865 )
2024-10-03 20:14:18 +08:00
qazal
5517a07a09
viz late to_program and benchmarks [pr] ( #6851 )
...
* viz late to_program [pr]
* benchmark resnet
* delete all of checkStatus
* revert that
* fixup
* get from kernel
2024-10-03 18:29:04 +08:00
qazal
c7925414df
don't default print the whole graph in buf limit error [pr] ( #6861 )
2024-10-03 18:02:19 +08:00
George Hotz
e10245909a
explore global uop cache [pr] ( #6863 )
...
* explore global uop cache
* wvd uops
* remove useless lru caches
* key is is
* simpler rewriter
2024-10-03 13:08:13 +08:00
George Hotz
a26c6a0ad0
cleanup with smax [pr] ( #6854 )
...
* cleanup with smax [pr]
* add that resolve
2024-10-03 08:11:02 +08:00
nimlgen
8bbf6fb88c
use mv_address in ops_gpu ( #6856 )
2024-10-02 22:31:51 +03:00
chenyu
c3c93f332a
symbolic bool raise ValueError when not sure [pr] ( #6853 )
2024-10-02 09:10:58 -04:00
chenyu
08850da026
minor rand_like change [run_process_replay] ( #6848 )
2024-10-02 07:27:51 -04:00
George Hotz
7214450c23
little symbolic changes [pr] ( #6849 )
...
* little symbolic changes [pr]
* symbolic needs resolve too
* no resolve
* less change
2024-10-02 17:12:30 +08:00
qazal
fc78716d31
Buffer arg from big graph [pr] ( #6847 )
...
* Buffer arg from big graph [pr]
* x.dtype
2024-10-02 15:28:47 +08:00
qazal
29363fb85e
add dtype.ptr() [pr] ( #6839 )
2024-10-02 15:03:05 +08:00
George Hotz
be12409b51
changes for symbolic ( #6844 )
...
* changes for symbolic
* only for ints
* check int first
2024-10-02 12:57:16 +08:00