Commit Graph

6154 Commits

Author SHA1 Message Date
chenyu 68e59eb3f5
update mlperf-logging to 4.1.0-rc3 (#6796) 2024-09-28 21:45:37 -04:00
qazal dab05ff070
match dataclass.replace in UOp.replace [run_process_replay] (#6792)
* UOp replace matching dataclass replace

* p2

* replace creates a copy
2024-09-28 16:28:49 +08:00
chenyu 494b20e886
bert BS back to 54 (#6791)
60 does not run end to end
2024-09-27 22:16:05 -04:00
chenyu 572d77d1d9
bert script delete eval data after eval (#6790)
fits BS=60 which is 2% faster than 54. also fixed wandb logging params
2024-09-27 20:54:00 -04:00
chenyu f9c8e144ff
chmod +x mlperf bert script for red (#6789)
also disabled raising power cap in setup. wozeparrot mentioned that's unstable and might cause bert training issue on red
2024-09-27 11:27:32 -04:00
Francis Lata d3a387be63
[MLPerf] Prepare openimages dataset script (#6747)
* prepare openimages for MLPerf

* cleanup

* fix issue when clearing jit_cache on retinanet eval

* revert pandas specific changes
2024-09-27 11:13:56 -04:00
chenyu bc82f8c5be
use where in dropout (#6758)
should save memory since we only store mask in bool instead of the upcasted used in mul
2024-09-27 11:11:43 -04:00
qazal 76b3c1e818
add all realized Buffers to schedule graph edges [run_process_replay] (#6786)
* add realized Buffers to bufs

* simpler checks
2024-09-27 19:25:51 +08:00
qazal 568c97f7a2
add UOp.define_global [run_process_replay] (#6787)
* add UOp.define_global [run_process_replay]

* no src
2024-09-27 19:24:03 +08:00
nimlgen b95f47784a
qcom sleep when sync (#6785)
* qcom sleep when sync

* linter

* short
2024-09-27 19:14:10 +08:00
qazal fb3fe6f39b
better VIZ (#6781)
* ui changes

* make kernels global

* dont save buffers when running VIZ=1

* remove flex in layout

* use os.execv

* del server thread

* server close

* cleanup

* logs cleanup

* rm getenv

* cleanups

* remove global
2024-09-27 18:38:31 +08:00
chenyu 2fc26890c9
default BS=9 in handcode_opt bert (#6783)
using 54 for 6 gpus now, and 2 is not a good default
2024-09-27 04:38:16 -04:00
George Hotz 9a3f6f392d llm.c tok/s 2024-09-27 00:46:18 -07:00
George Hotz b0e70ab04f llm.c updates 2024-09-27 15:25:59 +08:00
George Hotz eaa1e0eeeb
rename constant_folder to sym [run_process_replay] (#6780) 2024-09-27 14:54:54 +08:00
qazal 900b21ef0c
viz delete const after fold (#6778)
* viz delete const after fold

* add base to tests
2024-09-27 11:58:01 +08:00
qazal 94e43dc49a
add Buffer.to_uop [run_process_replay] (#6777) 2024-09-27 11:41:23 +08:00
qazal 98a81b36e1
viz table view (#6743)
* fix matcher with ctx

* current_kernel fix

* add table

* make the right things clickable

* some more init work

* add kernel resizer

* Revert "add kernel resizer"

This reverts commit 035eef37039aa1e848a766a29e3c4e81bbff2bab.

* allow scroll
2024-09-27 10:26:46 +08:00
chenyu bea7ed5986
add RUNMLPERF=1 to bert dev_run.sh (#6775)
already set in run_and_time.sh, need RUNMLPERF=1 for it to load real data
2024-09-26 11:00:49 -04:00
George Hotz c178dc1071
faster uops ci [run_process_replay] (#6774) 2024-09-26 20:15:01 +08:00
George Hotz 249af24f18
metal bfloat as cast (#6773) 2024-09-26 19:31:40 +08:00
George Hotz ed2f28388f
render cast is rewrite rules [run_process_replay] (#6772)
* render cast is rewrite rules [run_process_replay]

* move load/store to rewrite rules

* render_alu smaller

* render_gep
2024-09-26 19:03:31 +08:00
nimlgen 3c56aeee70
add Tensor.from_blob (#6765)
* draft tensor from pointer init

* some docs and types

* comment

* cleaner

* test

* malloc

* qcom cl interop

* jit example

* cleaner

* dealoc

* wording

* docs
2024-09-26 18:33:19 +08:00
George Hotz 14ad47b515
rewrite to use uops if (#6764)
* rewrite to use uops if

* does this pass

* careful penalty

* fix tests

* remove unused stuff

* that's a cstyle rewrite

* Update test_linearizer_dumb.py
2024-09-26 18:09:09 +08:00
George Hotz 7e7184bb13
cleaner ptx match rules [run_process_replay] (#6770)
* cleaner ptx match rules [run_process_replay]

* clean up load/store rules

* now that's clean

* oops, typo

* cast back to bool
2024-09-26 17:44:10 +08:00
chenyu 12de203a43
add IGNORE_JIT_FIRST_BEAM to bert scripts (#6769)
* update bert BEAM params

copied from resnet to start with

* just IGNORE_JIT_FIRST_BEAM
2024-09-26 05:38:24 -04:00
wozeparrot 15cd42cfb9
feat: support TRACEMETA=2 in handcode_opt (#6767) 2024-09-26 16:58:29 +08:00
chenyu 5a5fbfa1eb
smaller bert script change (#6768)
only WANDB and RUNMLPERF order. BENCHMARK and BEAM will be done differently
2024-09-26 04:54:28 -04:00
wozeparrot abd484a9f7
fix: need numpy for docs and testing (#6766) 2024-09-26 16:44:59 +08:00
wozeparrot 2b899164c6
no numpy (#6751) 2024-09-26 16:40:18 +08:00
George Hotz 7fca0bc912
use pattern matcher for image [run_process_replay] (#6762)
* use pattern matcher for image [run_process_replay]

* try again

* this
2024-09-26 15:49:09 +08:00
qazal 197f8fd986
early uop globals with Buffer (#6753) 2024-09-26 15:34:21 +08:00
George Hotz e999281502
match_to_scalar (#6761) 2024-09-26 14:50:47 +08:00
George Hotz 0c7d34ceb7
did vload do anything? [run_process_replay] (#6760) 2024-09-26 14:46:16 +08:00
qazal ee4feedb77
delete test_variable_const [run_process_replay] (#6757)
* delete test_variable_const [run_process_replay]

* don't allow variable UPat
2024-09-26 12:27:11 +08:00
chenyu 0424c4967d
fix handcode_opt.py for bert (#6756) 2024-09-26 00:20:24 -04:00
chenyu 396c96357b
update mlperf bert scripts (#6755)
removed DISABLE_DROPOUT=1.
updated BS to 54 that works on tinyboxes with dropouts.
used bert's sparse_categorical_crossentropy that takes Tensor ignore_index in accuracy method
2024-09-25 23:55:05 -04:00
George Hotz 717b394391
remove defaultdict from PatternMatcher [run_process_replay] (#6754)
* remove defaultdict from PatternMatcher [run_process_replay]

* nicer way to write that

* same line count

* tpm too
2024-09-26 11:25:01 +08:00
George Hotz 7e73c7b3cc hotfix: bump stable diffusion val distance 2024-09-26 11:15:29 +08:00
George Hotz ff880f5be4 hotfix: force_transcendental to fix process replay 2024-09-26 11:13:16 +08:00
George Hotz a6a70aa4bd
add optional NEG and SUB (#6750)
* add optional NEG and SUB

* describe that compute + optional mulacc

* ptx cleanup

* lil cleanups
2024-09-26 10:50:53 +08:00
George Hotz 197dbbda0f add UnaryOps.NEG + BinaryOps.SUB so process replay can work 2024-09-26 10:36:33 +08:00
George Hotz b199b699ed
use shl everywhere (#6744)
* use shl everywhere

* fix parens

* late patterns

* works as an extra pass

* ptx
2024-09-26 09:59:36 +08:00
qazal 88160e59b2
gate engine.graph imports [run_process_replay] (#6748) 2024-09-26 09:13:49 +08:00
qazal 12e4a4900a
hotfix: missing return in METAL dm benchmark (#6749) 2024-09-26 09:12:38 +08:00
qazal 8a15ccb414
start gc/mem usage tests for buffer schedule [run_process_replay] (#6737)
* gc tests for buffer schedule [run_process_replay]

* assert global counters, maybe del

* check init

* rm global counters
2024-09-26 08:26:31 +08:00
qazal b629a7998d
early assert buffer count limit [run_process_replay] (#6746)
* better error message for buffer count limit [run_process_replay]

* 3.9 needs that

* assert ScheduleItem

* new _test_buf_cnt
2024-09-26 08:24:26 +08:00
wozeparrot 4ebc9589a6
feat: make buffer (#6745) 2024-09-25 18:31:03 +08:00
wozeparrot c100f3d406
default threefry (#6116) 2024-09-25 17:45:13 +08:00
mesozoic-egg 992cde05d7
Metal with CDLL instead of py-objc (#6545)
* Add CDLL interface for metal

* remove two unused functions

* Cover most of the API methods

* switch to cdll

* directly call objc message in ops_metal

* keep only obj interface

* Use direct message sending for graph

* may have found a solution to the memoryview on ctypes pointer

* buf indexing bug fixed

* fix c_int

* fix c int to bytes

* fix gpu time bug

* line savings for cdll metal core

* wip

* c int bug

* fix buf casting

* dedup for c_void_p

* dedup for c_void_p

* linter fix

* remove unused stuff

* my py fix

* more mypy error fix

* line savings

* line savings

* rename send_message to msg; add __hash__ and __eq__ for dedup

* wip

* refactor

* refactor

* remove named import from ctypes

* forgot to change variable name

* file reorg, put support.py to ops_metal

* refactor

* hash error

* remove to_ns_array

* test oom exception, fix exception change

* typevar for msg

* add back dedup

* test for compile error

* move constant to graph

* move header constant around

* get label for icb buffer

* check icb label using "in"

* wip fixing mypy reported error

* fixed mypy error

* code formatting

* all_resources dedup match previous

* code formatting

* code formatting; buffer set to objc_id

* revert changes on buf for the manual release, seems like _free is not always called

* skip unless on metal, for test_metal

* fix premature mem release causing seg fault

* test_metal check for device before importing

* Buffer should only be released under _free explicitly

* mypy fixes

* change object ownership

* test compile success

* lint fixes

* remove load_library

* wrap sel_register in cache

* simplify to_struct

* swap lines

* fix type error in to_struct

* bump line to 9800

* remove pyobjc from setup.py

* command buffer should be objc_instance and get released

* stringWithUTF8String: returns objc_instance

* Use constant for MTLPipelineOptionNone

* better explanation for [MTLBuffer contents:] return

* Use dyld_find in case the path differs

* trailing whitespace

* handle exception for methods that take error:

* load /System/Library instead of /Library

* Init c_void_p with None instead of zero for error objects

---------

Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-09-25 17:43:01 +08:00