* Context and Timing can now be used as decorators
* Using Timing decorator in quickstart.md
The time formating is better and is a useful tool to learn.
Old: Time: 3.5260659999912605
New: Time: 3526.14 ms
* Updated env_vars documentation for Context
* Added test for Context decorator
* Put new import on same line as others
* need a better place for reshape and permute
* add permutation
* cuda fixed
* clean up
* enable nvidia GPU with global max
* fix order
* fix CI
* add check for global dim limit but need refactor
* refactor
* fix ignore
* Improve Metal runtime command buffer handling
* Remove obsolete mtl_buffers_in_flight list from _METAL class
* remove unused import in ops_metal.py
* Refactor: Use `self.dispatch_group` over `METAL.dispatch_group`
Changes `libdispatch.dispatch_group_enter(METAL.dispatch_group)` to `libdispatch.dispatch_group_enter(self.dispatch_group)`
* flake8: Ignore frequent violations, correct infrequent ones
* Ignore some rules in test
* Reorder test ignores
* Lint test + main
* EOF indent
* Include all E71,E72 errors
* Test the failing case in CI
* Revert "Test the failing case in CI"
This reverts commit 110add0a70f5a619d07631269104e84f908af6b9.
* Push to test!
This reverts commit f317532779a0e1ac8401e2474fd5c6c8695c08e9.
* ok back to passing
This reverts commit ba5052685f93f83e06152cdc696b9e26131d8ab7.
* Prove that CI fails when formatting is incorrect.
* Fix formatting
* Remove duplicitous E117 rule
* Use flake8 config for precommit
---------
Co-authored-by: waifairer <waifairer@gmail.com>
* Fix max nan
* Adds nan check option to max function
* Calls to max can pass in "ignore_nan=True" argument
* Added max nan CI tests
* Fix max nan
* Adds nan check option to max function
* Calls to max can pass in "ignore_nan=True" argument
* Added max nan CI tests
* Turned off due to the need for granularity
* models matrix
* fix typo and install gpu deps
* install llvm deps if needed
* fix
* testops with cuda
* remove pip cache since not work
* cuda env
* install cuda deps
* maybe it will work now
* i can't read
* all tests in matrix
* trim down more
* opencl stuff in matrix
* opencl pip cache
* test split
* change cuda test exclusion
* test
* fix cuda maybe
* add models
* add more n=auto
* third thing
* fix bug
* cache pip more
* change name
* update tests
* try again cause why not
* balance
* try again...
* try apt cache for cuda
* try on gpu:
* try cuda again
* update packages step
* replace libz-dev with zlib1g-dev
* only cache cuda
* why error
* fix gpuocelot bug
* apt cache err
* apt cache to slow?
* opt and image in single runner
* add a couple n=autos
* remove test matrix
* try cuda apt cache again
* libz-dev -> zlib1g-dev
* remove -s since not supported by xdist
* the cache takes too long and doesn't work
* combine webgpu and metal tests
* combine imagenet to c and cpu tests
* torch tests with linters
* torch back by itself
* small windows clang test with torch tests
* fix a goofy windows bug
* im dumb
* bro
* clang with linters
* fix pylint error
* linter not work on windows
* try with clang again
* clang and imagenet?
* install deps
* fix
* fix quote
* clang by itself (windows too slow)
* env vars for imagenet
* cache pip for metal and webgpu tests
* try torch with metal and webgpu
* doesn't work, too long
* remove -v
* try -n=logical
* don't use logical
* revert accidental thing
* remove some prints unless CI
* fix print unless CI
* ignore speed tests for slow tests
* clang windows in matrix (ubuntu being tested in imagenet->c test)
* try manual pip cache
* fix windows pip cache path
* all manual pip cache
* fix pip cache dir for macos
* print_ci function in helpers
* CI as variable, no print_ci
* missed one
* cuda tests with docker image
* remove setup-python action for cuda
* python->python3?
* remove -s -v
* try fix pip cache
* maybe fix
* try to fix pip cache
* is this the path?
* maybe cache pip
* try again
* create wheels dir
* ?
* cuda pip deps in dockerfile
* disable pip cache for clang
* image from ghcr instead of docker hub
* why is clang like this
* fast deps
* try use different caches
* remove the fast thing
* try with lighter image
* remove setup python for cuda
* small docker and cuda fast deps
* ignore a few more tests
* cool docker thing (maybe)
* oops
* quotes
* fix docker command
* fix bug
* ignore train efficientnet test
* remove dockerfile (docker stuff takes too long)
* remove docker stuff and normal cuda
* oops
* ignore the tests for cuda
* does this work
* ignore test_train on slow backends
* add space
* llvm ignore same tests as cuda
* nvm
* ignore lr scheduler tests
* get some stats
* fix ignore bug
* remove extra '
* remove and
* ignore test for llvm
* change ignored tests and durationon all backends
* fix
* and -> or
* ignore some more cuda tests
* finally?
* does this fix it
* remove durations=0
* add some more tests to llvm
* make last pytest more readable
* fix
* don't train efficientnet on cpu
* try w/out pip cache
* pip cache seems to be generally better
* pytest file markers
* try apt fast for cuda
* use quick install for apt-fast
* apt-fast not worth
* apt-get to apt
* fix typo
* suppress warnings
* register markers
* disable debug on fuzz tests
* change marker names
* apt update and apt install in one command
* update marker names in test.yml
* webgpu pytest marker
* demo somewhy doesn't work on my device and throw eror "Error: GPUPipelineError: [Invalid ShaderModule] is invalid" inside setupNet func
* because of that, JS halts the execution of the rest of the code below and on the screen we see "loading..." forever
* added try catch here to communicate about the error in a proper way