Commit Graph

2202 Commits

Author SHA1 Message Date
George Hotz fc2303e520 gitignore in weights 2023-08-02 16:26:41 +00:00
chenyu 18d0a93f09
LazyBuffer.get_variable_buffers() (#1391)
* LazyBudder.get_variable_buffers()

* remove left_only, add ProdNode

* no vars for OpNode.b

* do not change symbolic vars, remove ProdNode
2023-08-02 09:01:35 -07:00
Umut Zengin 8889821547
Const pad support to pad2d and slice (#1392)
* slice to pad2d migrate

* Gain line

* Mypy happy

* Mypy happy

* Revert

* whitespace
2023-08-02 08:58:52 -07:00
wozeparrot ab9e4a2e93
Make cuda CI a bit more consistent (#1403)
* feat: use fast-apt-mirror

* feat: use in more places
2023-08-02 07:38:22 -07:00
wozeparrot 7aff8c4ded
cl fixes (#1402)
* feat: non-blocking

* feat: store event on buffer
2023-08-01 22:13:51 -07:00
Alex Telon b66361843a
Timing and Context can now be used as decorators (#1385)
* Context and Timing can now be used as decorators

* Using Timing decorator in quickstart.md

The time formating is better and is a useful tool to learn.

Old: Time: 3.5260659999912605
New: Time: 3526.14 ms

* Updated env_vars documentation for Context

* Added test for Context decorator

* Put new import on same line as others
2023-08-01 17:16:10 -07:00
chenyu d9d1372dd0
Update pytest.ini format (#1398) 2023-08-01 18:00:51 -04:00
George Hotz f4218b709f
Revert "Improve Metal runtime command buffer handling (#1335)" (#1397)
This reverts commit bd54105b6b.
2023-08-01 12:10:20 -07:00
Diogo 4dc8595069
simple exporting models (#1344)
* unified exporting

* json exporting

* ignore more

* simplified buffer export

* added dtypes

* added assert

* swift example

* fix tests

* linter

* remove whitespace

* fixed tests

* remove swift example

* remove unintended changes

* allow callable models to be used

* whitespace

* more readable json export

* name change

* whitespace

* whitespace
2023-08-01 09:35:48 -07:00
wozeparrot 7c7cf16ef2
use host ptr for speed on copyouts (#1393)
* feat: use mapped buffer for speed

* fix: whoops don't need that

* feat: don't need explicit call to memoryview
2023-08-01 09:34:12 -07:00
Diogo ba5e3818a0
Limit dims based on max size (#1390)
* working

* whitespace

* changed defaults to None

* linter

* last linter error
2023-07-31 19:18:19 -07:00
chenyu b2fde9ec36
reshape to register variable value (#1386)
* reshape to register variable value

* better error message
2023-07-31 17:10:02 -07:00
Umut Zengin 0de5f20970
Re-open constant pad support to Tensor.pad (#1388)
* Added const padding support to .pad

* Linter
2023-07-31 17:08:57 -07:00
David Hou 3300d0aeaf
syncthreads before wmma (#1389)
(venv) chaos@tiny3:~/tinygrad$ KX=2 KY=2 N=2048 python extra/gemm/hip_matmul.py
   4194304    289.60 us, would be  59322.55 GFLOPS matmul, 173.80 GB/s
2023-07-31 17:05:49 -07:00
Alex Telon 2d10e0340e
Refactored ContextVars (#1331) 2023-07-31 15:44:46 -04:00
George Hotz f27df835a6
delete dead stuff (#1382)
* delete bpe from repo

* remove yolo examples

* Revert "remove yolo examples"

This reverts commit cd1f49d4662a5565726ae1fa7bf3f6a3e3985965.

* no windows
2023-07-31 11:17:49 -07:00
Yixiang Gao 6e62dcfbf3
add check global dim limit in linearizer (#1299)
* need a better place for reshape and permute

* add permutation

* cuda fixed

* clean up

* enable nvidia GPU with global max

* fix order

* fix CI

* add check for global dim limit but need refactor

* refactor

* fix ignore
2023-07-31 11:14:54 -07:00
ronak69 ce0ab1c14e
convert `$@` to `"$@"` in `run_multibackend.sh` (#1379) 2023-07-31 10:39:22 -07:00
chenyu f5ef445cb6
trim space (#1381) 2023-07-31 10:37:57 -07:00
JaSpa99 5ab12059da
rng hlops: add normal and kaiming_normal (#1378)
* add normal and kaiming_normal

* make sure its float

* add tests
2023-07-31 10:37:02 -07:00
George Hotz 37fa7e96fb
Revert "update editorconfig, enforce via CI (#1343)" (#1380)
This reverts commit da2efecbe2.
2023-07-31 10:35:50 -07:00
Pavol Rusnak da2efecbe2
update editorconfig, enforce via CI (#1343)
* update editorconfig to set unix-style newlines and trim whitespace

* add editorconfig github action to the CI

* fix whitespace
2023-07-30 18:44:30 -07:00
S-Lykles c2b82ea8ac
fix to_shape_strides (#1374)
* add tests for expr_node and expr_idxs

* simplify condition and add missing optimization
2023-07-30 18:42:46 -07:00
chenyu 1fdf560fb1
simplify get_contraction (#1373) 2023-07-30 18:35:22 -07:00
S-Lykles a32c677601
Fix off by one error in View.expr_node (#1363)
* Fix off_by_one error in View.expr_node

* Add test for expr_node

* Remove whitespace before :

* test no arguments and properly test idx=None
2023-07-29 08:10:37 -07:00
chenyu ab80ea0d38
use ubuntu for clang ci test (#1368) 2023-07-28 20:51:25 -04:00
Karan Handa e0a69bdbe6
Fix argfix and add tests (#1365)
* Remove unreachable code

* Fixed argfix

* Add empty check and tests

* Removed redundant tests"
2023-07-28 09:09:49 -07:00
wozeparrot 32d1afa4b5
feat: correct case when base is 0 (#1360) 2023-07-27 13:53:38 -04:00
wozeparrot c22e77abfd
Match torch on fractional negative base pow (#1352)
* feat: match torch on fractional negative base pow

* feat: tests for trunc
2023-07-26 19:14:54 -07:00
Anthony Zboralski bd54105b6b
Improve Metal runtime command buffer handling (#1335)
* Improve Metal runtime command buffer handling

* Remove obsolete mtl_buffers_in_flight list from _METAL class

* remove unused import in ops_metal.py

* Refactor: Use `self.dispatch_group` over `METAL.dispatch_group`

Changes `libdispatch.dispatch_group_enter(METAL.dispatch_group)` to `libdispatch.dispatch_group_enter(self.dispatch_group)`
2023-07-26 15:45:40 -07:00
Umut Zengin d4ebadf2da
Small Tensor.cat optimization and reformating (#1347) 2023-07-26 18:01:12 -04:00
geohotstan 4056f97187
Gather (#1329) 2023-07-25 15:05:41 -04:00
Francis Lam 9d142430cb
Add option in llama.py to quantize weights to int8 at runtime (#1289)
* Add option in llama.py to quantize weights to int8 at runtime

Also added lm-eval to external

* Add support for llama-2 evaluation
2023-07-24 17:22:38 -07:00
wozeparrot 12dd09ad54
feat: better comment for state bfloat16 conversion (#1338) 2023-07-24 17:17:40 -04:00
Pavol Rusnak cd60b8561c
Add LLaMA-2 support (#1284)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2023-07-24 17:12:02 -04:00
waifairer d89fb729e5
flake8 (#1323)
* flake8: Ignore frequent violations, correct infrequent ones

* Ignore some rules in test

* Reorder test ignores

* Lint test + main

* EOF indent

* Include all E71,E72 errors

* Test the failing case in CI

* Revert "Test the failing case in CI"

This reverts commit 110add0a70f5a619d07631269104e84f908af6b9.

* Push to test!
This reverts commit f317532779a0e1ac8401e2474fd5c6c8695c08e9.

* ok back to passing
This reverts commit ba5052685f93f83e06152cdc696b9e26131d8ab7.

* Prove that CI fails when formatting is incorrect.

* Fix formatting

* Remove duplicitous E117 rule

* Use flake8 config for precommit

---------

Co-authored-by: waifairer <waifairer@gmail.com>
2023-07-24 11:19:58 -04:00
wozeparrot 51173f0a48
HIP backend fixes (#1336)
* feat: hip trains cifar

* feat: test_dtype fixes
2023-07-24 08:16:57 -07:00
George Hotz 086382b64e
Revert "Fix max nan (#1298)" (#1334)
This reverts commit 50774470b2.
2023-07-23 20:41:28 -07:00
uncommonSensor 50774470b2
Fix max nan (#1298)
* Fix max nan

* Adds nan check option to max function
* Calls to max can pass in "ignore_nan=True" argument
* Added max nan CI tests

* Fix max nan

* Adds nan check option to max function
* Calls to max can pass in "ignore_nan=True" argument
* Added max nan CI tests
* Turned off due to the need for granularity
2023-07-23 19:39:44 -07:00
cheeetoo a0965ee198
CI < 5 minutes (#1252)
* models matrix

* fix typo and install gpu deps

* install llvm deps if needed

* fix

* testops with cuda

* remove pip cache since not work

* cuda env

* install cuda deps

* maybe it will work now

* i can't read

* all tests in matrix

* trim down more

* opencl stuff in matrix

* opencl pip cache

* test split

* change cuda test exclusion

* test

* fix cuda maybe

* add models

* add more n=auto

* third thing

* fix bug

* cache pip more

* change name

* update tests

* try again cause why not

* balance

* try again...

* try apt cache for cuda

* try on gpu:

* try cuda again

* update packages step

* replace libz-dev with zlib1g-dev

* only cache cuda

* why error

* fix gpuocelot bug

* apt cache err

* apt cache to slow?

* opt and image in single runner

* add a couple n=autos

* remove test matrix

* try cuda apt cache again

* libz-dev -> zlib1g-dev

* remove -s since not supported by xdist

* the cache takes too long and doesn't work

* combine webgpu and metal tests

* combine imagenet to c and cpu tests

* torch tests with linters

* torch back by itself

* small windows clang test with torch tests

* fix a goofy windows bug

* im dumb

* bro

* clang with linters

* fix pylint error

* linter not work on windows

* try with clang again

* clang and imagenet?

* install deps

* fix

* fix quote

* clang by itself (windows too slow)

* env vars for imagenet

* cache pip for metal and webgpu tests

* try torch with metal and webgpu

* doesn't work, too long

* remove -v

* try -n=logical

* don't use logical

* revert accidental thing

* remove some prints unless CI

* fix print unless CI

* ignore speed tests for slow tests

* clang windows in matrix (ubuntu being tested in imagenet->c test)

* try manual pip cache

* fix windows pip cache path

* all manual pip cache

* fix pip cache dir for macos

* print_ci function in helpers

* CI as variable, no print_ci

* missed one

* cuda tests with docker image

* remove setup-python action for cuda

* python->python3?

* remove -s -v

* try fix pip cache

* maybe fix

* try to fix pip cache

* is this the path?

* maybe cache pip

* try again

* create wheels dir

* ?

* cuda pip deps in dockerfile

* disable pip cache for clang

* image from ghcr instead of docker hub

* why is clang like this

* fast deps

* try use different caches

* remove the fast thing

* try with lighter image

* remove setup python for cuda

* small docker and cuda fast deps

* ignore a few more tests

* cool docker thing (maybe)

* oops

* quotes

* fix docker command

* fix bug

* ignore train efficientnet test

* remove dockerfile (docker stuff takes too long)

* remove docker stuff and normal cuda

* oops

* ignore the tests for cuda

* does this work

* ignore test_train on slow backends

* add space

* llvm ignore same tests as cuda

* nvm

* ignore lr scheduler tests

* get some stats

* fix ignore bug

* remove extra '

* remove and

* ignore test for llvm

* change ignored tests and durationon all backends

* fix

* and -> or

* ignore some more cuda tests

* finally?

* does this fix it

* remove durations=0

* add some more tests to llvm

* make last pytest more readable

* fix

* don't train efficientnet on cpu

* try w/out pip cache

* pip cache seems to be generally better

* pytest file markers

* try apt fast for cuda

* use quick install for apt-fast

* apt-fast not worth

* apt-get to apt

* fix typo

* suppress warnings

* register markers

* disable debug on fuzz tests

* change marker names

* apt update and apt install in one command

* update marker names in test.yml

* webgpu pytest marker
2023-07-23 13:00:56 -07:00
George Hotz 47f9d82722 test_conv: relax to 0.93 2023-07-23 12:57:29 -07:00
Giles Bathgate c4238b4ea0
Fix discriminator balancing in mnist_gan example (#1332) 2023-07-23 12:43:05 -07:00
chenyu aa05495620
symbolic stride (#1326) 2023-07-23 12:41:22 -07:00
Cole Sutyak 2d4e182294
change fetch to allow for local file selection (#1309) 2023-07-23 15:00:16 -04:00
waifairer 7cac5ea16c
[GH-1305] Refactor test_dtypes.py to be cleaner (#1306)
Co-authored-by: waifairer <waifairer@gmail.com>
2023-07-21 18:18:02 -04:00
Maxim Zakharov 48c4df1263
fix: prevent infinite "loading..." state (#1319)
* demo somewhy doesn't work on my device and throw eror "Error: GPUPipelineError: [Invalid ShaderModule] is invalid" inside setupNet func
* because of that, JS halts the execution of the rest of the code below and on the screen we see "loading..." forever
* added try catch here to communicate about the error in a proper way
2023-07-21 14:01:53 -07:00
Jacob Pradels b112edd2c3
Add pylint trailing whitespace rule (#1314) 2023-07-21 13:37:55 -04:00
George Hotz bfbb8d3d0f
fix ones, BS=2 stable diffusion, caching optimizer (#1312)
* fix ones, BS=2 stable diffusion

* caching optimizer

* print search time

* minor bug fix
2023-07-21 09:55:49 -07:00
George Hotz 9746f6d094
move hand coded optimizer (#1310)
* move hand coded optimizer

* llvm can optimize

* fix llvm

* save linearizer
2023-07-21 07:53:12 -07:00
madt2709 d2c1e8409a
Update arange to be (start, stop, step) (#1308) 2023-07-21 00:27:23 -04:00