Commit Graph

77 Commits

Author SHA1 Message Date
chenyu 4a65010de8
remove CUDACPU flag in tests [run_process_replay] (#5902)
no longer used
2024-08-04 16:06:38 -04:00
George Hotz 829262a5ee add external_test_speed_theoretical 2024-07-26 17:45:22 -07:00
Edward Wang 9a7d5a148e
move colorize_float to helpers.py (#5490)
* add colorize_float to helpers.py

* update references
2024-07-15 11:29:03 -07:00
nimlgen eb9689336e
nv mockgpu (#4600)
* mockgpu nv

* works

* comment that out

* fix merge

* setup gpuocelot

* install packages

* not run all of them

* passes

* fix ci

* almost

* should pass

* linter

* linter 2

* try this?

* ugn, not supported

* ci

* remove ticket from description

* better descs
2024-05-15 23:46:08 +03:00
chenyu c71627fee6
move GlobalCounter to helpers (#4002)
break circular import between ops and buffer
2024-03-30 00:30:30 -04:00
George Hotz 150ea2eb76
create engine folder and move code (#3948)
* retry

* older tf

* that
2024-03-26 20:38:03 -07:00
George Hotz 41efaa848c
move graph.py and jit.py into features (#3376)
* move graph.py into features

* move jit into features

* fix quickstart
2024-02-12 17:34:34 +01:00
George Hotz c81ce9643d
move globalcounters to ops (#2960)
* move globalcounters to ops

* missed a few

* sick of that failing
2024-01-01 14:21:02 -08:00
chenyu 73cadfbb3c
Remove pytest markers (#2831)
* remove pytest marker

* fix some, skip some

* tweak

* fix

* skip slow

* skip more
2023-12-18 18:53:28 -05:00
George Hotz 6d6eb9302d
ruff checks the max line length is 150 (#2734)
* ruff checks the max line length is 150

* fix tensor.py

* a lot more

* done
2023-12-12 17:34:47 -08:00
Christopher Mauri Milan 7f01dd04f0
Apply ruff linting rules to tests (#2473)
* everything except F821

* enable F821 with noqa

* dumb fix

* fix remaining imports and (former) lambdas

* replace _ with noqa to avoid gc
2023-11-27 21:24:06 -08:00
George Hotz 9e07824542
move device to device.py (#2466)
* move device to device.py

* pylint test --disable R,C,W,E --enable E0611

* fix tests
2023-11-27 11:34:37 -08:00
George Hotz 2f7aab3d13
move optimize_local_size (#2221)
* move optimize_local_size

* interpret_ast
2023-11-05 21:00:52 -08:00
chenyu f582ec56d5
Replace (getenv("CI", "") != "") with helpers.CI (#2213) 2023-11-03 15:20:44 -07:00
Francis Lam f445e056ed
wmma: add test and tensor core shape (#1925) 2023-09-28 18:04:28 -07:00
Francis Lam 651205fa5c
linearizer: support local and group_for_reduce dimensions together (#1821)
also minor changes to test_speed_v_torch.py and size of UOps.SPECIAL
2023-09-08 12:39:27 -07:00
Adrian Kretz 3473c9e88d
Metal conv tensor cores (#1696)
* Benchmark 5x5 conv kernel which is optimized

* Use Metal tensor cores in 2d convs
2023-09-04 15:14:46 -07:00
George Hotz a6d842af7a
move device to ops (#1646)
* move device to ops

* mlops types

* 2 lines
2023-08-23 08:30:17 -07:00
George Hotz d25046e66a
matvec tests (#1634)
* matvec tests

* f16

* f16 is broken
2023-08-22 17:33:58 -07:00
Yixiang Gao 4f02491cd4
add cpu if torch tensor (#1609) 2023-08-21 16:57:59 -07:00
Yixiang Gao 8d6662a741
.cpu().numpy() -> .numpy() (#1594)
* .cpu().numpy() -> .numpy()

* restore ops_torch

* restore test_speed_v_torch
2023-08-21 09:53:29 -07:00
Diogo d7d1011f1e
Add WEBGPU tests to CI (#1463)
* webgpu tests

* assert device is webgpu

* missed env set

* exclude failing ci tests

* ignore test file

* changed acc for adam test
2023-08-06 10:32:01 -07:00
George Hotz 486a9dbfd9
speed v torch (#1464)
* speed v torch

* always print

* change print

* torch speed tee

* all exposed
2023-08-06 09:32:33 -07:00
Diogo ba5e3818a0
Limit dims based on max size (#1390)
* working

* whitespace

* changed defaults to None

* linter

* last linter error
2023-07-31 19:18:19 -07:00
Umut Zengin d4ebadf2da
Small Tensor.cat optimization and reformating (#1347) 2023-07-26 18:01:12 -04:00
cheeetoo a0965ee198
CI < 5 minutes (#1252)
* models matrix

* fix typo and install gpu deps

* install llvm deps if needed

* fix

* testops with cuda

* remove pip cache since not work

* cuda env

* install cuda deps

* maybe it will work now

* i can't read

* all tests in matrix

* trim down more

* opencl stuff in matrix

* opencl pip cache

* test split

* change cuda test exclusion

* test

* fix cuda maybe

* add models

* add more n=auto

* third thing

* fix bug

* cache pip more

* change name

* update tests

* try again cause why not

* balance

* try again...

* try apt cache for cuda

* try on gpu:

* try cuda again

* update packages step

* replace libz-dev with zlib1g-dev

* only cache cuda

* why error

* fix gpuocelot bug

* apt cache err

* apt cache to slow?

* opt and image in single runner

* add a couple n=autos

* remove test matrix

* try cuda apt cache again

* libz-dev -> zlib1g-dev

* remove -s since not supported by xdist

* the cache takes too long and doesn't work

* combine webgpu and metal tests

* combine imagenet to c and cpu tests

* torch tests with linters

* torch back by itself

* small windows clang test with torch tests

* fix a goofy windows bug

* im dumb

* bro

* clang with linters

* fix pylint error

* linter not work on windows

* try with clang again

* clang and imagenet?

* install deps

* fix

* fix quote

* clang by itself (windows too slow)

* env vars for imagenet

* cache pip for metal and webgpu tests

* try torch with metal and webgpu

* doesn't work, too long

* remove -v

* try -n=logical

* don't use logical

* revert accidental thing

* remove some prints unless CI

* fix print unless CI

* ignore speed tests for slow tests

* clang windows in matrix (ubuntu being tested in imagenet->c test)

* try manual pip cache

* fix windows pip cache path

* all manual pip cache

* fix pip cache dir for macos

* print_ci function in helpers

* CI as variable, no print_ci

* missed one

* cuda tests with docker image

* remove setup-python action for cuda

* python->python3?

* remove -s -v

* try fix pip cache

* maybe fix

* try to fix pip cache

* is this the path?

* maybe cache pip

* try again

* create wheels dir

* ?

* cuda pip deps in dockerfile

* disable pip cache for clang

* image from ghcr instead of docker hub

* why is clang like this

* fast deps

* try use different caches

* remove the fast thing

* try with lighter image

* remove setup python for cuda

* small docker and cuda fast deps

* ignore a few more tests

* cool docker thing (maybe)

* oops

* quotes

* fix docker command

* fix bug

* ignore train efficientnet test

* remove dockerfile (docker stuff takes too long)

* remove docker stuff and normal cuda

* oops

* ignore the tests for cuda

* does this work

* ignore test_train on slow backends

* add space

* llvm ignore same tests as cuda

* nvm

* ignore lr scheduler tests

* get some stats

* fix ignore bug

* remove extra '

* remove and

* ignore test for llvm

* change ignored tests and durationon all backends

* fix

* and -> or

* ignore some more cuda tests

* finally?

* does this fix it

* remove durations=0

* add some more tests to llvm

* make last pytest more readable

* fix

* don't train efficientnet on cpu

* try w/out pip cache

* pip cache seems to be generally better

* pytest file markers

* try apt fast for cuda

* use quick install for apt-fast

* apt-fast not worth

* apt-get to apt

* fix typo

* suppress warnings

* register markers

* disable debug on fuzz tests

* change marker names

* apt update and apt install in one command

* update marker names in test.yml

* webgpu pytest marker
2023-07-23 13:00:56 -07:00
George Hotz 9dffc9ba23
Use nevergrad to optimize kernels (try 2) (#1301)
* nevergrad try 2

* touchups

* no ones

* opt fixup

* cleanups

* touchup

* make new optimizer file
2023-07-20 16:46:45 -07:00
George Hotz 17830e25da
real world tests (#1297)
* real world test

* touchup

* sync device
2023-07-20 10:50:22 -07:00
George Hotz d6637623e3 torch test touchup 2023-07-19 09:37:23 -07:00
Umut Zengin f8c539989e
Re-open create cumsum speed test (#1255)
* Reduced tensor size in testing

* Update formatting test_speed_v_torch.py
2023-07-17 18:59:36 -07:00
Diogo a9a1df785f
Webgpu support (#1077)
* initial commit

* 81 passing

* 105 passing tests

* 148 passing

* CI tests

* install dep on ci

* try opencl pkgs

* try using vulkan

* down to only 6 failing

* refactor

* cleaning up

* another test skipped due to buffer limit

* linter

* segfault

* indent fix

* another segfault found

* small touchups

* Fix max and maxpool tests

* Add constant folding

* Add javascript export script

* better asserts in codegen

* manual upcasting

* reverted token type change

* skip safetensor test due to unsupported type

* FIx efficientnet and all other model tests

* Remove np copy

* fixed indent and missing import

* manually destroy the buffer

* revert back to length

* linter errors

* removed extra val

* skip broken tests

* skipping more tests

* Make the page pretty

* Save model weights as safetensor

* Fix imagenet to c test

* Fix second imagenet to c bug

* Async and paralel kernel compilation

* workgroup support

* reversed local size

* fixed non local bug

* correct local groups

* ci experiment

* removed typo

* Fix define local by using shared memory

* Refactor

* try running on mac

* match metal tests

* add more workers

* scope down tests

* trying windows runner

* fixed windows env

* see how many it can do

* merged master

* refactor

* missed refactor

* increase test suite coverage

* missing import

* whitespace in test_efficientnet.py

* getting there

* fixed reset

* fixed bufs

* switched to cstyle

* cleanup

* min/max rename

* one more linter issue

* fixed demo

* linter

* testing ci chrome

* add unsafe webgpu arg

* add build step

* remove WEBGPU from cmd line

* use module

* try forcing directx

* trying forced metal backend

* temp disable conv2d for CI

* disable conv_trasnpose2d

---------

Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-07-12 12:52:06 -07:00
George Hotz 67e34b356a
good stuff from tensor cores branch (#1199) 2023-07-08 16:58:26 -07:00
George Hotz a98e361da0 torch speed test, add add 2023-06-26 18:55:27 -07:00
George Hotz 2977fb17f6
various touchups (#1058)
* op isn't optional

* barrier + named local buffers

* end global and local loop together to avoid useless if statement

* better comments
2023-06-26 15:41:23 -07:00
George Hotz 18892242b0
global -> group (#1007)
* global -> group

* allow None for local_size in custom function

* lil local

* comment on shape

* fix cuda

* smart local cast

* better local heuristic

* fix ptx, and work_dim cleanup

* fix metal

* fix ops test

* fix openpilot jit

* no more optlocal

* might fix metal tests

* try metal now

* see generated metal code

* test free removal. REVERT THIS

* mergable
2023-06-21 11:50:43 -07:00
George Hotz 5428b5d774
good changes from tensor_cores branch (#1005)
* good changes from tensor_cores branch

* touchups

* real_strides fixup

* refactor merge_views
2023-06-18 20:28:06 -07:00
George Hotz 20894991ed
good changes from the M1 Tensor Core project (#730)
* good changes

* working except llvm

* llvm types

* nice acc

* archprobe

* lang.float4

* use self.acc for late acc

* fix store bug
2023-03-29 05:11:02 +04:00
George Hotz 23f88fb026 synchronize for honest speed compare 2023-03-24 10:24:27 -07:00
George Hotz 036737a12a mem_estimate tracks bytes, not items 2023-03-10 09:44:12 -08:00
George Hotz 28a6ada4ce line reduction in metal 2023-03-03 23:14:40 -08:00
George Hotz 8919ca8163 test cleanups 2023-03-03 06:36:06 -08:00
George Hotz fca055bd66 NOOP means contiguous 2023-03-01 21:54:51 -08:00
George Hotz d062cc82b8 put restrict back 2023-03-01 21:34:45 -08:00
George Hotz b442e75c7a test speed v torch 2023-03-01 19:50:12 -08:00
George Hotz bfcec234a2
Refactor ASTs (#622)
* ugh worst branch name

* compiler refactor continues

* scc -> cloc

* buf -> _buf

* finish _buf, and program -> runtime

* gpu is still working, clang isn't

* clang in new style

* ops_metal

* something broke it

* improve metal

* clean up tons of cl crap

* hack fix sync

* cleaner gpu

* gpu metal clang

* cleanups

* minor refactor

* GPUCodegen

* fix up LLVM

* blind CUDA refactor

* codegen / runtime

* keep ops naming

* linter passes

* woah, llvm was allocing 4x what it needed to

* bugfixes

* fix openpilot compiler

* fix compile_efficientnet

* method cache should fix tests

* deal with duped functions
2023-03-01 18:57:29 -08:00
George Hotz 1d01842232 remove fake test 2023-02-25 10:21:07 -08:00
voidz 94bec40110
moved extras/jit.py -> tinygrad/jit.py (#599)
* moved extras/jit.py to tinygrad/jit.py

* fixed indent

* removed tinygrad.helpers.DEBUG from jit.py
2023-02-25 08:32:33 -08:00
George Hotz 628ce067a1 add tests to mypy 2023-02-22 07:07:38 -08:00
George Hotz 104c3c5e73 oops, forgot that debug 2023-02-22 06:58:27 -08:00
George Hotz fae7654924 fix sync issue 2023-02-17 12:42:45 -08:00