Commit Graph

260 Commits

Author SHA1 Message Date
George Hotz f5f8b09c13
allow manual release (#1704) 2023-08-28 17:54:25 -07:00
George Hotz 715047a1e4
fix release publish (#1703) 2023-08-28 17:48:00 -07:00
chenyu b5d700adae
update openpilot supercombo.onnx to 0.9.4 (#1681)
* update openpilot supercombo.onnx to 0.9.4

* update tests for the new model

* comment out comma models from external_model_benchmark
2023-08-26 19:16:08 -04:00
Roelof van Dijk 89b529c07f
[ready] ci: add py38 to linters (#1674)
* ci: add py38 to linters

* fix: run linters only on py38

---------

Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-26 09:34:15 -04:00
George Hotz a6d842af7a
move device to ops (#1646)
* move device to ops

* mlops types

* 2 lines
2023-08-23 08:30:17 -07:00
Roelof van Dijk 1900acda09
[READY] ci: setup venv cache (#1475)
* ci: cache installed packages

* ci: trigger jobs

* ci: fix hashfiles argument

---------

Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-20 18:43:16 -07:00
George Hotz 012ee7d162
not worth the speed (#1584)
* not worth the speed

* no slots

* uops comments

* bump to python 3.11 for speed

* add critical slots back
2023-08-20 10:24:58 -07:00
George Hotz ad7d26c393
fix __launch_bounds__ and benchmark TC MATMUL (#1575)
* fix

* benchmark matmul
2023-08-19 10:54:39 -07:00
chenyu ae39cf84ab
Symbolic Shape JIT main PR (#1353)
* Symbolic Shape JIT

update tests

2 variables symbolic ops, adding more tests

test passing

cleanup

* more test cases

* single flag

* review update

* jit attention one piece

* realize

* symbolic_jit test for cuda

* old artifact

* works with cuda gpu but failed ci

* CUDACPU
2023-08-18 14:39:55 -07:00
Roelof van Dijk 84e6693915
fix: apt-get to apt, no recommends, clean up (#1571)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-18 13:48:59 -07:00
Ethan Sorrell cb62911f6b
PTX Reintegration and Passing Tests (#1512)
* move assembly, assembly_ptx

* successful but broken rendering of ptx asm

* clear ins before render asm

* slightly less broken :')

* we needed thread syncs

* fix float16 loading, rounding modifiers and other casting stuff, passing casts_from_half

* Fix runtime_args for gpuocelot

* our casts were flipped on both ends

* more casting

* add ternary where op

* dealing with storing/loading bool

* add test for casting to bool from negative

* Fix args.valid on ConstOp

* add to CI, TODO: fix runtime_args for test_uops

* fix placement of runtime_args to work with lazy.Device

* undo ci changes so I can push

* fix lints

* start cleanup and fix things we broke fixing lints

* add checks for PTX specifc asm instructions

* revert added test -- doesn't pass on llvm

* skip tests for underflow,overflow

* another fix for how we're setting runtime args

* Less broken cleanup

* add to CI

* add more env variables for ci test

* fix ci to install pycuda for ptx

* ci: copy cuda test command

* cleanup

* assert to make sure we're actually running ptx in ci

* remove test assert

* move is_ptx arg

* move assembly, assembly_ptx back to extras

* fix imports

* initial merge fixes

* clear registers, fix UOps.LOAD with invalid value

* draft merge fixes

* remove prints

* quick lint and merge fixes

* cleanup

* remove PTXProgram wrapper

* final cleanup

* temp change for ci rerun

* ci rerun

* rollback ISA version
2023-08-16 16:20:20 -07:00
chenyu 11dd9b1741
symbolic codegen and exec (#1552)
* symbolic codegen and exec

* fix and add test

* no sketchy

* merge_dicts type

* dtypes._arg_int32
2023-08-16 14:43:41 -07:00
wozeparrot 074c467020
hotfix for broken ci (#1559) 2023-08-16 13:52:03 -04:00
Steven Anderson 93a36c3659
Arm (#1421)
* testing new memops

* better debugging

* testing padded conv

* branching with load

* refactoring a bit

* first try

* fixing bugs

* fixing some

* eq

* eq2

* do not use x's

* working

* fixing imm

* getting things working

* refactor

* pow not working

* working except one

* refactor: one store mem

* refactor: global load

* refactor: imm

* refactor: cleaning

* fixing big offsets

* refactor with ci

* try ci

* typo

* another typo

* ubuntu default

* forgot git

* do i need git?

* missing packages

* adding python-dev

* with cache?

* buildx action

* buildx name issue?

* maybe now?

* python3

* newline warning

* maybe now

* i actually need this

* ci should work now

* improved caching

* fixing cache

* maybe now it will cache

* this

* testing cache

* trying again

* load

* missing platform

* caching gha

* testing cache

* full testing

* typo

* now?

* why

* adding checkout back

* bad formatting

* fixing convention issues

* supporting python

* adding CI flag

* testing all

* better comments

* adding debugging

* takes 12x longer

* does it output progress now?

* ignore models for speed

* fixing merge

* excluding conv_transpose2d

* only 2 test cuz is to slow

* another approach

* let's see

* faster duh

* my bad

* T_T

* typo

* sup

* with output?

* comment test

* comment test

* comment test

* :?

* no comment

* with cache

* back to normal

* testing that ci works

* back to passing

* trying again

* does it create another entry

* does it create another entry?

* build local

* hey

* Revert "excluding conv_transpose2d"

This reverts commit cc7348de03033e032f47d69caff174e2f1a7bfea.

* does it cache if done before?

* does it cache?

* done

* adding test ops

* bad formatting

* no need for this

* working static mem

* sum 1d

* add ndim

* better reg import

* fix stack

* back to np

* working except for softmax

* 5 failing

* no pogress

* remove keystone

* remove keystone

* testops passing

* cleanups

* more cleanup

* typo

* ci

* ci2

* cond import

* ci3

* ci4

* ci4

* ci5

* ci5

* ci6

* aligment

* test all

* correct test

* err read_unmapped

* passing test

* ignore for speed

* ignore for speed

* ci7

* cleanup

* remove docker

* fixing merge

* fixing bugs

* add skipload for const ops

* comments

* First merge to master: Renderer

* fix emulation

* passing all tests arm64

* cleaning

* fix handcoded binary

* cleaning

* fix errs

* fix runtime arg binary

* clean git diff

* fix and clean

* fixing metal test

* cleaning

* fix metal test

* ci ~8 min

* fix pylint and clang

* cache the files in ops_clang

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-08-14 19:29:30 -07:00
wozeparrot 29d5801387
distributed collectives (#1519)
* feat: world

* feat: tests

* feat: no more backwards

* feat: recv into

* feat: whoops

* feat: test in ci

* feat: some debug logging

* feat: workflow naming

* feat: need to set pythonpath

* feat: just send to same device

* feat: allreduce

* feat: test

* feat: need contiguous

* feat: test in ci

* feat: exit with correct code

* feat: don't need that

* feat: opencl wait_for just doesn't work

* feat: synchronize on out

* feat: try?

* feat: try again?

* feat: add extra realizes

* feat: print

* feat: seed

* feat: tol

* feat: test ones and zeros

* feat: remove print

* feat: are you just flaky

* feat: seperate scatter and gather?

* feat: just try synchronizing

* feat: remove print again

* feat: bring back difference

* feat: no sync

* feat: revert that

* feat: back to wait_for

* fix: typo
2023-08-11 10:22:07 -07:00
wozeparrot 7e7c9001e9
distributed world (#1481)
* feat: world

* feat: tests

* feat: no more backwards

* feat: recv into

* feat: whoops

* feat: test in ci

* feat: some debug logging

* feat: workflow naming

* feat: need to set pythonpath

* feat: just send to same device
2023-08-10 10:00:51 -07:00
George Hotz e3c6c0c6db
add GPT2 example (#1511) (#1514)
* add gpt2 to examples

* some cleanup

* fixes

* argparse + scaled_dot_product_attention

* add timing

* add to benchmark

Co-authored-by: YassineYousfi <yassine.y10@gmail.com>
2023-08-10 09:09:47 -07:00
wozeparrot 351684395c
dont run on fork (#1510) 2023-08-09 13:06:45 -04:00
wozeparrot 88e2e0c8a3
Revert "don't try to run benchmark on forks" (#1508) 2023-08-09 12:59:49 -04:00
wozeparrot 65b65b760b
don't try to run benchmark on forks (#1507) 2023-08-09 12:59:19 -04:00
Roelof van Dijk aa83a9e910
ci: fix gpuocelot build cache (#1474)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-08 14:00:04 -07:00
Roelof van Dijk e2cf0f322e
[READY] ci: missing n=auto (#1486)
* ci: missing n=auto

* fix: add to commented test

---------

Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-08 07:37:24 -07:00
George Hotz 5fdd248617
don't download cifar (#1472) 2023-08-06 21:38:59 -07:00
George Hotz d78fb8f4ed
add stable diffusion and llama (#1471)
* add stable diffusion and llama

* pretty in CI

* was CI not true

* that

* CI=true, wtf

* pythonpath

* debug=1

* oops, wrong place

* uops test broken for wgpu

* wgpu tests flaky
2023-08-06 21:31:51 -07:00
Diogo d7d1011f1e
Add WEBGPU tests to CI (#1463)
* webgpu tests

* assert device is webgpu

* missed env set

* exclude failing ci tests

* ignore test file

* changed acc for adam test
2023-08-06 10:32:01 -07:00
George Hotz 486a9dbfd9
speed v torch (#1464)
* speed v torch

* always print

* change print

* torch speed tee

* all exposed
2023-08-06 09:32:33 -07:00
George Hotz 2ab282bfec
run on update_benchmark too (#1460)
* run on update_benchmark too

* amd inference test

* name it better

* add 10 CIFAR training steps
2023-08-06 08:58:37 -07:00
George Hotz 943b227cb1 only on push to master 2023-08-06 00:10:07 -07:00
George Hotz 2274e3e757
Fix benchmark (#1454)
* do benchmarking

* system

* artifact

* go

* name artifact

* only on push
2023-08-05 23:44:36 -07:00
George Hotz bf21aec81f
do benchmarking (#1451)
* do benchmarking

* system

* artifact

* go

* name artifact
2023-08-05 23:35:01 -07:00
George Hotz 67781fcf5d fix fail fast in CI 2023-08-05 10:24:24 -07:00
wozeparrot ab9e4a2e93
Make cuda CI a bit more consistent (#1403)
* feat: use fast-apt-mirror

* feat: use in more places
2023-08-02 07:38:22 -07:00
Diogo 4dc8595069
simple exporting models (#1344)
* unified exporting

* json exporting

* ignore more

* simplified buffer export

* added dtypes

* added assert

* swift example

* fix tests

* linter

* remove whitespace

* fixed tests

* remove swift example

* remove unintended changes

* allow callable models to be used

* whitespace

* more readable json export

* name change

* whitespace

* whitespace
2023-08-01 09:35:48 -07:00
George Hotz f27df835a6
delete dead stuff (#1382)
* delete bpe from repo

* remove yolo examples

* Revert "remove yolo examples"

This reverts commit cd1f49d4662a5565726ae1fa7bf3f6a3e3985965.

* no windows
2023-07-31 11:17:49 -07:00
George Hotz 37fa7e96fb
Revert "update editorconfig, enforce via CI (#1343)" (#1380)
This reverts commit da2efecbe2.
2023-07-31 10:35:50 -07:00
Pavol Rusnak da2efecbe2
update editorconfig, enforce via CI (#1343)
* update editorconfig to set unix-style newlines and trim whitespace

* add editorconfig github action to the CI

* fix whitespace
2023-07-30 18:44:30 -07:00
chenyu ab80ea0d38
use ubuntu for clang ci test (#1368) 2023-07-28 20:51:25 -04:00
waifairer d89fb729e5
flake8 (#1323)
* flake8: Ignore frequent violations, correct infrequent ones

* Ignore some rules in test

* Reorder test ignores

* Lint test + main

* EOF indent

* Include all E71,E72 errors

* Test the failing case in CI

* Revert "Test the failing case in CI"

This reverts commit 110add0a70f5a619d07631269104e84f908af6b9.

* Push to test!
This reverts commit f317532779a0e1ac8401e2474fd5c6c8695c08e9.

* ok back to passing
This reverts commit ba5052685f93f83e06152cdc696b9e26131d8ab7.

* Prove that CI fails when formatting is incorrect.

* Fix formatting

* Remove duplicitous E117 rule

* Use flake8 config for precommit

---------

Co-authored-by: waifairer <waifairer@gmail.com>
2023-07-24 11:19:58 -04:00
cheeetoo a0965ee198
CI < 5 minutes (#1252)
* models matrix

* fix typo and install gpu deps

* install llvm deps if needed

* fix

* testops with cuda

* remove pip cache since not work

* cuda env

* install cuda deps

* maybe it will work now

* i can't read

* all tests in matrix

* trim down more

* opencl stuff in matrix

* opencl pip cache

* test split

* change cuda test exclusion

* test

* fix cuda maybe

* add models

* add more n=auto

* third thing

* fix bug

* cache pip more

* change name

* update tests

* try again cause why not

* balance

* try again...

* try apt cache for cuda

* try on gpu:

* try cuda again

* update packages step

* replace libz-dev with zlib1g-dev

* only cache cuda

* why error

* fix gpuocelot bug

* apt cache err

* apt cache to slow?

* opt and image in single runner

* add a couple n=autos

* remove test matrix

* try cuda apt cache again

* libz-dev -> zlib1g-dev

* remove -s since not supported by xdist

* the cache takes too long and doesn't work

* combine webgpu and metal tests

* combine imagenet to c and cpu tests

* torch tests with linters

* torch back by itself

* small windows clang test with torch tests

* fix a goofy windows bug

* im dumb

* bro

* clang with linters

* fix pylint error

* linter not work on windows

* try with clang again

* clang and imagenet?

* install deps

* fix

* fix quote

* clang by itself (windows too slow)

* env vars for imagenet

* cache pip for metal and webgpu tests

* try torch with metal and webgpu

* doesn't work, too long

* remove -v

* try -n=logical

* don't use logical

* revert accidental thing

* remove some prints unless CI

* fix print unless CI

* ignore speed tests for slow tests

* clang windows in matrix (ubuntu being tested in imagenet->c test)

* try manual pip cache

* fix windows pip cache path

* all manual pip cache

* fix pip cache dir for macos

* print_ci function in helpers

* CI as variable, no print_ci

* missed one

* cuda tests with docker image

* remove setup-python action for cuda

* python->python3?

* remove -s -v

* try fix pip cache

* maybe fix

* try to fix pip cache

* is this the path?

* maybe cache pip

* try again

* create wheels dir

* ?

* cuda pip deps in dockerfile

* disable pip cache for clang

* image from ghcr instead of docker hub

* why is clang like this

* fast deps

* try use different caches

* remove the fast thing

* try with lighter image

* remove setup python for cuda

* small docker and cuda fast deps

* ignore a few more tests

* cool docker thing (maybe)

* oops

* quotes

* fix docker command

* fix bug

* ignore train efficientnet test

* remove dockerfile (docker stuff takes too long)

* remove docker stuff and normal cuda

* oops

* ignore the tests for cuda

* does this work

* ignore test_train on slow backends

* add space

* llvm ignore same tests as cuda

* nvm

* ignore lr scheduler tests

* get some stats

* fix ignore bug

* remove extra '

* remove and

* ignore test for llvm

* change ignored tests and durationon all backends

* fix

* and -> or

* ignore some more cuda tests

* finally?

* does this fix it

* remove durations=0

* add some more tests to llvm

* make last pytest more readable

* fix

* don't train efficientnet on cpu

* try w/out pip cache

* pip cache seems to be generally better

* pytest file markers

* try apt fast for cuda

* use quick install for apt-fast

* apt-fast not worth

* apt-get to apt

* fix typo

* suppress warnings

* register markers

* disable debug on fuzz tests

* change marker names

* apt update and apt install in one command

* update marker names in test.yml

* webgpu pytest marker
2023-07-23 13:00:56 -07:00
Jacob Pradels b112edd2c3
Add pylint trailing whitespace rule (#1314) 2023-07-21 13:37:55 -04:00
chenyu a5f5330d91
Add Fuzz Test symbolic / shapetracker to CI. (#1278)
* Fuzz test symbolic and shapetracker

This reverts commit d5773ddebff54c1ff608838076f0b4ff126b8aa8.

* mess again

* no tail

* test shapetracker too

* Revert mess and enable all tests

* removed leftover
2023-07-19 09:05:45 -07:00
chenyu c96bf395df
Enable JIT tests for supported devices, skip METAL and WEBGPU (#1265)
* Enable JIT test

* really test metal

* Skip some device
2023-07-18 11:40:37 -07:00
Diogo a9a1df785f
Webgpu support (#1077)
* initial commit

* 81 passing

* 105 passing tests

* 148 passing

* CI tests

* install dep on ci

* try opencl pkgs

* try using vulkan

* down to only 6 failing

* refactor

* cleaning up

* another test skipped due to buffer limit

* linter

* segfault

* indent fix

* another segfault found

* small touchups

* Fix max and maxpool tests

* Add constant folding

* Add javascript export script

* better asserts in codegen

* manual upcasting

* reverted token type change

* skip safetensor test due to unsupported type

* FIx efficientnet and all other model tests

* Remove np copy

* fixed indent and missing import

* manually destroy the buffer

* revert back to length

* linter errors

* removed extra val

* skip broken tests

* skipping more tests

* Make the page pretty

* Save model weights as safetensor

* Fix imagenet to c test

* Fix second imagenet to c bug

* Async and paralel kernel compilation

* workgroup support

* reversed local size

* fixed non local bug

* correct local groups

* ci experiment

* removed typo

* Fix define local by using shared memory

* Refactor

* try running on mac

* match metal tests

* add more workers

* scope down tests

* trying windows runner

* fixed windows env

* see how many it can do

* merged master

* refactor

* missed refactor

* increase test suite coverage

* missing import

* whitespace in test_efficientnet.py

* getting there

* fixed reset

* fixed bufs

* switched to cstyle

* cleanup

* min/max rename

* one more linter issue

* fixed demo

* linter

* testing ci chrome

* add unsafe webgpu arg

* add build step

* remove WEBGPU from cmd line

* use module

* try forcing directx

* trying forced metal backend

* temp disable conv2d for CI

* disable conv_trasnpose2d

---------

Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-07-12 12:52:06 -07:00
Roelof van Dijk d0e21a7398
ci: don't install recommended packages for GPU (#1215)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-07-11 15:38:49 -07:00
George Hotz beb4d3ab01
Tensor Cores 2: Local Buffers Edition (#1057)
* local buffers

* work

* works

* invert_strides

* work

* non tc

* fix shapetracker bug

* stride priority

* touchups

* gate tensor cores

* tensor core conv

* cleanups

* bug fixes

* fix metal_matmul

* fast tensor cores

* more speed

* buffer selection bug fix

* fix CI maybe

* ugh, CI is set to true, not 1

* tc allowed

* add_gl_dimension

* split out padding conv tests

* does padding add fail

* test_padded_conv2d_1x1

* skip metal ci stuff

* more strict on yellow

* float2

* strip parens

* fix float2

* touch up

* dtype

* strip parens

* no alias

* bugfix

* cast float2 and test tensor core ops

* oops, don't hardcode 4
2023-07-09 09:06:00 -07:00
George Hotz 7151382364
Refactor load/store before tensor cores (#1193)
* minor cleanups

* render_const

* now that's a nice refactor

* clean up vload/vstore

* clean up render_load

* debugs there

* dumb

* err, this?

* const float4

* what's failing

* bugfix

* statement includes semicolon

* bugfix
2023-07-08 15:54:58 -07:00
George Hotz d9c1d81e99
Revert "feat: cancel previous workflow runs on new commits (#1184)" (#1194)
This reverts commit d66a0c285d.
2023-07-08 11:26:13 -07:00
George Hotz 52600d532e add 20 minute timeout 2023-07-07 23:02:28 -07:00
wozeparrot d66a0c285d
feat: cancel previous workflow runs on new commits (#1184) 2023-07-07 22:55:35 -07:00
foreign-sub 574cbda979
Quickstart (#1015)
* fix quickstart md

* add quickstart to ci
2023-06-29 13:26:58 -07:00
George Hotz d16c16ec28
new upcast works (#1066)
* new upcast works

* float4 try

* fix unaligned float4

* disallow unaligned access

* upcast dim

* maybe good now

* fix gpu half

* vstore_half4

* fix deep image bugs

* improve symbolic to fix issues

* fix symbolic

* cl test

* this maybe

* gcd of 1 is 1

* real fix for old python

* improve fuzzer
2023-06-27 19:34:53 -07:00
George Hotz 70c07dfea5
5k line max (#1064) 2023-06-27 10:53:18 -07:00
George Hotz 0f281e7b18 touchups 2023-06-25 15:24:26 -07:00
George Hotz c8fbdeb48e
test speed llama (#1046)
* test speed llama

* oops, put it back

* uses the real device codegen

* just do it on the mac

* pp

* is faster?

* Revert "is faster?"

This reverts commit 42db542010906dd62376c0e419416978d03d3d62.

* disable docker again for less load on CI
2023-06-25 15:22:56 -07:00
Jacky Lee 5d16cc283f
Docker fix (#1039)
* Docker test

* Remove extra installs

* Don't run full test

* No need for testing dependencies
2023-06-25 10:38:58 -07:00
cloud11665 264b1e5f48
cache gpuocelot build in cuda CI (#1032) 2023-06-22 17:42:12 -07:00
cloud11665 2407690d82
add cuda on cpu tests (#1020) 2023-06-22 14:15:50 -07:00
George Hotz 18892242b0
global -> group (#1007)
* global -> group

* allow None for local_size in custom function

* lil local

* comment on shape

* fix cuda

* smart local cast

* better local heuristic

* fix ptx, and work_dim cleanup

* fix metal

* fix ops test

* fix openpilot jit

* no more optlocal

* might fix metal tests

* try metal now

* see generated metal code

* test free removal. REVERT THIS

* mergable
2023-06-21 11:50:43 -07:00
Diogo 57d3aa76a5
Windows & Ubuntu CLANG CI support (#1011)
* matrix strategy

* push env to GITHUB_ENV

* use printf instead of echo

* use temp helper function for cross os paths

* use path join

* switched to using temp helper function

* skip test on windows due to memory limit

* small fix

* removed semi

* touchups

* clean up

* seperate tests

* test changes to test_utils on windows

* small refactor

* more cleanups

* undo helpers change

* only skip if in CI and WINDOWS
2023-06-19 09:33:24 -07:00
George Hotz 0d4c4f4e9e
metal ci attempt (#1010)
* metal ci attempt

* skip failing ops tests

* skip in the ops test

* no dtype test
2023-06-19 09:23:55 -07:00
Diogo 6b1280f01c
fixes to Onnx ops LayerNormalization/Prelu and added OptionalHasElement/OptionalGetElement (#956)
* prelu and where casting

* typing for safe_numpy

* optional

* get rid of tracing in ci

* cleanup and resolved layernorm issues

* removed debug print
2023-06-08 16:09:19 -07:00
kposborne2 00360da05b
Update broken `docs/abstractions.py` for changed ops, and add to CI (#930)
* fix and add to ci

* still have those

* ocd

* update other doc
2023-06-04 19:21:20 -07:00
George Hotz a3feee29c5
make tests faster + add onnx (#815)
* search one dir, disable slow

* onnx tests

* fast rnnt test
2023-05-27 08:53:32 -07:00
George Hotz faf80418b7
pyopencl by default since GPU is default (#802) 2023-05-25 17:48:18 -07:00
George Hotz 03b38864db
fix batchnorm at training (#753)
* e2e testing

* min failure

* no affine on bn, still fails

* why did i think i could detach that?

* allow more kernels for bn

* some test issue i don't understand
2023-04-19 08:01:04 -07:00
George Hotz dbc99c243b why did that test break? 2023-04-18 17:08:38 -07:00
George Hotz b12b60af20
fix binop, other tests failure (#723)
* fix binop, other tests failure

* that was a bad idea

* better layernorm

* inference kernel count tests

* new style reshape pushing

* fixup replacement

* 199 kernels is okay. fix flops

* push reshape through unaryops only

* GRAPH=2 draws the phantom ops

* found resnet issue

* non working test

* mul is cheaper than div

* OPT inflation

* SHUFFLE_PAD_OPS in OPT=2
2023-03-22 18:15:07 -07:00
George Hotz f5467cfedc
Devicebufferless (#708)
* runs one metal kernel

* conv2d works

* ops tests are passing

* const folding

* all ops work

* pre commit always passes

* torch works

* working still

* fix graph test

* tests passing

* image almost works

* image conv works

* most images

* fix custom

* fix assignment

* fix compile enet

* clean up comments

* fix realize return value

* include shapetracker in LB repr

* copy should make a copy

* reenable method cache

* fix lna

* dtypes in graph

* forward only for IMAGE=2

* simple realize

* getting close

* fixup new api, it's good except the kernel count

* back to 197 kernels

* tests should pass

* go to a real float

* no type_on_cpu

* fix the docs

* put shapetracker back in it's proper place
2023-03-18 14:40:23 -07:00
Cyril Roumégous 3f08613a2a
apply flake8 E203 rule (#684) 2023-03-11 11:35:16 -08:00
George Hotz 1826ff6b89
dtypes nice and clean (#673)
* add dtype class

* dtypes

* buffers are lazy

* dtype is tracked by lazybuffer and GenericShape

* fix types in llvm

* llvm store

* dtype tests

* fix tests maybe

* fix flop counter

* fix CI

* CI fix and check format

* fix dtype and dtype check

* fix custom test

* fix test graph
2023-03-10 16:56:07 -08:00
George Hotz 5dc227dba6 fix bug in ENABLE_METHOD_CACHE and enable for llvm 2023-03-06 07:43:40 -08:00
George Hotz 50012f679b move get_contraction to shapetracker 2023-03-06 06:42:57 -08:00
George Hotz 7a1d96fd76
No negative (#632)
* behavior is correct without VALIDHACKS

* simple div and mod

* fix tests

* no negative variables

* alt form is correct

* still correct

* bug in mulnode

* at least validhacks works now

* cleanups

* test validhacks, and to_image_idx

* cache compare key

* tests and __neg__
2023-03-03 16:48:14 -08:00
George Hotz 999b44c274 fix external test + speed 2023-03-03 06:46:16 -08:00
George Hotz 459488bba2
fix linter (#630)
* fix linter

* no imports okay

* explicit bases

* disable in pylintrc
2023-03-02 20:06:20 -08:00
George Hotz bfcec234a2
Refactor ASTs (#622)
* ugh worst branch name

* compiler refactor continues

* scc -> cloc

* buf -> _buf

* finish _buf, and program -> runtime

* gpu is still working, clang isn't

* clang in new style

* ops_metal

* something broke it

* improve metal

* clean up tons of cl crap

* hack fix sync

* cleaner gpu

* gpu metal clang

* cleanups

* minor refactor

* GPUCodegen

* fix up LLVM

* blind CUDA refactor

* codegen / runtime

* keep ops naming

* linter passes

* woah, llvm was allocing 4x what it needed to

* bugfixes

* fix openpilot compiler

* fix compile_efficientnet

* method cache should fix tests

* deal with duped functions
2023-03-01 18:57:29 -08:00
George Hotz 3c8da6bd03 add typing 2023-02-28 10:54:46 -08:00
George Hotz d584bae5c0 fine, openpilot can have 197 kernels 2023-02-27 11:48:36 -08:00
George Hotz c9252d38b2 mypy cache breaks if you sometimes check untyped defs, no checking tests for now 2023-02-27 09:57:33 -08:00
George Hotz e74779f19d typing fixup 2023-02-27 09:52:04 -08:00
George Hotz edc8fbfff2 woah, why isn't OPT=2 2023-02-27 08:03:31 -08:00
George Hotz f4ee7d2cad back to 196 kernels 2023-02-25 18:25:34 -08:00
George Hotz 6e98a172a0 fix broken contiguous 2023-02-25 17:41:49 -08:00
George Hotz a44e8e4385 discard children on mop shuffle, 200 -> 196 kernels 2023-02-25 10:51:07 -08:00
George Hotz 758515dcc0
conv2d is an hlop (#589)
* conv2d is an hlop

* shorter conv

* KOPT=-1

* alt imp

* MULACC

* smarter mulacc

* pop conv

* 7x7 -> 5x5

* didn't fix, that's not going to work

* this is faster and matches old behavior

* oh, non lazy just won't work with mulacc

* mulacc in torch

* bool types were creeping in

* optimizer is actually better with hlop conv

* fix pushing permutes issue

* refactor einsum_mulacc

* fix up readme

* update readme

* _image_conv2d

* fix bias addition location

* pushing permutes gets back to 200 kernels

* conv cleanup

* disable hlop conv

* don't hide that in helpers
2023-02-23 17:52:31 -08:00
George Hotz 628ce067a1 add tests to mypy 2023-02-22 07:07:38 -08:00
George Hotz 714bf4b108
clang backend (#572)
* start clang backend

* mostly working

* no group for reduce w clang

* it compiles

* compiles

* a11y

* minor fixups

* formatting

* add a test

* rename test
2023-02-20 18:18:18 -08:00
James Roberts 0d405fd5bc
Parallelize CI tests (#535) 2023-02-06 15:27:44 -06:00
George Hotz 90529d3750
tests are 20% faster (#529)
* pytorch CPU

* no cache, it's slower

* pytorch cpu for real

* remove double onnx
2023-02-06 09:56:14 -06:00
George Hotz 6eb0e6a650 shuffle deps: always tqdm, make linting category 2023-02-06 09:27:01 -06:00
George Hotz 1d80639646 make linter test install testing deps 2023-02-06 09:21:48 -06:00
George Hotz 60bb64811c merge mypy into linters, no useless package update 2023-02-06 09:14:00 -06:00
Martin Loretz 97f0a82be7
Cache pip packages in github actions (#522)
* Cache pip dependencies in github actions

* Add setup.py as cache-dependency-path

* Test caching

* Test caching

* Upgrade setup python action

* Test caching

* Remove setup.py from cache-dependency-path

* Don't remove cache-dependency-path

* Don't cache linter package's

* Test caching

* Test caching

* Test caching

* Upgrade actions/checkout to v3
2023-02-03 20:04:20 -08:00
George Hotz e313c8af20 update openpilot tests from OPENCL to GPU 2023-01-24 14:05:20 -08:00
George Hotz 49c6e6d472
Latest attempt to add image (#462)
* add image

* load + store + boring stuff:

* image tests pass

* thneed print GFLOPS

* op conv test

* more debugging

* hack for multiview image

* shapetracker creates less views

* disable image tests

* working better

* ugh, lkey not key

* print in DEBUG, and allow views

* works

* simple padding conv2d

* use index for image

* that was bad code

* debug print

* fix types

* less lines

* save lines
2023-01-12 17:36:30 -08:00
George Hotz 27211103ae docker: no -it 2023-01-09 12:49:59 -08:00
George Hotz d6e86a29a8 docker: forgot to checkout code 2023-01-09 12:48:03 -08:00
George Hotz 73ce9a771e that fix it 2023-01-09 12:46:33 -08:00
George Hotz bfd4f4e35c testdocker 2023-01-09 12:41:52 -08:00
George Hotz 5e07d4669d
the speedy chonker is going to replace the old chonker (#432)
* bringing back reshape and permute

* done with E701

* 4x4 works in generic way

* max and sum not vectorizing...

* special case single float

* support comparing to MPS

* improve matmul speed, consider generic principles

* GlobalCounter

* fix op tracking

* faster

* comment that out for now

* err, it needs that

* fix minor issues

* fix global_mem
2022-11-11 18:34:24 -08:00
George Hotz b8c94a67c9
Simple chonker (#431)
* chonker will make llvm fast

* work

* better speed tests, we will make them fast

* with the cache add is the same speed

* relu and neg are fast

* fix sum speed

* maximum maxnum?

* hack for gemm opt

* gemm very slow

* zeros like

* test_permute

* shapetracker returns self

* fix shapetracker factorization

* err, int strides

* permutes are faster now in tinygrad than pytorch

* support -1 in expand

* gemm unrolled

* improve final test case

* WIP GEMM

* why isn't GEMM fast?

* revert cache dim

* ffp contract works on clang, not llvm?

* ignore llvm ir

* this makes fma work at least, but no faster

* USE_4x4

* 63 GFLOPS

* 87 GFLOPS

* that wasn't matmul, 44 GFLOPS now

* 82 GFLOPS permuted

* this permute too

* a little speed for the convs

* 45 GFLOPS

* speed tests pass again

* clean up prints

* fix FMA WHAT A WASTE OF TIME

* colors

* moar fair

* GPU

* useless on chonker

* cleanups

* improve factorized shapetracker

* better threshold

* label conv

* work

* ops test pass again

* hot load the index

* run the last view, no need to create

* ZeroView needs a repr for the key to work

* fix segfault on out of bounds

* one more test

* start amx, and llvm.initialize_native_asmparser

* amx works

* nice AMX class

* nicer AMX class

* refactor get_idxs

* amx working

* is slower...

* useless flip

* cache

* SZ_X

* AMX_SZ_X/Y work alone

* Contiguous mlop

* test gemm packed

* PREPARE in packed

* use_amx factor

* prefetch isn't faster

* loop

* same 3ms

* 2.24 ms

* allow double on store in TG

* amx reduce is the same speed as non amx reduce

* include memory bandwidth

* clean up shapetracker

* flip returns stride

* prepare for upstream

* Update ops_llvm.py (#426)

* permutes are yellow and green now

* faster conv

* llvm cleanups

* Show optimised IR under debug 4 (#428)

* ASTKernel class

* Make tinygrad work with older python version (#427)

* Make tinygrad work with older python version

* Use partialmethod instead of partial

* smiple chonker is chonking

* remove junk from test speed vs torch

* fix linker and types

* AMX is only here now

* add LLVM tests, it's a valid backend now

* oops, run llvm test

* contiguous_op

* fix loadops compare

* dedup reduceops

Co-authored-by: calledit <1573053+calledit@users.noreply.github.com>
2022-11-10 23:17:09 -08:00
Liam 8dc28dd733
Create python-publish.yml (#163) 2022-11-08 08:45:01 -08:00
George Hotz d02f8f9bc0 can we lose the lines with E701 still there? 2022-10-28 08:36:03 -07:00
George Hotz ef62db3186 cleanups, remove E701 2022-10-28 08:28:56 -07:00
George Hotz b65b70812a
Exec AST (#404)
* working exec ast

* exec_ast is staticmethod

* GenericExecAST

* fold that sometimes

* ExplicitExecAST

* exec_ast for GPU

* gpu working

* get_lazyop_shape

* now gpubuffer is ExplicitExecAST

* dedup

* add a type

* RESHAPE in opencl code

* fix linter

* that too for linter

* cleanups

* remove dead code

* GenericShape is less lines

* add ALLOWED_KERNEL_COUNT to tests

* fix mypy

* that's gotta be recursive

* fix opencl shape processing

* remove unneeded lambda
2022-10-28 08:27:03 -07:00
George Hotz 3b9b7eda48 remove run_thneed dead code 2022-10-20 17:24:18 -07:00
YassineYousfi ae0f9b17df
openpilot: new models and onnx ops (#401)
* ngrl stuff

* fngrl

* fix typo in compile script

* workflow dispatch

* new models in tests

* dont need to up this threshold

Co-authored-by: HaraldSchafer <harald.the.engineer@gmail.com>
2022-10-20 11:49:19 -07:00
George Hotz 8382c51c12 always MATMUL, test the ops in OPENCL 2022-10-01 13:31:29 -04:00
George Hotz e737513c52 external_test_opt 2022-09-28 23:29:41 -04:00
George Hotz f215534a64 1100 lines, but sane linter rules 2022-09-06 13:47:45 -07:00
George Hotz fa33ba716e flip that 2022-08-28 11:27:33 -07:00
George Hotz 227937d6d2 fix test maybe 2022-08-22 09:50:14 -07:00
George Hotz 11626053b0 run_thneed with test 2022-08-22 09:45:46 -07:00
George Hotz 0aca848d10 enable the openpilot test 2022-08-21 12:05:48 -07:00
George Hotz a8734df030 add openpilot tests to tinygrad 2022-08-21 12:03:37 -07:00
George Hotz ccd539e93e maybe that's a better way to do this 2022-08-20 08:25:12 -07:00
George Hotz b7d782b921 hmm, with the new reduce, we have to opt 3 for memory usage 2022-08-20 08:16:15 -07:00
George Hotz b132de677d
tinygrad.nn (#367)
* tinygrad.nn

* flake8

* working on pylint

* more pylint

* more pylint

* pylint passes

* networkx

* mypy can't infer that type

* junk
2022-08-18 07:41:00 -07:00
George Hotz ef1100fdff touchups 2022-07-19 09:30:06 -07:00
George Hotz ef4afdb5d2 tests maybe 2022-07-18 08:24:14 -07:00
George Hotz a2c4bcf313 disable opencl tests 2022-07-18 08:17:21 -07:00
George Hotz 762e859089 testopencl 2022-07-17 11:56:40 -07:00
George Hotz 608e2431f7 test opencl, commit to removing the crap conv code from GPU 2022-07-17 11:55:37 -07:00
George Hotz 34f43ea10e LAZY and CLCACHE are defaults 2022-07-04 13:09:15 -07:00
George Hotz d7aad46758 test lazy also, make TestMNIST faster 2022-07-03 15:19:19 -07:00
Nicklas Boman 64d986bc8b
add mypy to ci testing (#353) 2022-07-03 15:11:35 -07:00
George Hotz 2ee85812f7
intel opencl (#342)
* intel opencl

* run clinfo

* that fix it?

* meh

* think it's the same

* basekit fix

* it wasn't basekit

* more minimal

* no clinfo
2022-06-20 19:25:55 -07:00
Jacky Lee 81664baf64
Fix OpenCL installation (#301) 2022-01-06 10:35:48 -05:00
George Hotz b0511f9392 Revert "does pybind fix CI?"
This reverts commit d128e4fcae.
2021-12-30 14:13:58 -05:00
George Hotz d128e4fcae does pybind fix CI? 2021-12-30 14:11:39 -05:00
George Hotz 6bee5bdb7d add torch tests 2021-10-30 18:58:45 -07:00
George Hotz 7472a7ebe2 not forcing 3.9 for a stupid type 2021-10-30 16:52:40 -07:00
George Hotz f193eeed25 bump all to python 3.9 2021-10-30 16:15:41 -07:00
George Hotz ac8afd24fa refactor accel 2021-10-30 16:10:59 -07:00
Dinesh Kumar Gnanasekaran 2146860307
fixed OpenCL installation while running tests (#262)
Co-authored-by: dinesh <dinesh-GDK>
2021-06-12 11:14:21 -07:00
George Hotz 6842ad9ec8 minor cleanups, yolo work 2021-01-03 08:14:16 -08:00
Liam ebd72ff437
Test split (#231)
* Split tests

Split tests into "Test CPU" and "Test GPU".

Add test flag "TEST_DEVICES" which is a comma separated list of devices:
CPU,GPU,ANE

* Run tests based on provided TEST_DEVICES flag

By default will run all "CPU,GPU,ANE"

* fix bad quote

* Revert changes and use GPU=1

This is done through setting the default Tensor Device to Device.CPU of
GPU=1 is set.

Run GPU tests: GPU=1 pytest -s -v
2021-01-01 09:19:03 -05:00
iainwo 56d44637f3
fixed pylint, formatted python files iwth cblack on localhost (#204)
* fixed pylint, formatted python files iwth cblack on localhost

* Revert "fixed pylint, formatted python files iwth cblack on localhost"

This reverts commit 07e2b88466fa53399ad78d962ffb2ad55bc45344.

* dedented 4-spaces added linter

Co-authored-by: Iain Wong <iainwong@outlook.com>
2020-12-17 14:37:31 -08:00
Mufeed VH e6a5c6c93e
Added indentation linter (#187)
* Added indentation linter

* pylint package latest
2020-12-12 17:15:09 -08:00
Liam e79cda6dad
Add pyopencl to dependency installs (#174)
* Add pyopencl to dependency installs

OpenCL was not actually being tested as pyopencl was not installed.

* Reduce installation to 1 liner
2020-12-10 09:24:08 -08:00
George Hotz df64658a2c weee, opencl tests in CI 2020-11-10 10:04:45 -08:00
George Hotz d47a128812 pocl 2020-11-10 10:02:13 -08:00
George Hotz c05401a9ca sudo maybe 2020-11-10 09:53:49 -08:00
George Hotz 09bc8eddfe clinfo 2020-11-10 09:51:38 -08:00
George Hotz 58e703d099 fix tests 2020-11-10 09:49:19 -08:00
George Hotz 23405cec43 intel opencl 2020-11-10 09:41:40 -08:00
George Hotz 33090c4b0d install more 2020-11-10 09:34:56 -08:00
George Hotz a52590e76c cpu opencl maybe 2020-11-10 09:32:54 -08:00
George Hotz 2e7f16bf3f the power of cheating 2020-11-02 07:42:11 -08:00
George Hotz fc358a07ad print sloccount 2020-11-02 07:38:13 -08:00