Commit Graph

2521 Commits

Author SHA1 Message Date
George Hotz c6d5d45a2b
Remove MemOp (#1750)
* start removing memop

* locals

* support both stores

* might be correct

* remove parens on shape ish

* fix metal ops

* render load and render store

* fix image

* maybe fix asm

* fix test uops

* revert asm

* remove memop itself
2023-09-04 09:58:33 -07:00
George Hotz 56abe04e4b
disable assembly (#1755) 2023-09-04 09:41:20 -07:00
chenyu b8fde6bb0f
Test KOPT in CI (#1744)
* test kopt in ci

* getenv takes dtype from default
2023-09-03 14:37:20 -07:00
George Hotz ed194a1d3b
zero fold (#1748)
* add constant fold

* err, it's just zero folding

* self store fold + caching

* prints and more folds

* simpler winograd kernels

* remove childless uops
2023-09-03 13:48:11 -07:00
George Hotz e17b1af160
UnaryOps.NEG (#1749) 2023-09-03 12:44:26 -07:00
George Hotz 9f1a54acee
pretty kernel in cstyle (#1746)
* pretty kernel in cstyle

* fix mem estimate

* that made it slower

* Revert "that made it slower"

This reverts commit faa4cd0187b1d17ddbb6ce3ce0e842904a9001b4.
2023-09-03 10:21:02 -07:00
George Hotz e910e0e62c
folding mul by 0 (#1743)
* why doesn't this work

* zero mlop

* explicit fold in winograd
2023-09-03 09:04:12 -07:00
David Hou 3151d91f6e
3x3 winograd convs (#1675)
* winograd

* simplify local groups code

* comment

* respects self.opts.has_local

* always simplify ones

* make mypy happy

* move reshape, WINO flag

* wino flag, simple forward backward test for wino

* extra wino test

* merge oops

* comments

* axis_needs_valid -> axis_is_masked

* don't delete needs_valid (it's unused though)

* make linter happy

* make linter happy

* smaller test

* change number

* make wino tests very small
2023-09-03 07:29:43 -07:00
crankygrumpster c8025c319c
Remove Token from abstractions.py (#1741)
* Remove Token from abstractions.py, update output string

* add dtype
2023-09-02 21:56:11 -07:00
geohotstan e36148b1ce
Make __getitem__ TINYer (#1661) 2023-09-02 23:01:01 -04:00
Roelof van Dijk 60590cf8b5
perf: create buffer only when needed (#1684)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-09-02 17:43:29 -07:00
Yixiang Gao 66a6bbd029
codellama (#1702)
* add codellama with pre-downloaded weights

* add rope_theta, fix param

* fix test

* add 7B-Python

* add 7B-Instruct

* replace single quotes with doulbe

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-09-02 08:45:12 -07:00
chenyu a2745819f6
faster gpt2 jit path and gpt2 in test_real_world (#1738) 2023-09-02 08:39:12 -07:00
George Hotz 89cd380bfc
add nvidia CI (#1737)
* add nvidia

* speed(nvidia)
2023-09-01 22:02:30 -07:00
George Hotz 91258aa67f
render const (#1736)
* render const

* remove constop

* fix llvm and webgpu

* disable consts in llvm again

* assembly special

* fix const rendering

* fix arm64

* imms are int

* fix ptx

* fix arm64
2023-09-01 19:01:43 -07:00
nimlgen a96e54d8bb
search for grouped reduces (#1732) 2023-09-01 14:21:10 -07:00
George Hotz cd844ec4b2
remove Token class (#1723)
* no fusion

* no float4 grouping

* mulacc fusion is fine. remove uop_alu

* fully remove get_grouped_maybe_float4

* removed that test

* that's not float4 anymore

* disable failing arm64

* metal ops pass tokenless

* fix wmma

* update test_uops with new style

* fix gep

* fix float4 store

* fix float4 store more

* cuda tests pass

* disable broadcast pow

* fix ptx

* reenable arm64

* bring cse back

* don't cache the acc

* fix ptx bug
2023-09-01 12:53:07 -07:00
George Hotz 458eb89463
minor changes from prerender (#1734) 2023-09-01 10:04:47 -07:00
chenyu f964b9e5ee
visitor pattern for sym_infer and unit tests (#1733)
* visitor pattern for sym_infer and unit tests

* comments
2023-09-01 09:47:45 -07:00
wozeparrot bf05534c6e
hip multidevice (#1728)
* feat: hip multidevice support + p2p

* feat: default device
2023-09-01 06:46:13 -07:00
JaSpa99 024dd690fa
Reactivate commavq/gpt2m benchmark (#1731)
* get commavq/gpt2m from huggingface

* increase tols
2023-09-01 06:45:08 -07:00
George Hotz 7780eb3c5a
minor dimensions (#1730) 2023-09-01 06:42:00 -07:00
George Hotz 5c403d43b9
New >3 indexing (#1729)
* move reindexing into linearizer

* get_grouped_dims

* don't limit for clang
2023-08-31 21:24:15 -07:00
George Hotz e3a062ad17 real matvec test 2023-08-31 17:27:25 -07:00
George Hotz 453e437598
move stuff in the linearizer (#1726)
* move stuff in linearizer

* move stuff in linearizer

* minor

* fix opts import
2023-08-31 14:42:09 -07:00
George Hotz c18a497dde
minor global dim cleanup (#1724) 2023-08-31 12:23:39 -07:00
geohotstan 94b1257f5e
Changed DEVICE to Device.DEFAULT in deep_determinist_policy_gradient (#1715)
* added device in optim and deep

* oops forgot to del print code

* use Device.DEFAULT instead

* removed device
2023-08-31 07:08:51 -07:00
nimlgen b5cf274da3
remove memory peak for quantized llama (#1720) 2023-08-30 16:32:30 -04:00
chenyu e4eb5d55c7
critical realize for unjitted llama (#1718) 2023-08-30 14:52:32 -04:00
George Hotz cd7ceed914 gpt2: print total instead of sync time 2023-08-30 10:59:42 -07:00
Roelof van Dijk 62536d6000
perf: use enumerate where possible (#1692)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-30 10:41:51 -07:00
Karan Handa a8aa13dc91
[ready] Replacing os with pathlib (#1708)
* replace os.path with pathlib

* safe convert dirnames to pathlib

* replace all os.path.join

* fix cuda error

* change main chunk

* Reviewer fixes

* fix vgg

* Fixed everything

* Final fixes

* ensure consistency

* Change all parent.parent... to parents
2023-08-30 10:41:08 -07:00
nimlgen 355b02dc3f
allow zerosized tensors (#1659)
* allow zerosized tensors

* works with numpy
2023-08-30 10:39:24 -07:00
Max Hahn f9cb31fdc2
added visitor pattern (#1669)
* added visitor pattern

* pylint bug workaround

* added tests, made abstract OpNode inherit from ABC

* fixed assert

* fix check of abstract classes in negative test

* remove assert False
2023-08-30 09:03:44 -07:00
George Hotz fdd7f282cb
Reenable tensor cores for self-hosted Mac CI (#1717)
* debug 5 matmul

* allow tensor cores in CI

* tensor cores on arm64

* put debug back
2023-08-30 07:53:04 -07:00
chenyu ac183568be
llama JIT python runtime speedup (#1633)
* no JIT call in TransformerBlock

* idea

* move 2 reshapes to jitted function

shrink inside jitted too, 6.3ms

remove back reshapes, 5.5ms

isinstance -> __class__ 4.99ms

* think

revert ops_gpu.py

revert symbolic.py too

PYOPENCL_COMPILER_OUTPUT=1

* cleanup

* fix cache shape for conversational model

only reshape if start_pos > 0

* small cleanup

* include var_vals.keys() to st.key

* add comments

* llama small update

* everything jitted again, similar structure to gpt2

* fix typing

* add TODO for in place update cache
2023-08-30 07:51:05 -07:00
Umut Zengin 1682e9a38a
Fix: Stable Diffusion index (#1713) 2023-08-30 00:21:10 -04:00
wozeparrot 2f768e386d
stable diffusion benchmark artifact (#1714) 2023-08-29 21:08:40 -04:00
George Hotz 0ea22bf249 remove DEBUG=1 from stable diffusion AMD since jit cache is fixed 2023-08-29 12:46:12 -07:00
George Hotz ab9b9ff3e2
pipefail benchmark (#1709) (#1710)
* feat: specify shell

* feat: specify shell for mac

Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2023-08-29 08:15:02 -07:00
George Hotz aa7c98722b
sd timing (#1706) 2023-08-28 20:22:57 -07:00
nimlgen 8844a0a822
llvm jitted (#1652) 2023-08-28 20:22:44 -07:00
nimlgen 1c0449e190
add cache collector (#1595)
* init cache collector

* add test_cache_collector.py

* switch GlobalCounters.cache to CacheCollector

* init jit models test

* jitted SD

* add debug msg to print loaded bufs count

* moved cache collctor to jit

* clearer SD

* no double device import
2023-08-28 19:59:55 -07:00
George Hotz f5f8b09c13
allow manual release (#1704) 2023-08-28 17:54:25 -07:00
George Hotz 715047a1e4
fix release publish (#1703) 2023-08-28 17:48:00 -07:00
Olivier Chafik ee6d8de2dc
Llama: load models in HuggingFace format (incl. indexed, safetensors) (#1583) 2023-08-28 15:11:40 -04:00
qazal 3515ba4f23
add dtypes test (#1682) 2023-08-28 08:12:15 -07:00
Roelof van Dijk 50f669e43b
[ready] perf: simpler Tensor init (#1679)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-27 22:18:03 -04:00
Roelof van Dijk b66f54e379
perf: avoid reshaping if not necessary (#1683)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-27 14:17:04 -04:00
Roelof van Dijk 328cf2e86a
perf: remove cast and revert back to isinstance (#1694)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-27 14:15:52 -04:00