George Hotz
c6d5d45a2b
Remove MemOp ( #1750 )
...
* start removing memop
* locals
* support both stores
* might be correct
* remove parens on shape ish
* fix metal ops
* render load and render store
* fix image
* maybe fix asm
* fix test uops
* revert asm
* remove memop itself
2023-09-04 09:58:33 -07:00
George Hotz
56abe04e4b
disable assembly ( #1755 )
2023-09-04 09:41:20 -07:00
chenyu
b8fde6bb0f
Test KOPT in CI ( #1744 )
...
* test kopt in ci
* getenv takes dtype from default
2023-09-03 14:37:20 -07:00
George Hotz
ed194a1d3b
zero fold ( #1748 )
...
* add constant fold
* err, it's just zero folding
* self store fold + caching
* prints and more folds
* simpler winograd kernels
* remove childless uops
2023-09-03 13:48:11 -07:00
George Hotz
e17b1af160
UnaryOps.NEG ( #1749 )
2023-09-03 12:44:26 -07:00
George Hotz
9f1a54acee
pretty kernel in cstyle ( #1746 )
...
* pretty kernel in cstyle
* fix mem estimate
* that made it slower
* Revert "that made it slower"
This reverts commit faa4cd0187b1d17ddbb6ce3ce0e842904a9001b4.
2023-09-03 10:21:02 -07:00
George Hotz
e910e0e62c
folding mul by 0 ( #1743 )
...
* why doesn't this work
* zero mlop
* explicit fold in winograd
2023-09-03 09:04:12 -07:00
David Hou
3151d91f6e
3x3 winograd convs ( #1675 )
...
* winograd
* simplify local groups code
* comment
* respects self.opts.has_local
* always simplify ones
* make mypy happy
* move reshape, WINO flag
* wino flag, simple forward backward test for wino
* extra wino test
* merge oops
* comments
* axis_needs_valid -> axis_is_masked
* don't delete needs_valid (it's unused though)
* make linter happy
* make linter happy
* smaller test
* change number
* make wino tests very small
2023-09-03 07:29:43 -07:00
crankygrumpster
c8025c319c
Remove Token from abstractions.py ( #1741 )
...
* Remove Token from abstractions.py, update output string
* add dtype
2023-09-02 21:56:11 -07:00
geohotstan
e36148b1ce
Make __getitem__ TINYer ( #1661 )
2023-09-02 23:01:01 -04:00
Roelof van Dijk
60590cf8b5
perf: create buffer only when needed ( #1684 )
...
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-09-02 17:43:29 -07:00
Yixiang Gao
66a6bbd029
codellama ( #1702 )
...
* add codellama with pre-downloaded weights
* add rope_theta, fix param
* fix test
* add 7B-Python
* add 7B-Instruct
* replace single quotes with doulbe
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-09-02 08:45:12 -07:00
chenyu
a2745819f6
faster gpt2 jit path and gpt2 in test_real_world ( #1738 )
2023-09-02 08:39:12 -07:00
George Hotz
89cd380bfc
add nvidia CI ( #1737 )
...
* add nvidia
* speed(nvidia)
2023-09-01 22:02:30 -07:00
George Hotz
91258aa67f
render const ( #1736 )
...
* render const
* remove constop
* fix llvm and webgpu
* disable consts in llvm again
* assembly special
* fix const rendering
* fix arm64
* imms are int
* fix ptx
* fix arm64
2023-09-01 19:01:43 -07:00
nimlgen
a96e54d8bb
search for grouped reduces ( #1732 )
2023-09-01 14:21:10 -07:00
George Hotz
cd844ec4b2
remove Token class ( #1723 )
...
* no fusion
* no float4 grouping
* mulacc fusion is fine. remove uop_alu
* fully remove get_grouped_maybe_float4
* removed that test
* that's not float4 anymore
* disable failing arm64
* metal ops pass tokenless
* fix wmma
* update test_uops with new style
* fix gep
* fix float4 store
* fix float4 store more
* cuda tests pass
* disable broadcast pow
* fix ptx
* reenable arm64
* bring cse back
* don't cache the acc
* fix ptx bug
2023-09-01 12:53:07 -07:00
George Hotz
458eb89463
minor changes from prerender ( #1734 )
2023-09-01 10:04:47 -07:00
chenyu
f964b9e5ee
visitor pattern for sym_infer and unit tests ( #1733 )
...
* visitor pattern for sym_infer and unit tests
* comments
2023-09-01 09:47:45 -07:00
wozeparrot
bf05534c6e
hip multidevice ( #1728 )
...
* feat: hip multidevice support + p2p
* feat: default device
2023-09-01 06:46:13 -07:00
JaSpa99
024dd690fa
Reactivate commavq/gpt2m benchmark ( #1731 )
...
* get commavq/gpt2m from huggingface
* increase tols
2023-09-01 06:45:08 -07:00
George Hotz
7780eb3c5a
minor dimensions ( #1730 )
2023-09-01 06:42:00 -07:00
George Hotz
5c403d43b9
New >3 indexing ( #1729 )
...
* move reindexing into linearizer
* get_grouped_dims
* don't limit for clang
2023-08-31 21:24:15 -07:00
George Hotz
e3a062ad17
real matvec test
2023-08-31 17:27:25 -07:00
George Hotz
453e437598
move stuff in the linearizer ( #1726 )
...
* move stuff in linearizer
* move stuff in linearizer
* minor
* fix opts import
2023-08-31 14:42:09 -07:00
George Hotz
c18a497dde
minor global dim cleanup ( #1724 )
2023-08-31 12:23:39 -07:00
geohotstan
94b1257f5e
Changed DEVICE to Device.DEFAULT in deep_determinist_policy_gradient ( #1715 )
...
* added device in optim and deep
* oops forgot to del print code
* use Device.DEFAULT instead
* removed device
2023-08-31 07:08:51 -07:00
nimlgen
b5cf274da3
remove memory peak for quantized llama ( #1720 )
2023-08-30 16:32:30 -04:00
chenyu
e4eb5d55c7
critical realize for unjitted llama ( #1718 )
2023-08-30 14:52:32 -04:00
George Hotz
cd7ceed914
gpt2: print total instead of sync time
2023-08-30 10:59:42 -07:00
Roelof van Dijk
62536d6000
perf: use enumerate where possible ( #1692 )
...
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-30 10:41:51 -07:00
Karan Handa
a8aa13dc91
[ready] Replacing os with pathlib ( #1708 )
...
* replace os.path with pathlib
* safe convert dirnames to pathlib
* replace all os.path.join
* fix cuda error
* change main chunk
* Reviewer fixes
* fix vgg
* Fixed everything
* Final fixes
* ensure consistency
* Change all parent.parent... to parents
2023-08-30 10:41:08 -07:00
nimlgen
355b02dc3f
allow zerosized tensors ( #1659 )
...
* allow zerosized tensors
* works with numpy
2023-08-30 10:39:24 -07:00
Max Hahn
f9cb31fdc2
added visitor pattern ( #1669 )
...
* added visitor pattern
* pylint bug workaround
* added tests, made abstract OpNode inherit from ABC
* fixed assert
* fix check of abstract classes in negative test
* remove assert False
2023-08-30 09:03:44 -07:00
George Hotz
fdd7f282cb
Reenable tensor cores for self-hosted Mac CI ( #1717 )
...
* debug 5 matmul
* allow tensor cores in CI
* tensor cores on arm64
* put debug back
2023-08-30 07:53:04 -07:00
chenyu
ac183568be
llama JIT python runtime speedup ( #1633 )
...
* no JIT call in TransformerBlock
* idea
* move 2 reshapes to jitted function
shrink inside jitted too, 6.3ms
remove back reshapes, 5.5ms
isinstance -> __class__ 4.99ms
* think
revert ops_gpu.py
revert symbolic.py too
PYOPENCL_COMPILER_OUTPUT=1
* cleanup
* fix cache shape for conversational model
only reshape if start_pos > 0
* small cleanup
* include var_vals.keys() to st.key
* add comments
* llama small update
* everything jitted again, similar structure to gpt2
* fix typing
* add TODO for in place update cache
2023-08-30 07:51:05 -07:00
Umut Zengin
1682e9a38a
Fix: Stable Diffusion index ( #1713 )
2023-08-30 00:21:10 -04:00
wozeparrot
2f768e386d
stable diffusion benchmark artifact ( #1714 )
2023-08-29 21:08:40 -04:00
George Hotz
0ea22bf249
remove DEBUG=1 from stable diffusion AMD since jit cache is fixed
2023-08-29 12:46:12 -07:00
George Hotz
ab9b9ff3e2
pipefail benchmark ( #1709 ) ( #1710 )
...
* feat: specify shell
* feat: specify shell for mac
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2023-08-29 08:15:02 -07:00
George Hotz
aa7c98722b
sd timing ( #1706 )
2023-08-28 20:22:57 -07:00
nimlgen
8844a0a822
llvm jitted ( #1652 )
2023-08-28 20:22:44 -07:00
nimlgen
1c0449e190
add cache collector ( #1595 )
...
* init cache collector
* add test_cache_collector.py
* switch GlobalCounters.cache to CacheCollector
* init jit models test
* jitted SD
* add debug msg to print loaded bufs count
* moved cache collctor to jit
* clearer SD
* no double device import
2023-08-28 19:59:55 -07:00
George Hotz
f5f8b09c13
allow manual release ( #1704 )
2023-08-28 17:54:25 -07:00
George Hotz
715047a1e4
fix release publish ( #1703 )
2023-08-28 17:48:00 -07:00
Olivier Chafik
ee6d8de2dc
Llama: load models in HuggingFace format (incl. indexed, safetensors) ( #1583 )
2023-08-28 15:11:40 -04:00
qazal
3515ba4f23
add dtypes test ( #1682 )
2023-08-28 08:12:15 -07:00
Roelof van Dijk
50f669e43b
[ready] perf: simpler Tensor init ( #1679 )
...
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-27 22:18:03 -04:00
Roelof van Dijk
b66f54e379
perf: avoid reshaping if not necessary ( #1683 )
...
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-27 14:17:04 -04:00
Roelof van Dijk
328cf2e86a
perf: remove cast and revert back to isinstance ( #1694 )
...
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-27 14:15:52 -04:00