George Hotz
82fd932921
Lower schedule 2 ( #2561 )
...
* ls2
* fix types
* simpler
* cleaner
2023-12-01 20:25:49 -08:00
George Hotz
217cda81ba
hotfix: no metalgraph if there's weird ops
2023-12-01 19:32:55 -08:00
George Hotz
6733425095
lower schedule ( #2559 )
...
* lower schedule
* remove RAND, and don't put load in the JIT yet
* better fix for that test
2023-12-01 19:17:46 -08:00
Christopher Mauri Milan
077567f62d
Remove as_buffer for TORCH ( #2554 )
...
* remove as_buffer for torch
* enable torch zerocopy if on cpu
* remove as_buffer even on torch:cpu
2023-12-01 18:51:38 -08:00
chenyu
05a5357dd9
fix handcode_resnet50_opt.py ( #2558 )
2023-12-01 20:51:21 -05:00
chenyu
86fbd413f3
update test_real_world configs ( #2557 )
2023-12-01 20:03:52 -05:00
andresgit
00523d5656
New fix accessing elements created by padding ( #2529 )
...
* pad slice test cases, many failing
* fix failing test cases
check mask if we are outside the base buffer
also create a multi-view if in that case we reshape to an empty shape
* real_offset calculation more readable
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
2023-12-01 19:08:10 -05:00
George Hotz
bfdce1f0e7
hotfix: make openpilot test deterministic
2023-12-01 15:37:23 -08:00
George Hotz
9c306be282
better name for fast path
2023-12-01 15:32:47 -08:00
chenyu
67f4e03724
rewrite 0 size loadop into a CONST ( #2556 )
...
* rewrite 0 size loadop into a CONST
* check alloc size
* EMPTY is better
* Revert "EMPTY is better"
This reverts commit 574fe0f9ed28f1b97da5a81afdfd2cd5d9a94ff9.
* no ast is created
* fix test
2023-12-01 18:29:06 -05:00
George Hotz
4447188051
gate METAL_FAST_LOAD
2023-12-01 15:28:40 -08:00
chenyu
e9426f4fe4
simpler get_contraction ( #2552 )
...
* simpler get_contraction
* and test
2023-12-01 18:02:52 -05:00
George Hotz
eb595588bb
device.py cleanups for -1 line
2023-12-01 14:59:33 -08:00
George Hotz
f9b1de598f
hotfix: metal fastpath on sonoma
2023-12-01 14:55:34 -08:00
George Hotz
f5de21e753
fast path for copy ( #2548 )
...
* fast copy
* ruff first
* flat_mv on malloc
* order + webgpu test
2023-12-01 11:34:47 -08:00
wozeparrot
28183c7438
feat: reword ( #2549 )
2023-12-01 10:56:18 -08:00
George Hotz
4c984bba7e
bump version to 0.8.0, clean CI, remove requests ( #2545 )
...
* bump version to 0.8.0, clean CI, remove requests
* why was that even there
2023-12-01 10:42:50 -08:00
nimlgen
ff47be3a01
ruff check whitespaces ( #2547 )
2023-12-01 10:42:20 -08:00
George Hotz
8fd8399437
remove flake8 ( #2544 )
2023-12-01 09:48:41 -08:00
George Hotz
d8175a4380
simple fix ( #2543 )
2023-12-01 09:42:15 -08:00
qazal
04483f8187
refactor llvm consts ( #2537 )
2023-12-01 09:39:40 -08:00
nimlgen
badc97f824
hip & cuda to gpuctypes ( #2539 )
...
* cuda with gpuctypes
* hip gpuctypes
* graphs
* rename + linter happy
* use cpu_time_execution
* no ji in build_kernel_node_params
* remove hip_wrapper
* hip fix
* no arc
* smalle changes
* no clean moduke in cudacpu
2023-12-01 09:25:27 -08:00
qazal
0fb4ff30c8
share duplicate renders with cstyle ( #2538 )
2023-12-01 08:10:36 -08:00
chenyu
7fec966b5e
bye bye NOOP ( #2534 )
...
* bye bye NOOP
* SIN
* NEG
2023-11-30 23:10:35 -08:00
Joe Donovan
fa549d198d
Remove `type: ignore` comments ( #2533 )
...
* remove some type ignore comments and fix errors
* remove unnecessary get_args import
* revert triton changes
* remove changes not in tinygrad
2023-11-30 22:15:55 -08:00
George Hotz
12fa846122
zero copy ( #2531 )
...
* zero copy
* zero copy test
* loads coder in milliseconds
* zero copy for cpu and torch
* src_from_buffer is None
* SLOW_METAL_COPY there
2023-11-30 18:38:41 -08:00
Matthias Kronberg
5394a05b9d
Fix: Get item from ndarray before casting to int ( #2525 )
...
Directly casting is deprecated and will error in the future.
2023-11-30 18:34:31 -08:00
George Hotz
2c363b5f0b
new style device ( #2530 )
...
* cpu tests pass
* torch works
* works
* metal works
* fix ops_disk
* metal jit works
* fix openpilot
* llvm and clang work
* fix webgpu
* docs are rly broken
* LRU works on metal
* delete comment
* revert name to ._buf. LRU only on Compiled
* changes
* allocator
* allocator, getting closer
* lru alloc
* LRUAllocator
* all pass
* metal
* cuda
* test examples
* linearizer
* test fixes
* fix custom + clean realize
* fix hip
* skip tests
* fix tests
* fix size=0
* fix MOCKHIP
* fix thneed
* copy better
* simple
* old style metal copy
* fix thneed
* np reshape
* give cuda a device
2023-11-30 17:07:16 -08:00
chenyu
e56511b59a
more type annotation for tensor and lazy ( #2528 )
...
* more type annotation for tensor and lazy
* don't need that
2023-11-30 17:50:22 -05:00
Davi Silva
ddeec24fa8
Cleanup & fix llama.py ( #2524 )
...
* docs, cleanup crap
* comma AI
* fix 70B
* this is why lexical scope exists
2023-11-30 16:00:17 -05:00
chenyu
7d26452305
call ruff with --preview ( #2522 )
...
some checks are ignored without --preview
2023-11-30 13:59:00 -05:00
chenyu
5db0cdfbd3
support list of ints (or other Tensorable) in tensor indices ( #2520 )
...
* support list of ints (or other Tensorable) in tensor indices
* enable some index test cases
2023-11-30 12:46:33 -05:00
chenyu
bd941a0df1
first version of test_indexing ( #2515 )
...
* first version of test_indexing
* move to test/imported
2023-11-30 00:03:59 -05:00
chenyu
d210f6a786
minor device.py cleanups ( #2510 )
2023-11-29 18:16:25 -05:00
qazal
370cfbb957
Cleanup vectorized hip renders ( #2497 )
...
* add typedefs and make_dtypen functions
use ext_vector_type for half16 kernels
* remove the old test_render because we just use whatever cstyle has
* align vectors
2023-11-29 14:02:12 -08:00
George Hotz
abfc99187d
cleanup realize ( #2505 )
...
* delete reallocs
* cleaner
* that's real
* less lines
2023-11-29 11:38:38 -08:00
George Hotz
3dedeaae74
rebalance tests ( #2504 )
...
* rebalance
* balance
* parallel apt-get for all
* .local/lib/python3.11/site-packages
* what is user doing
* is that path right
* Update test.yml
* okay where are you
* site-packages
2023-11-29 11:18:22 -08:00
George Hotz
065aff747e
make webgpu test reliable ( #2502 )
...
* remove retry that doesn't work
* fix cleanup
* process exit in cleanup
* add space
2023-11-29 10:02:24 -08:00
George Hotz
6707f2588e
use copyin ( #2500 )
...
* it's always copyin
* all RawBuffer are RawBufferCopyIn
* cleanups
* this fixes it
* requirements='C'
* more correct
2023-11-29 09:34:00 -08:00
George Hotz
947711a532
split metal and webgpu tests ( #2501 )
2023-11-29 09:32:09 -08:00
chenyu
3eb3c74675
metal ci tests everything ( #2499 )
...
* metal ci tests everything
* pretty good
* METAL
2023-11-29 12:04:37 -05:00
George Hotz
889acefe85
Support weird loads in Image ( #2498 )
...
* image support weird loads
* umm, that was always wrong
* openpilot compile fails with a weird error
* image test passes
* we have valids now
* clean that up
* no more required opts
* add fastvits test, fix bug
* minor cleanups
2023-11-29 08:30:46 -08:00
George Hotz
e333672675
realize cleanup ( #2496 )
...
* move that logic
* revert that change
* clean up transfer and asserts
* what's that junk
2023-11-28 21:08:39 -08:00
George Hotz
5629fc368c
Use Buffer.STORE at the end of ASTs ( #2494 )
...
* work
* store broken
* interpreteds work
* this passes
* symbolic cpu
* fix tests
* fix opt tests
* images fail
* fix InterpretedFlopCounter
* stupid hack for images
2023-11-28 20:11:37 -08:00
Liam
cf0c9096a9
Removing METAL Skips as CI works ( #2488 )
...
* Test metal CI
* remove metal and CI restrictions
* enable dtype tests for metal ci
2023-11-28 19:46:59 -08:00
Jake
5588922884
Update cuda_matmul.py ( #2495 )
2023-11-28 19:46:01 -08:00
George Hotz
cdc3b95729
if you don't appreciate a 15 second timeout, you get a 10 second timeout
2023-11-28 17:44:09 -08:00
George Hotz
d87a246439
move to new cached fetch ( #2493 )
...
* move to new cached fetch
* extra.utils is over
* loads
* bump download cache
* bump timeout
2023-11-28 17:36:55 -08:00
George Hotz
ab5d14d4ba
MEM -> LOAD ( #2492 )
...
* MEM -> LOAD
* keep legacy working
2023-11-28 16:46:37 -08:00
chenyu
a739c6646e
fp16 in gpt2 attention ( #2491 )
...
* fp16 in gpt2 attention
* HALF
2023-11-28 19:27:03 -05:00