George Hotz
06e336bccb
mcts search ( #5598 )
...
* mcts search
* mcts cleanups
* mcts cleanup
* random shuffle children order
* mcts in handcode_opt
* src and remove_node
* debug 3 to print ast
* print the type
* mcts in extra
2024-07-19 21:38:39 -07:00
chenyu
b991097d41
move UPat and PatternMatcher from uopgraph.py to uops.py ( #5597 )
...
* move UPat and PatternMatcher from uopgraph.py to uops.py
towards instant UOps rewrite on UOp.alu
[run_process_replay]
* fix imports
2024-07-19 19:28:24 -04:00
Tobias Fischer
72da3fe7e6
added clip vision model ( #5595 )
...
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-07-19 18:35:51 -04:00
P4ssenger
a1af5a79ad
remove obsolete code ( #5596 )
2024-07-19 18:12:03 -04:00
George Hotz
a02998472b
fix no locals behavior ( #5593 )
2024-07-19 14:35:09 -07:00
George Hotz
2e617ca59e
lowerer img index ( #5592 )
2024-07-19 14:22:02 -07:00
chenyu
3acd8559f4
doc: variable names in abstractions2.py ( #5591 )
2024-07-19 17:06:15 -04:00
chenyu
00c01f6f4d
correct IDIV dtype check error msg ( #5589 )
...
`dtypes.is_int` is not the same as `dtype == dtypes.int`
2024-07-19 16:36:47 -04:00
nimlgen
b1782e3fef
hcq refactor signal into class ( #5575 )
...
* hcq refactor signal into class
* fix amd
* amd do not use amd_signal_t
* cleanup
* signal setter
* fix linter
* docs
* more docs + types
* fix types
2024-07-19 23:23:05 +03:00
Francis Lata
2dc100c565
fix typo in runtime overview docs ( #5588 )
2024-07-19 22:00:15 +03:00
George Hotz
d0ab20a5e5
careful memory counting (with tests to specify behavior) ( #5587 )
2024-07-19 11:37:34 -07:00
chenyu
37dd233650
always reverse global dim ( #5586 )
...
* always reverse global dim
* one more test
2024-07-19 13:58:05 -04:00
George Hotz
10be05aae5
push contract through cast to fix test_float2_acc (try 2) ( #5585 )
...
* push contract through cast to fix test_float2_acc (try 2)
* contract push only on floats
2024-07-19 10:34:43 -07:00
George Hotz
51892c8fac
Revert "push contract through cast to fix test_float2_acc ( #5581 )" ( #5583 )
...
This reverts commit ddda9420be
.
2024-07-19 09:44:30 -07:00
George Hotz
6bade4d419
save the uops in their own file ( #5582 )
2024-07-19 09:30:37 -07:00
George Hotz
ddda9420be
push contract through cast to fix test_float2_acc ( #5581 )
...
* push contract through cast to fix test_float2_acc
* no_vectorized_alu applies to cast too
2024-07-19 09:30:26 -07:00
chenyu
3f590c3b31
some limit_dims to limit global merging ( #5489 )
...
only supports merging dims in a way that does not surpass limit, no splitting yet
2024-07-19 12:17:46 -04:00
George Hotz
e04704faff
put acc first again ( #5580 )
2024-07-19 08:55:19 -07:00
chenyu
fc5b9f8dc9
Kernel.required_optimizations and Kernel.hand_coded_optimizations returns self ( #5576 )
...
[run_process_replay]
2024-07-19 10:55:14 -04:00
qazal
da34e1f617
scheduler refactors from the fuse_index branch ( #5579 )
...
* make simple_pads a safe set
* use is for comparing base
* 1 should continue
2024-07-19 16:23:31 +03:00
qazal
ecf88bb775
move assign_targets assignment ( #5578 )
2024-07-19 20:29:50 +08:00
George Hotz
0ad87021e2
move acc to end ( #5568 )
...
* move acc to end
* confirmed pictures are the same
* relax that
* Update test_ops.py
2024-07-19 03:06:52 -07:00
George Hotz
2de82b8a5d
remove get_lazyop_info ( #5570 )
...
* don't use get_lazyop_info more
* keep that min
* no ptx for that test
2024-07-19 03:05:33 -07:00
nimlgen
9d7edc9269
hcq rename HCQCompat -> HCQ ( #5577 )
2024-07-19 11:34:17 +03:00
chenyu
2b2f8ad18c
failed example of float2 acc no long applies ( #5573 )
...
* failed example of float2 acc no long applies
* # noqa: E501
2024-07-19 02:40:04 -04:00
chenyu
efccb1c3ba
swap global for size 3 too ( #5567 )
...
hc path resnet on green 10% faster
2024-07-18 23:31:15 -04:00
chenyu
abe29a05b0
swap first and last global in hcopt / hc tc path ( #5566 )
2024-07-18 18:54:44 -04:00
George Hotz
946da97820
swap action ( #5565 )
...
* swap action
* don't allow same action expressed differently
* oops, was reversed
* one line is fine
* only swap
2024-07-18 15:19:40 -07:00
qazal
e7a057c20f
retire replay_schedule ( #5563 )
2024-07-18 23:07:02 +03:00
qazal
50aba32ea8
hotfix: don't assert process replay in master. ( #5562 )
...
This is because https://github.com/tinygrad/tinygrad/actions/runs/9996754763/job/27631802686 ran exactly when master changed state, causing the diff to assert.
if [run_process_replay] is green pre merge it's ok.
2024-07-18 22:05:00 +03:00
George Hotz
223d9283ee
fix float4 acc by moving contracts ( #5559 )
2024-07-18 11:30:16 -07:00
George Hotz
c41cd55556
remove vectorized alu in expander [run_process_replay] ( #5561 )
2024-07-18 11:27:40 -07:00
kormann
c951bc99af
fix abstracions2 printout ( #5557 )
2024-07-18 21:21:45 +03:00
George Hotz
a7fec05acc
fix broken store rule [run_process_replay] ( #5558 )
...
* remove unused store rule [run_process_replay]
* that should preserve behavior i think
2024-07-18 11:07:34 -07:00
chenyu
f5af98c450
failed test case that DEFINE_ACC no long uses float4 ( #5555 )
...
* failed test case that DEFINE_ACC no long uses float4
* line
2024-07-18 10:55:59 -07:00
George Hotz
923e0fe0b8
fix half4 folding ( #5556 )
2024-07-18 10:47:39 -07:00
chenyu
12e6771209
failed test case for unrolled half4 ( #5552 )
2024-07-18 13:05:52 -04:00
George Hotz
d1a7279605
indexing fold with casted bool ( #5551 )
...
* cast bool is where
* universal transform is wrong
2024-07-18 10:02:29 -07:00
qazal
fdfc0015a7
[run_process_replay] for opencl/openpilot ( #5009 )
...
* lil reset script
* find the prg
* use lower_schedule_item
* add process replay back
* cleanups
2024-07-18 19:42:33 +03:00
kormann
2c4add6844
pretty print lazy op per default ( #5505 )
...
* pretty lop
* min diff
* walrus
* fix
* min diff
* simplify
* pretty helper function
* ws
* pretty uop upat
* tests
* stricter tests
* test passes
* ws
* stronger upat test
* delete print_tree
* min diff
* stricter exp test
* fix merge
* stronger uops eval test
* +readable and deep upat test
* +readable and deep upat test
* sort inv fix
* fix
* revert allowed_len
2024-07-18 09:34:08 -07:00
nimlgen
c30092e56d
amd remove useless barrier ( #5550 )
2024-07-18 18:05:33 +03:00
nimlgen
4e9d2b1615
nv memory_barrier command ( #5548 )
2024-07-18 16:23:11 +03:00
qazal
6d7cd34250
more save_schedule tooling ( #5547 )
2024-07-18 15:59:53 +03:00
qazal
0ad1672d5f
fuse indexing (LazyOp creation) ( #5506 )
...
* bring FUSE_AS_ONE_KERNEL back
* operands need reshape?
* fused but arange didnt fold
* something deeply wrong
* yay, fused
* derive broadcasts
* s/input/reduce_input
* _fixup_ones proved a point
* this is what it takes
* down to 3 required reshapes:
1. output_shape
2. the second reduce merge dims
3. remove dims for above reshape
* start real reshapes
* resolve shape in the edges pre lazyop
* outputs are the same shape
* rewrite1: just the reduce
* more correct
* fuse_as_one_kernel
* closer
* this passes
* dont rerun info
* dont need these
* not needed
2024-07-18 14:09:17 +03:00
wozeparrot
6ccb2390c3
feat: update_benchmark_staging ( #5529 )
2024-07-17 20:40:57 -07:00
chenyu
e569c927cf
remove Kernel.shape_offsets [run_process_replay] ( #5544 )
...
the only use case now can be further simplified
2024-07-17 23:16:47 -04:00
George Hotz
fa7e734b49
MetaOps.KERNEL ( #5543 )
2024-07-17 19:41:23 -07:00
George Hotz
d3b098299d
add failing regression test for image ( #5540 )
...
* add failing regression test for image
* tg type
* simpler test
* don't realize image to image casts caused issue
* simple pad
2024-07-17 17:27:18 -07:00
wozeparrot
218e157f00
benchmark on update_benchmark_staging ( #5541 )
2024-07-17 17:11:52 -07:00
wozeparrot
8845a5dbfd
feat: begin immediate ( #5539 )
2024-07-17 16:11:21 -07:00