Commit Graph

5189 Commits

Author SHA1 Message Date
George Hotz 06e336bccb
mcts search (#5598)
* mcts search

* mcts cleanups

* mcts cleanup

* random shuffle children order

* mcts in handcode_opt

* src and remove_node

* debug 3 to print ast

* print the type

* mcts in extra
2024-07-19 21:38:39 -07:00
chenyu b991097d41
move UPat and PatternMatcher from uopgraph.py to uops.py (#5597)
* move UPat and PatternMatcher from uopgraph.py to uops.py

towards instant UOps rewrite on UOp.alu

[run_process_replay]

* fix imports
2024-07-19 19:28:24 -04:00
Tobias Fischer 72da3fe7e6
added clip vision model (#5595)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-07-19 18:35:51 -04:00
P4ssenger a1af5a79ad
remove obsolete code (#5596) 2024-07-19 18:12:03 -04:00
George Hotz a02998472b
fix no locals behavior (#5593) 2024-07-19 14:35:09 -07:00
George Hotz 2e617ca59e
lowerer img index (#5592) 2024-07-19 14:22:02 -07:00
chenyu 3acd8559f4
doc: variable names in abstractions2.py (#5591) 2024-07-19 17:06:15 -04:00
chenyu 00c01f6f4d
correct IDIV dtype check error msg (#5589)
`dtypes.is_int` is not the same as `dtype == dtypes.int`
2024-07-19 16:36:47 -04:00
nimlgen b1782e3fef
hcq refactor signal into class (#5575)
* hcq refactor signal into class

* fix amd

* amd do not use amd_signal_t

* cleanup

* signal setter

* fix linter

* docs

* more docs + types

* fix types
2024-07-19 23:23:05 +03:00
Francis Lata 2dc100c565
fix typo in runtime overview docs (#5588) 2024-07-19 22:00:15 +03:00
George Hotz d0ab20a5e5
careful memory counting (with tests to specify behavior) (#5587) 2024-07-19 11:37:34 -07:00
chenyu 37dd233650
always reverse global dim (#5586)
* always reverse global dim

* one more test
2024-07-19 13:58:05 -04:00
George Hotz 10be05aae5
push contract through cast to fix test_float2_acc (try 2) (#5585)
* push contract through cast to fix test_float2_acc (try 2)

* contract push only on floats
2024-07-19 10:34:43 -07:00
George Hotz 51892c8fac
Revert "push contract through cast to fix test_float2_acc (#5581)" (#5583)
This reverts commit ddda9420be.
2024-07-19 09:44:30 -07:00
George Hotz 6bade4d419
save the uops in their own file (#5582) 2024-07-19 09:30:37 -07:00
George Hotz ddda9420be
push contract through cast to fix test_float2_acc (#5581)
* push contract through cast to fix test_float2_acc

* no_vectorized_alu applies to cast too
2024-07-19 09:30:26 -07:00
chenyu 3f590c3b31
some limit_dims to limit global merging (#5489)
only supports merging dims in a way that does not surpass limit, no splitting yet
2024-07-19 12:17:46 -04:00
George Hotz e04704faff
put acc first again (#5580) 2024-07-19 08:55:19 -07:00
chenyu fc5b9f8dc9
Kernel.required_optimizations and Kernel.hand_coded_optimizations returns self (#5576)
[run_process_replay]
2024-07-19 10:55:14 -04:00
qazal da34e1f617
scheduler refactors from the fuse_index branch (#5579)
* make simple_pads a safe set

* use is for comparing base

* 1 should continue
2024-07-19 16:23:31 +03:00
qazal ecf88bb775
move assign_targets assignment (#5578) 2024-07-19 20:29:50 +08:00
George Hotz 0ad87021e2
move acc to end (#5568)
* move acc to end

* confirmed pictures are the same

* relax that

* Update test_ops.py
2024-07-19 03:06:52 -07:00
George Hotz 2de82b8a5d
remove get_lazyop_info (#5570)
* don't use get_lazyop_info more

* keep that min

* no ptx for that test
2024-07-19 03:05:33 -07:00
nimlgen 9d7edc9269
hcq rename HCQCompat -> HCQ (#5577) 2024-07-19 11:34:17 +03:00
chenyu 2b2f8ad18c
failed example of float2 acc no long applies (#5573)
* failed example of float2 acc no long applies

* # noqa: E501
2024-07-19 02:40:04 -04:00
chenyu efccb1c3ba
swap global for size 3 too (#5567)
hc path resnet on green 10% faster
2024-07-18 23:31:15 -04:00
chenyu abe29a05b0
swap first and last global in hcopt / hc tc path (#5566) 2024-07-18 18:54:44 -04:00
George Hotz 946da97820
swap action (#5565)
* swap action

* don't allow same action expressed differently

* oops, was reversed

* one line is fine

* only swap
2024-07-18 15:19:40 -07:00
qazal e7a057c20f
retire replay_schedule (#5563) 2024-07-18 23:07:02 +03:00
qazal 50aba32ea8
hotfix: don't assert process replay in master. (#5562)
This is because https://github.com/tinygrad/tinygrad/actions/runs/9996754763/job/27631802686 ran exactly when master changed state, causing the diff to assert.
if [run_process_replay] is green pre merge it's ok.
2024-07-18 22:05:00 +03:00
George Hotz 223d9283ee
fix float4 acc by moving contracts (#5559) 2024-07-18 11:30:16 -07:00
George Hotz c41cd55556
remove vectorized alu in expander [run_process_replay] (#5561) 2024-07-18 11:27:40 -07:00
kormann c951bc99af
fix abstracions2 printout (#5557) 2024-07-18 21:21:45 +03:00
George Hotz a7fec05acc
fix broken store rule [run_process_replay] (#5558)
* remove unused store rule [run_process_replay]

* that should preserve behavior i think
2024-07-18 11:07:34 -07:00
chenyu f5af98c450
failed test case that DEFINE_ACC no long uses float4 (#5555)
* failed test case that DEFINE_ACC no long uses float4

* line
2024-07-18 10:55:59 -07:00
George Hotz 923e0fe0b8
fix half4 folding (#5556) 2024-07-18 10:47:39 -07:00
chenyu 12e6771209
failed test case for unrolled half4 (#5552) 2024-07-18 13:05:52 -04:00
George Hotz d1a7279605
indexing fold with casted bool (#5551)
* cast bool is where

* universal transform is wrong
2024-07-18 10:02:29 -07:00
qazal fdfc0015a7
[run_process_replay] for opencl/openpilot (#5009)
* lil reset script

* find the prg

* use lower_schedule_item

* add process replay back

* cleanups
2024-07-18 19:42:33 +03:00
kormann 2c4add6844
pretty print lazy op per default (#5505)
* pretty lop

* min diff

* walrus

* fix

* min diff

* simplify

* pretty helper function

* ws

* pretty uop upat

* tests

* stricter tests

* test passes

* ws

* stronger upat test

* delete print_tree

* min diff

* stricter exp test

* fix merge

* stronger uops eval test

* +readable and deep upat test

* +readable and deep upat test

* sort inv fix

* fix

* revert allowed_len
2024-07-18 09:34:08 -07:00
nimlgen c30092e56d
amd remove useless barrier (#5550) 2024-07-18 18:05:33 +03:00
nimlgen 4e9d2b1615
nv memory_barrier command (#5548) 2024-07-18 16:23:11 +03:00
qazal 6d7cd34250
more save_schedule tooling (#5547) 2024-07-18 15:59:53 +03:00
qazal 0ad1672d5f
fuse indexing (LazyOp creation) (#5506)
* bring FUSE_AS_ONE_KERNEL back

* operands need reshape?

* fused but arange didnt fold

* something deeply wrong

* yay, fused

* derive broadcasts

* s/input/reduce_input

* _fixup_ones proved a point

* this is what it takes

* down to 3 required reshapes:

1. output_shape
2. the second reduce merge dims
3. remove dims for above reshape

* start real reshapes

* resolve shape in the edges pre lazyop

* outputs are the same shape

* rewrite1: just the reduce

* more correct

* fuse_as_one_kernel

* closer

* this passes

* dont rerun info

* dont need these

* not needed
2024-07-18 14:09:17 +03:00
wozeparrot 6ccb2390c3
feat: update_benchmark_staging (#5529) 2024-07-17 20:40:57 -07:00
chenyu e569c927cf
remove Kernel.shape_offsets [run_process_replay] (#5544)
the only use case now can be further simplified
2024-07-17 23:16:47 -04:00
George Hotz fa7e734b49
MetaOps.KERNEL (#5543) 2024-07-17 19:41:23 -07:00
George Hotz d3b098299d
add failing regression test for image (#5540)
* add failing regression test for image

* tg type

* simpler test

* don't realize image to image casts caused issue

* simple pad
2024-07-17 17:27:18 -07:00
wozeparrot 218e157f00
benchmark on update_benchmark_staging (#5541) 2024-07-17 17:11:52 -07:00
wozeparrot 8845a5dbfd
feat: begin immediate (#5539) 2024-07-17 16:11:21 -07:00