qazal
e7f6b654ad
cleanup uop eq asserts for swizzle [run_process_replay] ( #6362 )
...
* cleanup uop eq asserts for swizzle [run_process_replay]
* more stuff
2024-09-05 13:36:36 +08:00
Oleg Rybalko
64f1384f5b
Einsum ellipsis support ( #6333 )
...
* working ellipsis expansion
* refactor
* fix commas in output
* add capital letters
* refactor
2024-09-05 10:08:55 +08:00
nimlgen
326a77336e
qcom remove some tests skips ( #6353 )
2024-09-04 15:38:18 +03:00
qazal
99018a4aa1
minor schedule differ utils [run_process_replay] ( #6348 )
...
* minor schedule differ utils [run_process_replay]
* rm
2024-09-04 03:41:38 +08:00
nimlgen
3adb76894d
validate image=2 float16=1 openpilot benchmark ( #6346 )
...
* validate image=2 float=16 openpilot
* linter
* linter2
2024-09-03 20:13:40 +03:00
qazal
2f00bf0c78
conv bw in one kernel with graph_rewrite ( #6330 )
...
* double reduce merger
* add test_fold_conv_relu_backward_ast_rewrite
* a correctness test to iterate on
* merge axes the other way around
* better
2024-09-03 03:53:53 +08:00
Vyacheslav Pachkov
4c33192a8b
add qcom runtime ( #5213 )
...
* qcom: driver init
* autogen stubs for msm_kgsl also fixup ioctls to show numbers instead of _IOW macros
* autogen: add adreno commands and registers
* ops_qcom: QcomAllocator + signals
* fix EDEADLK in hwqueue, init timestamps, use opencl compiler for qcom
* qcom: we do not really need all these constants input/output is enough
* qcom: perfctr for CS (do not really need all the rest)
* qcom: HALFREGFOOTPRINT and FULLREGFOOTPRINT are set to be around max
* qcom: explicitly set instruction len based on the shader size
* ops_qcom: Program init
extracts shader from open cl binary
sets input/output buffers
allocates stack
sets cs mode
runs shader
* use data64_le from helpers
* ops_qcom: use fill_kernargs for filling i/o buffers
* ops_qcom: add QcomCopyQueue just for api & set kernargs_args_offset
* new signals & fix exec
* add QCOM to the list of supported devices
* correct QcomComputeQueue._wait using CP_WAIT_REG_MEM
* fix exec, synchronize before copyout
* correct setting num_units for ST_SHADER
* fix gpu hangs on sigs with CP_MEM_WRITE, it is uncached mem anyway
* extract offsets to kernel arguments from opencl binary
* extract constants values and offsets from opencl binary
* handle KGSL_MEMFLAGS_USE_CPU_MAP correctly
* align kernel name to 4 bytes when skipping kernel opencl struct
* skip to consts directly using an offset from opencl binary header
* fix alloc
* get halfreg and fullreg from opencl bin
* set unmultipled global sizes as kernel group in HLSQ_CS_NDRANGE
* parse prg offset from open cl binary
* save loc with HLSQ_CS_CNTL. set this with HLSQ_CONTROL_2_REG
* support for vals in _fill_kernargs
* support 16-bit constants
* use KGSL_CONTEXT_NO_FAULT_TOLERANCE for contexts
this helps to not fall down when executing big kernels
/* Don't time out if the context has disabled it */
if (drawobj->context->flags & KGSL_CONTEXT_NO_FAULT_TOLERANCE)
return;
* minor changes of _exec
* QCOMRenderer
* disable HCQGraph for demo. TOOD: support HCQ update api
* support HCQ
- remove copy queue
- add updates
- add strides for buffs and vars for QCOM
* bufs_stride
* clean ups
* linter
* call super().__init__(value) in QcomSignal
* disable=unused-import
* mypy
* type ignore when queue is on the device
* fix
* query gpu_id.
Will be useful for selecting commands e.g. CP_EVENT_WRITE vs
CP_EVENT_WRITE7
* working timestamps
* free context after device is done
* move gpu stack to the device
* reserve some space with lib_gpu for gpu to write to
this fixes test_interpolate_bilinear
* exclude tests that fails with GPU=1 on qualcomm
* lint
* unmap mem in _gpu_free
* ctxt priority and preemtion policy
* remove old qcom
* pass size to self.device.allocator.free
* skip tests only on qcom
* use kgsl and adreno defines instead of numeric vals
* use allocator for allocating lib_gpu
* update to QcomArgsState from master
* intermediate commit while conquering images
* enable image tests on qcom
* fix shader disasm size, dump textures stuff
* working images
* allow signals to be 0
* set branchstack from OpenCL binary
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
* set shared memory size from OpenCL binary
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
* update images in QcomArgsState & less loc for images
* set stack sizes from OpenCL binary
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
* stack allocation based on OpenCL binary
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
* better autogen for kgsl and adreno. no more bitshifts
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
* cleanup commit for parse cl lib
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
* dont forget actual generated files
* refactor + less loc
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
* device.py back
* lint
* ruff
* timestamp divisor
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
* fix tex fmt & round global size
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
* dtypes
* 19.2MHz
* -1 loc in _update_exec
* remove noqa
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2024-09-02 19:35:47 +03:00
George Hotz
406ec8240e
hotfix: lin_fail_41 passes on my M3 Max
2024-08-31 11:46:46 -07:00
Roelof van Dijk
ad4b3b457f
bump limit for test_llama_embedding_opt ( #6332 )
2024-08-31 10:03:43 -04:00
George Hotz
72939901fc
hotfix: ebs print kernel names
2024-08-29 21:20:36 -07:00
George Hotz
365babe391
precompute early_reject [run_process_replay] ( #6327 )
...
* precompute early_reject [run_process_replay]
* features for ebs
* fix ocelot cache
2024-08-29 18:26:24 -07:00
George Hotz
385904526f
remove more rules [run_process_replay] ( #6326 )
...
* remove more rules [run_process_replay]
* disable invalid test
* ptx needs that str
2024-08-29 16:27:10 -07:00
qazal
539654fbe1
graph_rewrite complexity tests [run_process_replay] ( #6317 )
2024-08-29 22:39:08 +03:00
qazal
07942ef361
Proposal: Better UOps.SWIZZLE ( #6309 )
...
* better UOps.SWIZZLE
* test_swizzle_rewrite
* add it to docs
* show a diff
* a lil more verbose
* two teeny notes
* hotfix: sink
2024-08-29 15:39:48 +03:00
qazal
dd4e5f1c8d
process replay rewrite ( #6284 )
...
* process replay rewrite
p2
* start some unittests + exceptions and exits
* shebang
* remove extra kernel init
2024-08-29 15:08:27 +03:00
pedro
7de4eac8f7
add support and tests for nearest modes in interpolate, adapt uint8 bilinear to torch implementation ( #6308 )
...
* add `nearest` mode to interpolate
matching pytorch `nearest` which is knowingly buggy
+ relevant TestsOps
* add `nearest-exact` mode to interpolate
matching pytorch `nearest-exact`
+ relevant TestOps
* fix uint8 bilinear interpolation
by matching custom torch implementation
* implement uint8 lerp with torch interpolation trick
without converting it to float
2024-08-28 21:59:51 -07:00
qazal
ec34d9ee36
start benchmarking ast graph rewrite ( #6297 )
...
* ast_rewrite to ctx var
* add external_benchmark_ast
* refactor to asts
* track lazybuffers
* more work
* record checkpoint
* cleanup
2024-08-27 18:18:44 +03:00
Max-We
ab2714423b
Add einsum tests ( #6286 )
...
Co-authored-by: Maximilian Weichart <maximilian.weichart@icloud.com>
2024-08-26 09:09:25 -07:00
chenyu
b76f0c875e
lazy const fold idiv 1 ( #6285 )
2024-08-26 10:29:59 -04:00
chenyu
af7c04ff57
Tensor.__floordiv__ ( #6283 )
...
support Tensor.__floordiv__ and friends
2024-08-26 09:43:40 -04:00
qazal
d2f8eeed2e
make [compare_schedule] the default [run_process_replay] ( #6273 )
...
* make [compare_schedule] the default
* capture ctx
* logging
* set capture to false
2024-08-26 21:40:03 +08:00
CaltropHungerton
002f60b4c3
fix intel wmma flop counting, add flop counting tests for different tensor cores ( #6192 )
...
* fix wmma flop counting on intel, add count tests
* half
* add half gemm
* Update test.yml
* one test
* Update test_uops_stats.py
* Update test_uops_stats.py
* Update test_uops_stats.py
* smaller matrix, use unittest skipUnless decorator
2024-08-25 18:37:05 -07:00
qazal
f0cc8ca5f2
generic st_fixup in scheduler graph rewrite [compare_schedule] ( #6278 )
2024-08-25 11:02:17 +03:00
gswangg
3cf507ae7f
remove extra.ops and LazyOp support from Kernel ( #6267 )
...
* remove extra.ops and BufferOps
* remove extra.ops and LazyOp support in Kernel
2024-08-24 16:44:38 +03:00
qazal
ccb05d8baa
fixup neg tests [run_process_replay] ( #6268 )
2024-08-24 16:35:43 +03:00
gswangg
ea76b93814
migrate test_linearizer_dumb.py to UOp AST ( #6241 )
...
* add imports and update test_unmerged_ifs to UOp AST
* test_max_simplify_and_cancel
* test_expander_new_srcs
* test_llama_embedding
* test_unaligns_idxs
* test_unrolled_float4_align
* test_upcasted_stores_out_of_order
* remove LazyOp
* remove extra/ops and replace ReduceOps.SUM with BinaryOps.ADD
2024-08-24 16:27:29 +03:00
gswangg
e44653e25a
migrate test_linearizer_failures.py to UOp AST ( #6240 )
...
* add imports and update test_failure_1 to UOp AST
* update test_failure_2 with UOp AST
* update test_failure_3
* test_failure_5
* test_failure_6
* test_failure_7
* test_failure_8
* test_failure_9
* test_failure_10
* test_failure_11
* test_failure_12
* test_failure_12_multireduce
* uncomment skip and migrate test_failure_13
* test_failure_14
* test_failure_15
* test_failure_16
* test_failure_17
* test_failure_18
* test_failure_19
* test_failure_20
* test_failure_21
* test_failure_22
* test_failure_23
* test_failure_24
* test_failure_25
* test_failure_26
* test_failure_27
* test_failure_28
* test_failure_29
* test_failure_30
* test_failure_31
* test_failure_32
* test_failure_33
* test_failure_34
* test_failure_36
* test_failure_37
* test_failure_38
* test_update_39
* test_failure_40
* test_failure_41
* test_failure_42
* test_failure_43
* test_failure_44
* test_failure_45
* test_failure_46
* test_failure_47
* test_failure_48
* test_failure_49
* test_failure_50
* remove LazyOp
* reskip test_failure_22
* remove extra/ops
* replace ReduceOps with BinaryOps
* fixup that import
---------
Co-authored-by: qazal <qazal.software@gmail.com>
2024-08-24 16:26:58 +03:00
gswangg
1dc6040877
migrate test_search.py to UOp AST ( #6245 )
...
* add imports and update test_kernel_count with UOp AST
* test_filter_global_buffer
* remove LazyOp
* remove extra.ops and ReduceOps
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-08-24 16:13:53 +03:00
qazal
ae23540d6e
refresh process replay schedule ref in reset.py ( #6265 )
2024-08-24 16:12:51 +03:00
gswangg
7be5eede71
migrate test_linearizer_overflows.py to UOp AST ( #6244 )
...
* add imports, remove ConstBuffer, and update test_overflow_1 with UOp AST
* test_overflow_2
* test_overflow_3
* test_overflow_4
* test_overflow_5
* test_overflow_6
* test_overflow_7
* TestLinearizerOverflowAlt::test_overflow_1
* TestLinearizerOverflowAlt::test_overflow_2
* remove LazyOp
* remove extra.ops
* remove ReduceOps
2024-08-24 16:10:29 +03:00
chenyu
943ab97d24
fix Tensor.prod for multitensor ( #6264 )
2024-08-24 08:52:24 -04:00
qazal
bcb2f1caa3
init REDUCE_AXIS with BinaryOps ( #6256 )
...
* REDUCE_AXIS arg with BinaryOps
* more work in kernel.py
fixup sops.gz
* fix TestGraphRewriteEfficiency
2024-08-24 11:28:41 +03:00
chenyu
da5cf11859
fix acc init value for MUL ( #6263 )
2024-08-23 23:19:44 -04:00
George Hotz
26498b322e
add BEAM to external_benchmark_schedule.py
2024-08-23 18:10:46 -07:00
George Hotz
53a73038e3
hotfix: TestGraphRewriteEfficiency.test_create_many_uops
2024-08-23 15:51:57 -07:00
chenyu
590c0922b6
Tensor.prod ( #6250 )
...
* Tensor.prod
a new reduce op!
* onnx ReduceProd
2024-08-23 10:06:32 -04:00
qazal
78d6bd8b41
start graph rewrite in the scheduler ( #6248 )
...
* start graph rewrite in the scheduler
* test: enable it
* test timings
* only fails in multi reduce
* more isolated tests
2024-08-23 13:15:55 +03:00
George Hotz
238896ca02
loooking into graph rewrite speed ( #6239 )
...
* loooking into graph rewrite speed
* track, replace is slow
* if all same, no permutations [run_process_replay]
* types so compile works
* no implied comprehension
* TRACK_MATCH_STATS=2
2024-08-22 13:17:55 -07:00
chenyu
e745e16441
remove UnaryOps.NEG ( #6238 )
...
* Remove UnaryOps.NEG
generated new dataset with
```
time JIT=2 PYTHONPATH=. ./extra/optimization/generate_dataset.sh
gzip /tmp/sops
mv /tmp/sops.gz extra/datasets/
```
* fix that
2024-08-22 14:21:39 -04:00
nimlgen
6c4ddd6260
hcq skip tests when no multidev ( #6235 )
...
* hcq skip tests when no multidev
* linter
* a bit higher tinout
2024-08-22 18:27:16 +03:00
chenyu
08539f08b0
fix UOp repr with Variable in arg ( #6236 )
2024-08-22 11:06:33 -04:00
chenyu
3fc8203475
remove NEG from handwritten ast in tests ( #6234 )
...
* remove NEG from handwritten ast in tests
* test_linearizer_failures
2024-08-22 09:06:59 -04:00
chenyu
1c5ef5b793
format test_linearizer_failure ( #6231 )
...
made it easier to remove NEG
2024-08-21 21:10:56 -04:00
nimlgen
78c94abe9c
raise time limit for ci in test_profile_multidev_transfer ( #6227 )
2024-08-21 22:42:03 +03:00
gswangg
c74b318458
migrate test_linearizer.py to UOp AST, pt. 2 ( #6228 )
2024-08-21 22:16:11 +03:00
George Hotz
c3168952f0
wip: tracking pattern matcher [run_process_replay] ( #6225 )
...
* wip: tracking pattern matcher
* better
* proper dedup
* timing
* early reject
* mergable match stats
* TrackedPattenMatcher
* fix TrackedPattenMatcher
* cleanups
* clean that too
* remove early_reject
* Revert "remove early_reject"
This reverts commit dc2aef14b8f5da58f5ec9566daf252513cac394c.
* total
* sort by time
* match_stats cleanup
2024-08-21 11:57:26 -07:00
chenyu
a666450e4d
UOp pattern x + x -> x * 2 ( #6224 )
...
* UOp pattern x + x -> x * 2
now there's no NEG, with this it covers all kinds of a*x+b*x
* can remove x-x
2024-08-21 12:06:19 -04:00
chenyu
c9a9631818
no UnaryOps.NEG in generated UOp patterns ( #6209 )
...
* no UnaryOps.NEG in generated UOp patterns
removed pattern `x * (-1) -> -x` and `x != True`
* those are fine because NEG became CMPNE and True
* fix sd validation L2 norm
2024-08-21 11:08:22 -04:00
qazal
3b8cc5a3e0
more multireduce tests prep for neg removal [run_process_replay] ( #6220 )
2024-08-21 12:45:24 +03:00
qazal
f03e5a4b3b
test_multireduce const has a shape ( #6218 )
2024-08-21 11:02:45 +03:00
George Hotz
2c42e9c2c6
faster rewrite, no folder in expand/reduce [run_process_replay] ( #6216 )
...
* faster rewrite, no folder in expand/reduce [run_process_replay]
* is removing the expander there okay
* parens
* don't reconstruct exact match uop
* fast do_reduce
* expand pyint
* most of the parents gains with less lines
2024-08-20 23:36:58 -07:00
George Hotz
16f420f7a7
split full_graph_rewrite and linearize_uop [run_process_replay] ( #6215 )
...
* split full_graph_rewrite and linearize_uop
* fix tests
* graph rewrite in test uops
* add types
2024-08-20 20:12:33 -07:00
George Hotz
9faf205601
CIFAR trainer + various bugfixes / improvements ( #6146 )
...
* move cifar into datasets
* support for pathlib Tensors, tar_extract, and fetch gunzip
* too early for Device.DEFAULT
* simpler hlb_cifar + .to(None) is default
* new compiler failure, start beautiful_cifar
* beautiful cifar runs but is broken
* jit train step
* cleaner
* std_mean, not mean_std
* more correct
* fast indexing
* don't print that
* torch load broken
* add eval
* nicer bar
* decoraters are the way to do this
* bounds check the load
* a few ops
* batchnorm bugfix, if track_running_stats is False, use online estimate
* full timing
* fix fusion
* unneeded realize
* master tensor
2024-08-20 16:58:46 -07:00
madt2709
4bb98d8882
Fix track_running_stats in batchnorm ( #6200 )
...
* Fix track_running_stats in batchnorm
* Fix linter
* Update test_fold_conv_batchnorm_notrain to keep allowed at 1
* Add test_fold_conv_batchnorm_notrain_no_running_stats
* Save 1 line
2024-08-20 14:01:22 -07:00
George Hotz
a5d79688db
fix indexing out of bounds ( #6208 )
...
* fix indeing out of bounds
* 5 ops per access is fine
2024-08-20 11:34:56 -07:00
chenyu
4451bcaf95
update test_arange test_llama_embedding_opt ( #6207 )
...
non CI uses larger embedding, still same orders of magnitude
2024-08-20 13:58:43 -04:00
qazal
074cf780dd
add option to only benchmark schedule [run_process_replay] ( #6204 )
2024-08-20 16:51:27 +03:00
gswangg
0e6f057eae
migrate test_linearizer.py to UOP AST (pt. 1) ( #6150 )
...
* migrate test_multioutput to UOP AST
* inline buf declarations
* migrate test_multireduce to UOp AST
* update test_mid_dim_multireduce to UOp AST
* update test_triple_multireduce with UOp AST
* make global definitions more concise
* update test_double_reduce_multireduce with UOp AST
* update test_multireduce_with_parallel with UOp AST
* update test_multiout_multireduce to UOp AST
* make gidx style consistent across updated tests
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-08-20 10:02:20 +03:00
chenyu
10330a41c7
add CMPNE tests in test_uops ( #6196 )
...
fixed the output_dtype for CMPNE and match the tests for CMPLT
2024-08-19 19:41:21 -04:00
chenyu
21d6739237
remove UnaryOps.NEG from lazy.py ( #6193 )
...
* remove UnaryOps.NEG from lazy.py
* neg is no longer unary
2024-08-19 18:41:28 -04:00
Gabe Caldwell
bdd6325f31
default num_classes value for one_hot ( #6182 )
...
* num_classes=-1
If num_classes set to -1, the number of classes will be inferred as one greater than the largest class value in the input tensor.
* num_classes desc
comment to explain num_classes default and what that means.
* replacing ' with `
2024-08-19 12:07:14 -07:00
Alessandro Benetti
9328248610
support for std_mean and cross_entropy ( #6181 )
...
* support for std_mean and cross_entropy (#3 )
* Cross entropy and std mean support
* remove extra examples
2024-08-19 12:06:44 -07:00
Max-We
53b20afa3f
Write tar_extract ( #6180 )
...
* Add tar_extract
* Add tar_extract tests
* Fix dtype for initialization from path
* Tests for path initialization
* rm print
---------
Co-authored-by: Maximilian Weichart <maximilian.weichart@icloud.com>
2024-08-19 12:06:17 -07:00
Eitan Turok
8556d0c642
Support `gunzip` in `fetch` ( #6176 )
...
* init
* update
* clean
* add type
* clean
* fix import order
* shorten variable names
2024-08-19 12:04:40 -07:00
samm393
5d742f7fe3
Missing features from rearrange ( #6184 )
...
* fixes and tests
* typo in test
2024-08-19 11:19:07 -07:00
qazal
2242ff84be
type verify intermediate UOps [run_process_replay] ( #6140 )
...
* type verify intermediate UOps [run_process_replay]
* merge asserts
* variable const
2024-08-19 20:59:01 +03:00
qazal
478145cb8e
lowering error in diff_schedule is fine [run_process_replay] ( #6185 )
2024-08-19 20:51:12 +03:00
chenyu
00578a021b
re:6125 switch real_size to use uops [run_process_replay] ( #6138 )
...
* switch real_size to use uops [run_process_replay]
* enough to pass
---------
Co-authored-by: George Hotz <geohot@gmail.com>
2024-08-19 13:20:24 -04:00
qazal
e28d29641f
more scheduler process replay tooling [run_process_replay] ( #6178 )
2024-08-19 15:35:51 +03:00
chenyu
b36a7273c6
RUF018 assignment-in-assert [run_process_replay] ( #6172 )
...
assertion should not have side effect or `-O` breaks.
initially just wanted to fix the one in rearrange, but it also made some long lines less long
2024-08-19 00:34:52 -04:00
chenyu
9c60a27ece
lower float64 sin fuzzer threshold ( #6173 )
...
139216373.71875 failed
https://github.com/tinygrad/tinygrad/actions/runs/10446960642/job/28925156240
2024-08-19 00:25:42 -04:00
samm393
fd7c84c1c8
Rearrange ( #6106 )
...
* rearrange and tests
* tidy
* whitespace
* remove line
* -5 lines
* test fix
* static -> instance
* fix () & add more tests
* remove flags
* -1 line
* match einops
* whitespace
* repeated names
2024-08-18 20:22:28 -07:00
chenyu
2de174677a
threefry touchup [run_process_replay] ( #6169 )
...
also why is test_gc testing _rng_counter is allocated??
2024-08-18 23:01:24 -04:00
David González Martínez
724e408736
add support for retain_graph in backward ( #6145 )
...
* add support for retain_graph in backward
* fix: dont accumulate grad on non-leaf tensors
* fix order
* fix: do not delete grad on leafs
* fix linter
* fix: can't exactly match torch behaviour internally
* allow numerical room for test
* refactor
2024-08-18 16:08:31 -07:00
wozeparrot
0c5189de25
threefry half ( #6154 )
2024-08-18 15:23:12 -07:00
Timmy
e3d14d1ccc
Lowerer Multireduce Grouping ( #6097 )
...
* grouping changes to codegen
* linters + tests
* fix identical store issue on PTX
* comment in grouping multireduce tests
* cleaning up diff
* cleaning up diff
* comments
* linters
* hotfix: dont change kernels
---------
Co-authored-by: qazal <qazal.software@gmail.com>
2024-08-18 19:57:51 +03:00
qazal
1ba83cc7fa
split test_sgd_4convs_fuse [run_process_replay] ( #6158 )
2024-08-18 18:35:42 +03:00
qazal
be6dda4093
hotfix: more lazyop rename to uop [run_process_replay] ( #6157 )
2024-08-18 17:28:44 +03:00
George Hotz
17a043edad
tensor inference ( #6156 )
...
* tensor inference
* test is even better name
2024-08-18 00:19:28 -07:00
chenyu
f7950fc2b6
add E275 missing-whitespace-after-keyword linting rule ( #6149 )
...
requires space after keywords like `assert`, `not`, `return`, `else`
2024-08-17 16:44:34 -04:00
George Hotz
88edc2902d
axis_is_masked with graph_rewrite [run_process_replay] ( #6144 )
2024-08-17 10:28:49 -07:00
qazal
5a266d5d0c
type verify ImageDType and PtrDType [run_process_replay] ( #6137 )
...
* type verify ImageDType and PtrDType [run_process_replay]
* fix tests
2024-08-17 16:37:07 +03:00
qazal
d1d41130cd
use membufs in ImageDType checks [run_process_replay] ( #6136 )
...
* use membufs in ImageDType checks
* set by key [run_process_replay]
2024-08-17 16:17:46 +03:00
qazal
d9ce664350
add test_verify_ast [run_process_replay] ( #6134 )
2024-08-17 14:14:30 +03:00
George Hotz
3a2d724cb2
extra matcher from renderer [run_process_replay] ( #6130 )
...
* extra matcher from renderer
* cache_pm [run_process_replay]
2024-08-16 23:53:11 -07:00
George Hotz
5048066e79
st_arg, never -1 [run_process_replay] ( #6128 )
2024-08-16 22:46:56 -07:00
George Hotz
d9cb45af09
only axis is masked [run_process_replay] ( #6123 )
2024-08-16 21:01:17 -07:00
George Hotz
94aa5f11b5
Revert "use vmax for real_size [run_process_replay] ( #6120 )" ( #6122 )
...
This reverts commit a6e3211444
.
2024-08-16 20:33:19 -07:00
George Hotz
a6e3211444
use vmax for real_size [run_process_replay] ( #6120 )
...
* use vmax for real_size [run_process_replay]
* axis is masked
2024-08-16 20:17:23 -07:00
George Hotz
912f01ed4b
UOpGraph -> linearize_uop [run_process_replay] ( #6119 )
2024-08-16 19:48:39 -07:00
George Hotz
89c7989659
no shapetracker in ops [run_process_replay] ( #6117 )
2024-08-16 17:23:27 -07:00
George Hotz
74ee9febec
remove iter from uopgraph ( #6110 )
...
* remove iter from uopgraph
* linearize returns uops
* fix tests
* linearize in linearize
* tests fix
* touchup
* test failures
2024-08-16 15:58:29 -07:00
qazal
28c75bf2a6
merge uops with ops ( #6111 )
...
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-08-16 18:17:57 -04:00
qazal
d5e3217076
hotfix: scheduler differ ( #6115 )
...
* hotfix: scheduler differ
* add the test back
* track keys
2024-08-16 23:34:49 +03:00
qazal
c23d44c779
AST is UOp ( #6030 )
...
* most of the work from the uops2 branch
* schedule
* realize
* kernel
* lowerer
* search
* green
* merge uops with ops
* Revert "merge uops with ops"
This reverts commit 1408a59f12c97e3466679884266b247cf9df46bc.
* fix benchmark
* remove extra dedup
2024-08-16 22:09:00 +03:00
CaltropHungerton
38fb1e14a2
Intel XMX Tensor Core Support ( #5622 )
...
* fixed xmx demo
* i think i'm invoking the DPAS but it's slow
* compiler build arg to stop register spilling, indicated where to fix flop counter
* don't mind this
* do NOT mind me
* do not mind me
* do not view
* i will add bf16 later
* in process of figuring out tc fields
* we figured out the fields!!!
* added check for cl device vendor, added seperate IntelRenderer
* remove tc thread_local_aliases
* cleaning debris before draft pr
* edits for linter
* deduping and checking device extensions
* i will find more line reductions in other places
* before merge upstream
* double grf size in compiler to fix register spilling (bandaid), device checking changes
* tc python emulation
* fixed emulation
* tests for emulated intel tensor core
* TC=0, 1 working on upstream, fixed perf
* test
* debris
* check for specialized cl device when we canonicalize device
* bf16 support, tc=3 test added
* address tests
* revert half2 loads on intel tc, cleanup
* linter
* fold_expanded revert
* lint, whitespace fix
* cuda bf16 (only one with bf16) is skipped in test tensor cores, so i will skip for intel bf16 too
* make line shorter, no need for noqa E501
* removed device intel
* fix python emulation
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-08-16 09:19:21 -07:00
George Hotz
553ae9ebc0
bilinear interp uint8 fails ( #6103 )
...
* new test for e2e compile failures
* fix bug
* bilinear interp uint8 fails
* better tests
2024-08-15 19:34:39 -07:00
George Hotz
c850e03758
new test for e2e compile failures ( #6101 )
...
* new test for e2e compile failures
* fix bug
2024-08-15 18:56:22 -07:00
chenyu
9ef82e1f2b
UOp pattern DEFINE_VAR with min==max is also CONST ( #6095 )
...
* UOp pattern DEFINE_VAR with min==max is also CONST
* fix tests
2024-08-15 12:09:44 -04:00
qazal
4d38fec8c1
rename lazyops to parents [run_process_replay] ( #6091 )
2024-08-15 17:27:32 +03:00
chenyu
5accfe26a0
rewrite bool ADD to OR and MUL to AND ( #6084 )
...
* rewrite bool ADD to OR and MUL to AND
fixed running `tinyphysics.onnx`, which contains a getitem from a boolean tensor.
only can repro through BEAM_COMPARE, which i think is a different bug in test_linearizer_failure
* fold those, and fix tests
* only for bool
* move dtypes.bool
2024-08-15 10:11:57 -04:00
chenyu
df03dca6e3
move % inside UOp mod_folding and remove deprecated tests ( #6085 )
...
[run_process_replay]
2024-08-14 23:25:10 -04:00
qazal
2bf7b56485
minor test fixups from the AST is UOp diff ( #6081 )
...
* add assert_equiv_uops cache
* dont expect lowering and schedule errors
2024-08-14 23:58:04 +03:00
George Hotz
64563abc90
add LSTMCell to nn ( #6080 )
...
* add LSTMCell to nn
* lstmcell works with no input on first
* fix no bias 0
* simpler
2024-08-14 12:08:42 -07:00
chenyu
6b3112d525
fix qcom process_replay for kernel diff ( #6079 )
...
* debug why qcom process_replay does not run
skipping the wrong exception?
* um-hum
* get_step_times was parsed incorrectly
* cleanup
2024-08-14 15:05:49 -04:00
chenyu
2fe9d62451
increase test_recursive_add time from 1s to 2s ( #6078 )
...
flaky https://github.com/chenyuxyz/tinygrad/actions/runs/10392144818/job/28776666700
2024-08-14 13:52:02 -04:00
samm393
2dc586ffe5
Shape change bitcast for more dtypes ( #6047 )
...
* bitcast & tests
* use to_dtype
* put disk tensor tests back
* tests
* bitmask
* no bitmask
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-08-14 10:03:34 -07:00
qazal
83a2543c74
spec for in order LOAD/STORE indexing ( #6073 )
...
* test_unaligns_idxs
* spec for in order LOAD/STORE indexing
* test UOps.SPECIAL
* check for supports_float4
2024-08-14 19:18:00 +03:00
chenyu
5048f9a4d5
test linearizer failure 49 ( #6074 )
...
with UOP_IS_SYMBOLIC=1, on METAL it breaks store fusion and have A+B and B+A being two different UOp
2024-08-14 11:29:10 -04:00
qazal
30035df5a4
add metal process replay back ( #6068 )
...
test this new one
2024-08-14 12:29:56 +03:00
chenyu
1782e4f64d
use div folding to do lt folding ( #6065 )
2024-08-13 16:59:05 -04:00
chenyu
e3af273fa1
touchup cl_errors ( #6058 )
...
* touchup cl_errors
* update test
2024-08-13 13:06:59 -04:00
qazal
9145ad52ff
revert UOps eq, this needs to be isolated in realize.py ( #6063 )
...
This reverts commit dccca7f227
.
2024-08-13 18:02:34 +03:00
Tobias Fischer
6e3eb50fd1
added fix and reg tests ( #6060 )
2024-08-12 21:00:48 -04:00
qazal
dccca7f227
test: uop and lazyop have the same compare ( #6053 )
...
* test: uop and lazyop have the same compare
* typings
* self.assert_equiv_uops -> assertEqual
* hash dtype
* test nop too
* TestPatternMatcher never used this compare anyway
* nop eq and ne tests
2024-08-13 00:33:19 +03:00
chenyu
3f2d24a6ec
test_failure_48 for wrong truncation in idx on NV ( #6055 )
...
also added `RAWAST` to print pre-modified AST in DEBUG=3
2024-08-12 16:17:42 -04:00
chenyu
6ed9711898
UOps pattern (x%c)+(x//c)*c = x ( #6051 )
...
pretty cool that this is very easy to write now
2024-08-12 14:58:48 -04:00
ignaciosica
777d6b3349
Fix compile error for max with inline const ( #5840 )
2024-08-12 23:40:39 +08:00
ignaciosica
164ca5632e
split tensor core tests ( #6041 )
2024-08-12 09:42:02 -04:00
chenyu
7ce716b3a0
bigint -> pyint [run_process_replay] ( #6040 )
...
it's a python int. priority should be higher than bool, but we are not using it in type promo now.
2024-08-12 09:12:23 -04:00
Timmy
a00994b423
Lowerer Multireduce Uopgraph ( #6007 )
...
* uopgraph changes
* fixing for non-reducing ranges
* multireduce tests
* linters
* linters
* removing comments
* removing arg[1]
* linters
* prettier
* linters
* more linters
* use any instead of intersection
2024-08-12 15:16:07 +03:00
qazal
7d1f118731
use assertIs in test_schedule ( #6035 )
...
* use self.assertIs in test_schedule
* test_lazybuffer
2024-08-11 19:19:18 +03:00
qazal
b918e3c255
cache assert_equiv_uops ( #6033 )
2024-08-11 12:17:05 +03:00
George Hotz
1b3443902c
don't use tgmath with clang ( #6029 )
...
* don't use tgmath with clang
* fix tests
* nostdlib for clang
* needs ffreestanding on OSX
2024-08-10 13:58:19 -07:00
chenyu
5820940d98
more relax rtol for test_arange_fuse_grouped_children ( #6027 )
...
one more https://github.com/chenyuxyz/tinygrad/actions/runs/10334072657/job/28607120462
2024-08-10 16:10:03 -04:00
chenyu
10374a2741
relax rtol for test_arange_fuse_grouped_children ( #6026 )
...
flaky https://github.com/tinygrad/tinygrad/actions/runs/10333939631/job/28606831006?pr=6023
2024-08-10 15:49:11 -04:00
George Hotz
cf7d3c1eb8
fix tests locally on metal ( #6025 )
...
* remove contiguous child, it was breaking tests locally
* hmm, it's still needed
* include NOOPT in method cache key
2024-08-10 12:36:22 -07:00
chenyu
e6c7c3e499
update pylint path to check indent/space for all ( #6022 )
...
also fixed many errors. it was not checking nested dirs. exclude autogen for now.
can we use ruff for this?
2024-08-10 14:41:09 -04:00
George Hotz
cfb04c67d1
run unit tests separate from others (and only once) ( #6020 )
...
* run unit tests separate from others
* ignore unit tests elsewhere
2024-08-10 11:17:56 -07:00
uuuvn
ee3b015407
ELF loader strtab fix and tests ( #6011 )
...
* ELF loader strtab fix and tests
* ruff
* typos
* only one test
2024-08-10 10:13:16 -07:00
Jun Zhang
54e176fb4f
Ignore non-computational backends when overwriting the default ( #5770 )
2024-08-10 09:23:29 -07:00
qazal
3ef2788c4f
hotfix: run the entire test_conv_bw schedule ( #6014 )
2024-08-10 17:55:41 +03:00
qazal
0e62076cf5
more process replay cleanups ( #6013 )
...
* more process replay cleanups
* comma benchmark missing
2024-08-10 17:29:10 +03:00
chenyu
63a8bc29d4
addition divisor in UOp div_folding ( #6002 )
...
in addition to try gcd of all terms, also try least common divisor of all MULs
2024-08-09 20:09:05 -04:00
chenyu
5961faa4be
minor change to UOp div_fold ( #6004 )
...
remove an unnecessary gcd and swap the quo rem order, minimize diff for divisor pr
2024-08-09 17:09:59 -04:00
qazal
7373b05ee8
assert conv bw reduceops merge [compare_schedule] ( #6001 )
...
* assert conv bw reduceops merge [compare_schedule]
* diff with ref_commit_hash
2024-08-09 19:29:56 +03:00
qazal
b67d521a07
assert test_conv_bw correctness ( #6000 )
...
* assert test_conv_bw correctness
* reorder half
* metal and clang still red
2024-08-09 18:30:36 +03:00
qazal
a833f1a735
scheduler process replay with [compare_schedule] ( #5997 )
2024-08-09 16:58:22 +03:00
qazal
24c7c41ce0
diff LazyBuffer schedules in process replay ( #5996 )
...
* start diff printing
* this should be 2
* add to process_replay.py
* enable schedule capture
* arange diff is process replay
2024-08-09 14:16:43 +03:00
chenyu
1f1eb46af6
more failed simplified UOp div test case ( #5992 )
...
this speculative div was handled by "divisor" in symbolic.
2024-08-08 18:39:25 -04:00
chenyu
c3e1ae2535
add failed simplified UOp div test case ( #5990 )
...
more cases!
2024-08-08 17:37:48 -04:00
nimlgen
38d5eecc68
hcq profiler support args ( #5989 )
...
* hcq profiler support args
* bytes -> _bytes
* fix
* add test
* mypy
* not f strings
* percison
2024-08-09 00:18:36 +03:00
qazal
45b1761175
smaller test_llama_embedding + assert correctness ( #5986 )
...
* smaller test_llama_embedding in CI
* test correctness
2024-08-08 22:11:29 +03:00
Timmy
8c99bdab08
More Multireduce Tests ( #5968 )
...
* multireduce tests
* linters
* more linters
* more linters
* seeing how it works with parallel
2024-08-08 22:04:08 +03:00
gswangg
df44a4e861
Make vectorization of CONST explicit ( #5322 )
...
* remove test_const_vectorize_fold
* remove const folding UPat for VECTORIZE
* refactor cstyle render_const
* remove calls to dtype.scalar() in render_const
* add assert
* add vectorized const to UOp.const
* add UPat GEP-VECTORIZE-CONST -> CONST
* render_vectorize for DEFINE_ACC in cstyle
* add back missing render_cast in render_const
* generate vectorized consts as UOps for DEFINE_ACC
* update asserts for DEFINE_ACC with VECTORIZE src
* add UPats for PHI with VECTORIZE src
* use prev rendered vectorize in DEFINE_ACC render
* update DEFINE_ACC in python runtime
* update vectorized DEFINE_ACC in PTXRenderer
* rebase DEFINE_ACC changes on lowerer
* verbose rewrite of bad UPats
* simplify UOps.CONST implementation in ops_python
* update sum_collapse UPats for DEFINE_ACC-VECTORIZE
* revert linearizer to TOT
* fix DEFINE_ACC implementation in ops_python
* simplify DEFINE_ACC in cstyle
* Fix linter error
* support VECTORIZE in fold gated load/store UPat
* support VECTORIZE in other fold gated load UPats
* rewrite VECTORIZE in UPat for no input DEFINE_ACC
* simplify DEFINE_ACC render in cstyle
* make VECTORIZE rules more concise
* add more vectorize fold tests
* inline VECTORIZE-CONSTs in cstyle render
* revert VECTORIZE/GEP rule refactor
* revert cstyle render_const refactor
* inline VECTORIZE-CONSTs in cstyle render
* implicitly vectorized const rendering -> explicit
* WMMA VECTORIZE CONST process replay hacks
* VECTORIZE CONST NAN process_replay hacks
* more VECTORIZE CONST NAN hacks
* cleanup process_replay hacks
* isnan() -> not isfinite() cstyle VECTORIZE CONST
* tweak isnan and isfinite checks VECTORIZE CONST
* tweak for positive vs negative infinity VECTORIZE CONST
* add assert to PTX CONST render
* process_replay VECTORIZE CONST render parity for PTX STORE
* vmin/vmax for VECTORIZE'd CONST
* update WMMA folding rules
* add tests for WMMA VECTORIZE fold
* hack for cstyle half4 CONST zero process_replay parity
* revert PTX backend changes
* add back minimal DEFINE_ACC PTX change
* remove cstyle process_replay hacks
* remove dead code in PTX CONST render
* cleanup vmin/vmax logic for VECTORIZE'd CONSTs
* update vectorize fold tests to use DEFINE_VAR
* fix long line formatting in test
* remove unwanted merge artifact
* more vmin/vmax cleanup
* remove unnecessary asserts
* yet more vmin/vmax cleanup
* get rid of explicit VECTORIZE CONST logic in _min_max
* reuse CONST instead of creating a new one
* remove unneeded cast
* handle DType correctly in sconst
* improve readability of tests
* save a line
* save another line
* tuplize pats in src
* remove GEP-VECTORIZE pats
* add vec +0 fold
* HACK: fold only vec8 +0
* remove vectorized ALU fold hack
---------
Co-authored-by: qazal <qazal.software@gmail.com>
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-08-08 20:59:05 +03:00
chenyu
62c77a2831
trim const in UOp div_folding ( #5982 )
...
simplify `(4*x+4*y+7)//16` to `(x+y+1)//4`.
fixed `GPU=1 UOP_IS_SYMBOLIC=1 IMAGE=2 python -m pytest test/test_ops.py -k conv`
2024-08-08 12:49:05 -04:00
qazal
e6d41b0ce7
hotfix: adjust test_backward_pass_diamond_model thresholds ( #5981 )
2024-08-09 00:20:53 +08:00
nimlgen
183c4c91a3
fix non-jitted transfers in profile ( #5980 )
...
* fix transfers in profile
* fix linter
* sync to be sure everythin is recorded
2024-08-08 17:58:08 +03:00
George Hotz
c5baa3d66b
hotfix: don't run OOM test in CI
2024-08-07 22:19:29 -07:00
chenyu
859d0e4709
UOp simplify `(x+c0)*c1 -> x*c1+c0*c1` ( #5973 )
2024-08-07 21:25:22 -04:00
wozeparrot
97d708252a
remove realize from threefry ( #5969 )
2024-08-07 15:08:49 -07:00
George Hotz
bf8ec23b00
hotfix: contiguous on precompute_freqs_cis
2024-08-07 14:40:56 -07:00
nimlgen
8d8704af2d
fix amd exec_update for locals ( #5966 )
2024-08-07 21:02:56 +03:00
tyoc213
0c4e9dbe71
retrieve defined opencl error codes ( #5792 )
2024-08-07 10:46:24 -07:00
qazal
d6f4a61c42
graph LBScheduleItem [run_process_replay] ( #5960 )
...
* add toposort key to LBScheduleItem
* use dedup
* graph LBScheduleItem
* make that comment beautiful again
* diff_schedule utils
* update fuzz_schedule
2024-08-07 19:59:11 +03:00
qazal
7677361d90
test pushing through different expands in 1 kernel ( #5963 )
...
* test pushing through different expands in 1 kernel
* realize eye
* back to test_example_matmul
2024-08-07 19:33:18 +03:00
qazal
39dda3d042
rename prescheduled items to lsi [run_process_replay] ( #5959 )
...
* rename to lsi
* fuzz_schedule more typings
* rename fuzz_schedule
2024-08-07 14:31:50 +03:00
qazal
728b7e189e
diff_schedule tests [run_process_replay] ( #5958 )
...
* diff_schedule tests [run_process_replay]
* ok to run serial
2024-08-07 13:50:27 +03:00
chenyu
a7163b80d8
lower test_transcendental fuzz test threshold for sin float64 ( #5956 )
2024-08-07 02:04:37 -04:00
chenyu
fa3a36e576
fancier UOp div gcd folding ( #5953 )
...
combine and cancel the remaining const based on gcd of other terms like SumNode.
2024-08-07 02:04:25 -04:00
chenyu
aa7fd7ef74
Use `(-self).lt(-x+1)` for `UOp.ge` ( #5955 )
...
matched symbolic and fixed UOP_IS_SYMBOLIC=1 arange folding
2024-08-07 01:31:27 -04:00
George Hotz
658d58784b
embedding doesn't cast ( #5952 )
...
* embedding doesn't cast
* test the right thing
* too much annoying with that test
2024-08-06 17:49:14 -07:00
wozeparrot
30d0cb2a82
fix: fix transcendental flakyness on exp float with 9.96875 ( #5951 )
2024-08-06 17:32:13 -07:00
George Hotz
3a0515ea22
hotfix: process_replay/diff_schedule.py to LBScheduleItem
2024-08-06 17:01:05 -07:00
chenyu
aee737bd9e
divide by gcd in UOp div folding ( #5949 )
...
* divide by gcd in UOp div folding
`(6x+6y)//16 -> (3x+3y)//8` etc
simpler version
* only factor out const
* don't apply for unsigned
* don't need that if
* space
2024-08-06 20:00:57 -04:00
George Hotz
6d1fdcfce2
don't reduce the same thing in a vector ( #5950 )
...
* don't reduce the same thing over and over
* cleaner way to write it that doesn't loop
2024-08-06 16:59:15 -07:00
qazal
d5d7f4e7b8
more TestIndexing correctness asserts [run_process_replay] ( #5948 )
...
* use torch in test_mnist_val
* more asserts
2024-08-07 01:50:42 +03:00
chenyu
794796256c
UOp.const_factor [run_process_replay] ( #5945 )
...
* UOp.const_factor [run_process_replay]
simplify mod and div folding
* test does not work now
2024-08-06 18:18:29 -04:00
George Hotz
73d4d51845
add LBScheduleItem type [run_process_replay] ( #5944 )
...
* add LBScheduleItem type [run_process_replay]
* minor cleanups
* fix
* fix fuzz tests
* add group cache type
2024-08-06 14:49:40 -07:00
qazal
7b6496f2e6
fix the reduceops cache breaking beautiful_mnist ( #5938 )
...
* fix the reduceops cache breaking beautiful_mnist
* test_sparse_categorical_crossentropy_simple
* starting tests
* atol from test_nn
* test_sparse_categorical_crossentropy_alt
* dont use torch
2024-08-07 00:02:54 +03:00
George Hotz
1417cc8df1
can reenable that test now ( #5914 )
2024-08-06 13:38:21 -07:00
chenyu
489575c3be
more UOp sum div with gcd tests ( #5936 )
...
* more UOp sum div with gcd tests
* one more
2024-08-06 12:50:10 -04:00
ignaciosica
81ae9fadc8
Float4 support for CLANG ( #5915 )
...
* float4 support on clang
* skip linearizer tests that require locals
* add aligned attribute
2024-08-06 07:50:12 -07:00
qazal
a7db4c3ee9
show timings for DIFF_ARANGE=1 ( #5935 )
...
* show timings for DIFF_ARANGE=1
* always with DEBUG=2
2024-08-06 17:20:38 +03:00
qazal
102a8c184b
diff fused arange schedules with ARANGE_DIFF=1 ( #5934 )
...
* diff fused arange schedules with ARANGE_DIFF=1
* better llama diff
2024-08-06 16:52:26 +03:00
qazal
3d4742dd2e
override output shape in fused assign ( #5930 )
...
* override output shape in fused assign
This makes
```
FUSE_ARANGE=1 JIT=0 python3 examples/llama.py --gen 1 --prompt "Hello." --count 10 --temperature 0 --timing
```
work. In general we should assert ASSIGN doesn't change shape.
* merge asserts
2024-08-06 13:28:50 +03:00
chenyu
09b7722637
UOp generic div folding ( #5896 )
2024-08-05 21:38:43 -04:00
George Hotz
3e1336957d
test arange with all opts ( #5923 )
...
* test arange with all opts
* Update test_arange.py
* Update test_arange.py
* Update test_arange.py
* Update test_arange.py
* Update test_arange.py
2024-08-05 18:38:25 -07:00
George Hotz
5d17f54e3c
fast mnist indexing ( #5921 )
...
* fast mnist indexing
* more tests
* remove those tests, new indexing rule
2024-08-05 13:55:15 -07:00
George Hotz
e81c18f494
make the arange test check correctness [run_process_replay] ( #5920 )
2024-08-05 13:41:06 -07:00
George Hotz
8d1c884e78
capture the const pattern in both directions ( #5919 )
...
* capture the const pattern in both directions
* add regression test
2024-08-05 12:15:38 -07:00
George Hotz
42f599870c
unroll arange is broken ( #5918 )
...
* unroll arange is broken
* fix unrolled arange
* one more test
2024-08-05 12:15:07 -07:00
qazal
70949ea7e6
test cstyle compile error for max with inline const ( #5838 )
...
* test_failure_46
* GPU=1 fails too
* add test_renderer
* add failing platforms
* nv too
* assert return value
2024-08-05 19:02:16 +03:00
qazal
e0c6520138
check arange fusing with VIEW and COPY ( #5912 )
...
* check arange fusing with VIEW and COPY
* gpu and clang
2024-08-05 17:09:21 +03:00
nimlgen
590b9ebb34
hcq copy queue is optional ( #5909 )
...
* hcq copy queue is optional
* one more
* this
2024-08-05 14:03:25 +03:00
George Hotz
159ac06b5b
remove unused reduce rules + improve unparented ( #5908 )
...
* remove unused reduce rules [run_process_replay]
* this work
* those tests are meaningless now
2024-08-04 18:18:27 -07:00
George Hotz
d7387d31bf
remove useless reduce cases [run_process_replay] ( #5907 )
...
* remove useless reduce cases [run_process_replay]
* do_reduce cleanup
* more cleanups + no longer supported tests
* Revert "more cleanups + no longer supported tests"
This reverts commit e9f2f6ba7061f8697a308aacdc3442fa922a77f5.
* no longer supported tests
* switch ReduceOps.SUM -> BinaryOps.ADD
2024-08-04 17:11:08 -07:00
George Hotz
be8958e26b
use CONTRACT before REDUCE ( #5903 )
...
* use CONTRACT before REDUCE [run_process_replay]
* support half expand
* EXPAND GEP
2024-08-04 16:17:33 -07:00
chenyu
4a65010de8
remove CUDACPU flag in tests [run_process_replay] ( #5902 )
...
no longer used
2024-08-04 16:06:38 -04:00
qazal
aad9234e52
test fused precompute_freqs_cis ( #5900 )
...
* test_precompute_freqs_cis
* tiny for ci
2024-08-04 21:01:05 +03:00
chenyu
c67e9887f7
support using str to specify dtype ( #5897 )
...
* support using str to specify dtype
in Tensor creation and args into `cast` and `bitcast`, and acc_dtype
* more tests
2024-08-04 12:56:28 -04:00
qazal
4c5ef2cc4f
setitem with arange fusion 1 ( #5898 )
2024-08-04 16:09:21 +03:00
chenyu
da61dea1b2
simple failed UOp sub symbolic test case ( #5894 )
2024-08-03 14:27:23 -04:00
qazal
56ef9e453e
pad reduceops to the max of each dimension ( #5889 )
...
* early verify
* pad reduceops to the max of each dim
* remove the function
2024-08-03 14:03:30 +03:00
qazal
65fa86901a
indexing fusion 2 ( #5888 )
...
* arange fusion
* kernels that fuse
* tests
2024-08-03 13:13:39 +03:00
qazal
af59b2eea9
tests from the indexing fusion branch ( #5886 )
2024-08-03 11:56:48 +03:00
chenyu
d5de44340e
UOp add mod folding ( #5862 )
...
* UOp add mod folding
* that passes now
2024-08-02 18:31:46 -04:00
chenyu
41bbd3f4c1
update UOp mod reduction patterns ( #5883 )
...
prepare generic mod folding, also some test changes from mod folding pr
2024-08-02 17:43:40 -04:00
wozeparrot
acadccf344
comma benchmark ( #5518 )
2024-08-02 14:36:54 -07:00
Elias Wahl
4a114756f6
New BERT dataloader ( #5881 )
...
* One file == One topic
* update test
* new dataloader
* update train script
* get index is faster
2024-08-02 15:12:23 -04:00
nimlgen
2777784b91
add dependency viewer to hcq profiler ( #5874 )
...
* hcq profiler support deps
* clean up
* cleaner
* cleanup
* revert this
* linter
* mypy
* add test
* sync is strange, need to take the end
* linter + test
2024-08-02 22:07:01 +03:00
George Hotz
23e8c39288
get program fields in __post_init__ [run_process_replay] ( #5878 )
...
* get program fields in __post_init__ [run_process_replay]
* remove print
2024-08-02 09:57:12 -07:00
qazal
8611fa6c99
apply opts.extra_matcher in process replay [run_process_replay] ( #5877 )
2024-08-02 18:07:58 +03:00
qazal
2a791f7924
fuzz uops is simpler with List[UOp] [run_process_replay] ( #5875 )
...
* remove from fuzz_uops
* update fuzz_uops.py
* add to realize.py
2024-08-02 17:28:15 +03:00
George Hotz
877e0b4ba0
define global only has the index [run_process_replay] ( #5869 )
...
* define global only has the index [run_process_replay]
* fix that linearizer test
* fix ptx
* stupid ptx fix
2024-08-01 19:01:15 -07:00
chenyu
f27f949a5d
Revert "revert some UOp IDIV bound ( #5863 )" ( #5871 )
...
This reverts commit 0c8d202348
.
2024-08-01 21:38:31 -04:00
chenyu
df138bc558
Revert "revert a mod pattern ( #5864 )" ( #5870 )
...
This reverts commit 5c8de2d044
.
2024-08-01 20:44:26 -04:00
chenyu
1b0314d9ef
Revert "remove one more UOp mod pattern ( #5865 )" ( #5868 )
...
This reverts commit b03b8e18c2
.
2024-08-01 20:28:35 -04:00
George Hotz
d73bc85ba9
UOpGraph not in renderer or Program [run_process_replay] ( #5867 )
...
* UOpGraph not in renderer or Program [run_process_replay]
* fix some tests
* fix ptx
2024-08-01 16:20:30 -07:00
chenyu
b392b8edc3
increase atol and rtol test_gemm_fp16 ( #5866 )
...
* increase atol and rtol test_gemm_fp16
made it pass with NOOPT which has larger accumulated error
* revert that
2024-08-01 19:09:58 -04:00
chenyu
b03b8e18c2
remove one more UOp mod pattern ( #5865 )
...
fixed UOP_IS_SYMBOLIC=1 test_failure_40
2024-08-01 18:29:04 -04:00
chenyu
5c8de2d044
revert a mod pattern ( #5864 )
...
fixed UOP_IS_SYMBOLIC=1 linearizer failure 47
2024-08-01 17:24:26 -04:00
George Hotz
2d3c7e4d4e
some TestPickleJIT tests ( #5860 )
...
* some TestPickleJIT tests
* hotfix: print which opencl device we are using
2024-08-01 12:39:59 -07:00
chenyu
0c8d202348
revert some UOp IDIV bound ( #5863 )
...
* revert some UOp IDIV bound
breaks conv with UOP_IS_SYMBOLIC, added some conv tests in CI
* those are correct
* skip slow ones
2024-08-01 15:09:06 -04:00
George Hotz
53fcac9e80
hotfix: increase time on flaky NV test
2024-08-01 10:20:07 -07:00
qazal
26d0265d66
test schedule of LazyBuffers [run_process_replay] ( #5859 )
2024-08-01 19:06:29 +03:00
David Hou
eb91423cb4
MLB support reshape for uneven shards ( #5804 )
...
* cleaner uneven reshape
* update test
2024-08-01 02:36:03 -07:00
David González Martínez
0f09b94c43
add failing test for second order derivatives ( #5772 )
...
* add failing test
* fix lint
* fix bad merge
* fix again
* fix test
* more minimal
2024-08-01 02:34:47 -07:00
George Hotz
9d05dfb6f4
move JIT graphing into CapturedJit ( #5852 )
...
* move JIT graphing into CapturedJit
* better
* _jit_cache
* clear inputs cleanup
* test_pickle_jit with graph + cleanup
* 0 is fine to start
* support None in bufs
* alloc real buffers
* cleaner
2024-07-31 20:48:17 -07:00
chenyu
0ec732b494
test lin fail 47 for UOP_IS_SYMBOLIC ( #5853 )
...
failed arange example with UOP_IS_SYMBOLIC
2024-07-31 23:09:22 -04:00
George Hotz
c6a8395f1b
CapturedJit is fun to pickle [run_process_replay] ( #5851 )
...
* CapturedJit is fun to pickle
* export input replace
2024-07-31 17:23:01 -07:00
George Hotz
72621d9e7c
count the specials in uops [run_process_replay] ( #5848 )
...
* count the specials in uops [run_process_replay]
* cleanups
2024-07-31 14:53:18 -07:00
chenyu
c2ffcf6887
remove the wrong mod UOp pattern ( #5847 )
...
don't think we are hitting it because the stride construction, and it's wrong and not needed
2024-07-31 16:24:25 -04:00
qazal
8174c438a3
pad test_failure_45 ( #5846 )
2024-07-31 23:08:48 +03:00
George Hotz
8672a9db3f
add test to validate lazyops dims ( #5845 )
2024-07-31 12:59:38 -07:00
chenyu
4fe5b95568
fix UOp ALU bound ( #5844 )
...
* fix UOp ALU bound
root cause of resnet bug, the ALU bound is only correct for scalar, not vectorized
* it can be nan...
2024-07-31 15:19:31 -04:00
nimlgen
f768935be8
add RING_ALLREDUCE_THRESHOLD ( #5835 )
...
* add RING_ALLREDUCE_THRESHOLD
* becnhmark
* fixes
* fix n_gpus
* unused import
* remove debug=2
2024-07-31 16:13:09 +03:00
chenyu
2e087ca8e4
UOp bound for div negative number ( #5808 )
2024-07-31 02:10:23 -04:00
qazal
bcbd925001
hcopts failing test for fused arange kernel ( #5815 )
...
* add failure_43
* n 45
2024-07-31 09:02:44 +03:00
qazal
ed556c260e
UOps.IF rules more tests ( #5831 )
...
* init tests
* split tests
* assert multiple gates simplicity
2024-07-31 00:11:02 -04:00
David Hou
492a696d14
allow specify splits in shard, handle multiple different splits in MLB.e ( #5599 )
...
* allow specify splits in shard, handle multiple different splits in MLB.e
* line width
* linter
* don't use Device in docstring
* specify size of shards instead of boundaries
* adjust docstring for specify size of shards instead of boundaries
* don't allow splits on symbolic axis?
* just allow sint in splits_to_bounds
* add message for assert
* bounds instead of splits to save lines
* fix types
* reduce diff
* fix
* tuple
* golf :(
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-07-30 19:33:04 -07:00
chenyu
c3da458bc3
UOp if min==max folds to CONST ( #5828 )
...
* UOp if min==max folds to CONST
* fix test
2024-07-30 22:14:22 -04:00
George Hotz
e6879035a0
work to make GEMV fast ( #5824 )
...
* work to make GEMV fast
* half8 cast
* align struct
* fix amd
* float8 is a later problem
2024-07-30 17:41:40 -07:00
chenyu
02f0be03f2
tests on UOp div negative number and arange opts ( #5825 )
2024-07-30 20:06:57 -04:00
George Hotz
693990a346
swap src[2] and src[3] in load [run_process_replay] ( #5821 )
...
* swap src[2] and src[3] in load [run_process_replay]
* cleanups + bugfix
* fix ptx
2024-07-30 14:04:13 -07:00
George Hotz
17a2f74412
new style load/store folder ( #5784 )
...
* remove old index reorder
* new style folder
* works better
* dedup
* one failure
* this is fine now...
* expander_rewrite
* images broken, but all else should work
* cleanups
* make tests work with old
* fix images
* cleanups + bugfix
* minor fixes
* fix gated store folding
* flip gate_creator and expander
* fix gated store
* remove unneeded rules
* lines getting close
* line count good
2024-07-30 13:17:20 -07:00
qazal
03d866b84f
UOps.IF with rewrite rules ( #5812 )
...
* expand merge
* merge barriers
* gate_folder
* test_linearizer_failures
* this can be here
* bring the new repr back
* gate_folder2
* gate_creator is better
* gate_folder
* dedup conditions
* early gate folding
* dedup barrier
* fold noop conditions
* all consts can go away
* free lines
2024-07-30 20:50:56 +03:00
chenyu
defd89e8e0
unify negative shape creation to raise ValueError ( #5817 )
...
[run_process_replay]
2024-07-30 13:42:59 -04:00
P4ssenger
6742a4789a
Add check for negative dimension in view ( #5790 )
...
* add check for negative dimension in view
* add negative dim tests
* move check to tensor level
* fix error message
* move check to view create
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-07-30 13:26:27 -04:00
Francis Lata
ce61be16f1
clean up how preprocessed folder is defined ( #5813 )
2024-07-30 12:35:26 -04:00
qazal
5e827e51d2
add llama3 BEAM=2 failures to test_linearizer_failures ( #5553 )
...
* skips
* opts.device
* benchmarks
* add to test_linearizer_failures
* remove hardcoded ones
* linter
* skip cpu
2024-07-30 00:37:32 +03:00
samm393
573e0f9a48
remove float division from idiv in python_alu ( #5777 )
...
* removes float division from idiv in python_alu
* add test
* cleaner logic
* pass clang unsigned literals correctly
* suffix ULL instead of U
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-07-29 12:14:12 -04:00
samm393
2c94316bd2
ull literal support and test ( #5789 )
...
* ull literal support and test
* missing .numpy()
2024-07-29 11:50:49 -04:00
nimlgen
ab3839a80a
cleanup nv/cuda compilers ( #5767 )
...
* cleanup nv/cuda compilers
* destroy prog
* small test
* fix test
* nv ptx rewrite key
* jitlink free
* ptx is part of cuda
2024-07-29 13:50:03 +03:00
chenyu
e7a14f398e
more uop_symbolic tests for divmod pairs ( #5785 )
2024-07-28 21:27:06 -04:00
George Hotz
76d191ab94
move consts to end of add ( #5783 )
...
* move consts to end of add
* better
* fix infinite loop
2024-07-28 17:38:57 -07:00
chenyu
71a64d8252
UOps.MUL bound when one is negative ( #5781 )
...
* UOps.MUL bound when one is negative
also one more distribute_mul rule
* don't always expand
2024-07-28 19:02:47 -04:00
qazal
b775db6b60
high-level benchmark timing diff ( #5776 )
...
* high level timings
benchmark times
fix defs
* use the name map
* skip last task
2024-07-28 23:42:57 +03:00
chenyu
600a39771d
fix Tensor.arange if (stop-start) and step have different signs ( #5775 )
2024-07-28 14:34:10 -04:00
David González Martínez
d0fd84e617
feat: allow passing gradient to .backward() to compute vjp ( #5771 )
...
* feat: allow passing gradient to .backward() to compute vjp
* fix
* refactor
* fix trailing whitespace
2024-07-28 11:13:18 -07:00
qazal
e0e7293b0a
make process replay unique in retries [run_process_replay] ( #5773 )
2024-07-28 20:44:15 +03:00
qazal
95dda8dadf
more unmatching vectorize/gep asserts [run_process_replay] ( #5760 )
...
* merge vectorize/gep rules [run_process_replay]
* assert dtypes
* src=
* float2=(float4.x,float4.y)
2024-07-28 15:08:54 +08:00
chenyu
bfbd7c5461
more generic UOp mul mod folding ( #5765 )
2024-07-27 20:20:35 -04:00
chenyu
80c6475757
update test_uop_symbolic to test UOp min and max ( #5764 )
...
covers #5750 , #5748 , #5741
2024-07-27 19:53:21 -04:00
nimlgen
ed1d784077
test profiler timer sync across devs ( #5751 )
...
* test profiler timer sync across devs
* more correct
* typo
2024-07-27 16:47:37 +03:00
qazal
3e49d86c01
process replay diffs 3 things now ( #5731 )
...
* github api infra
* process replay is 3 parts now
* parse benchmarks
* add gh_token
* complete diff
* move process replay tests
* last successful run
* add tempdir
* skip master
2024-07-27 12:52:20 +03:00
qazal
57b4a8e98d
assert process replay asserts ( #5737 )
...
* assert process replay asserts
* one ci job is fine
* test: Revert "separate process replay main loop (#5734 )"
This reverts commit 94d578396f
.
* mac sed needs that
* Revert "test: Revert "separate process replay main loop (#5734 )""
This reverts commit e4ad7684d5472a64841a66b43bc1db7c9bbbf9e8.
* disable process replay capture
* save time
* amd is tiny
* send to /dev/null
2024-07-27 12:07:50 +03:00
George Hotz
f8972ace38
test flops (and allow wide ALU in UOps) [run_process_replay] ( #5749 )
...
* flops test in external_test_speed_theoretical.py
* test speed theo
* min SZMAX
* allow wide ALU for things that support it
* needed for mypy
2024-07-26 21:07:28 -07:00
George Hotz
2fde2d2914
hotfix: external_test_speed_theoretical works on 24GB
2024-07-26 18:41:52 -07:00
George Hotz
829262a5ee
add external_test_speed_theoretical
2024-07-26 17:45:22 -07:00
kormann
a5ede535ef
NOp field name [run_process_replay] ( #5742 )
...
* rm def name
* add field name
2024-07-26 18:45:59 -04:00
George Hotz
c50e374bb6
multiple locals + get_kernel_modifier + fix valid ( #5739 )
...
* multiple locals + get_kernel_modifier + fix valid
* fix test pattern matcher
2024-07-26 15:10:10 -07:00
chenyu
dc7483ee6f
UOp simple div folding ( #5740 )
...
made UOp.divides return the Optional[quotient] and used it for simple div folding
2024-07-26 17:14:32 -04:00
chenyu
671259417f
reuse UOp `__repr__` for NOp ( #5738 )
2024-07-26 16:59:55 -04:00
kormann
b0c1dba299
named UOp class "NOP" [run_process_replay] ( #5728 )
...
* NOP
* fix const + simplify compile
* rm VAR for NOOP
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-07-26 13:25:53 -07:00
George Hotz
4df46eac67
clean up tensor cores [run_process_replay] ( #5736 )
...
* clean up tensor cores [run_process_replay]
* remove tuple(wmma_sz), self.opts.device
* remove tls, leave DEVICE
2024-07-26 13:21:23 -07:00
qazal
94d578396f
separate process replay main loop ( #5734 )
...
* separate process replay main loop
* [run_process_replay]
* add kernel_changed
* test with [run_process_replay]
* revert temp [run_process_replay]
2024-07-26 21:43:08 +03:00
chenyu
a4e9ebc68a
update test_uop_symbolic ( #5733 )
...
enabled more passed tests
2024-07-26 13:46:09 -04:00
chenyu
2cc55a3095
UOp simple mul add div fold ( #5726 )
2024-07-25 22:00:30 -04:00
chenyu
5521b6d437
UOp simple mul-add-lt fold ( #5721 )
2024-07-25 20:49:38 -04:00
qazal
1b53207b4f
revert isolated dags scheduling ( #5724 )
2024-07-25 19:45:12 -04:00
chenyu
845b0d1c9d
UOp more generic div folding ( #5722 )
...
old: `x // c` can fold if `0 <= x.vmin <= x.vmax < c`
new: `x // c` can fold if `0 < c and x.vmin // c == x.vmax // c`
2024-07-25 17:49:14 -04:00
chenyu
a82815262c
more test_pattern_matcher fixups ( #5714 )
2024-07-25 14:12:21 -04:00
chenyu
05e02ddfb3
fixup test_pattern_matcher ( #5712 )
2024-07-25 13:48:52 -04:00
qazal
9ceb3a3d1f
beautiful_mnist -4.3% kernels ( #5709 )
...
* add is_complete
* partially delete forced_realized
* p2
* start
* refactor to can_group
* remove steps
* _get_inputs is nicer
* fix the cache
* cache is dict now
* rename to group
2024-07-25 20:30:49 +03:00
kormann
1e2eac755d
Fix repr upat ( #5705 )
...
* test
* fix
* x fix
* simpler
* rm extra space
2024-07-25 12:05:48 -04:00
qazal
1c992de257
hotfix: compare_schedule defaults to false ( #5707 )
2024-07-25 17:08:28 +03:00
qazal
489cda827a
more scheduler process replay tooling ( #5706 )
...
* more scheduler process replay tooling
* refactor to compare_schedule
2024-07-25 15:47:18 +03:00
qazal
4e070a2c89
start work on indexing fusion ( #5590 )
...
* start base
* the views add up
base reduceop st:
ShapeTracker(views=(View(shape=(60000, 1), strides=(1, 0), offset=0, mask=None, contiguous=True),))
top st:
ShapeTracker(views=(View(shape=(512, 6000, 1, 28, 28, 10), strides=(0, 1, 0, 0, 0, 6000), offset=0, mask=None, contiguous=False), View(shape=(512, 6000, 1, 28, 28, 10), strides=(47040000, 784, 0, 28, 1, 4704000), offset=0, mask=None, contiguous=False)))
merged buf.st+st:
ShapeTracker(views=(View(shape=(512, 6000, 1, 28, 28, 10), strides=(0, 1, 0, 0, 0, 6000), offset=0, mask=None, contiguous=False), View(shape=(512, 6000, 1, 28, 28, 10), strides=(47040000, 784, 0, 28, 1, 4704000), offset=0, mask=None, contiguous=False)))
* p1
* some cleanups
* more cleanups
* one kernel
* more
* late fuse arange
* less lines
* more work
* fix st strides 1
* update test_schedule, start argmax
* test_tiny_argmax
* add FUSE_ARANGE
* more cleanup
* add utils
* reduce merging
* fix axis and fold if needed
* more fusion
* need to figure this out
* now fixing all of these
* todos+save a line
* ready for p1
2024-07-25 13:23:38 +03:00
nimlgen
08f47d7dc3
more info on failure 41 ( #5704 )
2024-07-25 12:14:28 +03:00
nimlgen
69d4f474d8
amd resnet pf ( #5703 )
2024-07-25 11:21:22 +03:00
chenyu
46e1151c02
UOp more generic mul -> mod folding ( #5698 )
2024-07-24 21:41:25 -04:00
chenyu
66a9c372af
UOp mod reduction ( #5697 )
2024-07-24 20:36:00 -04:00
chenyu
8648fb2636
UOp vmin/vmax on ADD ( #5689 )
2024-07-24 19:09:42 -04:00
chenyu
85710e86cb
UOps div folding ( #5690 )
...
#5689 , with just div folding and new test cases
2024-07-24 14:21:44 -04:00
chenyu
a7a77dfd83
UOp mul lt fold ( #5677 )
2024-07-24 02:49:25 -04:00
chenyu
4e85761d40
UOp mod folding ( #5668 )
2024-07-24 00:10:47 -04:00
George Hotz
053550c3f3
remove MERGE opt, cleanup wmma upcast ( #5669 )
...
* remove MERGE opt, cleanup wmma upcast
* upcast first
* fix broken vectorize folding rule
2024-07-23 20:43:42 -07:00
chenyu
3060e0be4f
add vmin vmax of SPECIAL ( #5670 )
...
* add vmin vmax of SPECIAL
folded stuff like (-1 < gidx0)
* flaky
2024-07-23 22:55:54 -04:00
George Hotz
fa14f7b4fd
switch contract arg to match expand arg [run_process_replay] ( #5667 )
...
* switch contract arg to match expand arg [run_process_replay]
* support multiaxis contract too, it's easy
* cancel contract/expand
2024-07-23 18:08:33 -07:00
George Hotz
a85493bdbe
multiaxis contract test
2024-07-23 15:09:15 -07:00
George Hotz
e3f00ac77d
Fix cuda tc emu test ( #5663 )
...
* fix acc folding for NV tensor cores
* fix correctness of reduce_before_expand
* fix test emulated CUDA tensor cores
* test_gemm_fp16 on some devices
2024-07-23 15:04:25 -07:00
chenyu
16c27ae400
update UOp.SPECIAL arg spec [run_process_replay] ( #5661 )
...
* update UOp.SPECIAL arg spec [run_process_replay]
from `(0, "gid0", 4)` to just `("gid0", 4)`. closer to a Variable
* fix ptx
2024-07-23 16:58:12 -04:00
chenyu
01fe00e055
skip test_failure_39 in CI ( #5660 )
...
took more than 2 minutes in ci metal, it's basically the same as test_failure_37 but 20X bigger
2024-07-23 14:47:05 -04:00
chenyu
199b3bf02b
simple UOp lt/ge folding ( #5657 )
...
works if lhs is a DEFINE_VAR.
folds trivial x < -math.inf now, need to change SPECIAL to use DEFINE_VAR to fold more
2024-07-23 14:11:05 -04:00
qazal
b0fc5a4c6f
start scheduler process replay ( #5656 )
2024-07-23 20:02:51 +03:00
chenyu
e210c87b4a
uop mod-mod simplification ( #5650 )
2024-07-23 12:33:55 -04:00
nimlgen
1384f08cd4
hcq profile tests ( #5654 )
...
* profile tests
* fixes
* remove linter
2024-07-23 18:40:33 +03:00
qazal
5f394fc9c6
more work toward non-blocking process replay ( #5653 )
...
* non-blocking process replay
* more actionable
* test it
* revert the test
* %s/logging.warn/logging.warning
2024-07-23 14:26:31 +03:00
qazal
7cb67e6fb2
merge gated stores spec ( #5652 )
...
* test_unmerged_ifs should merge ifs
* test_tiny_gate_store
* test_merge_ifs_alt
* assert assert asserts
2024-07-23 18:53:27 +08:00