Commit Graph

4685 Commits

Author SHA1 Message Date
Nicklas Boman 6e86472cd6
fix typing for test to run in py38 (#4930) 2024-06-12 13:22:30 -04:00
chenyu 1326f29e24
fix Tensor.gather shape checking criteria (#4932)
it's fine if `self.shape[d] >= index.shape[d]` for all `d != dim`, not for all `d`
2024-06-12 13:10:14 -04:00
qazal 898430c004
more typing in linearizer uoping utils (#4929)
* type check everything

* idxs will be uops
2024-06-12 11:00:02 -04:00
George Hotz 828c98d5c4 add slides from code europe to docs 2024-06-12 14:35:08 +02:00
George Hotz 9a3c1e4a17
fix mul div failure (#4928) 2024-06-12 13:58:46 +02:00
George Hotz 11a03cbbf5
don't use uops.add while constructing (#4913)
* don't use uops.add while constructing

* rebase

* bugfixes

* have to use BFS

* prove it's late

* simpler uop symbolic test (why we did this)

* use dict, not set
2024-06-12 13:31:34 +02:00
qazal d894acbb50
remove hardcoded -1s referencing late reduce (#4926) 2024-06-12 04:50:15 -04:00
qazal b833a112ba
allocate shared memory per block (#4924)
* define temp

* use idx

* cleaner [run_process_replay]
2024-06-12 03:43:10 -04:00
George Hotz ca4ccddcd6 docsfix: nn.Tensor -> Tensor 2024-06-12 09:18:32 +02:00
wozeparrot 3d13c23bfa
llama3 `--download_model` (#4922) 2024-06-11 22:59:59 -07:00
chenyu f902af4f0b
increase metal ci test timeout to 20 minutes (#4920)
make it less annoying for now
2024-06-11 18:45:51 -04:00
chenyu fdbb4305cb
skip unsupported dtype in fuzz_linearizer (#4917)
resolve issues in #4887. dataset generated from ubuntu but metal does not support double
2024-06-11 18:18:21 -04:00
qazal 7f3d9e6d94
revert hsa autogen removal (#4914)
* Revert "only install comgr in AMD CI (#4909)"

This reverts commit 7f03420d05.

* rocm-llvm only removal
2024-06-11 12:55:45 -04:00
nimlgen 58cf6eaba9
add missing dir level for amd mockgpu (#4911) 2024-06-11 18:35:04 +02:00
chenyu b886d250fb
improve test_dropout_on_shard (#4912)
tested some basic property, also minor formatting for a few Tensor.training setups
2024-06-11 11:36:02 -04:00
qazal 7f03420d05
only install comgr in AMD CI (#4909)
* test

* delete hsa autogen
2024-06-11 06:19:33 -04:00
George Hotz 35e53c0809
add sharded arange test (#4908) 2024-06-11 10:58:33 +02:00
chenyu 798ea61377
widen test_ops [low, high] and more strict atol (#4906)
default [low, high] changed from [-1.5, 1.5] to [-2, 2] (except tan).
dropped several explicit atol if it's unnecessarily larger than default 1e-6.
tested on mac, tinybox red / green
2024-06-10 20:47:09 -04:00
chenyu 97b05f567e
revert the .detach() in layernorm (#4904)
* revert the .detach() in layernorm

it's only correct in LayerNorm where input is the data, and not correct in GroupNorm and InstanceNorm that reused layernorm.
Added backward tests for weights, bias and input for these norms.

* bigger atol for llvm

* relax backward more
2024-06-10 18:02:05 -04:00
qazal 8b5bcf309a
process replay in all of CI (#4884) 2024-06-10 14:49:29 -04:00
George Hotz 9715a7193a
replace set with dedup (#4901) 2024-06-10 18:20:38 +02:00
chenyu c8cd637236
test case for Tensor.var reducing over size = 1 axis (#4902)
backward failed when correction >= reducing n
2024-06-10 12:11:39 -04:00
chenyu c0fb7eee09
cleanup lazy const fold for binary (#4900)
removed pylint: disable=possibly-used-before-assignment
[run_process_replay]
2024-06-10 10:46:58 -04:00
nimlgen 5bf1f7d4d3
nv better error messages for ioctls (#4899) 2024-06-10 16:01:50 +03:00
George Hotz b9f26eedc9 hotfix: import datasets in nn init 2024-06-10 11:33:50 +02:00
chenyu b56ae5606c
cosmetic changes to uop _match (#4897)
minor cleanup before fixing two level match
[run_process_replay]
2024-06-09 18:29:42 -04:00
SnakeOnex b1db2d0094
tqdm replacement (#4846)
* tqdm replacement almost

* formatting

* formatting

* imports

* line len

* fix

* removed set description :(

* removed set description :(

* fix

* fix

* green check?

* rewrote as class, fixed several bugs

* types spacing

* removed imports

* fix

* iterable

* typing

* mypy disagreement

* imports

* more e2e tests vs tqdm

* removed seed setting

* robustness against time.sleep() flakiness

* flaky fix

* automatic bar closing when count==total

* cleanup

* clang error with tqdm

* tqdm back

* use os lib, print to stderr (fixes the clang bug, where the bar was leaking into the generated c program

* back to shutil

* unit_scale + unit_scale test

* custom unit to tests

* pretty

* clean

* removed flaky test

* less test iters

* empty line

* remove disable
2024-06-09 23:46:03 +02:00
qazal 05d7ab774f
set tensor core opt options in Renderer (#4896) 2024-06-09 14:12:41 -04:00
George Hotz f42183ba28 hotfix: relax cifar to 93.2 2024-06-09 13:09:21 +02:00
qazal 1dde829e34
UOps.IF* to graph spec (#4894) 2024-06-09 07:00:12 -04:00
George Hotz b9afb0d577
test uop as symbolic (#4870)
* start work

* more tests passing

* more tests passing

* more

* 34 failures

* expect the failures

* remove broken rule

* render is fine in just the test

* simplify and put in test
2024-06-09 12:15:11 +02:00
nimlgen 654a8b9ef7
retire hsa (#4885)
* retire hsa

* EMULATE_AMD
2024-06-09 11:33:03 +03:00
chenyu e33efd6a3d
test cases for multitensor adds const (#4892)
Tested const remained const in ast. Removed the TODO in _to_const_val too
2024-06-08 22:57:48 -04:00
chenyu a3ec4234df
expand broadcast functions a bit (#4891)
taking some good stuff from the #4886. I think `from_, to` is more readble than `sh, s` too
[run_process_replay]
2024-06-08 20:16:54 -04:00
wozeparrot 2849d0a2a1
fix copying to clipboard on a non secure context (#4890) 2024-06-08 16:51:47 -07:00
nimlgen 6327b50e51
amd in benchmarks (#4861)
* amd in benchmarks

* remove all hsa
2024-06-08 23:24:46 +03:00
nimlgen d24e57c615
amd support kernel with bf16 (#4863)
* amd support kernels with dispatch_ptr

* fixes

* line savings

* one line

* try

* Revert "try"

This reverts commit 5f340dfdd45c63a8b0c444c9601eef0c3a718e62.

* not used will be back when hsa is gone

* gone will be back

* add this as well
2024-06-08 22:52:32 +03:00
wozeparrot 6c24eda522
feat: tinychat (#4869) 2024-06-08 12:05:45 -07:00
Brennan Kinney 9445946cae
docs: Update referenced yaml in `yolov8.py` (#4871)
YAML files have since been relocated.
2024-06-08 15:05:00 -04:00
Roelof van Dijk 794fecf8e3
perf: faster element deletion during matching (#4882)
* perf: faster deletion

* fix: leave the tuple init
2024-06-08 15:16:35 +02:00
Roelof van Dijk 0eebb8e998
fix: _free should not return (#4880) 2024-06-08 14:45:06 +02:00
Roelof van Dijk 1785a70e77
fix: else-return on runtime (#4881)
* fix: add init file

* fix: no else-return

* fix: remove file again
2024-06-08 14:44:24 +02:00
qazal 1e3325f369
raise assert [run_process_replay] (#4879) 2024-06-08 08:31:44 -04:00
qazal d19f39d4dd
unbind Variable pre LazyOp (#4873)
* early unbind

* assert ConstType is correct
2024-06-08 08:16:38 -04:00
George Hotz 9c30889ce9
[run_process_replay] faster and simpler match function (#4876) 2024-06-08 14:08:30 +02:00
Roelof van Dijk aadab3e3da
fix: pylint will not lint folders without __init__.py (#4875)
* fix: add __init__.py

* fix: no-else-return

* fix: redefined-builtin

* fix: unused-variable

* fix: possibly-used-before-assignment
2024-06-08 14:00:24 +02:00
Szymon Ożóg 1680a4bcb8
Remove unused and internal variables (#4862) 2024-06-07 23:05:38 +02:00
Roelof van Dijk 15e5a4fb26
fix: variable defined in assert breaks -O (#4866) 2024-06-07 21:36:24 +03:00
chenyu 3a20cff7c2
expand ShapeTracker.invert a bit (#4864)
removed a type cast and it can early return now

[run_process_replay]
2024-06-07 14:26:02 -04:00
nimlgen 688b14c933
do not sleep immediately in amd's wait_signal (#4859)
* that was slow python in hlb

* wait actibely for 5s

* just this

* revert this back

* fix
2024-06-07 16:33:46 +03:00