tinygrad

Commit Graph

Author	SHA1	Message	Date
qazal	27f4de2ce4	delete half_prekernel (#3388 ) * generic rendering of half and bf16 hotfix * fix uops + regression test * fix the test for metal's half4 * uop.uop fixup * mypy with --strict-equality, fix ops_gpu	2024-02-14 15:40:48 +01:00
chenyu	078a2603d5	set metal fast math default to 0 (disabled) (#3370 ) * set metal fast math default to 0 (disabled) It's a correctness fix because we use inf and nan. Let's see how slow it is * skip failed onnx tests * tmp DISABLE_COMPILER_CACHE=1 in metal benchmark * Revert "tmp DISABLE_COMPILER_CACHE=1 in metal benchmark" This reverts commit 22267df38099acbf949aefdb6a5911ebc3a31984.	2024-02-14 11:42:33 +01:00
Francis Lam	668324d92b	wmma: protect TC locals from modification and use only LOCAL (#3379 ) also remove unnecesssary upcast_dim from tensor_core and calculate it from the dimensions and thread sizes	2024-02-13 10:19:35 +01:00
Francis Lam	f1ad01fd91	test_linearizer_failures: add new linearizer compile failure on METAL (#3380 )	2024-02-12 20:28:34 -05:00
George Hotz	2e60012bcf	move create schedule and delete old API (#3377 ) * move create schedule and delete old API * fix test multitensor	2024-02-12 18:10:45 +01:00
George Hotz	41efaa848c	move graph.py and jit.py into features (#3376 ) * move graph.py into features * move jit into features * fix quickstart	2024-02-12 17:34:34 +01:00
George Hotz	0f6cde243d	import from wino_cleanup (#3374 )	2024-02-12 16:26:50 +01:00
Jyotirmaya Mahanta	b6a2600c86	fix merging condition in merge_dims (#3363 ) * fix merging condition in merge_dims * add tests * set contiguous after mask is canonicalized * minor fix	2024-02-12 11:50:26 +01:00
qazal	c8fd66a131	Run RDNA3 tensor core tests in CI (#3367 ) * add test_linearizer * skip test_padto_matmul	2024-02-11 19:54:06 -05:00
chenyu	f798b60338	add METAL_FAST_MATH env var to disable metal fast math (#3369 ) * env var METAL_FAST_MATH to disable fastmath for metal use this to test impact of fast math. might need to disable compiler cache with DISABLE_COMPILER_CACHE * failed onnx test with fast math METAL_FAST_MATH=0 DISABLE_COMPILER_CACHE=1 NOOPT=1 python -m pytest -n=auto test/external/external_test_onnx_backend.py -k test_MaxPool3d_stride_padding_cpu	2024-02-11 04:26:09 -05:00
chenyu	1156a27619	cleanup atol in test_ops (#3368 ) removed the explicit set value if it's the same as default 1e-6, or higher but can be set to default.	2024-02-10 19:44:44 -05:00
Francis Lam	ddb22a60c8	linearizer: fix up edge case bugs in UNROLL opt (#3362 ) Fully UNROLLing the first_reduce should not change the number of local_dims. Fully UNROLLing a GROUP dim should reduce the number of group_for_reduces by one. Also changed group_for_reduces to be a count as the axis number isn't used anywhere (they are always the first reduce dims).	2024-02-10 11:49:25 +01:00
andresgit	28ba1c5406	fix Tensor.randint ignoring kwargs (#3350 ) * fix Tensor.randint ignoring kwargs * randint kwargs fix	2024-02-09 17:12:16 +01:00
Francis Lam	ce21fdfb67	ops_python: add HIP tensor core mock and refactor METAL (#3354 ) * ops_python: add HIP tensor core mock and refactor METAL * Add tests to CI * add DEBUG=2 to full tests --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-02-09 12:46:06 +01:00
chenyu	c151131d1b	update onnx tests that no longer fail on CI (#3353 ) was debugging fast math and turned out it passed on CI now. more like a bug in CI	2024-02-08 21:19:00 -05:00
chenyu	7c1c6efee5	exclude half with PYTHON in test_dtype.is_dtype_supported (#3351 ) half memoryview only in 3.12+. rest of the test_dtype (bounty) seems to be legit issue in ops_python.	2024-02-08 20:10:25 -05:00
George Hotz	c32ea95d7d	Python uop emulator (#3327 ) * start uop emu * tiny_add passes * more ops * emulate the whole warp * test_gemm passes * metal gemm test pass * works on big gemm * works on big gemm * more tests pass * touch ups * fix mypy * cleanups * exp2 mypy * arch is where it belongs * actually emulate tensor cores * fix test * new style	2024-02-08 19:24:55 +01:00
Francis Lam	2266152b28	linearizer: added FUZZ_BEAM to fuzz_linearizer and additional tests (#3340 ) Fixed test_tensor_core_opts to test all the TCs. Added commented out failing tests in test_color_shapes_with_local.	2024-02-08 16:12:58 +01:00
chenyu	b110c4a7b8	explicitly set input low and high in test_ops (#3347 ) easier to set `(low, high)` than figuring out a,b for `(x+a)*b`. this pr kept the same input ranges	2024-02-08 04:11:45 -05:00
chenyu	0d2dacb549	test intermediate tensors created by function have same device as input (#3338 ) run on TORCH since it's the fastest one on CI. caught a bug in multinomial, and update the behavior of fancy index and gather to move the indices Tensor to same device as self.	2024-02-07 09:24:36 -05:00
chenyu	02636ff62d	re-enable test_reduce_0d_default int test case in test_dtype (#3336 )	2024-02-07 05:30:14 -05:00
chenyu	ca66be6a70	add failed Tensor.pow test cases (#3334 ) tried refactoring pow and found some bugs	2024-02-07 04:28:24 -05:00
chenyu	d9ef8e25b3	fix Tensor.var with 0 in reduce dim. (#3324 ) fix when correction is too big. it seems to only work when input size is 0 though. torch can output -inf in var when correction is too big, which does not make sense.	2024-02-05 20:59:13 -05:00
Obada Khalili	ee25f73283	Fix Tensor.mean to compute the mean correctly when 0-length axes are selected (#3318 ) * fix Tensor.mean to compute the mean correctly with 0-length axes are selected * add a regression test * rename sum variable to sum_t to avoid conflict with built it function * refactor Tensor.mean to has less lines	2024-02-05 01:40:37 -05:00
chenyu	97275101e9	fix safetensor load uint32 and uint64 (#3315 ) the correct keys are U32 and U64.	2024-02-04 10:46:27 -05:00
Yoshinori Sano	edb74897b2	support safe load bf16 (#3310 ) * support safe load bf16 * fix lint error E501 * add test for loading safetensors * key should be BOOL * fix lint	2024-02-04 10:08:39 -05:00
chenyu	d459956966	move TestGetContraction to test_helpers (#3313 ) also cleaned long lines in test_shapetracker and enabled the line length check	2024-02-04 06:05:01 -05:00
Obada Khalili	b4ea0e18e3	Fix dot product on buffers with zero strides (#3303 ) * skip matacc opt if the all src buffers of mul op are const buffers * add noqa directive for long test * unskip MALACC opt * ensure that a_axes at least includes summation axes in order to perform np.einsum correctly * add regression test for mulacc op * compute a_slices using a_axes * refactor helper of function to retrieve axes and slices for nonzero strides as well as summation axes * include a regression test that uses and to test the behaviour indirectly	2024-02-04 05:15:06 -05:00
chenyu	30a3288c4a	touchup canonicalize empty mask (#3308 ) empty list -> None. also added env SEED for fuzz_shapetracker_math	2024-02-03 21:05:10 -05:00
David Hou	aebaab011f	faster wino compile by catting consts across data expand dim (#3293 ) * PoC faster wino compile by catting consts across data expand dim * fix fusions * faster + golf it * noqa 501 * implicit broadcast * Revert "implicit broadcast" This reverts commit 5915a9083d045ec1e6be84dcb492333325d48666. * shorter * shorter * oops * 216 upcasts is probably fine * wino kernel count test * test winograd number of sts * specify device for apply_matrix mat elements	2024-02-02 03:47:45 -05:00
Felix Wu	021eea3a52	fix UnboundLocalError when running Compiler with DISABLE_COMPILER_CACHE (#3296 )	2024-02-01 21:12:33 -05:00
chenyu	9196b11dfb	test_ops sinh/cosh/asinh/acosh/atanh (#3294 ) some have numerical issues at large input similar to sigmoid	2024-02-01 03:10:11 -05:00
Francis Lam	927f2dd24d	wmma: add HIP FP16 to FP16 tensor core (#3287 ) * wmma: add HIP FP16 to FP16 tensor core * test: fix test_tensor_core to use separate tolerances for half	2024-01-31 23:00:51 -05:00
chenyu	18e854cdbf	shrink MLB on sharded axis (#3255 ) * shrink MLB on sharded axis use onehot structure to store the real partition. goal is unsynced batchnorm2d that can be run on multigpu for training. draft version in https://github.com/chenyuxyz/tinygrad/pull/109 * SYNCBN flag * test unclean shrinks * UnsyncedBatchNorm reuses BatchNorm * more robust pad arg check * better types * more tests! * 6 gpus in benchmark * disable slow GPUS=6 benchmark	2024-01-31 21:48:25 -05:00
chenyu	a3652e6ddc	minor cleanups to test_ops (#3290 ) - removed noop a=0 - fixed integer div test - added test for both python expression and Tensor method call - reordered for consistency and added some spaces	2024-01-31 19:01:25 -05:00
chenyu	7816c3b692	onnx update for trilu and argmax (#3283 ) * support 0 in shape for tril and triu * select_last_index for ArgMax and ArgMin * pass **kwargs	2024-01-30 18:39:16 -05:00
qazal	5b46b0ff3d	Simple RDNA3 emulator (#2974 ) * mockhip->hipcpu * allocate buffers * launch a kernel read_asm api * run remu in CI * remu 0.0.2, real test ops * simple driver * 0.0.3, all test_ops * run the latest emulator * 9 minutes is way too long, drop backprop in CI * bring back the backward pass * Revert "bring back the backward pass" This reverts commit 3781e1bc56fc06b424e7c7bed1224f819247fb8f. * Print slowest tests * emulated device directly in ops_hip * fix ruff, override mypy for specific rules * test in the same code path - hip backend env variables - install packages and verify autogen - run certain tests - remove the other hip tests path - verify Device.DEFAULT * remove the emulated hip in extra --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-01-30 10:39:28 -08:00
George Hotz	247a8a2a6c	add canonicalization to View.create (#3280 ) * Reapply "take merge views from corsix branch" (#3278) This reverts commit `d298916232`. * reintroduce merge views * update second any * isinstance -> not * 25% less same but unequal	2024-01-30 10:26:48 -08:00
George Hotz	d8f6280ffb	hotfix: add CHECK_NEQ to fuzz_shapetracker_math	2024-01-30 10:07:54 -08:00
George Hotz	09f2952dc3	reintroduce merge views in update benchmark (#3279 ) * Reapply "take merge views from corsix branch" (#3278) This reverts commit `d298916232`. * reintroduce merge views	2024-01-30 09:47:20 -08:00
George Hotz	d298916232	Revert "take merge views from corsix branch" (#3278 )	2024-01-30 09:34:28 -08:00
George Hotz	b57a16aa89	take merge views from corsix branch (#3273 ) * take merge views from corsix branch * better DEBUG * max views * remove view.py change * Revert "remove view.py change" This reverts commit f3025f4f393b4b9a9a1ac89ea488d82de448b78c. * only allow filter on non symbolic * oops, correct fix * comment to explain	2024-01-30 09:25:16 -08:00
George Hotz	6a4a5dc79d	fix pad 0 size (#3277 ) * fix pad 0 size * put in view, not pad * test was wrong	2024-01-30 08:58:10 -08:00
Francis Lam	861d5ac224	wmma: fix the upcasts after WMMA to be hcopt ordering invariant (#3250 ) will correctly handle and permutation of optops after the TC one	2024-01-29 11:51:57 -08:00
George Hotz	085dc87bed	winograd should be 4 kernels (#3268 )	2024-01-28 09:21:26 -08:00
George Hotz	9e17378b60	Fix metal tests (#3266 ) * small fixes for tests on mac * remove device from TensorCore	2024-01-27 18:09:42 -08:00
Hristo Georgiev	3ae811af21	tests for Tensor init data dtype and resulting dtype (#3247 ) Co-authored-by: Hristo Georgiev <6043312+hristog@users.noreply.github.com>	2024-01-27 00:13:42 -08:00
George Hotz	3c728d1082	compiler support (#3260 ) * compiler support * revert that * fix tests	2024-01-26 23:36:40 -08:00
Francis Lam	4273aabe31	extra/gemm: add a simple_conv.py along with correctness check (#3236 ) * extra/gemm: add a simple_conv.py along with correctness check The goal is to easily test tensor core triggering situations * test: add tests for acc_dtype handling and fixed typing	2024-01-26 19:06:57 -08:00
George Hotz	473935125a	use comgr to compile (#3248 ) * use comgr to compile * fast * bfloat16 * move comgr to it's own file * cleaner style * comgr in new place * comgr free + dtype cleanup	2024-01-26 18:27:49 -08:00

1 2 3 4 5 ...

1377 Commits