tinygrad

Commit Graph

Author	SHA1	Message	Date
kormann	f5dd25d376	enable whisper batch for long sequences (#6458 ) * long batch +test * long batch +test * cleanup * rollback syntactic changes --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-09-17 00:42:10 -04:00
George Hotz	d3b098299d	add failing regression test for image (#5540 ) * add failing regression test for image * tg type * simpler test * don't realize image to image casts caused issue * simple pad	2024-07-17 17:27:18 -07:00
wozeparrot	90f0e2fc49	db in wal mode (#5388 )	2024-07-12 20:43:36 -07:00
chenyu	a0dbe20dbd	skip some redundant and slow tests in ci (#5416 )	2024-07-12 14:43:13 -04:00
chenyu	322c37e621	use helpers.JIT in llama and gpt2 examples (#5350 ) * use helpers.JIT in llama and gpt2 examples replaced getenv("JIT"), effectively made gpt2 default jit * fix test_gpt2	2024-07-09 15:04:43 -04:00
chenyu	9a2a82a77f	test stable diffusion unet in ci (#5268 ) unet is parameterized now so can test a smaller one is ci	2024-07-02 21:37:52 -04:00
Tobias Fischer	9a25ee0b9a	pixed unet call params (#5262 )	2024-07-02 12:40:27 -04:00
Tobias Fischer	8c9c1cf62f	Pulled CLIP and UNet into Seperate Files (#5253 ) * pulled clip and unet into seperate files * reference cleanup, lru cache fix * better pool indexing	2024-07-01 22:33:01 -04:00
chenyu	e2c5054bdd	update resnet.load_from_pretrained (#5040 )	2024-06-18 16:29:22 -04:00
chenyu	6bbbeb93ac	skip a few clang test that took > 30 seconds in CI (#4126 ) * skip slow CLANG test test_train_cifar * skip those too * and that * only CI * one more	2024-04-10 02:00:34 -04:00
George Hotz	f916aadaea	external that test	2024-03-29 19:35:50 -07:00
reddyn12	9b5e15db6e	Mamba Implementation (#3456 ) * first commit * state back to orig * mamba comparisions * rm file * rename file * use Tensor.einsum and mke default model 370M * Cleaned code and made a comparision test * Simplyfy pull request. Only has 1 mamba implementation now. * Update prompt * rm whitespaces * last space * remove Einops dependency * rm unused code * add tests * rm print statement * rm imports * skip CLANG * Update skipIf description * skip model test in CI and add CLANG fix * rm Device import * don't be stupid * Fix conv assign When the prompt is too short, the logic for conv_state assign messes up. This can be fixed when padding the tokenized array to min length of 4. I padded using the empty string token, but idk if proper practice is to use the PAD token * fix p1 * temp * fix jit import --------- Co-authored-by: schlimeszn <schlimeszn@gmail.com> Co-authored-by: reddyn <nikidsniper@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-03-28 17:49:12 -07:00
George Hotz	150ea2eb76	create engine folder and move code (#3948 ) * retry * older tf * that	2024-03-26 20:38:03 -07:00
chenyu	a2d3cf64a5	move is_dtype_supported to test.helpers (#3762 ) * move is_dtype_supported to test.helpers updated all places that check if float16 is supports * fix tests	2024-03-15 14:33:26 -04:00
chenyu	922f8319cb	Run test_real_world in METAL test (#3760 ) * clean up test_real_world * skip that * JIT=2 for metal * all device	2024-03-15 13:56:52 -04:00
George Hotz	41f0a25b53	lazy.py: cache consts (#3577 ) * lazy.py: cache consts * add regression test * always always cache const * bump by 1	2024-03-02 03:50:05 -08:00
xarkes	28a8b72024	Remove Interpreted device & remaining CPU/TORCH ref (#3423 ) * Remove Interpreted device & remaining CPU/TORCH ref * Oops * supports_device was useful * Fix doc wording --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-02-16 00:30:21 -05:00
George Hotz	93eceef727	remove cpu prereqs (#3410 )	2024-02-15 13:45:06 +01:00
George Hotz	41efaa848c	move graph.py and jit.py into features (#3376 ) * move graph.py into features * move jit into features * fix quickstart	2024-02-12 17:34:34 +01:00
George Hotz	9e17378b60	Fix metal tests (#3266 ) * small fixes for tests on mac * remove device from TensorCore	2024-01-27 18:09:42 -08:00
chenyu	f88506e630	move gpt2/llama sampling inside the model call (#3013 ) * move gpt2/llama sampling inside the model call * argmax uses one more kernel	2024-01-04 17:01:50 -05:00
Yixiang Gao	8a63f26a0f	make LR scheduler work with multigpu (#3011 ) * add a failing test for LR scheduler when using multigpu * fix calculation order and unnecessary tensor created for float * min_lr is no longer tensor	2024-01-04 12:10:56 -08:00
chenyu	ae112c9dbe	fix some long lines in tests (#3006 ) * fix some long lines in tests * better	2024-01-03 23:53:33 -05:00
Yixiang Gao	84eb6dd32a	skip GPU cause opencl on intel can't compile half	2024-01-03 07:07:21 -08:00
Yixiang Gao	73879b50ad	only need to check the min_lr for the nan bug	2024-01-03 07:00:50 -08:00
Yixiang Gao	99f8740c60	running half in CI CPU is slow	2024-01-02 18:44:35 -08:00
Yixiang Gao	781690fd99	how long it takes on CI CPU without the lr scheduler	2024-01-02 18:33:48 -08:00
Yixiang Gao	dd00bcb9c0	fix whitespace	2024-01-02 18:16:33 -08:00
Yixiang Gao	841487cad9	add half test with using hyp from benchmarks	2024-01-02 18:14:30 -08:00
George Hotz	a280cfe169	move dtypes to dtype.py (#2964 ) * move dtypes to dtype.py * fix urllib	2024-01-01 14:58:48 -08:00
chenyu	1fb815e77e	hotfix fix coder. RMSNorm cannot have float16 input (#2932 ) * hotfix fix coder. RMSNorm cannot have float16 input * update real world test due to new kernels * more type casts	2023-12-25 02:28:11 -05:00
chenyu	50cfb1fb3a	update onnx model links (#2908 ) updated in https://github.com/onnx/models/pull/644	2023-12-22 00:19:41 -05:00
chenyu	73cadfbb3c	Remove pytest markers (#2831 ) * remove pytest marker * fix some, skip some * tweak * fix * skip slow * skip more	2023-12-18 18:53:28 -05:00
chenyu	0723f26c80	dtypes.default_float and dtypes.default_int (#2824 )	2023-12-18 12:21:44 -05:00
George Hotz	877c78b4ce	lazy tests (#2796 ) * tests * mini sd is very mini	2023-12-16 08:24:21 -08:00
George Hotz	c6eb618013	tests from new lazy branch (#2774 ) * tests from new lazy branch * fix lin 11 * that was needed * doesn't fail * mark * meant that * llvm passes	2023-12-14 23:06:39 -08:00
Shawn Hagler	51afe938f1	update onnx model links (#2737 )	2023-12-12 19:11:11 -08:00
George Hotz	6d6eb9302d	ruff checks the max line length is 150 (#2734 ) * ruff checks the max line length is 150 * fix tensor.py * a lot more * done	2023-12-12 17:34:47 -08:00
George Hotz	4164d0ebbd	multitensor start (#2676 ) * multitensor work * early gen fixes the tests * atol for flaky test	2023-12-07 17:07:05 -08:00
chenyu	fd21eced74	reduce gpt2 kernel count in test_real_world (#2663 )	2023-12-06 21:57:04 -05:00
George Hotz	232ed2af3f	more test cleanups (#2631 ) * more test cleanups * move test example back	2023-12-05 16:17:57 -08:00
qazal	4380ccb169	Non fp32 math (#2264 ) * `global_load` and `global_store` using buffer dtype * `UOps.PHI` in all dtypes * `UOps.ALU` in all dtypes * `UOps.CONST` & `UOps.DEFINE_ACC` in all dtypes * -- endof implementation -- +tiny lint changes * these tests require the fp16 extention you can run them locally to confirm they're green: (GPT2 test is broken in master for mac, see [this](https://discord.com/channels/1068976834382925865/1069001075828469790/1177993277958533261) `GPU=1 python3 -m pytest test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_dequantizelinear_e4m3fn_float16_cpu test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_max_float16_cpu test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_min_float16_cpu test/models/test_real_world.py::TestRealWorld::test_llama test/models/test_real_world.py::TestRealWorld::test_gpt2 test/models/test_whisper.py test/test_specific_conv.py::TestSpecific::test_big_vec_mul` skip the new test_linearizer_failures in CI GPU because of the fp16 extention This passes on a real GPU since the extention is available: `GPU=1 python3 -m pytest test/test_linearizer_failures.py::TestLinearizerFailures::test_failure_8` see CI logs [here](https://github.com/tinygrad/tinygrad/actions/runs/6996590597/job/19032641427#step:14:644) * these tests fail in CI due to segfaults and CPU crashes To confirm they're green locally, you can run the following commands: 1. For the tests skipped in test_ops.py (note: CLANG is very slow) `for var in GPU CUDA CLANG; do export $var=1; for test in test/test_ops.py::TestOps::test_slice_fancy_indexing_no_dim_collapse test/test_ops.py::TestOps::test_slice_fancy_indexing_dim_collapse_int test/test_ops.py::TestOps::test_slice_fancy_indexing_dim_inject_none test/test_ops.py::TestOps::test_slice_fancy_indexing_dim_inject_and_collapse; do python3 -m pytest $test; done; unset $var; done` 2. For the ONNX tests skipped in CLANG: ``` CLANG=1 python3 -m pytest test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_ai_onnx_ml_array_feature_extractor_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_gather_elements_0_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_mean_weight_ii_3d_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_gather_elements_1_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_NCd1_mean_weight_negative_ii_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1_weight_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2d3_none_no_weight_negative_ii_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1_ii_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_mean_weight_ii_4d_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_mean_weight_ii_3d_log_prob_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_gather_elements_negative_indices_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_NCd1d2d3d4d5_mean_weight_log_prob_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_NCd1_mean_weight_negative_ii_log_prob_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_no_weight_reduction_mean_ii_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_NCd1d2d3d4d5_mean_weight_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2d3d4d5_mean_weight_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1_mean_weight_negative_ii_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_mean_weight_ii_4d_log_prob_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_with_weight_reduction_mean_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1_weight_ii_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_with_weight_reduction_sum_ii_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_with_weight_reduction_sum_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_reduction_sum_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2d3d4d5_none_no_weight_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2d3_sum_weight_high_ii_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_reduction_mean_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_with_weight_expanded_cpu ``` 3. The LLVM test I skipped here is already [skipped in master for all backends](https://github.com/tinygrad/tinygrad/blob/master/test/external/external_test_onnx_backend.py#L186), I just made it more specific `LLVM=1 python3 -m pytest test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_dequantizelinear_e4m3fn_float16_cpu` * Revert "these tests fail in CI due to segfaults and CPU crashes" This reverts commit 15db57014381a4449d563526ac6c870e36257658. * merge with cleanup-vectorized-hip-renders * barely working HIP P1, ALU ops need a refactor? * manage the fact that in HIP [half2 is actually an unsigned int vec](`f921880387/hip/include/hip/amd_detail/amd_hip_fp16.h (L59)`) and half is a totally different __half that [has an unsigned int element in it](`f921880387/hip/include/hip/amd_detail/amd_hip_fp16.h (L50)`) but can't be accessed [because it's private](`f921880387/hip/include/hip/amd_detail/amd_hip_fp16.h (L86)`). If you just do this: ``` half2 val0 = // ... half val1 = // ... ``` then you can't do: ``` val0.x + val1 // error: use of overloaded operator '+' is ambiguous (with operand types 'unsigned short' and 'half' (aka '__half')) ``` * update the sign definition to avoid division by zero in all dtypes * diff cleanup p1: why were these in the diff anyways * less hacky HIP, enable CIFAR fp16 benchmark, test ops for HIP in CI! add ALU ops overloads for HIP this will make HIP max work handle mod Revert "handle mod" This reverts commit 370fd4b3fbe99b6ae8cc293d005b106628205933. update max to use hmax add HIP GEP render logic enable CIFAR fp16 benchmark test ops for HIP back to store as float because this only works for float4 grouping right now test_ops for hip!! always sign * back to the sign we had before because we cant do a backward pass on a Less node * remove old hacks HIP compiling test_ops in CI takes ~9 mins, not doing it for now new HIP ALUs * reduce accs done right * refactor to function * no device hacks hacks p2 the other way * LLVM ALU ops half, float and double are all float update max * update test_uops, cmplt is always a bool in the real linearizer. assertAlmostEqual is wrong when ret is bool * cleanup LLVM wrong code * dummy change for the CUDA install glitch --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-12-03 13:45:49 -08:00
chenyu	86fbd413f3	update test_real_world configs (#2557 )	2023-12-01 20:03:52 -05:00
chenyu	67f4e03724	rewrite 0 size loadop into a CONST (#2556 ) * rewrite 0 size loadop into a CONST * check alloc size * EMPTY is better * Revert "EMPTY is better" This reverts commit 574fe0f9ed28f1b97da5a81afdfd2cd5d9a94ff9. * no ast is created * fix test	2023-12-01 18:29:06 -05:00
chenyu	7d26452305	call ruff with --preview (#2522 ) some checks are ignored without --preview	2023-11-30 13:59:00 -05:00
George Hotz	d87a246439	move to new cached fetch (#2493 ) * move to new cached fetch * extra.utils is over * loads * bump download cache * bump timeout	2023-11-28 17:36:55 -08:00
Christopher Mauri Milan	7f01dd04f0	Apply ruff linting rules to tests (#2473 ) * everything except F821 * enable F821 with noqa * dumb fix * fix remaining imports and (former) lambdas * replace _ with noqa to avoid gc	2023-11-27 21:24:06 -08:00
George Hotz	9e07824542	move device to device.py (#2466 ) * move device to device.py * pylint test --disable R,C,W,E --enable E0611 * fix tests	2023-11-27 11:34:37 -08:00
George Hotz	7170a9a057	coder.py can write and run code (#2439 ) * wip mistral * coder * touchups * cleanups * mistral cleanups * clean up cache create * download the weights, fix tests * fix llama loading * global fixup * clean up all * move llama model * cleanups * Revert "cleanups" This reverts commit a71c5d59eb86290634a258704d8bab2378b8d63d. * fine, leave it	2023-11-25 12:27:54 -08:00
George Hotz	8ff2e13550	From teeny (#2426 ) * changes from teenygrad work * support not supporting ImageDType/PtrDType * fixups from teeny	2023-11-24 12:50:56 -08:00

1 2 3

121 Commits