tinygrad

Commit Graph

Author	SHA1	Message	Date
qazal	27f4de2ce4	delete half_prekernel (#3388 ) * generic rendering of half and bf16 hotfix * fix uops + regression test * fix the test for metal's half4 * uop.uop fixup * mypy with --strict-equality, fix ops_gpu	2024-02-14 15:40:48 +01:00
qazal	c8fd66a131	Run RDNA3 tensor core tests in CI (#3367 ) * add test_linearizer * skip test_padto_matmul	2024-02-11 19:54:06 -05:00
Francis Lam	ce21fdfb67	ops_python: add HIP tensor core mock and refactor METAL (#3354 ) * ops_python: add HIP tensor core mock and refactor METAL * Add tests to CI * add DEBUG=2 to full tests --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-02-09 12:46:06 +01:00
George Hotz	b385234961	oops, change to 3.12 (#3357 )	2024-02-09 12:21:06 +01:00
George Hotz	7726eef464	ops_python: add image support (#3356 ) * ops_python: add image support * uops tests in their own CI * fix ci	2024-02-09 12:02:06 +01:00
George Hotz	c32ea95d7d	Python uop emulator (#3327 ) * start uop emu * tiny_add passes * more ops * emulate the whole warp * test_gemm passes * metal gemm test pass * works on big gemm * works on big gemm * more tests pass * touch ups * fix mypy * cleanups * exp2 mypy * arch is where it belongs * actually emulate tensor cores * fix test * new style	2024-02-08 19:24:55 +01:00
chenyu	d8ad9e5660	verify eval acc for hlb_cifar training (#3344 ) set to 93% to reduce flakiness for now	2024-02-07 19:19:59 -05:00
chenyu	0d2dacb549	test intermediate tensors created by function have same device as input (#3338 ) run on TORCH since it's the fastest one on CI. caught a bug in multinomial, and update the behavior of fancy index and gather to move the indices Tensor to same device as self.	2024-02-07 09:24:36 -05:00
chenyu	3a7c1eb383	add winograd hlb_cifar10 back to tinybox benchmark (#3300 ) * add winograd hlb_cifar10 back to tinybox benchmark * LATEWINO * use wino for the full run to save benchmark time	2024-02-02 04:29:56 -05:00
chenyu	18e854cdbf	shrink MLB on sharded axis (#3255 ) * shrink MLB on sharded axis use onehot structure to store the real partition. goal is unsynced batchnorm2d that can be run on multigpu for training. draft version in https://github.com/chenyuxyz/tinygrad/pull/109 * SYNCBN flag * test unclean shrinks * UnsyncedBatchNorm reuses BatchNorm * more robust pad arg check * better types * more tests! * 6 gpus in benchmark * disable slow GPUS=6 benchmark	2024-01-31 21:48:25 -05:00
qazal	5b46b0ff3d	Simple RDNA3 emulator (#2974 ) * mockhip->hipcpu * allocate buffers * launch a kernel read_asm api * run remu in CI * remu 0.0.2, real test ops * simple driver * 0.0.3, all test_ops * run the latest emulator * 9 minutes is way too long, drop backprop in CI * bring back the backward pass * Revert "bring back the backward pass" This reverts commit 3781e1bc56fc06b424e7c7bed1224f819247fb8f. * Print slowest tests * emulated device directly in ops_hip * fix ruff, override mypy for specific rules * test in the same code path - hip backend env variables - install packages and verify autogen - run certain tests - remove the other hip tests path - verify Device.DEFAULT * remove the emulated hip in extra --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-01-30 10:39:28 -08:00
chenyu	34c7621556	HIP=1 NOCLANG=1 for tinybox external_model_benchmark (#3270 ) used HIP instead of GPU and disabled slow CLANG	2024-01-28 22:05:26 -05:00
George Hotz	0aad8d238b	rebuild ocelot (#3259 ) * rebuild * strip trailing whitespace	2024-01-26 18:46:36 -08:00
George Hotz	03a6bc59c1	move autogen to runtime/autogen (#3254 )	2024-01-26 12:44:19 -08:00
George Hotz	a3869ffd46	move gpuctypes in tree (#3253 ) * move gpuctypes in tree * fix mypy * regex exclude * autogen sh * mypy exclude * does that fix it * fix mypy * add hip confirm * verify all autogens * build clang2py * opencl headers * gpu on 22.04	2024-01-26 12:25:03 -08:00
chenyu	bc92c4cc32	onnx Einsum, CumSum, DepthToSpace, SpaceToDepth (#3252 ) * onnx Einsum, CumSum, DepthToSpace, SpaceToDepth Einsum inner product and `...` are not supported * --durations=20	2024-01-26 10:47:53 -05:00
George Hotz	aa0d1b6330	hotfix: don't use noqa: E702 that's just dumb	2024-01-24 20:01:00 -08:00
chenyu	2088937206	run full hlb_cifar training in tinybox ci (#3145 ) * run full hlb_cifar training in tinybox ci single gpu ~89 seconds * time that	2024-01-15 23:59:20 -05:00
chenyu	e078e2d060	add half @ half to mac benchmark (#3103 )	2024-01-12 16:38:41 -05:00
chenyu	93e3f952aa	use BEAM=2 instead of BEAM=4 in cuda ci gpt2 (#3089 ) BEAM=2 is faster and less search time. investigating why BEAM2+BEAM4 is slower than BEAM2 alone	2024-01-11 13:21:06 -05:00
chenyu	7f9590d357	hotfix disable flaky mac runner wino cifar (#3087 )	2024-01-11 11:57:05 -05:00
jxdv	ef3aa6d7fb	update gh actions (#3033 ) * update checkout actions * update upload artifact * update setup python --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-01-09 17:52:22 -08:00
chenyu	1d730b8853	remove ACCUM_FP32 in simple_matmul.py (#3045 ) * remove ACCUM_FP32 in simple_matmul.py accumate for half inputs is always in float * move test llama compile speed to metal	2024-01-08 17:37:57 -05:00
George Hotz	50754f1494	add caches there (#3042 ) * add caches there * no curl	2024-01-08 13:02:16 -08:00
George Hotz	c5a941d466	webgl backend in extra (#3041 ) * WebGL WIP * 84% of ops passing test * tests passing 100% * Cleanup, refactor * Shave off some lines * Work on dtypes * TestOps at 100% again * Efficient net shaders compile in browser webgl2 * Compile all efficientnet shaders in browser * Create empty textures for tensor buffers * Run program. Up next weight loading * Exported WebGL model working * Add tests, refactor * Explicit cast alu for GLSL * Fix CI tests * WebGL efficientnet demo * Compile and run yolov8 in browser * Fix imports * Simplify yolo compile * Fix boolbool and cast cmplt to float More tests * Do std tests pass on CI? * Skip std tests on CI * Remove explicit_cast_alu hack, and solve it in code_for_op * Move to new dtype-less alloc api * Remove local size hack: optimize local_size only if device has local * Remove glsl.py, and move content to cstyle * dont_use_locals in opts * Fix dtype tests * type_map in CStyleLanguage * Make core changes smaller, cleaner, refactor export_model and demo * Skip pad_slice * Simplify: render_const, render_conditional * solve bool alu for other binops, cleaner ops_webgl * Fix noopt hack * Remove some skipIfs * WebGL image hack * type_names is a better name * global_max * Fix dtype import * Fix type_names -> type_map * Fix lint * Remove webgpu, back to 5k lines (#3040) * remove webgpu * max 5000 lines * revert those to master * retain that cstyle --------- Co-authored-by: Ahmed Harmouche <ahmedharmouche92@gmail.com>	2024-01-08 09:29:13 -08:00
George Hotz	8cbcd1b342	Remove webgpu, back to 5k lines (#3040 ) * remove webgpu * max 5000 lines	2024-01-08 09:10:07 -08:00
George Hotz	60abc62a3f	fast hip read (#3014 ) * fast hip read * hip read faster * fix tests * to_mv * simplify * bump to 6k lines	2024-01-05 10:33:13 -08:00
chenyu	2b6670d2ea	separate entry for HALF hlb_cifar10 in benchmark (#3010 )	2024-01-04 13:24:10 -05:00
George Hotz	a0c7cb2564	hotfix: create weights dir in local tg checkout	2024-01-03 14:14:33 -08:00
George Hotz	fc36a7d669	tinygrad weights	2024-01-03 14:09:28 -08:00
George Hotz	0be0f2f745	remove stable diffusion test on tinymac	2024-01-03 13:18:24 -08:00
George Hotz	753a7ecc05	Hip driver (#2992 ) * start hip driver * fix hip llama * make HIP default if we can * don't change those	2024-01-03 12:53:47 -08:00
Yixiang Gao	ea3bc2f509	remove wino benchmark for now	2024-01-03 10:46:43 -08:00
Yixiang Gao	5663dd46b6	Merge branch 'master' of github.com:tinygrad/tinygrad into cifar_fp16	2024-01-03 10:11:46 -08:00
Yixiang Gao	7f1802cd50	update benchmark	2024-01-03 09:09:34 -08:00
George Hotz	f494b9d463	simple multitensor API (#2903 ) * simple multitensor API * test multitensor * mt work * new api * copies * all but data parallel * allreduce there * works, but axis sharded * fix all mt tests * features/multi * work * backprop * fix tests * tests passing * mt progress * cleanups * less lines * tensor cleanup * save more lines * mypy passes * fix tests * skip for cuda too * bump download cache	2024-01-02 17:49:44 -08:00
George Hotz	dbe4a1a914	switch CI to tiny8 (#2984 ) * switch CI to tiny8 * no copyin for disk * Revert "no copyin for disk" This reverts commit eb46b7e93da4a650d8125020c38f44d1f8f2c86e. * rocm 6 broke llama * rename it	2024-01-02 16:40:25 -08:00
Yixiang Gao	54cdba57e7	mend	2024-01-02 14:21:06 -08:00
Yixiang Gao	26303d181b	re-enable half cifar benchmarks	2024-01-02 14:16:35 -08:00
George Hotz	17f0c3006b	hotfix: do stable diffusion first on mac	2024-01-01 15:38:25 -08:00
chenyu	e53b96fdbb	fix TC=2 tensor core op test (#2951 ) * print DEBUG for TC=2 in CI * enable TC=2 * no need to check src type * LOAD has side effect * don't push any local buffer * update comment * and BARRIER	2023-12-29 21:39:49 -05:00
George Hotz	7da2325dc7	get_lazyops() -> lazyops (#2884 ) * get_lazyops() -> lazyops * don't compare empty mem	2023-12-20 18:04:49 -08:00
George Hotz	1765849937	new lazy, benchmark (#2878 ) * lazy rewrite, try 2 * min fix tests * pass contig test * put broken pads back * move that to realize * no contig child fixes array packing * so wrong * now that's correct * base children * fix bind issues * disable to_image_idx * fix tests * that failure shouldn't break other tests * more fixes * fix torch * skip failing tests in CI * 1e-7 * half is broken * 1e-6 margin of error	2023-12-20 14:33:21 -08:00
George Hotz	ca59054463	fix shapetracker math (#2861 ) * proper test * all st math good now * fix real_strides bug	2023-12-19 22:17:34 -08:00
chenyu	1231ec5a02	run the sz.py line count at the end of linter ci (#2857 )	2023-12-19 16:33:12 -05:00
George Hotz	6617dcf095	move graph to runtime, check line count with sz.py (#2842 ) * move graph to runtime, check line count with sz.py * oops, didn't save * dtype aliases * restore comment, REALCOUNT	2023-12-18 20:30:06 -08:00
George Hotz	80f53245e8	shapetracker add and invert (#2828 ) * invert (broken) * decent invert * shapetracker invert works * plus is meh, invert is good * support invert mask * a few more invert tests * shapetracker math invert test	2023-12-18 16:03:27 -08:00
chenyu	73cadfbb3c	Remove pytest markers (#2831 ) * remove pytest marker * fix some, skip some * tweak * fix * skip slow * skip more	2023-12-18 18:53:28 -05:00
chenyu	4e2a92cee1	run HALF GPT2 in nvidia benchmark in addition to HALF/BEAM (#2811 ) easier to separate the issue between HALF and BEAM when it failed	2023-12-17 02:24:55 -05:00
George Hotz	051402625e	remove pushing contig + fix linearizer bug (#2798 ) * remove that logic * fix test, move LOADs * fix repeat issue on LLVM * with_phi	2023-12-16 09:36:31 -08:00
George Hotz	c6eb618013	tests from new lazy branch (#2774 ) * tests from new lazy branch * fix lin 11 * that was needed * doesn't fail * mark * meant that * llvm passes	2023-12-14 23:06:39 -08:00
chenyu	a044125c39	validate stable diffusion for seed 0 (#2773 ) * validate stable diffusion for seed 0 the closest false positive i can get is with the setup and one less step. dist = 0.0036 same setup with fp16 has dist=5e-6. so setting validation threshold to 1e-4 should be good * run with --seed 0	2023-12-15 00:07:09 -05:00
Ahmed Harmouche	4b01839774	support vals on WebGPU, run more tests (#2668 ) * Vals on webgpu, run more tests * Skip slow tests, run symbolic ops tests * Balance out tests	2023-12-07 16:45:21 -08:00
George Hotz	00d9eda961	FROM -> COPY, move vars_from_ast (#2675 )	2023-12-07 16:32:30 -08:00
Ahmed Harmouche	50dcd532d5	Get all WEBGPU test_ops passing (#2646 ) * Get all WEBGPU tests passing * Custom render store is not needed in wgsl	2023-12-06 07:40:37 -08:00
chenyu	229ada5fe5	Gpt2 benchmark with HALF and BEAM (#2636 ) * benchmark gpt2 with half and beam * BEAM=4 * optional validation * green is good * we care	2023-12-05 22:15:16 -05:00
George Hotz	35b5e95097	parallel beam search (#2610 ) * better print * fix beam search with vars * cleanups * parallel is not default * restore that * bugfix * cleanups * bugfix	2023-12-05 10:09:45 -08:00
George Hotz	bbeba8ec85	use default dict for external_model_benchmark (#2592 ) * device default * Device.DEFAULT * half max for cuda * CUDA_INCLUDE_PATH * closer to working * cuda fixups * Update ops_cuda.py	2023-12-03 15:25:43 -08:00
George Hotz	bc012f26b9	hotfix, disable model inference benchmark on NVIDIA	2023-12-03 13:52:41 -08:00
qazal	4380ccb169	Non fp32 math (#2264 ) * `global_load` and `global_store` using buffer dtype * `UOps.PHI` in all dtypes * `UOps.ALU` in all dtypes * `UOps.CONST` & `UOps.DEFINE_ACC` in all dtypes * -- endof implementation -- +tiny lint changes * these tests require the fp16 extention you can run them locally to confirm they're green: (GPT2 test is broken in master for mac, see [this](https://discord.com/channels/1068976834382925865/1069001075828469790/1177993277958533261) `GPU=1 python3 -m pytest test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_dequantizelinear_e4m3fn_float16_cpu test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_max_float16_cpu test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_min_float16_cpu test/models/test_real_world.py::TestRealWorld::test_llama test/models/test_real_world.py::TestRealWorld::test_gpt2 test/models/test_whisper.py test/test_specific_conv.py::TestSpecific::test_big_vec_mul` skip the new test_linearizer_failures in CI GPU because of the fp16 extention This passes on a real GPU since the extention is available: `GPU=1 python3 -m pytest test/test_linearizer_failures.py::TestLinearizerFailures::test_failure_8` see CI logs [here](https://github.com/tinygrad/tinygrad/actions/runs/6996590597/job/19032641427#step:14:644) * these tests fail in CI due to segfaults and CPU crashes To confirm they're green locally, you can run the following commands: 1. For the tests skipped in test_ops.py (note: CLANG is very slow) `for var in GPU CUDA CLANG; do export $var=1; for test in test/test_ops.py::TestOps::test_slice_fancy_indexing_no_dim_collapse test/test_ops.py::TestOps::test_slice_fancy_indexing_dim_collapse_int test/test_ops.py::TestOps::test_slice_fancy_indexing_dim_inject_none test/test_ops.py::TestOps::test_slice_fancy_indexing_dim_inject_and_collapse; do python3 -m pytest $test; done; unset $var; done` 2. For the ONNX tests skipped in CLANG: ``` CLANG=1 python3 -m pytest test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_ai_onnx_ml_array_feature_extractor_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_gather_elements_0_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_mean_weight_ii_3d_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_gather_elements_1_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_NCd1_mean_weight_negative_ii_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1_weight_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2d3_none_no_weight_negative_ii_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1_ii_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_mean_weight_ii_4d_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_mean_weight_ii_3d_log_prob_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_gather_elements_negative_indices_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_NCd1d2d3d4d5_mean_weight_log_prob_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_NCd1_mean_weight_negative_ii_log_prob_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_no_weight_reduction_mean_ii_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_NCd1d2d3d4d5_mean_weight_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2d3d4d5_mean_weight_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1_mean_weight_negative_ii_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_mean_weight_ii_4d_log_prob_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_with_weight_reduction_mean_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1_weight_ii_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_with_weight_reduction_sum_ii_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_with_weight_reduction_sum_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_reduction_sum_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2d3d4d5_none_no_weight_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2d3_sum_weight_high_ii_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_reduction_mean_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_with_weight_expanded_cpu ``` 3. The LLVM test I skipped here is already [skipped in master for all backends](https://github.com/tinygrad/tinygrad/blob/master/test/external/external_test_onnx_backend.py#L186), I just made it more specific `LLVM=1 python3 -m pytest test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_dequantizelinear_e4m3fn_float16_cpu` * Revert "these tests fail in CI due to segfaults and CPU crashes" This reverts commit 15db57014381a4449d563526ac6c870e36257658. * merge with cleanup-vectorized-hip-renders * barely working HIP P1, ALU ops need a refactor? * manage the fact that in HIP [half2 is actually an unsigned int vec](`f921880387/hip/include/hip/amd_detail/amd_hip_fp16.h (L59)`) and half is a totally different __half that [has an unsigned int element in it](`f921880387/hip/include/hip/amd_detail/amd_hip_fp16.h (L50)`) but can't be accessed [because it's private](`f921880387/hip/include/hip/amd_detail/amd_hip_fp16.h (L86)`). If you just do this: ``` half2 val0 = // ... half val1 = // ... ``` then you can't do: ``` val0.x + val1 // error: use of overloaded operator '+' is ambiguous (with operand types 'unsigned short' and 'half' (aka '__half')) ``` * update the sign definition to avoid division by zero in all dtypes * diff cleanup p1: why were these in the diff anyways * less hacky HIP, enable CIFAR fp16 benchmark, test ops for HIP in CI! add ALU ops overloads for HIP this will make HIP max work handle mod Revert "handle mod" This reverts commit 370fd4b3fbe99b6ae8cc293d005b106628205933. update max to use hmax add HIP GEP render logic enable CIFAR fp16 benchmark test ops for HIP back to store as float because this only works for float4 grouping right now test_ops for hip!! always sign * back to the sign we had before because we cant do a backward pass on a Less node * remove old hacks HIP compiling test_ops in CI takes ~9 mins, not doing it for now new HIP ALUs * reduce accs done right * refactor to function * no device hacks hacks p2 the other way * LLVM ALU ops half, float and double are all float update max * update test_uops, cmplt is always a bool in the real linearizer. assertAlmostEqual is wrong when ret is bool * cleanup LLVM wrong code * dummy change for the CUDA install glitch --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-12-03 13:45:49 -08:00
chenyu	1ac958a058	update pytest marks and CI test filters (#2587 ) * remove pytest marks * test more stuff * fine revert some * add that mark back * skip that * hmm LLVM does not work on ubuntu * too slow on CUDA CI * dup test	2023-12-03 15:20:44 -05:00
George Hotz	5068e99d18	refactor to remove extra kernel params (#2563 ) * refactor to have compiled kernel * bugfixes * docs/beautiful.py * revert that * fix tests	2023-12-02 00:32:25 -08:00
George Hotz	27481b9206	Switch ops_gpu -> gpuctypes (#2532 ) * ops_gpu is go * fix size 0 * fix image, and add more tests * nerf openpilot test, doesn't test thneed * run the schedule * better * oops, new inputs * delete pyopencl * Update ops_gpu.py	2023-12-01 22:30:21 -08:00
George Hotz	4c984bba7e	bump version to 0.8.0, clean CI, remove requests (#2545 ) * bump version to 0.8.0, clean CI, remove requests * why was that even there	2023-12-01 10:42:50 -08:00
George Hotz	8fd8399437	remove flake8 (#2544 )	2023-12-01 09:48:41 -08:00
George Hotz	d8175a4380	simple fix (#2543 )	2023-12-01 09:42:15 -08:00
George Hotz	2c363b5f0b	new style device (#2530 ) * cpu tests pass * torch works * works * metal works * fix ops_disk * metal jit works * fix openpilot * llvm and clang work * fix webgpu * docs are rly broken * LRU works on metal * delete comment * revert name to ._buf. LRU only on Compiled * changes * allocator * allocator, getting closer * lru alloc * LRUAllocator * all pass * metal * cuda * test examples * linearizer * test fixes * fix custom + clean realize * fix hip * skip tests * fix tests * fix size=0 * fix MOCKHIP * fix thneed * copy better * simple * old style metal copy * fix thneed * np reshape * give cuda a device	2023-11-30 17:07:16 -08:00
chenyu	7d26452305	call ruff with --preview (#2522 ) some checks are ignored without --preview	2023-11-30 13:59:00 -05:00
George Hotz	3dedeaae74	rebalance tests (#2504 ) * rebalance * balance * parallel apt-get for all * .local/lib/python3.11/site-packages * what is user doing * is that path right * Update test.yml * okay where are you * site-packages	2023-11-29 11:18:22 -08:00
George Hotz	065aff747e	make webgpu test reliable (#2502 ) * remove retry that doesn't work * fix cleanup * process exit in cleanup * add space	2023-11-29 10:02:24 -08:00
George Hotz	947711a532	split metal and webgpu tests (#2501 )	2023-11-29 09:32:09 -08:00
chenyu	3eb3c74675	metal ci tests everything (#2499 ) * metal ci tests everything * pretty good * METAL	2023-11-29 12:04:37 -05:00
George Hotz	889acefe85	Support weird loads in Image (#2498 ) * image support weird loads * umm, that was always wrong * openpilot compile fails with a weird error * image test passes * we have valids now * clean that up * no more required opts * add fastvits test, fix bug * minor cleanups	2023-11-29 08:30:46 -08:00
Liam	cf0c9096a9	Removing METAL Skips as CI works (#2488 ) * Test metal CI * remove metal and CI restrictions * enable dtype tests for metal ci	2023-11-28 19:46:59 -08:00
George Hotz	d87a246439	move to new cached fetch (#2493 ) * move to new cached fetch * extra.utils is over * loads * bump download cache * bump timeout	2023-11-28 17:36:55 -08:00
chenyu	28a67106ca	enable symbolic ops tests for hip (#2485 )	2023-11-27 22:33:41 -08:00
Davi Silva	136dbd8b36	HIP CI that compiles (to RDNA3) but doesn't have to run (#2482 ) * hip amd compilation * gate the test properly * cleanup unused import * remove superfluous numpy conversion * add SpeedyNet tests (f32 [passes] & f16 [fails]) * make CI verbose (error log from hip compiler) * test the real ops_hip * Merge branch 'tinygrad:master' into ci/hip-compilation * fix CI * cleanup * really fix CI * Fix CI Three: the refixening --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-11-27 21:17:06 -08:00
George Hotz	acbe6d1b53	Revert "HIP compilation on CI targeting RDNA3 (#2459 )" (#2481 ) This reverts commit `d275ff930a`.	2023-11-27 20:41:21 -08:00
Davi Silva	d275ff930a	HIP compilation on CI targeting RDNA3 (#2459 ) * hip amd compilation * gate the test properly * cleanup unused import * remove superfluous numpy conversion * add SpeedyNet tests (f32 [passes] & f16 [fails]) * make CI verbose (error log from hip compiler) * test the real ops_hip * Merge branch 'tinygrad:master' into ci/hip-compilation * fix CI * cleanup * really fix CI	2023-11-27 20:33:11 -08:00
George Hotz	9e07824542	move device to device.py (#2466 ) * move device to device.py * pylint test --disable R,C,W,E --enable E0611 * fix tests	2023-11-27 11:34:37 -08:00
andresgit	259a869fc1	Fix UnicodeDecodeError when debugging on Intel APU (#2421 ) * test DEBUG=5 * print prg if NVIDIA, fixes error on Intel APU	2023-11-25 12:30:50 -08:00
George Hotz	857d440ea7	fail means fail (#2391 ) * flip order * cleanup and comment out failing test	2023-11-24 08:27:39 -08:00
George Hotz	1f4231a8f9	global pipefail	2023-11-24 08:03:49 -08:00
George Hotz	095e2ced61	add name support to fetch (#2407 ) * add name support * use fetch in gpt2 * remove requests from main lib, networkx also optional * umm, keep that assert * updates to fetch * i love the walrus so much * stop bundling mnist with tinygrad * err, https * download cache names * add DOWNLOAD_CACHE_VERSION * need env. * ugh, wrong path * replace get_child	2023-11-23 14:16:17 -08:00
Francis Lata	6d672785db	Update Whisper to use fetch helper (#2401 ) * update whisper to use new fetch helper * simplify file opening * update name * update key name to "downloads-cache"	2023-11-23 12:59:59 -08:00
George Hotz	66c75f30c6	remove triton (#2396 )	2023-11-23 07:40:59 -08:00
George Hotz	8656eebb42	jit doesn't use named tensors (#2393 ) * jit doesn't use named tensors * move to compile2 * remove broken single root junk * explicit float32 * skip slow test	2023-11-23 00:13:18 -08:00
mmmkkaaayy	08d09eb666	Enable whisper test in CI for more backends (#2355 )	2023-11-18 17:52:50 -05:00
chenyu	8e22c0d95c	everything can jit now (#2338 )	2023-11-16 23:54:57 -05:00
George Hotz	1d5501594e	force rebuild of ocelot (#2334 ) * force rebuild of ocelot * SzymonOzog gpuocelot * delete that * downgrade that * non parallel * force rebuild * use llvm * nauto * less mem maybe * print test * helper_test_exception skip CUDACPU * helper_test_exception * shippable	2023-11-16 20:44:14 -08:00
chenyu	163b2bc26a	wgpu.utils._device -> wgpu.utils.device (#2330 ) * wgpu.utils._device -> wgpu.utils.device * can i do this? * no need to specify metal	2023-11-16 12:52:13 -05:00
forcefieldsovereign	b64738e1d6	Remove AS_STRIDED from shapetracker (#2216 ) * very close * remove comment * negative strides working * almost everything passes * calculate offset with list comprehension * some cleanup * got disk load working * review suggestions * fix after merge * overlap working * did it * clean * fixed disk load * lint * mypy * removed as_strided * trying without simplify * added back simplify * make sure expanding to smaller shape * cleanup * removed comment * removed env file * trying whisper test again * onnx test sqlite issue * working on test * finished test * eliminate unnecessary shrink-then-pad * don't shrink buffer * added strides check * added to ci under linters * switch issue * allow symbolic stride * removed .env * isinstance * adjust strides for double expand * cleanup * needed to add type hint for mypy * set pythonpath	2023-11-15 15:50:17 -05:00
mmmkkaaayy	91546225f4	Add cache step for model weights in CI, re-enable whisper test (#2307 )	2023-11-14 21:16:04 -08:00
George Hotz	01f8781c26	fix CI (#2300 ) * might work * might work 2 * might work 3 * sneak that in to llama too * pin them all	2023-11-14 11:02:59 -08:00
George Hotz	38b7f5a7fd	less phi, proper phi (#2241 ) * less phi, proper phi * disable flaky whisper test	2023-11-08 16:13:43 -08:00
George Hotz	c60c3b467a	clean up symlinking in benchmark (#2219 ) * clean up symlinking * make torch deterministic	2023-11-05 16:46:05 -08:00
George Hotz	8ba7ced7f9	extract const if it's const (#2193 ) * extract const if it's const * fix if statement * fast math issue * fix graphing and casting * disable flaky copyout test	2023-10-31 18:52:35 -07:00
George Hotz	a27c9f9de5	openpilot compile2 (#2189 ) * try compile2 * pass to thneed * fix tanh onnx	2023-10-31 11:08:58 -07:00
Akshay Kashyap	018bd29e37	Enable Multi-Output Export (#2179 ) * Enable Multi-Output Export * Add test * Update examples and lint * fix padding * test ops * dummy commit to rerun test * revert cuda lint * Enforce tuple/list of tensors * subscripted generics * put back webgpu test * Re-enable WebGPU Efficientnet test	2023-10-30 18:42:26 -07:00
chenyu	6c58bf3e9c	in time_linearizer, allocate a scratch buffer if output buffer is also input (#2152 ) * in time_linearizer, allocate a scratch buffer if output buffer is also input * move scratch buffer creation outside search	2023-10-28 07:17:41 -10:00
chenyu	0ca0e9ee5e	exclude ast with variables from beam search (#2140 ) * exclude ast with variables from beam search * test that * add to CI	2023-10-25 16:35:29 -04:00
Szymon Ożóg	a52b420fb3	switch ocelot back to main repo (#2147 ) * return to ocelot main branch * cd before checkout	2023-10-25 15:14:26 -04:00
George Hotz	12dd165d38	add WINO/HALF/HIP to AMD benchmark	2023-10-25 13:22:45 -04:00
Francis Lam	bf3490cdf9	wmma: refactor tensor cores using existing local dims (#2097 ) * wmma: refactor tensor cores using existing local dims * optimizer: fix bad rebase and break after one late local --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-10-25 13:10:46 -04:00
George Hotz	abeba8f1fc	optimization: get actions in CI (#2125 ) * get actions in CI * actually run the test * pythonpath	2023-10-20 12:22:01 -07:00
George Hotz	4526891db7	parallel apt (#2111 )	2023-10-18 14:49:00 -07:00
George Hotz	15da96f393	print test durations and add speed (#2107 ) * print test durations * decrease sizes to increase speed * faster * GPU/CLANG onnx in seperate runner * test split, move ONNX CPU CI * simpler tests * simpler uops test * faster * less cuda apt * running ninja install * apt install * split fancy indexing	2023-10-18 13:46:42 -07:00
George Hotz	e2a1c2aaa6	force ruff reinstall	2023-10-18 11:40:46 -07:00
George Hotz	0d2b3a9d33	full path for ruff	2023-10-18 11:27:49 -07:00
George Hotz	8940c89d13	tests: remove 2 runners, make cache reliable (#2106 ) * remove 2 runners * device.DEFAULT printing * explain rebuild * disable ocelot rebuild * try again to fix workflow * this? fix cache hash * force no rebuild * fix pylint	2023-10-18 11:10:41 -07:00
George Hotz	b3afe0106b	typo, src printing, and no verbose on triton (#2105 )	2023-10-18 09:44:36 -07:00
George Hotz	881fd7c141	add mops to graph, refactor IMAGE (#2100 ) * add mops to graph, refactor IMAGE * no reshape pushing * add todo * fix openpilot model alt * push reshapes reduces kernels in new op * IMAGE=2 is a first class citizen now	2023-10-17 21:27:51 -07:00
Szymon Ożóg	4bef1591f0	Disable ocelot cache + fix matvec in triton (#2010 ) * Revert "disable flaky triton test" This reverts commit `1e15fdaee7`. * Update test.yml * check if has shared for matvec * disable ocelot cache for triton * disable ocelot cache * disable ocelot cache * pass shared to triton uops tests * temporary debugs for CI crash * Revert "temporary debugs for CI crash" This reverts commit fee3ea96c818e83c19b935c2f8482e0ccc91a542. * Revert "triton isn't tested, and allows this refactor (#2007)" This reverts commit `dea8bb0938`. * add runtime_args to every renderer, move triton local size override to runtime args * Add binary to args, correct type returned * update to new loops * Update test.yml	2023-10-17 10:33:32 -07:00
geohotstan	5ed630204b	Add ONNX to CI for other backends (#2069 ) * some cleanup * move continue back * more more more * added to CI * try * try intentionally break some tests * wtf * del True for test * yay tests broke, now pls no break * try AGAIN * gahy * lol * try * move over constant * moved over MORE * move shrink over * trailing lines * try CUDA CI * try again * boom * oops * improved comments * try: disable some flags and disable CUDA * try breaking tests * traceback has too much info so add --tb=no * revert forced CI failure * add comments and del unused imports * oooooooo using regular debug try enable tb * intentionally break tests * added tb back. Maybe not too verbose * strip whitespcae * missed something * Shape op int32 -> int64 * oops missed something * add some types * get rid of crazy 1 liners in pad op * actually test Split this time LOL * strip that whitespace	2023-10-17 09:33:54 -07:00
George Hotz	5a4a62ecae	Disable logging in early compile2 and lower kernel counts (#2090 ) * Revert "Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)" This reverts commit `924ecc4d6a`. * gate behind OPT >= 4 * disable_logging in schedule * simple * from master * more images * revert that * 206 kernels	2023-10-16 20:15:24 -07:00
George Hotz	d0aaf7d83b	Revert "Revert "Revert "openpilot kernel fix from 209 to 207 (#2006 )" (#2065 )"" This reverts commit f22a7cf6561fd3843b7e0c1d77a72a39a127bcd8.	2023-10-16 17:47:00 -07:00
George Hotz	5e24dc5a95	limit metal buffers and revert the 207 fix (try 2) (#2088 ) * limit metal buffers * look at the base, not the srcs * Revert "Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)" This reverts commit `924ecc4d6a`. * add a test for that	2023-10-16 14:52:16 -07:00
George Hotz	e8fcd2f3db	Revert "limit metal buffers and revert the 207 fix (#2087 )" This reverts commit `2fb10f6a19`.	2023-10-16 14:32:22 -07:00
George Hotz	2fb10f6a19	limit metal buffers and revert the 207 fix (#2087 ) * limit metal buffers * Revert "Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)" This reverts commit `924ecc4d6a`.	2023-10-16 14:26:32 -07:00
George Hotz	c36d306606	KOPT is over, BEAM is upstream (#2071 ) * create cache for q learning * make linter happy * global beam * where it belongs * bugfix * ditch the kopt, use the beam * faster lin and DEBUG=2 okay * remove kopt, move search to features	2023-10-16 09:46:03 -07:00
mmmkkaaayy	91168a28c4	whisper: make file transcription work, add basic CI test (#2042 )	2023-10-13 17:13:35 -07:00
George Hotz	924ecc4d6a	Revert "openpilot kernel fix from 209 to 207 (#2006 )" (#2065 ) This reverts commit `63869c62fc`.	2023-10-13 12:01:55 -07:00
Amrit Sahu	63869c62fc	openpilot kernel fix from 209 to 207 (#2006 ) * Fix openpilot kernel from 209 to 206 1. Use push_movement_ops conditions in _movement_op. Don't push PAD or check if the ops are safe to be pushed with PAD 2. Don't push if all the op.buffers are realized * change ALLOWED_KERNEL_COUNT to 206 for openpilot * don't push through sourceless buffers * change the tests to adjust kernel counts for new behaviour * restore pushing of movement ops through childless buffer * don't push EXPAND, causes OOM * allow push of intermediate movement ops * adding new test behaviour * modifying external_test_opt for new behaviour * restore old tests * Reenable push of EXPAND and introduce new tests I was wrong intially thinking EXPAND can cause OOM and hence I had disabled it. Since it is 0 stride and doesn't allocate memory its cool * Don't push EXPAND above LoadOps LB. This is causing OOM * Push should be decided on movement root of bufs To check if ast.op.buffers is sourceless/ realized go the the movement root and then decide if pushing should be done or not * refactor for readability * use .base instead * don't push expand, bad memory/compute consumption * restrict push of reshape, seeing improvement * push reshape if unary without further check * disable PAD solves convnext kernel count increase * reenable test_cache_binaryop_transpose * small nit	2023-10-13 11:59:15 -07:00
qazal	0e2e041faf	CI for using tinygrad as an external pkg (#2019 ) * create workflow * unify with test.yml	2023-10-08 10:50:48 -07:00
Vidhan Bhatt	94b21c41a7	ci: use `mypy.ini` (#1993 )	2023-10-06 01:45:28 -07:00
George Hotz	2d0c1037b1	Fix up latest openpilot model (#1976 ) * fix gemv triggering for gemm * fixup_openpilot * external test issues	2023-10-05 05:24:28 -07:00
Ahmed Harmouche	fb4d830a2a	Fix cast error in render_load in wgsl (#1956 ) * Fix cast error in wgsl * User render_cast intead of introducing new method * Make it shorter * Add back webgpu tests: efficientnet and dtypes	2023-10-04 02:29:14 -07:00
George Hotz	6a79d4044a	unrealized consts everywhere (#1963 ) * unrealized consts everywhere * don't import device from lazy * Device isn't in Lazy * same issue * disable jit random	2023-10-04 01:48:10 -07:00
George Hotz	6a4ec4776e	fix CI (#1953 ) * this work * unauth * update in all places	2023-10-02 02:58:58 -07:00
Francis Lam	f445e056ed	wmma: add test and tensor core shape (#1925 )	2023-09-28 18:04:28 -07:00
Yixiang Gao	10f0dc0c85	keep only one comment from git action bot (#1936 )	2023-09-28 20:24:53 -04:00
wozeparrot	70671d9625	fix test_collectives (#1934 ) * fix: fix test_collectives.py * feat: reenable test_collectives	2023-09-28 11:02:22 -07:00
George Hotz	adab724caa	schedule2, keep the tests working with small changes (#1932 ) * lazy cleanups * ast functions take in LazyOps * op instead of self.op * _base for mops * fix contiguous * start schedule * test_schedule * fix openpilot * more tests * bugfix and test skip * work * make sure things get freed * fix zerosized tensors * fix failing test * fix ceil and friends * fix openpilot * disable training * disable test collectives	2023-09-28 09:14:43 -07:00
George Hotz	1e15fdaee7	disable flaky triton test	2023-09-23 14:59:36 +08:00
Szymon Ożóg	58296c079d	Make Triton work again (#1547 ) * Move ops_triton to runtime and remove errors from deprecated code * Remove deprecated AST Kernel * Remove deprecated buffer * Add TritonProgram * Triton Buffer * Use RawCUDABuffer * triton_compile * Added new parameter * pass _buf to program * remove deprecated include * Added triton tests * Deprecated includes removed * remove double print * Disable float4 support * Disable float4 support * variable load fix * Track local size * Add pycuda to triton dependencies * Merge test.yml * install cuda packages for testing * merge double package install * remove emulated from triton tests * upscale local index to power of 2 and add masking * cuda envs * Add TernaryOps * ConstOp loading * proper function name * remove deprecated variables * get global program from name * const ops match local shape * Enable test_nn * remove deprecated import * fix linter error * Add wait logic * Add local size override * accumulate local shapes instead of using max shape * Merge triton tests into global tests * fix envs in testing * Old testing routine * split file into renderer and program * remove print and starting whitespace * pretty ptx print on debug 5 * linter errors * ignore triton saturation tests * ignore test example * remove pytorch cpu extra index * Add triton to existing testing routine * use triton tests * disable cuda backend in triton tests * use cudacpu in tests * print used device * Print device default * Remove print * ensure we are running triton backend * update variable signatures * update dtypes for load * infinity render fixed * limit global size * negative infinity now properly rendered * split chain with parentheses for and node * Add option to disable shared memory, disable for triton * missing import * Properly index and mask conditional load * use mask only if not loading a block pointer * nan support * fix symbolic tests to include chain split * proper masking for stores * Implemented bool dtype * Add mod * fix loads for variables with valid range * merge triton with cuda runtime * merge from master * run triton tests with cuda * Correct target when running from triton * conftest with triton compiler config * use triton nightly * verbose tests for triton * capture stdout * fix function depth when exiting multiple loops * add render valid function for readabilty * fix mask for local loops * add _arg_int32 datatype * fix dims for conditional loads * enable non float stores * correct variable dtypes * fix type for arg_int32 * remove junk * Added get max function for range based var.max * remove deprecated code * Fix triton ptxas path * Fix testing for CI * clamp local size by max local size instead of always running max * Disable matmul test in triton cpu * rerun tests * Disable broken test in triton cpu * whitespace removed * rerun tests again * Disable TestSymbolicOps for triton * update to new uops * linter fix * ignore test/extra * linting fix * Update tinygrad/renderer/triton.py Co-authored-by: Gijs Koning <gijs-koning@live.nl> * remove deprecated line * quotes type fix * linter * Remove unnecesary lines * UnaryOps.NEG * dont define constants * Linting fix * Disable tests that are broken in ocelot * remove trailing whitespace * reduce line count * linting fix * update to new uast * New looping style * Update to new uast * make AST runner work with triton * linting fix * set renderer var for testing * disable local for ocelot * reenable all tests for ocelot * Pass shared to cuda * Don't group if the backend doesn't support shared mem * use working gpuocelot branch * enable all tests * enable local for ocelot * cleanup * Update test.yml * update cache key * reenable test symbolic and extra * Update test.yml * Revert "Update test.yml" (rerun tests) This reverts commit 98c0630ee5da4379e5c6b2437a5145fe87058c35. * Revert "fix symbolic tests to include chain split" This reverts commit 22a9a4c9cd14d23735e6540c8d90ee005ac4ea17. * Revert "split chain with parentheses for and node" This reverts commit 7499a7004ef4db785d0cd05cf292fdeff65ca90d. * use global size from linearizer * rename newvar to dtype to match other renderers * join program start lines * simplify code that adds axis to local dims * assign r[u] in ssa * We no longer need to replace target in src * we no longer need to cast indices to int by hand * Update triton.py(rerun tests) * Update triton.py(rerun tests) * Update triton.py(rerun tests) --------- Co-authored-by: Gijs Koning <gijs-koning@live.nl> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-09-23 14:17:12 +08:00
Umut Zengin	3987280daf	Fix VALIDHACKS for Images and make it default (#1832 ) * valid hacks * valid hacks * valid hacks * new method * new method * handtune * is gate load breaking? * lint ruff less junk new approach? maybe this? * Make it more clear * Make it more clear * Will deal with the linter later * hack for linter * subs the idx but dont touch the valid * Updated the mod rules * lint hack * I believe bug fix lets see * Mod Node left * revert * Maybe this wont break? * revert * implemented "handtuned garbage" * revert and use VALIDHACKS * Lets see the CI * still broken? * currently its jungle * maybe this jungle ? * This works for everything somehow * Added test for symbolic * lint * final touch * This still works * lint * midway clean * less garbage * lint * final form * Slow but working way * lint and other stuff * lint * mypy * Make sure CI test Openpilot valid checks * test if CI break * Convert back * refactor * refactor * Managed to reduce openpilot time from 30 secs to 5 secs * Refactor * Substitute a node with variable * flake8 * Comment and refactor * More comprehensive mod * refactor * bug fix * More shave off * remove not sure part	2023-09-23 07:34:43 +08:00
Yixiang Gao	84ab47a90a	add branch up-to-date check (#1879 )	2023-09-20 12:41:51 -04:00
Yixiang Gao	18ec5a9e09	add comment bot to CI (#1873 )	2023-09-16 12:22:06 -04:00
wozeparrot	c870764940	Revert "add line changes diff bot to CI (#1863 )" (#1870 )	2023-09-15 16:56:42 -04:00
Yixiang Gao	789c84a7a3	add line changes diff bot to CI (#1863 )	2023-09-15 16:29:58 -04:00
chenyu	29ac8293d7	run gpt2 in CI (#1866 )	2023-09-15 04:37:02 +08:00
chenyu	9e9ea20784	Fix view, CI cpu test with python 3.8 (#1845 )	2023-09-10 22:37:58 -04:00
George Hotz	0e3e2bac13	amd wino: upload results	2023-09-09 13:57:14 -07:00
George Hotz	6f95c5f284	winograd speed test for AMD (#1826 )	2023-09-09 13:56:33 -07:00
George Hotz	0f2bd10d00	add winograd CIFAR to mac tests (#1825 ) * add winograd CIFAR to mac tests * symlink already done	2023-09-09 13:45:24 -07:00
Pavol Rusnak	52a92bf95d	use class Foo: instead of class Foo(): (#1797 ) * use class Foo: instead of class Foo(): * add ruff linter, copy settings from .flake8 to ruff.toml	2023-09-06 12:20:25 -07:00
George Hotz	fb1cc6bf4b	llama jit is default, print tok/sec (#1774 ) * llama jit is default, print tok/sec * jit not default in CI	2023-09-05 10:12:16 -07:00
nimlgen	f863c12610	test kopt correctness (#1756 ) * test kopt correctness * bump BUDGET to 20 * kopt hooks as setUp/tearDown	2023-09-04 10:55:00 -07:00
George Hotz	56abe04e4b	disable assembly (#1755 )	2023-09-04 09:41:20 -07:00
chenyu	b8fde6bb0f	Test KOPT in CI (#1744 ) * test kopt in ci * getenv takes dtype from default	2023-09-03 14:37:20 -07:00
George Hotz	89cd380bfc	add nvidia CI (#1737 ) * add nvidia * speed(nvidia)	2023-09-01 22:02:30 -07:00
George Hotz	fdd7f282cb	Reenable tensor cores for self-hosted Mac CI (#1717 ) * debug 5 matmul * allow tensor cores in CI * tensor cores on arm64 * put debug back	2023-08-30 07:53:04 -07:00
wozeparrot	2f768e386d	stable diffusion benchmark artifact (#1714 )	2023-08-29 21:08:40 -04:00
George Hotz	0ea22bf249	remove DEBUG=1 from stable diffusion AMD since jit cache is fixed	2023-08-29 12:46:12 -07:00
George Hotz	ab9b9ff3e2	pipefail benchmark (#1709 ) (#1710 ) * feat: specify shell * feat: specify shell for mac Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2023-08-29 08:15:02 -07:00
George Hotz	aa7c98722b	sd timing (#1706 )	2023-08-28 20:22:57 -07:00
George Hotz	f5f8b09c13	allow manual release (#1704 )	2023-08-28 17:54:25 -07:00
George Hotz	715047a1e4	fix release publish (#1703 )	2023-08-28 17:48:00 -07:00
chenyu	b5d700adae	update openpilot supercombo.onnx to 0.9.4 (#1681 ) * update openpilot supercombo.onnx to 0.9.4 * update tests for the new model * comment out comma models from external_model_benchmark	2023-08-26 19:16:08 -04:00
Roelof van Dijk	89b529c07f	[ready] ci: add py38 to linters (#1674 ) * ci: add py38 to linters * fix: run linters only on py38 --------- Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-26 09:34:15 -04:00
George Hotz	a6d842af7a	move device to ops (#1646 ) * move device to ops * mlops types * 2 lines	2023-08-23 08:30:17 -07:00
Roelof van Dijk	1900acda09	[READY] ci: setup venv cache (#1475 ) * ci: cache installed packages * ci: trigger jobs * ci: fix hashfiles argument --------- Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-20 18:43:16 -07:00
George Hotz	012ee7d162	not worth the speed (#1584 ) * not worth the speed * no slots * uops comments * bump to python 3.11 for speed * add critical slots back	2023-08-20 10:24:58 -07:00
George Hotz	ad7d26c393	fix __launch_bounds__ and benchmark TC MATMUL (#1575 ) * fix * benchmark matmul	2023-08-19 10:54:39 -07:00
chenyu	ae39cf84ab	Symbolic Shape JIT main PR (#1353 ) * Symbolic Shape JIT update tests 2 variables symbolic ops, adding more tests test passing cleanup * more test cases * single flag * review update * jit attention one piece * realize * symbolic_jit test for cuda * old artifact * works with cuda gpu but failed ci * CUDACPU	2023-08-18 14:39:55 -07:00
Roelof van Dijk	84e6693915	fix: apt-get to apt, no recommends, clean up (#1571 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-18 13:48:59 -07:00
Ethan Sorrell	cb62911f6b	PTX Reintegration and Passing Tests (#1512 ) * move assembly, assembly_ptx * successful but broken rendering of ptx asm * clear ins before render asm * slightly less broken :') * we needed thread syncs * fix float16 loading, rounding modifiers and other casting stuff, passing casts_from_half * Fix runtime_args for gpuocelot * our casts were flipped on both ends * more casting * add ternary where op * dealing with storing/loading bool * add test for casting to bool from negative * Fix args.valid on ConstOp * add to CI, TODO: fix runtime_args for test_uops * fix placement of runtime_args to work with lazy.Device * undo ci changes so I can push * fix lints * start cleanup and fix things we broke fixing lints * add checks for PTX specifc asm instructions * revert added test -- doesn't pass on llvm * skip tests for underflow,overflow * another fix for how we're setting runtime args * Less broken cleanup * add to CI * add more env variables for ci test * fix ci to install pycuda for ptx * ci: copy cuda test command * cleanup * assert to make sure we're actually running ptx in ci * remove test assert * move is_ptx arg * move assembly, assembly_ptx back to extras * fix imports * initial merge fixes * clear registers, fix UOps.LOAD with invalid value * draft merge fixes * remove prints * quick lint and merge fixes * cleanup * remove PTXProgram wrapper * final cleanup * temp change for ci rerun * ci rerun * rollback ISA version	2023-08-16 16:20:20 -07:00
chenyu	11dd9b1741	symbolic codegen and exec (#1552 ) * symbolic codegen and exec * fix and add test * no sketchy * merge_dicts type * dtypes._arg_int32	2023-08-16 14:43:41 -07:00
wozeparrot	074c467020	hotfix for broken ci (#1559 )	2023-08-16 13:52:03 -04:00
Steven Anderson	93a36c3659	Arm (#1421 ) * testing new memops * better debugging * testing padded conv * branching with load * refactoring a bit * first try * fixing bugs * fixing some * eq * eq2 * do not use x's * working * fixing imm * getting things working * refactor * pow not working * working except one * refactor: one store mem * refactor: global load * refactor: imm * refactor: cleaning * fixing big offsets * refactor with ci * try ci * typo * another typo * ubuntu default * forgot git * do i need git? * missing packages * adding python-dev * with cache? * buildx action * buildx name issue? * maybe now? * python3 * newline warning * maybe now * i actually need this * ci should work now * improved caching * fixing cache * maybe now it will cache * this * testing cache * trying again * load * missing platform * caching gha * testing cache * full testing * typo * now? * why * adding checkout back * bad formatting * fixing convention issues * supporting python * adding CI flag * testing all * better comments * adding debugging * takes 12x longer * does it output progress now? * ignore models for speed * fixing merge * excluding conv_transpose2d * only 2 test cuz is to slow * another approach * let's see * faster duh * my bad * T_T * typo * sup * with output? * comment test * comment test * comment test * :? * no comment * with cache * back to normal * testing that ci works * back to passing * trying again * does it create another entry * does it create another entry? * build local * hey * Revert "excluding conv_transpose2d" This reverts commit cc7348de03033e032f47d69caff174e2f1a7bfea. * does it cache if done before? * does it cache? * done * adding test ops * bad formatting * no need for this * working static mem * sum 1d * add ndim * better reg import * fix stack * back to np * working except for softmax * 5 failing * no pogress * remove keystone * remove keystone * testops passing * cleanups * more cleanup * typo * ci * ci2 * cond import * ci3 * ci4 * ci4 * ci5 * ci5 * ci6 * aligment * test all * correct test * err read_unmapped * passing test * ignore for speed * ignore for speed * ci7 * cleanup * remove docker * fixing merge * fixing bugs * add skipload for const ops * comments * First merge to master: Renderer * fix emulation * passing all tests arm64 * cleaning * fix handcoded binary * cleaning * fix errs * fix runtime arg binary * clean git diff * fix and clean * fixing metal test * cleaning * fix metal test * ci ~8 min * fix pylint and clang * cache the files in ops_clang --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-08-14 19:29:30 -07:00
wozeparrot	29d5801387	distributed collectives (#1519 ) * feat: world * feat: tests * feat: no more backwards * feat: recv into * feat: whoops * feat: test in ci * feat: some debug logging * feat: workflow naming * feat: need to set pythonpath * feat: just send to same device * feat: allreduce * feat: test * feat: need contiguous * feat: test in ci * feat: exit with correct code * feat: don't need that * feat: opencl wait_for just doesn't work * feat: synchronize on out * feat: try? * feat: try again? * feat: add extra realizes * feat: print * feat: seed * feat: tol * feat: test ones and zeros * feat: remove print * feat: are you just flaky * feat: seperate scatter and gather? * feat: just try synchronizing * feat: remove print again * feat: bring back difference * feat: no sync * feat: revert that * feat: back to wait_for * fix: typo	2023-08-11 10:22:07 -07:00
wozeparrot	7e7c9001e9	distributed world (#1481 ) * feat: world * feat: tests * feat: no more backwards * feat: recv into * feat: whoops * feat: test in ci * feat: some debug logging * feat: workflow naming * feat: need to set pythonpath * feat: just send to same device	2023-08-10 10:00:51 -07:00
George Hotz	e3c6c0c6db	add GPT2 example (#1511 ) (#1514 ) * add gpt2 to examples * some cleanup * fixes * argparse + scaled_dot_product_attention * add timing * add to benchmark Co-authored-by: YassineYousfi <yassine.y10@gmail.com>	2023-08-10 09:09:47 -07:00
wozeparrot	351684395c	dont run on fork (#1510 )	2023-08-09 13:06:45 -04:00
wozeparrot	88e2e0c8a3	Revert "don't try to run benchmark on forks" (#1508 )	2023-08-09 12:59:49 -04:00
wozeparrot	65b65b760b	don't try to run benchmark on forks (#1507 )	2023-08-09 12:59:19 -04:00
Roelof van Dijk	aa83a9e910	ci: fix gpuocelot build cache (#1474 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-08 14:00:04 -07:00
Roelof van Dijk	e2cf0f322e	[READY] ci: missing n=auto (#1486 ) * ci: missing n=auto * fix: add to commented test --------- Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-08 07:37:24 -07:00
George Hotz	5fdd248617	don't download cifar (#1472 )	2023-08-06 21:38:59 -07:00
George Hotz	d78fb8f4ed	add stable diffusion and llama (#1471 ) * add stable diffusion and llama * pretty in CI * was CI not true * that * CI=true, wtf * pythonpath * debug=1 * oops, wrong place * uops test broken for wgpu * wgpu tests flaky	2023-08-06 21:31:51 -07:00
Diogo	d7d1011f1e	Add WEBGPU tests to CI (#1463 ) * webgpu tests * assert device is webgpu * missed env set * exclude failing ci tests * ignore test file * changed acc for adam test	2023-08-06 10:32:01 -07:00
George Hotz	486a9dbfd9	speed v torch (#1464 ) * speed v torch * always print * change print * torch speed tee * all exposed	2023-08-06 09:32:33 -07:00
George Hotz	2ab282bfec	run on update_benchmark too (#1460 ) * run on update_benchmark too * amd inference test * name it better * add 10 CIFAR training steps	2023-08-06 08:58:37 -07:00
George Hotz	943b227cb1	only on push to master	2023-08-06 00:10:07 -07:00
George Hotz	2274e3e757	Fix benchmark (#1454 ) * do benchmarking * system * artifact * go * name artifact * only on push	2023-08-05 23:44:36 -07:00
George Hotz	bf21aec81f	do benchmarking (#1451 ) * do benchmarking * system * artifact * go * name artifact	2023-08-05 23:35:01 -07:00
George Hotz	67781fcf5d	fix fail fast in CI	2023-08-05 10:24:24 -07:00
wozeparrot	ab9e4a2e93	Make cuda CI a bit more consistent (#1403 ) * feat: use fast-apt-mirror * feat: use in more places	2023-08-02 07:38:22 -07:00
Diogo	4dc8595069	simple exporting models (#1344 ) * unified exporting * json exporting * ignore more * simplified buffer export * added dtypes * added assert * swift example * fix tests * linter * remove whitespace * fixed tests * remove swift example * remove unintended changes * allow callable models to be used * whitespace * more readable json export * name change * whitespace * whitespace	2023-08-01 09:35:48 -07:00
George Hotz	f27df835a6	delete dead stuff (#1382 ) * delete bpe from repo * remove yolo examples * Revert "remove yolo examples" This reverts commit cd1f49d4662a5565726ae1fa7bf3f6a3e3985965. * no windows	2023-07-31 11:17:49 -07:00
George Hotz	37fa7e96fb	Revert "update editorconfig, enforce via CI (#1343 )" (#1380 ) This reverts commit `da2efecbe2`.	2023-07-31 10:35:50 -07:00
Pavol Rusnak	da2efecbe2	update editorconfig, enforce via CI (#1343 ) * update editorconfig to set unix-style newlines and trim whitespace * add editorconfig github action to the CI * fix whitespace	2023-07-30 18:44:30 -07:00
chenyu	ab80ea0d38	use ubuntu for clang ci test (#1368 )	2023-07-28 20:51:25 -04:00
waifairer	d89fb729e5	flake8 (#1323 ) * flake8: Ignore frequent violations, correct infrequent ones * Ignore some rules in test * Reorder test ignores * Lint test + main * EOF indent * Include all E71,E72 errors * Test the failing case in CI * Revert "Test the failing case in CI" This reverts commit 110add0a70f5a619d07631269104e84f908af6b9. * Push to test! This reverts commit f317532779a0e1ac8401e2474fd5c6c8695c08e9. * ok back to passing This reverts commit ba5052685f93f83e06152cdc696b9e26131d8ab7. * Prove that CI fails when formatting is incorrect. * Fix formatting * Remove duplicitous E117 rule * Use flake8 config for precommit --------- Co-authored-by: waifairer <waifairer@gmail.com>	2023-07-24 11:19:58 -04:00
cheeetoo	a0965ee198	CI < 5 minutes (#1252 ) * models matrix * fix typo and install gpu deps * install llvm deps if needed * fix * testops with cuda * remove pip cache since not work * cuda env * install cuda deps * maybe it will work now * i can't read * all tests in matrix * trim down more * opencl stuff in matrix * opencl pip cache * test split * change cuda test exclusion * test * fix cuda maybe * add models * add more n=auto * third thing * fix bug * cache pip more * change name * update tests * try again cause why not * balance * try again... * try apt cache for cuda * try on gpu: * try cuda again * update packages step * replace libz-dev with zlib1g-dev * only cache cuda * why error * fix gpuocelot bug * apt cache err * apt cache to slow? * opt and image in single runner * add a couple n=autos * remove test matrix * try cuda apt cache again * libz-dev -> zlib1g-dev * remove -s since not supported by xdist * the cache takes too long and doesn't work * combine webgpu and metal tests * combine imagenet to c and cpu tests * torch tests with linters * torch back by itself * small windows clang test with torch tests * fix a goofy windows bug * im dumb * bro * clang with linters * fix pylint error * linter not work on windows * try with clang again * clang and imagenet? * install deps * fix * fix quote * clang by itself (windows too slow) * env vars for imagenet * cache pip for metal and webgpu tests * try torch with metal and webgpu * doesn't work, too long * remove -v * try -n=logical * don't use logical * revert accidental thing * remove some prints unless CI * fix print unless CI * ignore speed tests for slow tests * clang windows in matrix (ubuntu being tested in imagenet->c test) * try manual pip cache * fix windows pip cache path * all manual pip cache * fix pip cache dir for macos * print_ci function in helpers * CI as variable, no print_ci * missed one * cuda tests with docker image * remove setup-python action for cuda * python->python3? * remove -s -v * try fix pip cache * maybe fix * try to fix pip cache * is this the path? * maybe cache pip * try again * create wheels dir * ? * cuda pip deps in dockerfile * disable pip cache for clang * image from ghcr instead of docker hub * why is clang like this * fast deps * try use different caches * remove the fast thing * try with lighter image * remove setup python for cuda * small docker and cuda fast deps * ignore a few more tests * cool docker thing (maybe) * oops * quotes * fix docker command * fix bug * ignore train efficientnet test * remove dockerfile (docker stuff takes too long) * remove docker stuff and normal cuda * oops * ignore the tests for cuda * does this work * ignore test_train on slow backends * add space * llvm ignore same tests as cuda * nvm * ignore lr scheduler tests * get some stats * fix ignore bug * remove extra ' * remove and * ignore test for llvm * change ignored tests and durationon all backends * fix * and -> or * ignore some more cuda tests * finally? * does this fix it * remove durations=0 * add some more tests to llvm * make last pytest more readable * fix * don't train efficientnet on cpu * try w/out pip cache * pip cache seems to be generally better * pytest file markers * try apt fast for cuda * use quick install for apt-fast * apt-fast not worth * apt-get to apt * fix typo * suppress warnings * register markers * disable debug on fuzz tests * change marker names * apt update and apt install in one command * update marker names in test.yml * webgpu pytest marker	2023-07-23 13:00:56 -07:00
Jacob Pradels	b112edd2c3	Add pylint trailing whitespace rule (#1314 )	2023-07-21 13:37:55 -04:00
chenyu	a5f5330d91	Add Fuzz Test symbolic / shapetracker to CI. (#1278 ) * Fuzz test symbolic and shapetracker This reverts commit d5773ddebff54c1ff608838076f0b4ff126b8aa8. * mess again * no tail * test shapetracker too * Revert mess and enable all tests * removed leftover	2023-07-19 09:05:45 -07:00
chenyu	c96bf395df	Enable JIT tests for supported devices, skip METAL and WEBGPU (#1265 ) * Enable JIT test * really test metal * Skip some device	2023-07-18 11:40:37 -07:00
Diogo	a9a1df785f	Webgpu support (#1077 ) * initial commit * 81 passing * 105 passing tests * 148 passing * CI tests * install dep on ci * try opencl pkgs * try using vulkan * down to only 6 failing * refactor * cleaning up * another test skipped due to buffer limit * linter * segfault * indent fix * another segfault found * small touchups * Fix max and maxpool tests * Add constant folding * Add javascript export script * better asserts in codegen * manual upcasting * reverted token type change * skip safetensor test due to unsupported type * FIx efficientnet and all other model tests * Remove np copy * fixed indent and missing import * manually destroy the buffer * revert back to length * linter errors * removed extra val * skip broken tests * skipping more tests * Make the page pretty * Save model weights as safetensor * Fix imagenet to c test * Fix second imagenet to c bug * Async and paralel kernel compilation * workgroup support * reversed local size * fixed non local bug * correct local groups * ci experiment * removed typo * Fix define local by using shared memory * Refactor * try running on mac * match metal tests * add more workers * scope down tests * trying windows runner * fixed windows env * see how many it can do * merged master * refactor * missed refactor * increase test suite coverage * missing import * whitespace in test_efficientnet.py * getting there * fixed reset * fixed bufs * switched to cstyle * cleanup * min/max rename * one more linter issue * fixed demo * linter * testing ci chrome * add unsafe webgpu arg * add build step * remove WEBGPU from cmd line * use module * try forcing directx * trying forced metal backend * temp disable conv2d for CI * disable conv_trasnpose2d --------- Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-07-12 12:52:06 -07:00
Roelof van Dijk	d0e21a7398	ci: don't install recommended packages for GPU (#1215 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-07-11 15:38:49 -07:00
George Hotz	beb4d3ab01	Tensor Cores 2: Local Buffers Edition (#1057 ) * local buffers * work * works * invert_strides * work * non tc * fix shapetracker bug * stride priority * touchups * gate tensor cores * tensor core conv * cleanups * bug fixes * fix metal_matmul * fast tensor cores * more speed * buffer selection bug fix * fix CI maybe * ugh, CI is set to true, not 1 * tc allowed * add_gl_dimension * split out padding conv tests * does padding add fail * test_padded_conv2d_1x1 * skip metal ci stuff * more strict on yellow * float2 * strip parens * fix float2 * touch up * dtype * strip parens * no alias * bugfix * cast float2 and test tensor core ops * oops, don't hardcode 4	2023-07-09 09:06:00 -07:00
George Hotz	7151382364	Refactor load/store before tensor cores (#1193 ) * minor cleanups * render_const * now that's a nice refactor * clean up vload/vstore * clean up render_load * debugs there * dumb * err, this? * const float4 * what's failing * bugfix * statement includes semicolon * bugfix	2023-07-08 15:54:58 -07:00
George Hotz	d9c1d81e99	Revert "feat: cancel previous workflow runs on new commits (#1184 )" (#1194 ) This reverts commit `d66a0c285d`.	2023-07-08 11:26:13 -07:00
George Hotz	52600d532e	add 20 minute timeout	2023-07-07 23:02:28 -07:00
wozeparrot	d66a0c285d	feat: cancel previous workflow runs on new commits (#1184 )	2023-07-07 22:55:35 -07:00
foreign-sub	574cbda979	Quickstart (#1015 ) * fix quickstart md * add quickstart to ci	2023-06-29 13:26:58 -07:00
George Hotz	d16c16ec28	new upcast works (#1066 ) * new upcast works * float4 try * fix unaligned float4 * disallow unaligned access * upcast dim * maybe good now * fix gpu half * vstore_half4 * fix deep image bugs * improve symbolic to fix issues * fix symbolic * cl test * this maybe * gcd of 1 is 1 * real fix for old python * improve fuzzer	2023-06-27 19:34:53 -07:00
George Hotz	70c07dfea5	5k line max (#1064 )	2023-06-27 10:53:18 -07:00
George Hotz	0f281e7b18	touchups	2023-06-25 15:24:26 -07:00
George Hotz	c8fbdeb48e	test speed llama (#1046 ) * test speed llama * oops, put it back * uses the real device codegen * just do it on the mac * pp * is faster? * Revert "is faster?" This reverts commit 42db542010906dd62376c0e419416978d03d3d62. * disable docker again for less load on CI	2023-06-25 15:22:56 -07:00
Jacky Lee	5d16cc283f	Docker fix (#1039 ) * Docker test * Remove extra installs * Don't run full test * No need for testing dependencies	2023-06-25 10:38:58 -07:00
cloud11665	264b1e5f48	cache gpuocelot build in cuda CI (#1032 )	2023-06-22 17:42:12 -07:00
cloud11665	2407690d82	add cuda on cpu tests (#1020 )	2023-06-22 14:15:50 -07:00
George Hotz	18892242b0	global -> group (#1007 ) * global -> group * allow None for local_size in custom function * lil local * comment on shape * fix cuda * smart local cast * better local heuristic * fix ptx, and work_dim cleanup * fix metal * fix ops test * fix openpilot jit * no more optlocal * might fix metal tests * try metal now * see generated metal code * test free removal. REVERT THIS * mergable	2023-06-21 11:50:43 -07:00
Diogo	57d3aa76a5	Windows & Ubuntu CLANG CI support (#1011 ) * matrix strategy * push env to GITHUB_ENV * use printf instead of echo * use temp helper function for cross os paths * use path join * switched to using temp helper function * skip test on windows due to memory limit * small fix * removed semi * touchups * clean up * seperate tests * test changes to test_utils on windows * small refactor * more cleanups * undo helpers change * only skip if in CI and WINDOWS	2023-06-19 09:33:24 -07:00
George Hotz	0d4c4f4e9e	metal ci attempt (#1010 ) * metal ci attempt * skip failing ops tests * skip in the ops test * no dtype test	2023-06-19 09:23:55 -07:00
Diogo	6b1280f01c	fixes to Onnx ops LayerNormalization/Prelu and added OptionalHasElement/OptionalGetElement (#956 ) * prelu and where casting * typing for safe_numpy * optional * get rid of tracing in ci * cleanup and resolved layernorm issues * removed debug print	2023-06-08 16:09:19 -07:00
kposborne2	00360da05b	Update broken `docs/abstractions.py` for changed ops, and add to CI (#930 ) * fix and add to ci * still have those * ocd * update other doc	2023-06-04 19:21:20 -07:00
George Hotz	a3feee29c5	make tests faster + add onnx (#815 ) * search one dir, disable slow * onnx tests * fast rnnt test	2023-05-27 08:53:32 -07:00
George Hotz	faf80418b7	pyopencl by default since GPU is default (#802 )	2023-05-25 17:48:18 -07:00
George Hotz	03b38864db	fix batchnorm at training (#753 ) * e2e testing * min failure * no affine on bn, still fails * why did i think i could detach that? * allow more kernels for bn * some test issue i don't understand	2023-04-19 08:01:04 -07:00
George Hotz	dbc99c243b	why did that test break?	2023-04-18 17:08:38 -07:00
George Hotz	b12b60af20	fix binop, other tests failure (#723 ) * fix binop, other tests failure * that was a bad idea * better layernorm * inference kernel count tests * new style reshape pushing * fixup replacement * 199 kernels is okay. fix flops * push reshape through unaryops only * GRAPH=2 draws the phantom ops * found resnet issue * non working test * mul is cheaper than div * OPT inflation * SHUFFLE_PAD_OPS in OPT=2	2023-03-22 18:15:07 -07:00
George Hotz	f5467cfedc	Devicebufferless (#708 ) * runs one metal kernel * conv2d works * ops tests are passing * const folding * all ops work * pre commit always passes * torch works * working still * fix graph test * tests passing * image almost works * image conv works * most images * fix custom * fix assignment * fix compile enet * clean up comments * fix realize return value * include shapetracker in LB repr * copy should make a copy * reenable method cache * fix lna * dtypes in graph * forward only for IMAGE=2 * simple realize * getting close * fixup new api, it's good except the kernel count * back to 197 kernels * tests should pass * go to a real float * no type_on_cpu * fix the docs * put shapetracker back in it's proper place	2023-03-18 14:40:23 -07:00
Cyril Roumégous	3f08613a2a	apply flake8 E203 rule (#684 )	2023-03-11 11:35:16 -08:00
George Hotz	1826ff6b89	dtypes nice and clean (#673 ) * add dtype class * dtypes * buffers are lazy * dtype is tracked by lazybuffer and GenericShape * fix types in llvm * llvm store * dtype tests * fix tests maybe * fix flop counter * fix CI * CI fix and check format * fix dtype and dtype check * fix custom test * fix test graph	2023-03-10 16:56:07 -08:00
George Hotz	5dc227dba6	fix bug in ENABLE_METHOD_CACHE and enable for llvm	2023-03-06 07:43:40 -08:00
George Hotz	50012f679b	move get_contraction to shapetracker	2023-03-06 06:42:57 -08:00
George Hotz	7a1d96fd76	No negative (#632 ) * behavior is correct without VALIDHACKS * simple div and mod * fix tests * no negative variables * alt form is correct * still correct * bug in mulnode * at least validhacks works now * cleanups * test validhacks, and to_image_idx * cache compare key * tests and __neg__	2023-03-03 16:48:14 -08:00
George Hotz	999b44c274	fix external test + speed	2023-03-03 06:46:16 -08:00
George Hotz	459488bba2	fix linter (#630 ) * fix linter * no imports okay * explicit bases * disable in pylintrc	2023-03-02 20:06:20 -08:00
George Hotz	bfcec234a2	Refactor ASTs (#622 ) * ugh worst branch name * compiler refactor continues * scc -> cloc * buf -> _buf * finish _buf, and program -> runtime * gpu is still working, clang isn't * clang in new style * ops_metal * something broke it * improve metal * clean up tons of cl crap * hack fix sync * cleaner gpu * gpu metal clang * cleanups * minor refactor * GPUCodegen * fix up LLVM * blind CUDA refactor * codegen / runtime * keep ops naming * linter passes * woah, llvm was allocing 4x what it needed to * bugfixes * fix openpilot compiler * fix compile_efficientnet * method cache should fix tests * deal with duped functions	2023-03-01 18:57:29 -08:00
George Hotz	3c8da6bd03	add typing	2023-02-28 10:54:46 -08:00
George Hotz	d584bae5c0	fine, openpilot can have 197 kernels	2023-02-27 11:48:36 -08:00
George Hotz	c9252d38b2	mypy cache breaks if you sometimes check untyped defs, no checking tests for now	2023-02-27 09:57:33 -08:00
George Hotz	e74779f19d	typing fixup	2023-02-27 09:52:04 -08:00
George Hotz	edc8fbfff2	woah, why isn't OPT=2	2023-02-27 08:03:31 -08:00
George Hotz	f4ee7d2cad	back to 196 kernels	2023-02-25 18:25:34 -08:00
George Hotz	6e98a172a0	fix broken contiguous	2023-02-25 17:41:49 -08:00
George Hotz	a44e8e4385	discard children on mop shuffle, 200 -> 196 kernels	2023-02-25 10:51:07 -08:00
George Hotz	758515dcc0	conv2d is an hlop (#589 ) * conv2d is an hlop * shorter conv * KOPT=-1 * alt imp * MULACC * smarter mulacc * pop conv * 7x7 -> 5x5 * didn't fix, that's not going to work * this is faster and matches old behavior * oh, non lazy just won't work with mulacc * mulacc in torch * bool types were creeping in * optimizer is actually better with hlop conv * fix pushing permutes issue * refactor einsum_mulacc * fix up readme * update readme * _image_conv2d * fix bias addition location * pushing permutes gets back to 200 kernels * conv cleanup * disable hlop conv * don't hide that in helpers	2023-02-23 17:52:31 -08:00
George Hotz	628ce067a1	add tests to mypy	2023-02-22 07:07:38 -08:00
George Hotz	714bf4b108	clang backend (#572 ) * start clang backend * mostly working * no group for reduce w clang * it compiles * compiles * a11y * minor fixups * formatting * add a test * rename test	2023-02-20 18:18:18 -08:00
James Roberts	0d405fd5bc	Parallelize CI tests (#535 )	2023-02-06 15:27:44 -06:00
George Hotz	90529d3750	tests are 20% faster (#529 ) * pytorch CPU * no cache, it's slower * pytorch cpu for real * remove double onnx	2023-02-06 09:56:14 -06:00
George Hotz	6eb0e6a650	shuffle deps: always tqdm, make linting category	2023-02-06 09:27:01 -06:00
George Hotz	1d80639646	make linter test install testing deps	2023-02-06 09:21:48 -06:00
George Hotz	60bb64811c	merge mypy into linters, no useless package update	2023-02-06 09:14:00 -06:00
Martin Loretz	97f0a82be7	Cache pip packages in github actions (#522 ) * Cache pip dependencies in github actions * Add setup.py as cache-dependency-path * Test caching * Test caching * Upgrade setup python action * Test caching * Remove setup.py from cache-dependency-path * Don't remove cache-dependency-path * Don't cache linter package's * Test caching * Test caching * Test caching * Upgrade actions/checkout to v3	2023-02-03 20:04:20 -08:00
George Hotz	e313c8af20	update openpilot tests from OPENCL to GPU	2023-01-24 14:05:20 -08:00
George Hotz	49c6e6d472	Latest attempt to add image (#462 ) * add image * load + store + boring stuff: * image tests pass * thneed print GFLOPS * op conv test * more debugging * hack for multiview image * shapetracker creates less views * disable image tests * working better * ugh, lkey not key * print in DEBUG, and allow views * works * simple padding conv2d * use index for image * that was bad code * debug print * fix types * less lines * save lines	2023-01-12 17:36:30 -08:00
George Hotz	27211103ae	docker: no -it	2023-01-09 12:49:59 -08:00
George Hotz	d6e86a29a8	docker: forgot to checkout code	2023-01-09 12:48:03 -08:00
George Hotz	73ce9a771e	that fix it	2023-01-09 12:46:33 -08:00
George Hotz	bfd4f4e35c	testdocker	2023-01-09 12:41:52 -08:00
George Hotz	5e07d4669d	the speedy chonker is going to replace the old chonker (#432 ) * bringing back reshape and permute * done with E701 * 4x4 works in generic way * max and sum not vectorizing... * special case single float * support comparing to MPS * improve matmul speed, consider generic principles * GlobalCounter * fix op tracking * faster * comment that out for now * err, it needs that * fix minor issues * fix global_mem	2022-11-11 18:34:24 -08:00
George Hotz	b8c94a67c9	Simple chonker (#431 ) * chonker will make llvm fast * work * better speed tests, we will make them fast * with the cache add is the same speed * relu and neg are fast * fix sum speed * maximum maxnum? * hack for gemm opt * gemm very slow * zeros like * test_permute * shapetracker returns self * fix shapetracker factorization * err, int strides * permutes are faster now in tinygrad than pytorch * support -1 in expand * gemm unrolled * improve final test case * WIP GEMM * why isn't GEMM fast? * revert cache dim * ffp contract works on clang, not llvm? * ignore llvm ir * this makes fma work at least, but no faster * USE_4x4 * 63 GFLOPS * 87 GFLOPS * that wasn't matmul, 44 GFLOPS now * 82 GFLOPS permuted * this permute too * a little speed for the convs * 45 GFLOPS * speed tests pass again * clean up prints * fix FMA WHAT A WASTE OF TIME * colors * moar fair * GPU * useless on chonker * cleanups * improve factorized shapetracker * better threshold * label conv * work * ops test pass again * hot load the index * run the last view, no need to create * ZeroView needs a repr for the key to work * fix segfault on out of bounds * one more test * start amx, and llvm.initialize_native_asmparser * amx works * nice AMX class * nicer AMX class * refactor get_idxs * amx working * is slower... * useless flip * cache * SZ_X * AMX_SZ_X/Y work alone * Contiguous mlop * test gemm packed * PREPARE in packed * use_amx factor * prefetch isn't faster * loop * same 3ms * 2.24 ms * allow double on store in TG * amx reduce is the same speed as non amx reduce * include memory bandwidth * clean up shapetracker * flip returns stride * prepare for upstream * Update ops_llvm.py (#426) * permutes are yellow and green now * faster conv * llvm cleanups * Show optimised IR under debug 4 (#428) * ASTKernel class * Make tinygrad work with older python version (#427) * Make tinygrad work with older python version * Use partialmethod instead of partial * smiple chonker is chonking * remove junk from test speed vs torch * fix linker and types * AMX is only here now * add LLVM tests, it's a valid backend now * oops, run llvm test * contiguous_op * fix loadops compare * dedup reduceops Co-authored-by: calledit <1573053+calledit@users.noreply.github.com>	2022-11-10 23:17:09 -08:00
Liam	8dc28dd733	Create python-publish.yml (#163 )	2022-11-08 08:45:01 -08:00
George Hotz	d02f8f9bc0	can we lose the lines with E701 still there?	2022-10-28 08:36:03 -07:00
George Hotz	ef62db3186	cleanups, remove E701	2022-10-28 08:28:56 -07:00
George Hotz	b65b70812a	Exec AST (#404 ) * working exec ast * exec_ast is staticmethod * GenericExecAST * fold that sometimes * ExplicitExecAST * exec_ast for GPU * gpu working * get_lazyop_shape * now gpubuffer is ExplicitExecAST * dedup * add a type * RESHAPE in opencl code * fix linter * that too for linter * cleanups * remove dead code * GenericShape is less lines * add ALLOWED_KERNEL_COUNT to tests * fix mypy * that's gotta be recursive * fix opencl shape processing * remove unneeded lambda	2022-10-28 08:27:03 -07:00
George Hotz	3b9b7eda48	remove run_thneed dead code	2022-10-20 17:24:18 -07:00
YassineYousfi	ae0f9b17df	openpilot: new models and onnx ops (#401 ) * ngrl stuff * fngrl * fix typo in compile script * workflow dispatch * new models in tests * dont need to up this threshold Co-authored-by: HaraldSchafer <harald.the.engineer@gmail.com>	2022-10-20 11:49:19 -07:00
George Hotz	8382c51c12	always MATMUL, test the ops in OPENCL	2022-10-01 13:31:29 -04:00
George Hotz	e737513c52	external_test_opt	2022-09-28 23:29:41 -04:00
George Hotz	f215534a64	1100 lines, but sane linter rules	2022-09-06 13:47:45 -07:00
George Hotz	fa33ba716e	flip that	2022-08-28 11:27:33 -07:00
George Hotz	227937d6d2	fix test maybe	2022-08-22 09:50:14 -07:00
George Hotz	11626053b0	run_thneed with test	2022-08-22 09:45:46 -07:00
George Hotz	0aca848d10	enable the openpilot test	2022-08-21 12:05:48 -07:00
George Hotz	a8734df030	add openpilot tests to tinygrad	2022-08-21 12:03:37 -07:00
George Hotz	ccd539e93e	maybe that's a better way to do this	2022-08-20 08:25:12 -07:00
George Hotz	b7d782b921	hmm, with the new reduce, we have to opt 3 for memory usage	2022-08-20 08:16:15 -07:00
George Hotz	b132de677d	tinygrad.nn (#367 ) * tinygrad.nn * flake8 * working on pylint * more pylint * more pylint * pylint passes * networkx * mypy can't infer that type * junk	2022-08-18 07:41:00 -07:00
George Hotz	ef1100fdff	touchups	2022-07-19 09:30:06 -07:00
George Hotz	ef4afdb5d2	tests maybe	2022-07-18 08:24:14 -07:00
George Hotz	a2c4bcf313	disable opencl tests	2022-07-18 08:17:21 -07:00
George Hotz	762e859089	testopencl	2022-07-17 11:56:40 -07:00
George Hotz	608e2431f7	test opencl, commit to removing the crap conv code from GPU	2022-07-17 11:55:37 -07:00
George Hotz	34f43ea10e	LAZY and CLCACHE are defaults	2022-07-04 13:09:15 -07:00
George Hotz	d7aad46758	test lazy also, make TestMNIST faster	2022-07-03 15:19:19 -07:00
Nicklas Boman	64d986bc8b	add mypy to ci testing (#353 )	2022-07-03 15:11:35 -07:00
George Hotz	2ee85812f7	intel opencl (#342 ) * intel opencl * run clinfo * that fix it? * meh * think it's the same * basekit fix * it wasn't basekit * more minimal * no clinfo	2022-06-20 19:25:55 -07:00
Jacky Lee	81664baf64	Fix OpenCL installation (#301 )	2022-01-06 10:35:48 -05:00
George Hotz	b0511f9392	Revert "does pybind fix CI?" This reverts commit `d128e4fcae`.	2021-12-30 14:13:58 -05:00
George Hotz	d128e4fcae	does pybind fix CI?	2021-12-30 14:11:39 -05:00
George Hotz	6bee5bdb7d	add torch tests	2021-10-30 18:58:45 -07:00
George Hotz	7472a7ebe2	not forcing 3.9 for a stupid type	2021-10-30 16:52:40 -07:00
George Hotz	f193eeed25	bump all to python 3.9	2021-10-30 16:15:41 -07:00
George Hotz	ac8afd24fa	refactor accel	2021-10-30 16:10:59 -07:00
Dinesh Kumar Gnanasekaran	2146860307	fixed OpenCL installation while running tests (#262 ) Co-authored-by: dinesh <dinesh-GDK>	2021-06-12 11:14:21 -07:00
George Hotz	6842ad9ec8	minor cleanups, yolo work	2021-01-03 08:14:16 -08:00
Liam	ebd72ff437	Test split (#231 ) * Split tests Split tests into "Test CPU" and "Test GPU". Add test flag "TEST_DEVICES" which is a comma separated list of devices: CPU,GPU,ANE * Run tests based on provided TEST_DEVICES flag By default will run all "CPU,GPU,ANE" * fix bad quote * Revert changes and use GPU=1 This is done through setting the default Tensor Device to Device.CPU of GPU=1 is set. Run GPU tests: GPU=1 pytest -s -v	2021-01-01 09:19:03 -05:00
iainwo	56d44637f3	fixed pylint, formatted python files iwth cblack on localhost (#204 ) * fixed pylint, formatted python files iwth cblack on localhost * Revert "fixed pylint, formatted python files iwth cblack on localhost" This reverts commit 07e2b88466fa53399ad78d962ffb2ad55bc45344. * dedented 4-spaces added linter Co-authored-by: Iain Wong <iainwong@outlook.com>	2020-12-17 14:37:31 -08:00
Mufeed VH	e6a5c6c93e	Added indentation linter (#187 ) * Added indentation linter * pylint package latest	2020-12-12 17:15:09 -08:00
Liam	e79cda6dad	Add pyopencl to dependency installs (#174 ) * Add pyopencl to dependency installs OpenCL was not actually being tested as pyopencl was not installed. * Reduce installation to 1 liner	2020-12-10 09:24:08 -08:00
George Hotz	df64658a2c	weee, opencl tests in CI	2020-11-10 10:04:45 -08:00
George Hotz	d47a128812	pocl	2020-11-10 10:02:13 -08:00
George Hotz	c05401a9ca	sudo maybe	2020-11-10 09:53:49 -08:00
George Hotz	09bc8eddfe	clinfo	2020-11-10 09:51:38 -08:00

... 4 5 6 7 8 ...

566 Commits