tinygrad

Commit Graph

Author	SHA1	Message	Date
chenyu	7afca52796	replace pow in LAMB by tracking b1t and b2t per step (#4582 ) * replace pow in LAMB by tracking b1t and b2t per step * remove t, add [self.b1_t, self.b2_t] to return * adam has one less kernel	2024-05-14 13:08:22 -04:00
nimlgen	9b02aef45a	remove rhip (#4579 ) * remove rhip * remove hip runner	2024-05-14 17:58:19 +03:00
Szymon Ożóg	5eb81ff764	Fix speed compare script (#4581 ) * Fix speed compare script * Update speed_compare_cuda_ptx.py * Update speed_compare_cuda_ptx.py * Remove unused function	2024-05-14 17:47:03 +03:00
nimlgen	2131556c2c	amd mockgpu (#4535 ) * start mock amd gpu * virt files * cleaner * init ci * small fixes * linter * better? * ugh * linter * fix * diable some * run shorter * fixes * add hcq test * fix * fix cmd revert	2024-05-14 14:28:04 +03:00
geohotstan	089eeec271	setitem in-place operator tests (#4577 ) * tests and error * rename to in-place * add a note * more comments * more comments * disable folded advanced setitem tests for now	2024-05-14 01:28:02 -04:00
chenyu	0fa57b8ce9	raise error if setitem tensors have requires_grad (#4575 ) * raise error if setitem tensors have requires_grad working on supporting this, first properly raises error * NotImplementedError	2024-05-13 18:56:47 -04:00
Filip Brzek	f7d08bd454	feat: add acc_dtype to einsum (#4571 )	2024-05-13 14:02:07 -04:00
Szymon Ożóg	d97d5a7689	Optimize PTX gated loads index calculation (#4304 ) * WIP but working * Cleanup * Remove float4 pred and alt * Cleanup * this is somehow slowin it down * Simplify * add define var to ignore when optimizing gates * Update assembly.py * Test for optimizing gated loads * Cleanup * Fix NEG needed before if * Remove unused parameters * Update assembly.py * Fix for cachable gone --------- Co-authored-by: oz <oz@oz-MS-7B86.NAT.gliwice.vectranet.pl> Co-authored-by: chenyu <chenyu@fastmail.com>	2024-05-13 10:14:01 -07:00
qazal	c67b70ca67	small scheduler refactor (#4569 ) * outputs * consistent * more style * doesnt need tuple	2024-05-13 10:47:39 +03:00
qazal	77aa8659f5	use assign_targets in LazyOp creation (#4568 ) * start * correct error * this is possible * document it	2024-05-13 10:24:35 +03:00
qazal	b0fa97e176	assert error detail in test_assign (#4567 ) * use regex assert * that shouldnt raise	2024-05-13 09:56:05 +03:00
chenyu	25ec40ca93	cleanup dtype of tensor creation from list (#4566 )	2024-05-13 02:47:41 -04:00
qazal	4e1135a0bc	assign buffer read/write tests (#4565 ) * simple tests * more tests	2024-05-13 09:43:36 +03:00
George Hotz	b660f60125	all uops are now cachable (#4564 ) * all uops are now cachable * cachable is gone	2024-05-12 22:34:35 -07:00
George Hotz	02327b8adf	simple stuff from new_uops branch (#4563 )	2024-05-12 22:18:05 -07:00
ziereis	f53a23d21e	Test for optim assertion (#4558 ) * add test for assertion * whitespace * restore state --------- Co-authored-by: Thomas Ziereis <thomas.ziereis@web.de>	2024-05-12 14:21:28 -07:00
wozeparrot	d7670f8141	quantized llama multilazybuffer fix (#4557 )	2024-05-12 14:19:21 -07:00
ziereis	bcee4743ce	fix error message (#4556 ) * fix error messgae * typo * add suggestion to fix error --------- Co-authored-by: Thomas Ziereis <thomas.ziereis@web.de>	2024-05-12 12:35:51 -07:00
chenyu	01a0c1a948	slightly faster nf4 llama (#4542 )	2024-05-12 14:24:42 -04:00
qazal	4c232dc0ae	refactor LoadOps scheduling (#4553 ) * refactor * op -> lop	2024-05-12 12:59:24 +03:00
qazal	3da152f0fe	scheduler docs 2 (#4551 ) * docs * delete cleanups	2024-05-12 12:15:39 +03:00
wozeparrot	e07c7668b3	nf4 llama (#4540 )	2024-05-11 22:22:34 -07:00
George Hotz	7a26bdac65	move scheduleitem to schedule.py (#4541 ) * move scheduleitem to schedule.py * don't need that type checking anymore	2024-05-11 21:13:04 -07:00
George Hotz	508e8a6666	add cpu objdump to LLVM/CLANG (#4537 )	2024-05-11 14:28:44 -07:00
chenyu	bed70b130c	mlperf bert getenv-able EVAL_STEP_FREQ (#4534 )	2024-05-11 14:36:56 -04:00
George Hotz	328b083e66	lil profiling script	2024-05-11 11:02:44 -07:00
chenyu	da10cf0be1	extra/threefry.py for mem usage (#4533 ) for now it needs 8N mem to generate size N rand	2024-05-11 13:46:44 -04:00
chenyu	8a0fb3d765	delete old extra/autopad.py (#4532 )	2024-05-11 13:06:10 -04:00
chenyu	04a4980a51	touchup bert script (#4531 ) small adjustments, remove duplicated training setting and stop the script once target is hit	2024-05-11 13:02:02 -04:00
qazal	4871476a1e	move copy kernel to out of schedule ordering (#4530 ) * delete from sorting * move the logic	2024-05-11 14:44:44 +03:00
qazal	2fb564c125	multi reduce linearizer tests start (#4529 ) * test_end_local * test_early_end_local * todos * mean+std * skip no locals	2024-05-11 14:06:40 +03:00
qazal	3cba22920f	test_linearizer_correctness (#4458 ) * test helper * uops asserts * cleanup args * nits	2024-05-11 13:02:08 +03:00
qazal	b3d9fd48d0	infra for testing linearizer correctness (#4528 ) * refactor outbufs * delete helper	2024-05-11 12:10:33 +03:00
George Hotz	2f970a4fc2	all realize 2 (#4527 ) * all realize 2 * tests fixup * fix more tests * fix openpilot * fix tests * unneeded	2024-05-10 22:43:09 -07:00
wozeparrot	d2c347fc74	faster gather for bert (#4526 )	2024-05-10 22:28:48 -07:00
George Hotz	922e6e056a	hotfix: fix docs	2024-05-10 21:51:35 -07:00
George Hotz	347a3acb37	add renderer class (#4524 ) * add renderer class * tests pass * fix pylint * fix tensor cores	2024-05-10 21:40:02 -07:00
chenyu	b00b6b16f0	fix TRAIN_BEAM and Tensor.training for mlperf bert (#4525 ) also hard coded bert model config instead of looking up a file	2024-05-11 00:18:36 -04:00
chenyu	7fab8c9e17	add symbolic mean test cases in test_symbolic_ops and test_symbolic_jit (#4523 ) * add symbolic mean test cases in test_symbolic_ops and test_symbolic_jit 2d symbolic mean in jit does not quite work, order of the variable inputs are not deterministic? * skip	2024-05-10 23:19:55 -04:00
George Hotz	827058f030	update tests get_runner (#4522 )	2024-05-10 20:09:22 -07:00
George Hotz	a0448ff595	use copy kernel in schedule (#4520 ) * use copy kernel in schedule * imports	2024-05-10 15:30:33 -07:00
chenyu	b15e2309bd	verbose error message in getitem (#4519 ) * verbose error message in getitem still hard to undetstand, at least it prints what it's trying to expand * sure * :	2024-05-10 17:25:41 -04:00
George Hotz	d438d5698d	bring buffer back to device (#4517 )	2024-05-10 11:22:31 -07:00
qazal	a2b707a3eb	scheduler comments 1 (#4515 )	2024-05-10 20:44:28 +03:00
George Hotz	4eef1ee9bf	move renderer into options (#4514 ) * move renderer into options * fix tests * renders are functions	2024-05-10 10:01:51 -07:00
George Hotz	7c630a9a53	hotfix: fix llama spacing + fix hcq	2024-05-10 15:10:13 +00:00
George Hotz	58e7256ce9	restore hcq graph (#4513 ) * Reapply "hcq graph (#4380)" (#4512) This reverts commit `06c1e7498e`. * bring back hcq graph	2024-05-10 07:45:05 -07:00
George Hotz	06c1e7498e	Revert "hcq graph (#4380 )" (#4512 ) This reverts commit `84a2e2b8c1`.	2024-05-10 07:18:09 -07:00
nimlgen	84a2e2b8c1	hcq graph (#4380 ) * start hcq graph * hack-fix sync on amd * nv * fix nv * multigrah * fixes * temp fix for graph * this is not needed * fix * cleaner * linetr * fix none * faster cuda copy * faster amd copy * temp nv fixes * alloc on gpu * exp: faster amd * Revert "exp: faster amd" This reverts commit 2e4cfd1f7d8a33634c50fb5655cff1b40269d28c. * revert, unrelated * not in this pr * linter	2024-05-10 07:15:12 -07:00
qazal	2b7ab60584	dfs fusion (#4491 ) * use continue * simplify * flip * track r * derive forced_realize * scheduler needs comments	2024-05-10 17:00:48 +03:00

1 2 3 4 5 ...

4422 Commits All Branches Search

4422 Commits

All Branches