tinygrad

Commit Graph

Author	SHA1	Message	Date
George Hotz	18892242b0	global -> group (#1007 ) * global -> group * allow None for local_size in custom function * lil local * comment on shape * fix cuda * smart local cast * better local heuristic * fix ptx, and work_dim cleanup * fix metal * fix ops test * fix openpilot jit * no more optlocal * might fix metal tests * try metal now * see generated metal code * test free removal. REVERT THIS * mergable	2023-06-21 11:50:43 -07:00
Pasan Perera	b6102ba4ac	added CUDA and PTX to env_vars.md (#1009 )	2023-06-19 08:47:44 -07:00
Casey Primozic	651d6ea457	Minor improvements + cleanup to `ops_gpu.py` (#1006 ) * Minor improvements + cleanup to `ops_gpu.py` * Add some previously undocumented environment variables from `ops_gpu.py` to `env_vars.md` * Update debug print for OpenCL to print the devices that will be used post-filtering with `CL_EXCLUDE` * Remove a couple unused or superfluous variables and assignments * Use `fromimport` shorthand to shave off a couple precious LOC * Couple small whitespace changes to clean things up * Revert change to ordering of OpenCL devices * Small refactor for OpenCL context creation	2023-06-18 21:26:40 -07:00
sehaj	775287ed91	Add yolov8 implementation (#806 ) * added SPPF module from yolov8 * added conv_block, bottleneck modules * cleaned modules * c2f example * spf changes * C2f * fixed and tested bottleneck * improved detect class * tested spf and conv * checked c2f * DFL structure * fixed dfl * added dist2bbox function * added dist2bbox function * added and tested make_anchors function for the head * keeping functions above * creating the detection head * fixing head * untested blocks a. scale_boxes b. clip_boxes c. xywh2xyxy d. box_iou * head works * structure fixx * added darknet (backbone) * yolov8 neck, and intialize bias function while detection * fixed spacing * yolov8 class, init bias, and fixed c2f * forward pass almost working * fixed net structure * init bias not needed, forward pass working * load weights boilerplate * load weights done? * all variants loading! * post process: clip_boxes, scale_boxes, xywh2xyxy, and box_iou(untested) * fix scale_boxes * box_iou fixed and tested * created the pre nms function * fix nms * fixed load weights, apparently the latest commit broke something, excluding num_batches_tracked * added letterbox and pre_tranform for pre_process function * fixed letterbox, pre_transform and added preprocess function * custom NMS done, integrated prepare_boxes and nms, improved box_iou * added postprocess function till parsing * added draw_bounding_boxes_and_save function * testing full flow * using fetch for class names * fixed make_anchors + all tinygrad now * added command line arguments, weight downloading * single image for now only * made draw boxes more efficient * made NMS functions efficient * made compute_transform better * v8 working now, inference is done * prints objects detected in console now * fixed image loading (pre processing) * batch post processing * created initial tests * fixes bounding box thickness AND added get_detected_classes_with_frequency function * cleaning for testing * two tests * added url option for image, removed need for specifiying arguments * tests complete, but lots on things are printed on screen by ultralytics * remove parse arguments * fixed weight location * fixed colours of classes, and black font when high brightness * minor changes * TODOs for later * removed use of torch, using .npz weights * fixed tests * one path for fetch * preprocess now in tinygrad, plus test fix for that * updated tests * fix tests * no class labels needed * Add files via upload * Update showcase.md * Update showcase.md * added safe tensors as weights, and tests fix for that * safe tensors test * using safe_load * using tinygrad functions now to load weights * update tests --------- Co-authored-by: r3sist-uniq <amanmatreja@gmail.com> Co-authored-by: r3sist <72573738+r3sist-uniq@users.noreply.github.com>	2023-06-16 18:55:19 -07:00
John Moore	45bc040a63	Fix typo (#978 )	2023-06-13 15:15:45 -07:00
Nicklas Boman	5c7248c72d	imagenet download and prepare (#928 ) Changing if not exist to the exist_ok=True parameter and adding a variable check if you want to download training data also adding variable to env_vars.md	2023-06-08 12:55:33 -07:00
George Hotz	df40a9c238	EXP+LOG -> EXP2+LOG2 (#954 ) * EXP+LOG -> EXP2+LOG2 * update docs	2023-06-08 10:57:31 -07:00
Timothy Lindblom	a149f12a5b	Replaced broken link to /tests with /test (#939 )	2023-06-06 10:29:09 -07:00
kposborne2	00360da05b	Update broken `docs/abstractions.py` for changed ops, and add to CI (#930 ) * fix and add to ci * still have those * ocd * update other doc	2023-06-04 19:21:20 -07:00
wozeparrot	091bd65a68	feat: quick doc fixups (#923 )	2023-06-04 11:03:57 -07:00
wozeparrot	e9c1ae3825	Add a quick start guide (#900 ) * feat: initial quick start guide * fix: fix link * feat: add note about jit * feat: add note about load/store ops * feat: add link to discord * feat: add note about saving and loading models * fix: correct code for saving and loading * feat: overhaul docs * fix: fix link * feat: wording * feat: add link to discord * feat: contributing guidelines * feat: make contributing section more doc focused * feat: add link to env_vars from readme * fix: wording * feat: move community to bottom * feat: showcase * feat: linebreak * feat: redesigned header * feat: tweaks * feat: tweaks * feat: badge for lines of code * feat: move installation instructions to repo readme * feat: readme overhaul number 2 * feat: move visualization to quick start guide * feat: readme 2 electric boogaloo * fix: grammar * fix: formatting * feat: no ugly line * feat: add line back * feat: new load method * feat: split adding accelerator docs out * feat: showcase whisper * feat: smaller tweaks * feat: bring back oneliner	2023-06-04 08:51:20 -07:00
George Hotz	791530045d	Refactor LoadOps (#910 ) * test * work * upd test * loadops * cleanups * real ones * remove LazyNumpyArray * fix assign test * remove range * np.require * llama uses arange kernels * no caching consts * fix enet * torch load support * tests cleanup * fix shufflenet * fix image * fix torch_load test	2023-06-03 09:40:43 -07:00
Nicklas Boman	0e9e0fd718	document environment variables (#887 )	2023-06-01 13:11:17 -07:00
George Hotz	1e56aced05	add changeable DEBUG (#816 )	2023-05-27 13:28:25 -07:00
George Hotz	f5467cfedc	Devicebufferless (#708 ) * runs one metal kernel * conv2d works * ops tests are passing * const folding * all ops work * pre commit always passes * torch works * working still * fix graph test * tests passing * image almost works * image conv works * most images * fix custom * fix assignment * fix compile enet * clean up comments * fix realize return value * include shapetracker in LB repr * copy should make a copy * reenable method cache * fix lna * dtypes in graph * forward only for IMAGE=2 * simple realize * getting close * fixup new api, it's good except the kernel count * back to 197 kernels * tests should pass * go to a real float * no type_on_cpu * fix the docs * put shapetracker back in it's proper place	2023-03-18 14:40:23 -07:00
George Hotz	3a8af99adb	i understand ClassVar now	2023-03-15 09:00:25 -07:00
Pasan Perera	df48753692	fixed the import error for latest changes in master (#705 )	2023-03-15 08:59:42 -07:00
George Hotz	54f499b623	Move rawbuffer (#697 ) * move GlobalCounters to helpers * that's not part of the public api * move InterpretedBuffer * remove fromCPU from devicebuffer	2023-03-13 22:30:36 -07:00
George Hotz	cbc5a7222a	symbolic is now a 6/10 due to the infinite loop. do better.	2023-03-13 00:07:59 -07:00
George Hotz	c594a0a835	fix flip bug, add new unit tests	2023-03-12 23:55:31 -07:00
George Hotz	ce1564b05e	fix shapetracker test	2023-03-12 22:33:25 -07:00
George Hotz	153cce0f7e	tutorial	2023-03-12 22:31:46 -07:00
George Hotz	8d16ebaea7	we have docs:	2023-03-12 19:05:44 -07:00
George Hotz	0ba6179de7	stable diffusion in readme	2022-09-05 18:51:56 -07:00
George Hotz	81c9438ea1	keepdim avoids reshapes	2022-06-05 15:56:42 -07:00
George Hotz	7a3fe34db1	GPU llops	2022-06-05 13:49:39 -07:00
George Hotz	2097d814f6	Sum doesn't need to save the tensor	2022-06-05 12:04:51 -07:00
George Hotz	fc6597a6d9	only resnet18, it's too slow otherwise	2021-10-30 16:48:39 -07:00
George Hotz	2075fdeb4f	FPGA Based Accelerator for Tinygrad (#258 ) * ops_risk * risk sim * guessing is for winners * minor * better * matmal with risk * conv doesn't work * closer * conv2d works * ops_risk * opt2 works * opt1 may not be possible * opt1 is a mulacc * arty * attosoc example building on mac * minor * riscv assembler * gucci gang * we got C code * not a scam * hello * make risk mergeable into master * unop support	2021-06-07 17:45:09 -07:00
George Hotz	1ae0e88627	nvidia notes	2021-05-26 14:27:00 -07:00
Skosh	78aa147b39	[WIP] YOLO working on tinygrad! (#245 ) * Some progress on yolov3 * Removed some debugging comments… Also, the forward pass eats all RAM for some reason * forward pass almost runs * forward pass runs almost * forward pass runs, now we gotta load the weights * loading weights works * fetches config and weights * everything kind of works, postprocessing of output still needs to be implemented, temp_process_results kind of works, but its kind of terrible, and not how things should be done * some changes * fixed some bugs in the forward pass and load_weights function, now outputs more correct values, however some values are still loaded incorrectly * Something is wrong with the forward pass, Conv2d tests added * forward pass almost outputs correct values, gotta fix one more thign * yolo works * some final changes * reverting changes * removed dataloader * fixed some indentation * comment out failing test, somehow it fails CI even though it passes on my computer… * fixed wrong probabilities * added webcam option to YOLO, now just need to add bounding boxes and speed it up * some progress towards adding bounding boxes * trying to speed up yolo layer on GPU, still faster on CPU but with 30GB ram usage * Faster inference times, bounding boxes added correctly, webcam works, but is slow, and there is a memory leak when running on CPU... Also added tinygrads output on the classic dog image * removed some debugging print statements * updated result image * something weird is going on, mean op on GPU tensor randomly faults, copying a tensor from GPU->CPU takes 10+ seconds…	2021-04-25 18:06:52 -07:00
NeuralLink	1a1c63a08b	Gan is real...Look what tiny just generated! (#192 ) * mode collapse solved * info add * delete unnecessary imports * readme	2020-12-13 20:23:12 -08:00
=	6b44a7f729	adds beautiful and meaningful logo	2020-10-26 18:12:49 +01:00

1 2 3 4 5

233 Commits