Commit Graph

44 Commits

Author SHA1 Message Date
chenyu 940b6fd21a
Revert "Fix constant folding for Tensor([3]) (#1227)" (#1274)
This reverts commit ab645317c9.
2023-07-19 10:51:06 -07:00
David Hou 56ee97b37f
dedup kernel args v2 (#1272)
* new version

* fix abstractions

* try remove test

* Revert "try remove test"

This reverts commit 2fc18a9f8ed180540baf73d32b568262709822f1.

* assert_allclose

* minimize the test

* minimize the test

* minimize the test

* minimize the test

* Revert "minimize the test"

This reverts commit e0c092959636109f745d1c8a73f2db90c75fe3c1.

* Revert "minimize the test"

This reverts commit 88240551b13403b21a81765043d5736103a49293.

* Revert "minimize the test"

This reverts commit 78328a7ce27328c8bf9a325ae017cc2a4d98f65b.

* Revert "minimize the test"

This reverts commit 989523fded4319b13db047e45ad8c35c861a36aa.

* skip test inside body

* oops

* oops
2023-07-18 20:03:42 -07:00
Adrian Kretz 5a8ad57163
Add WHERE ternary (or trinary?) op (#1196)
* Rename FusedOps to TernaryOps

* Support ternary broadcast

* Add where llop and mlop

* Make where op work in cstyle codegen

* Don't skip test_inf_where

* Add backward path to where op

* Use bool in cstyle codegen

* Add LLVM where op

* Add numpy where op

* Add torch where op

* Simplify where mlop

* Update documentation

* Forgot a rename

* Merged relevant changes from PR #1195 onto PR #1196

* Add test to cover changes to linearizer.ast_parse for WHERE op

Without this METAL will try to use ternary op on float4 and fail

* Make where op work in wgsl backend

* Allow ternary ops to be merged

* Make mypy happy

---------

Co-authored-by: Francis Lam <flam@alum.mit.edu>
2023-07-16 00:31:55 -07:00
chenyu ab645317c9
Fix constant folding for Tensor([3]) (#1227)
* Fix constant folding for Tensor([3])

* Remove duplicated prod import

* load in the same device

* better numpy

* add constant fold shape test cases

* improve tests
2023-07-11 14:01:32 -07:00
George Hotz 2952b8e7a8
Fix up abstractions.py to include the Linearizer (#1177)
* fix up docs

* remove pow, add sqrt
2023-07-07 18:33:51 -07:00
terafo aa60feda48
Fix naming conflict with huggingface datasets (#1161)
* Rename in files

* Move files

* Moved to extra/datasets as suggested

* Changes to files

* Fixed stupid mistake

---------

Co-authored-by: terafo <terafo@protonmail.com>
2023-07-07 10:43:44 -07:00
Eli Frigo 801564f31b
Remove POW llop and add SQRT llop (#1104)
* fixed division by zero for fast operations

* made et closer to 0

* replace POW llop with SQRT

* updated mlops to swap SQRT and POW llops

* updated hlops to swap POW and SQRT

* added sqrt llop to cpu runtime

* added sqrt llop to cstyle codegen

* added POW llop to llvm ir codegen

* added SQRT llop to torch runtime

* moved pow from mlops to hlops

* found a better way to do reverse pow

* fixed indentation

* added SQRT llop to triton

* update docs to match new llops

* removed POW operator from assembly codegen

* added sqrt and rsqrt to pow hlop

* rewrote pow function in tensor.py

* Adjust tolerance

* Adjust for adamw

* Reduce for Adam too

* removed accidental leftover code

* removed all of accidental code

* added rsqrt test

* removed pow from mlops again

it was added back when resolving merge conflicts

---------

Co-authored-by: Jacky Lee <jla524@sfu.ca>
2023-07-05 18:07:58 -07:00
tricky-labyrinth fd98f6cffa
Small fix to abstractions.py so it runs on Windows without throwing an AttributeError (#1109)
Co-authored-by: Tricky Labyrinth <trickylabyrinth@gmail.com>
2023-07-03 13:44:49 -07:00
Reza Rezvan 8ae9a054ae
Refactor nn.optim (#1091)
* Refactor: nn.optim.py

* Refactor: nn.optim.py; Fix all tests

* Refactor: Replace all optim.get_parameters()

* Refactor: Revert list comp.

* Refactor: Replace optim.get_state_dict

* Refactor: Change quickstart.md
2023-07-02 15:07:30 -07:00
foreign-sub 574cbda979
Quickstart (#1015)
* fix quickstart md

* add quickstart to ci
2023-06-29 13:26:58 -07:00
ernie 4d703be6d7
fix typo (#1065) 2023-06-27 10:56:54 -07:00
George Hotz 18892242b0
global -> group (#1007)
* global -> group

* allow None for local_size in custom function

* lil local

* comment on shape

* fix cuda

* smart local cast

* better local heuristic

* fix ptx, and work_dim cleanup

* fix metal

* fix ops test

* fix openpilot jit

* no more optlocal

* might fix metal tests

* try metal now

* see generated metal code

* test free removal. REVERT THIS

* mergable
2023-06-21 11:50:43 -07:00
Pasan Perera b6102ba4ac
added CUDA and PTX to env_vars.md (#1009) 2023-06-19 08:47:44 -07:00
Casey Primozic 651d6ea457
Minor improvements + cleanup to `ops_gpu.py` (#1006)
* Minor improvements + cleanup to `ops_gpu.py`

 * Add some previously undocumented environment variables from `ops_gpu.py` to `env_vars.md`
 * Update debug print for OpenCL to print the devices that will be used post-filtering with `CL_EXCLUDE`
 * Remove a couple unused or superfluous variables and assignments
 * Use `fromimport` shorthand to shave off a couple precious LOC
 * Couple small whitespace changes to clean things up

* Revert change to ordering of OpenCL devices

* Small refactor for OpenCL context creation
2023-06-18 21:26:40 -07:00
sehaj 775287ed91
Add yolov8 implementation (#806)
* added SPPF module from yolov8

* added conv_block, bottleneck modules

* cleaned modules

* c2f example

* spf changes

* C2f

* fixed and tested bottleneck

* improved detect class

* tested spf and conv

* checked c2f

* DFL structure

* fixed dfl

* added dist2bbox function

* added dist2bbox function

* added and tested make_anchors function for the head

* keeping functions above

* creating the detection head

* fixing head

* untested blocks a. scale_boxes b. clip_boxes c. xywh2xyxy d. box_iou

* head works

* structure fixx

* added darknet (backbone)

* yolov8 neck, and intialize bias function while detection

* fixed spacing

* yolov8 class, init bias, and fixed c2f

* forward pass almost working

* fixed net structure

* init bias not needed, forward pass working

* load weights boilerplate

* load weights done?

* all variants loading!

* post process: clip_boxes, scale_boxes, xywh2xyxy, and box_iou(untested)

* fix scale_boxes

* box_iou fixed and tested

* created the pre nms function

* fix nms

* fixed load weights, apparently the latest commit broke something, excluding num_batches_tracked

* added letterbox and pre_tranform for pre_process function

* fixed letterbox, pre_transform and added preprocess function

* custom NMS done, integrated prepare_boxes and nms, improved box_iou

* added postprocess function till parsing

* added draw_bounding_boxes_and_save function

* testing full flow

* using fetch for class names

* fixed make_anchors + all tinygrad now

* added command line arguments, weight downloading

* single image for now only

* made draw boxes more efficient

* made NMS functions efficient

* made compute_transform better

* v8 working now, inference is done

* prints objects detected in console now

* fixed image loading (pre processing)

* batch post processing

* created initial tests

* fixes bounding box thickness AND added get_detected_classes_with_frequency function

* cleaning for testing

* two tests

* added url option for image, removed need for specifiying arguments

* tests complete, but lots on things are printed on screen by ultralytics

* remove parse arguments

* fixed weight location

* fixed colours of classes, and black font when high brightness

* minor changes

* TODOs for later

* removed use of torch, using .npz weights

* fixed tests

* one path for fetch

* preprocess now in tinygrad, plus test fix for that

* updated tests

* fix tests

* no class labels needed

* Add files via upload

* Update showcase.md

* Update showcase.md

* added safe tensors as weights, and tests fix for that

* safe tensors test

* using safe_load

* using tinygrad functions now to load weights

* update tests

---------

Co-authored-by: r3sist-uniq <amanmatreja@gmail.com>
Co-authored-by: r3sist <72573738+r3sist-uniq@users.noreply.github.com>
2023-06-16 18:55:19 -07:00
John Moore 45bc040a63
Fix typo (#978) 2023-06-13 15:15:45 -07:00
Nicklas Boman 5c7248c72d
imagenet download and prepare (#928)
Changing if not exist to the exist_ok=True parameter and adding a variable check if you want to download training data also
adding variable to env_vars.md
2023-06-08 12:55:33 -07:00
George Hotz df40a9c238
EXP+LOG -> EXP2+LOG2 (#954)
* EXP+LOG -> EXP2+LOG2

* update docs
2023-06-08 10:57:31 -07:00
Timothy Lindblom a149f12a5b
Replaced broken link to /tests with /test (#939) 2023-06-06 10:29:09 -07:00
kposborne2 00360da05b
Update broken `docs/abstractions.py` for changed ops, and add to CI (#930)
* fix and add to ci

* still have those

* ocd

* update other doc
2023-06-04 19:21:20 -07:00
wozeparrot 091bd65a68
feat: quick doc fixups (#923) 2023-06-04 11:03:57 -07:00
wozeparrot e9c1ae3825
Add a quick start guide (#900)
* feat: initial quick start guide

* fix: fix link

* feat: add note about jit

* feat: add note about load/store ops

* feat: add link to discord

* feat: add note about saving and loading models

* fix: correct code for saving and loading

* feat: overhaul docs

* fix: fix link

* feat: wording

* feat: add link to discord

* feat: contributing guidelines

* feat: make contributing section more doc focused

* feat: add link to env_vars from readme

* fix: wording

* feat: move community to bottom

* feat: showcase

* feat: linebreak

* feat: redesigned header

* feat: tweaks

* feat: tweaks

* feat: badge for lines of code

* feat: move installation instructions to repo readme

* feat: readme overhaul number 2

* feat: move visualization to quick start guide

* feat: readme 2 electric boogaloo

* fix: grammar

* fix: formatting

* feat: no ugly line

* feat: add line back

* feat: new load method

* feat: split adding accelerator docs out

* feat: showcase whisper

* feat: smaller tweaks

* feat: bring back oneliner
2023-06-04 08:51:20 -07:00
George Hotz 791530045d
Refactor LoadOps (#910)
* test

* work

* upd test

* loadops

* cleanups

* real ones

* remove LazyNumpyArray

* fix assign test

* remove range

* np.require

* llama uses arange kernels

* no caching consts

* fix enet

* torch load support

* tests cleanup

* fix shufflenet

* fix image

* fix torch_load test
2023-06-03 09:40:43 -07:00
Nicklas Boman 0e9e0fd718
document environment variables (#887) 2023-06-01 13:11:17 -07:00
George Hotz 1e56aced05
add changeable DEBUG (#816) 2023-05-27 13:28:25 -07:00
George Hotz f5467cfedc
Devicebufferless (#708)
* runs one metal kernel

* conv2d works

* ops tests are passing

* const folding

* all ops work

* pre commit always passes

* torch works

* working still

* fix graph test

* tests passing

* image almost works

* image conv works

* most images

* fix custom

* fix assignment

* fix compile enet

* clean up comments

* fix realize return value

* include shapetracker in LB repr

* copy should make a copy

* reenable method cache

* fix lna

* dtypes in graph

* forward only for IMAGE=2

* simple realize

* getting close

* fixup new api, it's good except the kernel count

* back to 197 kernels

* tests should pass

* go to a real float

* no type_on_cpu

* fix the docs

* put shapetracker back in it's proper place
2023-03-18 14:40:23 -07:00
George Hotz 3a8af99adb i understand ClassVar now 2023-03-15 09:00:25 -07:00
Pasan Perera df48753692
fixed the import error for latest changes in master (#705) 2023-03-15 08:59:42 -07:00
George Hotz 54f499b623
Move rawbuffer (#697)
* move GlobalCounters to helpers

* that's not part of the public api

* move InterpretedBuffer

* remove fromCPU from devicebuffer
2023-03-13 22:30:36 -07:00
George Hotz cbc5a7222a symbolic is now a 6/10 due to the infinite loop. do better. 2023-03-13 00:07:59 -07:00
George Hotz c594a0a835 fix flip bug, add new unit tests 2023-03-12 23:55:31 -07:00
George Hotz ce1564b05e fix shapetracker test 2023-03-12 22:33:25 -07:00
George Hotz 153cce0f7e tutorial 2023-03-12 22:31:46 -07:00
George Hotz 8d16ebaea7 we have docs: 2023-03-12 19:05:44 -07:00
George Hotz 0ba6179de7 stable diffusion in readme 2022-09-05 18:51:56 -07:00
George Hotz 81c9438ea1 keepdim avoids reshapes 2022-06-05 15:56:42 -07:00
George Hotz 7a3fe34db1 GPU llops 2022-06-05 13:49:39 -07:00
George Hotz 2097d814f6 Sum doesn't need to save the tensor 2022-06-05 12:04:51 -07:00
George Hotz fc6597a6d9 only resnet18, it's too slow otherwise 2021-10-30 16:48:39 -07:00
George Hotz 2075fdeb4f
FPGA Based Accelerator for Tinygrad (#258)
* ops_risk

* risk sim

* guessing is for winners

* minor

* better

* matmal with risk

* conv doesn't work

* closer

* conv2d works

* ops_risk

* opt2 works

* opt1 may not be possible

* opt1 is a mulacc

* arty

* attosoc example building on mac

* minor

* riscv assembler

* gucci gang

* we got C code

* not a scam

* hello

* make risk mergeable into master

* unop support
2021-06-07 17:45:09 -07:00
George Hotz 1ae0e88627 nvidia notes 2021-05-26 14:27:00 -07:00
Skosh 78aa147b39
[WIP] YOLO working on tinygrad! (#245)
* Some progress on yolov3

* Removed some debugging comments… Also, the forward pass eats all RAM for some reason

* forward pass almost runs

* forward pass runs almost

* forward pass runs, now we gotta load the weights

* loading weights works

* fetches config and weights

* everything kind of works, postprocessing of output still needs to be implemented, temp_process_results kind of works, but its kind of terrible, and not how things should be done

* some changes

* fixed some bugs in the forward pass and load_weights function, now outputs more correct values, however some values are still loaded incorrectly

* Something is wrong with the forward pass, Conv2d tests added

* forward pass almost outputs correct values, gotta fix one more thign

* yolo works

* some final changes

* reverting changes

* removed dataloader

* fixed some indentation

* comment out failing test, somehow it fails CI even though it passes on my computer…

* fixed wrong probabilities

* added webcam option to YOLO, now just need to add bounding boxes and speed it up

* some progress towards adding bounding boxes

* trying to speed up yolo layer on GPU, still faster on CPU but with 30GB ram usage

* Faster inference times, bounding boxes added correctly, webcam works, but is slow, and there is a memory leak when running on CPU... Also added tinygrads output on the classic dog image

* removed some debugging print statements

* updated result image

* something weird is going on, mean op on GPU tensor randomly faults, copying a tensor from GPU->CPU takes 10+ seconds…
2021-04-25 18:06:52 -07:00
NeuralLink 1a1c63a08b
Gan is real...Look what tiny just generated! (#192)
* mode collapse solved

* info add

* delete unnecessary imports

* readme
2020-12-13 20:23:12 -08:00
= 6b44a7f729 adds beautiful and meaningful logo 2020-10-26 18:12:49 +01:00