Commit Graph

2202 Commits

Author SHA1 Message Date
Sayantan Das e829e0e718
Update CONTRIBUTING.md (#1008) 2023-06-18 22:09:03 -07:00
George Hotz d84c600e5d contibuting 2023-06-18 21:48:18 -07:00
Casey Primozic 651d6ea457
Minor improvements + cleanup to `ops_gpu.py` (#1006)
* Minor improvements + cleanup to `ops_gpu.py`

 * Add some previously undocumented environment variables from `ops_gpu.py` to `env_vars.md`
 * Update debug print for OpenCL to print the devices that will be used post-filtering with `CL_EXCLUDE`
 * Remove a couple unused or superfluous variables and assignments
 * Use `fromimport` shorthand to shave off a couple precious LOC
 * Couple small whitespace changes to clean things up

* Revert change to ordering of OpenCL devices

* Small refactor for OpenCL context creation
2023-06-18 21:26:40 -07:00
George Hotz 5428b5d774
good changes from tensor_cores branch (#1005)
* good changes from tensor_cores branch

* touchups

* real_strides fixup

* refactor merge_views
2023-06-18 20:28:06 -07:00
Yann Huynh ccb51ff5b0
"Fixed argument passing in example yolov8" (#1004)
"Fixed argument passing in example yolov8"
2023-06-18 14:29:39 -07:00
George Hotz b14b7bc749 don't make HIP the default...it's slower 2023-06-18 19:11:39 +00:00
Alex Wang 3d63c71e27
HIP backend (#750)
* llama works for HIP backend

* Use hipMemcpyAsync; Less lines of code

* Remove unused code

* Refactor

* Add comments; hipDeviceSynchronize

* HIP over GPU; Remove PyHIP dependency

* Cleanups

* Fix mypy check

* Merge master; Dump assembly code
2023-06-18 11:35:57 -07:00
Casey Primozic 805eef10dd
Add tensorflow GEMM benchmark script (#1000)
* Modelled closely after the existing torch benchmark script but just adapted slightly for tensorflow
2023-06-18 10:57:45 -07:00
George Hotz c690eeaca9
flip mulacc to save a line (#997) 2023-06-17 16:47:55 -07:00
Diogo d2b837c1d9
Adds floor/ceil (#989)
* floor ceil impl

* control casting in numpy
2023-06-17 10:56:21 -07:00
sehaj 775287ed91
Add yolov8 implementation (#806)
* added SPPF module from yolov8

* added conv_block, bottleneck modules

* cleaned modules

* c2f example

* spf changes

* C2f

* fixed and tested bottleneck

* improved detect class

* tested spf and conv

* checked c2f

* DFL structure

* fixed dfl

* added dist2bbox function

* added dist2bbox function

* added and tested make_anchors function for the head

* keeping functions above

* creating the detection head

* fixing head

* untested blocks a. scale_boxes b. clip_boxes c. xywh2xyxy d. box_iou

* head works

* structure fixx

* added darknet (backbone)

* yolov8 neck, and intialize bias function while detection

* fixed spacing

* yolov8 class, init bias, and fixed c2f

* forward pass almost working

* fixed net structure

* init bias not needed, forward pass working

* load weights boilerplate

* load weights done?

* all variants loading!

* post process: clip_boxes, scale_boxes, xywh2xyxy, and box_iou(untested)

* fix scale_boxes

* box_iou fixed and tested

* created the pre nms function

* fix nms

* fixed load weights, apparently the latest commit broke something, excluding num_batches_tracked

* added letterbox and pre_tranform for pre_process function

* fixed letterbox, pre_transform and added preprocess function

* custom NMS done, integrated prepare_boxes and nms, improved box_iou

* added postprocess function till parsing

* added draw_bounding_boxes_and_save function

* testing full flow

* using fetch for class names

* fixed make_anchors + all tinygrad now

* added command line arguments, weight downloading

* single image for now only

* made draw boxes more efficient

* made NMS functions efficient

* made compute_transform better

* v8 working now, inference is done

* prints objects detected in console now

* fixed image loading (pre processing)

* batch post processing

* created initial tests

* fixes bounding box thickness AND added get_detected_classes_with_frequency function

* cleaning for testing

* two tests

* added url option for image, removed need for specifiying arguments

* tests complete, but lots on things are printed on screen by ultralytics

* remove parse arguments

* fixed weight location

* fixed colours of classes, and black font when high brightness

* minor changes

* TODOs for later

* removed use of torch, using .npz weights

* fixed tests

* one path for fetch

* preprocess now in tinygrad, plus test fix for that

* updated tests

* fix tests

* no class labels needed

* Add files via upload

* Update showcase.md

* Update showcase.md

* added safe tensors as weights, and tests fix for that

* safe tensors test

* using safe_load

* using tinygrad functions now to load weights

* update tests

---------

Co-authored-by: r3sist-uniq <amanmatreja@gmail.com>
Co-authored-by: r3sist <72573738+r3sist-uniq@users.noreply.github.com>
2023-06-16 18:55:19 -07:00
George Hotz fe71282ba1
faster RDNA assembly backend (#990)
* fast asm

* torch gemm
2023-06-16 12:06:38 -07:00
George Hotz ba56ee6020
RDNA assembly backend ($1000 bounty) (#787)
* Revert "Revert "ops rdna""

This reverts commit 0400315078.

* Revert "Revert "writing 2""

This reverts commit 325a3bf2cf.

* no dump

* 2x 2

* simple asm

* local size

* sub

* lil work

* support args != 3

* assembler work

* generate that

* ptx assembler

* begin index renderer

* max

* ptx loops

* gemms work

* valid works

* asm working a bit more

* close

* passing all ops tests

* ptx is a codegen only, not a backend

* ptx

* float16 support

* rdna goes here

* install types

* make amd disassemble

* ansilen for pretty print

* fix ptx log2/exp2

* assemblyinstruction

* new asm

* working gemm

* fix cmp

* more passing

* mod

* ptx works again

* rdan3 add works

* log exp

* sin is sin 2pi

* fix types

* progress

* loops work

* rdna xyz

* better addressing

* cleanups

* handle exception in early process

* div support

* rdna float4

* locals work

* fix neg index

* cast

* smaller diff

* yaml

* import only if selected

* fromimport

* types

* this all needs rewriting

* a few more
2023-06-16 09:33:18 -07:00
George Hotz dca084f227 minor == to is touchups 2023-06-15 17:11:12 -07:00
blake 041d96083c
clang rt for msvc (#986)
* added platform config for clang runtime and tempfile dir for xplatform /tmp

* flake8 lint

* mypy lint

* pythonic?

* python?

* return darwin cflags

* <lines

* lint;
2023-06-15 17:06:44 -07:00
George Hotz 039f0d372f
delete ltypes (#984)
* delete ltypes

* only upcast float types

* test dtype on mac passes

* ugh, these upcasts
2023-06-15 16:24:45 -07:00
Yahya Lmallas 804c45b5fc
FIX: Can't pickle local object (#979)
_early_exec_process is a local function that is defined whiting the scope of another function, should be global
2023-06-14 12:32:17 -07:00
Rayan Hatout 2d567ef688
Optimizations in tensor.py (#974)
* optimizations in tensor.py

* make mypy happy

* revert split of Function class
2023-06-14 08:44:35 -07:00
Diogo 0629791cbd
F64 support (#976)
* initial commit

* added osx check for opencl

* added llvm f64 conversions

* typo in llvmir

* more tests and modified unsupported error

* fixed linting error

* added pragma fp64

* simplified exclusion for OSX

* fixed device check and also added it to cast func

* added ifdef check for fp16 in ops_gpu

* Revert "added ifdef check for fp16 in ops_gpu"

This reverts commit 92de754d48cba19c04ef20b3d4a1c3003046a9d0.

* f64 prekernel signature match f16

* moved condition to buffer init
2023-06-13 21:31:31 -07:00
John Moore 45bc040a63
Fix typo (#978) 2023-06-13 15:15:45 -07:00
George Hotz 80e665bddb a couple new tests 2023-06-13 12:36:05 -07:00
George Hotz ba4eadb04c
PTX assembly support (#977)
* ptx assembly

* all ops tests pass

* fix tests
2023-06-13 12:31:42 -07:00
Rayan Hatout 727416201f
Shapetracker optimizations (#966)
* optimizations in shapetracker.py

* revert micro-optimizations in assertions

* make mypy happy

* list comp instead of map in get_unsafe_resize_offset

* list comp instead of map in get_unsafe_resize_offset
2023-06-12 18:13:21 -07:00
cloud11665 5f13e7c3cf
cuda: fix fp16, uint8, int64, half4 codegen (#968)
* cuda: add uchar, int64 typedefs

* cuda: fix float16 codegen

* fuck it, half4 stub. llama time!

* inline fp16 half4, revert changes to CStyleLanguage

* add inline just in case

* remove half4 operators

* use dict
2023-06-12 11:15:44 -07:00
Steven Anderson e54b6c5e7f
One hot (#972)
* passing with 1d indices

* passing all test

* cleanup

* using safe_numpy for scalar
2023-06-12 10:13:29 -07:00
Diogo 613c74ca9f
maintain input tensor dtype (#969) 2023-06-12 10:12:47 -07:00
Diogo 2d4370b487
Adds tril & triu support (#936)
* triu & tril support

* lint and kernel count error

* switched shape indicies

* larger shape tests

* reverted numpy removal until #942 is resolved
2023-06-09 22:13:20 -07:00
George Hotz 48e9461197 broken tests for #862 and #942 2023-06-09 22:02:59 -07:00
George Hotz c62c64f0b7
remove GeNode (#965) 2023-06-09 21:48:56 -07:00
George Hotz 2c324d0685
fix metal uaf (#964) 2023-06-09 21:28:06 -07:00
Steven Anderson c0e558b77c
Test nllloss (#958)
* works but slow

* work with NC and NCd1 it still slow

* refactor

* support for k dimensions

* without numpy
2023-06-09 09:00:29 -07:00
Diogo 6b1280f01c
fixes to Onnx ops LayerNormalization/Prelu and added OptionalHasElement/OptionalGetElement (#956)
* prelu and where casting

* typing for safe_numpy

* optional

* get rid of tracing in ci

* cleanup and resolved layernorm issues

* removed debug print
2023-06-08 16:09:19 -07:00
Nicklas Boman 5c7248c72d
imagenet download and prepare (#928)
Changing if not exist to the exist_ok=True parameter and adding a variable check if you want to download training data also
adding variable to env_vars.md
2023-06-08 12:55:33 -07:00
George Hotz df40a9c238
EXP+LOG -> EXP2+LOG2 (#954)
* EXP+LOG -> EXP2+LOG2

* update docs
2023-06-08 10:57:31 -07:00
Diogo 666d151f8a
Onnx slice fixups (#952)
* resolved some slice test errors and added some more debugging logs

* use same device in cumsum

* increased float priority

* onnx debug ouput match input
2023-06-07 19:44:30 -07:00
cloud11665 e8a23d4331
there is a better way to do that! (#950) 2023-06-06 15:23:30 -07:00
SnakeOnex 990fc40219
made seed None by default -> numpy picks a random seed (#946)
* made seed None by default -> numpy picks a random seed

* fixed _seed type

* set the seed to unix timestamp

* make filetype int only
2023-06-06 13:06:23 -07:00
Timothy Lindblom a149f12a5b
Replaced broken link to /tests with /test (#939) 2023-06-06 10:29:09 -07:00
M4tthewDE 664d6cc7e5
Implement onnx MeanVarianceNormalization (#943) 2023-06-06 10:28:19 -07:00
Diogo 3bb38c3518
limit split to 1 due to windows path containing : (#944) 2023-06-06 10:27:54 -07:00
Steven Anderson 079ea217a3
fix test_pow_type - autocasting for Pow with inputs of diff type (#937) 2023-06-05 15:22:35 -07:00
George Hotz 76ab379f9b readme updates 2023-06-05 12:20:14 -07:00
cloud11665 43ea1614b0
fix inf/nan codegen (#935)
* fix inf/nan codegen

* remove nasty oneliner, fix -inf

* inf/nan const mul/div tests
2023-06-05 11:24:09 -07:00
Filip Dimitrovski 78460034ff
Initial ellipsis support when slicing Tensors (#843)
* Initial ellipsis support when slicing Tensors

* Better comments in ellipsis slicing

* Formatting
2023-06-05 07:52:49 -07:00
M4tthewDE 70f12fdb57
Fix wrong op version being used if versions equal (#934) 2023-06-05 07:45:10 -07:00
Steven Anderson 79613eb83e
Test min (#932)
* fix __neg__ defaulting to float32 due to 0.0

* fixed __neg__ always defaulting to float32

* fixed openpilot (OpenCL) Test
2023-06-05 00:03:30 -07:00
kposborne2 00360da05b
Update broken `docs/abstractions.py` for changed ops, and add to CI (#930)
* fix and add to ci

* still have those

* ocd

* update other doc
2023-06-04 19:21:20 -07:00
Tom Edwards 5bbcbd145c
Add cumsum with n-dim inputs (#922)
* add cumsum with n-dim inputs, over arbitrary axis + relevant tests

* increased rtol for cumsum test

* move test_cumsum into test_ops

* skip arange test for images as relies on cumsum

* Fix typo

* rewrite cumsum to work with images
2023-06-04 16:55:23 -07:00
wozeparrot 091bd65a68
feat: quick doc fixups (#923) 2023-06-04 11:03:57 -07:00
George Hotz fbf17f0031 intel benchmark matmul gets 60 TFLOPS? 2023-06-04 17:01:50 +00:00