Commit Graph

2077 Commits

Author SHA1 Message Date
Yahya Lmallas fd66d1ca00
fix Tensor.manual_seed() default to wrong type (#1168)
* fix Tensor.manual_seed() default to wrong type None while it should be int

* remove that tests
2023-07-07 10:42:48 -07:00
Stan 9b6e57eccd
helpers.py: improved test coverage + exception handling (#1165)
* Fixes + improved test coverage for helpers.py

- added exception handling in `proc`, if an exception was thrown, the thread would hang
- made `_early_exec_process` catch any Exception, before if an exception was thrown before the process was started, it would hand the thread

* Made `_early_exec_process` catch any Exception

 Otherwise, if an exception was thrown before the process was started, it would hang the thread. For example a type error for an argument passed to `subprocess.check_output`

* Fixed `from tinygrad.helpers import Timing` import

oops, for some reason my IDE cleaned that import from extra/helpers.

* Fixed import in llama.py

Another one that I skipped by accident, mybad

* Extracted a class for tests of early exec

* Normalize line endings, windows uses /r/n

* Made `cross_process` not a daemon
2023-07-07 10:26:05 -07:00
Kunwar Raj Singh 8391648822
Over 90% on CIFAR with examples/hlb_cifar10.py (#1073)
* fix eval, lr decay, best eval

* 82.27

* 82.64

* 82.79, reproducable

* add lr sched, 85.26

* 87.42

* 87.94

* 87.42

* tta with flip

* training flip aug

* refactor

* using Tensor for LR is faster

* 89.5

* refactor, flip only train set

* 90.01

* 90.64

* eval jit

* refactor

* only JIT model

* fix eval JIT

* fix eval JIT

* 90.82

* STEPS=900 reaches 90.22

* TTA envvar

* TTA default 0

* fully jit training

* refactor optim

* fix sched

* add label smoothing

* param changes

* patial gelu

* OneCycle with pause

* gelu maybe works

* 90.12

* remove pause lr

* maybe fix lr schedulers

* scheduler test passing

* comments

* try mixup

* shuffle!

* add back the missing last eval

* fix shuffle bugs

* add mixup prob

* fix mixup prob

* 90.19

* correct mixup

* correct mixup

* correct mixup

* 90.24

* 90.33

* refactor, add type hints

* add gradient clipping

* maybe fix test

* full JIT

* back to relu for now

* pass mixup prob as param

* add typehints

* maybe CI works

* try erf gelu

* CI, types

* remove useless import/

* refactor optim

* refactor optim

* try leakyrelu

* try celu

* gelu

* 90.67

* remove grad clip

* remove grad clip tests

* revert params

* add test for OneCycleLR

* 90.62

* fix eval timing

* fix eval timing again

* so where i calculate mixup_prob matters

---------

Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>
2023-07-06 20:46:22 -07:00
Barath c5aea13a65
Fix evaluation stage in examples/transformer.py when using CUDA (#1150)
* make test data as contiguous array

* standardise contiguous array for all input data in cuda ops

* swap to x.ravel
2023-07-06 18:07:10 -07:00
Rayan Hatout 9975f24452
Fold expand preceding reduce if the reduction is on the same axis as the expansion (#1134)
* fold expands that precede a reduce if the reduction is on the same axis as the expansion

* add deterministic test for SIMPLIFY_SUM_RESHAPE_EXPAND_SUM optimization

* add a test case to make sure we don't fold reduce-expand-reduce on different axes
2023-07-06 13:41:05 -07:00
cheeetoo f109af3cbb
Don't save parents unless needed (#1142)
* don't save parents unless requires grad

* keep del ctx since idk
2023-07-05 18:11:57 -07:00
Eli Frigo 801564f31b
Remove POW llop and add SQRT llop (#1104)
* fixed division by zero for fast operations

* made et closer to 0

* replace POW llop with SQRT

* updated mlops to swap SQRT and POW llops

* updated hlops to swap POW and SQRT

* added sqrt llop to cpu runtime

* added sqrt llop to cstyle codegen

* added POW llop to llvm ir codegen

* added SQRT llop to torch runtime

* moved pow from mlops to hlops

* found a better way to do reverse pow

* fixed indentation

* added SQRT llop to triton

* update docs to match new llops

* removed POW operator from assembly codegen

* added sqrt and rsqrt to pow hlop

* rewrote pow function in tensor.py

* Adjust tolerance

* Adjust for adamw

* Reduce for Adam too

* removed accidental leftover code

* removed all of accidental code

* added rsqrt test

* removed pow from mlops again

it was added back when resolving merge conflicts

---------

Co-authored-by: Jacky Lee <jla524@sfu.ca>
2023-07-05 18:07:58 -07:00
cloud11665 b7369ffcff
add ptx formatter + syntax highlighter (#1128) 2023-07-05 17:56:09 -07:00
Reza Rezvan d1356cac27
Fix: Jacobian tests [WIP] (#1126)
* Fix: Jacobian tests; num_jacobian either bugged or not accurate enough;

* Fix: Jacobian tests;

* Fix: Gradcheck;
2023-07-05 15:36:22 -07:00
nimlgen d363d25ee2
fix imports for examples/transformer.py (#1136) 2023-07-05 08:15:13 -07:00
Mehmet Kuzucu c3173ff281
Add return statement to the train function (#1135)
add a return statement to the train function in order to provide access to the losses and accuracies lists
2023-07-05 08:13:38 -07:00
wozeparrot 981d4980c4
feat: reword contributing (#1131) 2023-07-04 22:17:47 -07:00
George Hotz 793a670187
from tensor cores + lb touchup (#1127) 2023-07-04 15:45:20 -07:00
George Hotz 2f968f8547 ignore cloudpickle type for local mypy 2023-07-04 13:51:20 -07:00
George Hotz 87d21ea979 examples: simple conv bn 2023-07-04 13:50:26 -07:00
Reza Rezvan 535224ac20
Remove float64 (#1101)
* Refactor: Remove float64

* Refactor: Remove unused imports

* Refactor: Remove float64

* Refactor: Remove float64

* Refactor: Exclude float64 onnx backend

* Add: Skip jacobian and gradcheck tests;
2023-07-04 08:40:51 -07:00
Daniel Hipke b4ce23e4b8
Make cross_process use cloudpickle (#1118)
* fix syntax issues in imagenet_download.py

* use cloudpickle in cross_process to make it work in Python 3.9+

* add cross_process test

* prevent unpickling on every function call

* add cloudpickle to setup.py

* add support for args/kwargs
2023-07-04 00:47:34 -07:00
George Hotz c709dec8b5 gelu: weird test was broken for metal 2023-07-04 00:43:54 -07:00
George Hotz daf8e1942f sigmoid: test large postive also and add note 2023-07-04 00:18:31 -07:00
Kunwar Raj Singh 9e6067378f
Broken Sigmoid backward: Add test and mlop for Sigmoid (#1113)
* Add failing sigmoid test

* update more tests

* add mlop for sigmoid

* add back test

* math.log(math.e) = 1

* remove divides

---------

Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>
2023-07-04 00:14:22 -07:00
Daniel Hipke d58a9603ab
Create COCO data directory if it doesn't exist. (#1114)
* Create COCO data directory if it doesn't exist.

* update paths to support windows
2023-07-03 18:15:53 -07:00
Anselm Coogan a22aad7d32
Use generators instead of lists in `any`s and `all`s (#1111)
* Use generators in any(..) instead of lists for better best-case

* Use generators in all(...) instead of lists

* enable R1729 in .pylintrc

* revert import sorting

---------

Co-authored-by: Anselm Coogan <anselm@scandit.com>
2023-07-03 16:06:06 -07:00
tricky-labyrinth fd98f6cffa
Small fix to abstractions.py so it runs on Windows without throwing an AttributeError (#1109)
Co-authored-by: Tricky Labyrinth <trickylabyrinth@gmail.com>
2023-07-03 13:44:49 -07:00
Mike Ovyan 651d080594
[perf] Replace more list comprehension with * (#1106)
* [perf] Replace more list comprehension with *

* comeback

* final fix?

* blind me

* kill me

* ?

* rev

* [none]
2023-07-03 10:49:23 -07:00
Frank Pinnola 2071e53da8
Handle broadcast flag on gemm (#1103) 2023-07-02 22:15:07 -07:00
Taras Tsugrii cbb5c655e5
[tensor][perf] Replace list comprehension with *. (#1102)
It's more concise, idiomatic and faster:
```
In [8]: %timeit [1 for _ in range(100)]
2.12 µs ± 26.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [9]: %timeit [1] * 100
515 ns ± 5.23 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
```
2023-07-02 18:34:23 -07:00
David Hou 363fbfc2e4
do not emit loop end code for global+local loops in assembly kernel (#1100) 2023-07-02 18:33:57 -07:00
Reza Rezvan 8ae9a054ae
Refactor nn.optim (#1091)
* Refactor: nn.optim.py

* Refactor: nn.optim.py; Fix all tests

* Refactor: Replace all optim.get_parameters()

* Refactor: Revert list comp.

* Refactor: Replace optim.get_state_dict

* Refactor: Change quickstart.md
2023-07-02 15:07:30 -07:00
Eli Frigo 10f1aeb144
fixed broken link (#1097) 2023-07-02 15:06:59 -07:00
Rob Grossman c8ddc34368
include missing queue in thneed load (#1095) 2023-07-02 12:33:59 -07:00
nmarwell26 12ce68c1ee
Renamed examples/yolo to examples/vgg7_helpers because that directory contains no yolo-related code and only helper code for vgg7. This was confusing to a new user when trying to understand the examples. (#1086) 2023-07-01 12:04:28 -07:00
Rob Grossman 2533a992e7
remove unused imports in models (#1088) 2023-07-01 12:04:19 -07:00
geohotstan 575f75f613
hello (#1084) 2023-07-01 01:29:35 -07:00
foreign-sub 574cbda979
Quickstart (#1015)
* fix quickstart md

* add quickstart to ci
2023-06-29 13:26:58 -07:00
Roelof van Dijk 542b2d93a5
Perf/cache string ops (#1078)
* perf: remove extra function, include in cached getitem

* perf: only calculate hash once per node

---------

Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-06-29 13:23:11 -07:00
George Hotz e234bf2298 hip matmul : add K support 2023-06-28 19:54:33 +00:00
George Hotz 0e93b9642a hip matmul 2023-06-28 19:21:01 +00:00
Jacky Lee 754e54ebb9
Fix Tensor ceil and floor for whole numbers (#1071)
* Works on non-special numbers

* Test different cases
2023-06-27 23:22:17 -07:00
George Hotz 1f5d45ca8c imagenet loader minor cleanups 2023-06-28 05:08:09 +00:00
George Hotz 6ec0a24706 imagenet eval in 1 min 28 sec 2023-06-28 04:23:26 +00:00
George Hotz 9fabdbd054
speed (#1070) 2023-06-27 20:28:57 -07:00
George Hotz d16c16ec28
new upcast works (#1066)
* new upcast works

* float4 try

* fix unaligned float4

* disallow unaligned access

* upcast dim

* maybe good now

* fix gpu half

* vstore_half4

* fix deep image bugs

* improve symbolic to fix issues

* fix symbolic

* cl test

* this maybe

* gcd of 1 is 1

* real fix for old python

* improve fuzzer
2023-06-27 19:34:53 -07:00
ernie 4d703be6d7
fix typo (#1065) 2023-06-27 10:56:54 -07:00
George Hotz 70c07dfea5
5k line max (#1064) 2023-06-27 10:53:18 -07:00
George Hotz c8d87eb8d4 strip whitespace 2023-06-27 10:11:43 -07:00
Rayan Hatout 23648538fa
fix folding of float4 add/mul (#1060) 2023-06-26 20:59:29 -07:00
George Hotz a98e361da0 torch speed test, add add 2023-06-26 18:55:27 -07:00
George Hotz 3e33befc1d
realize hotspots (#1059)
* realize hotspots

* no str check

* minor changes

* make this an assert

* faster and more readable

* nicer self.buffers

* tests for weak op + LAZYCACHE=0
2023-06-26 18:31:18 -07:00
George Hotz 2977fb17f6
various touchups (#1058)
* op isn't optional

* barrier + named local buffers

* end global and local loop together to avoid useless if statement

* better comments
2023-06-26 15:41:23 -07:00
George Hotz f265e8523a
movement ops aren't really ops (#1056) 2023-06-26 15:01:28 -07:00