tinygrad/setup.py

#!/usr/bin/env python3

import os
from setuptools import setup

directory = os.path.abspath(os.path.dirname(__file__))
with open(os.path.join(directory, 'README.md'), encoding='utf-8') as f:
  long_description = f.read()

setup(name='tinygrad',
      version='0.6.0',
      description='You like pytorch? You like micrograd? You love tinygrad! <3',
      author='George Hotz',
      license='MIT',
      long_description=long_description,
      long_description_content_type='text/markdown',
      packages = ['tinygrad', 'tinygrad.codegen', 'tinygrad.nn', 'tinygrad.runtime', 'tinygrad.shape'],
      classifiers=[
        "Programming Language :: Python :: 3",
        "License :: OSI Approved :: MIT License"
      ],
      install_requires=['numpy', 'requests', 'pillow', 'tqdm', 'networkx', 'pyopencl', 'PyYAML'],
      python_requires='>=3.8',
      extras_require={
        'llvm': ["llvmlite"],
        'cuda': ["pycuda"],
        'triton': ["triton>=2.0.0.dev20221202"],
        'webgpu': ["wgpu"],
        'metal': ["pyobjc-framework-Metal", "pyobjc-framework-Cocoa", "pyobjc-framework-libdispatch"],
        'linting': [
            "flake8",
            "pylint",
            "mypy",
            "pre-commit",
        ],
        'testing': [
            "torch",
            "pytest",
            "pytest-xdist",
            "onnx",
            "onnx2torch",
            "opencv-python",
            "tabulate",
            "safetensors",
            "types-PyYAML",
            "cloudpickle",
        ],
      },
      include_package_data=True)
add setup.py and change imports to relative 2020-10-26 23:19:50 +08:00			`#!/usr/bin/env python3`

			`import os`
			`from setuptools import setup`

			`directory = os.path.abspath(os.path.dirname(__file__))`
			`with open(os.path.join(directory, 'README.md'), encoding='utf-8') as f:`
two spaces 2020-10-26 23:54:55 +08:00			`long_description = f.read()`
add setup.py and change imports to relative 2020-10-26 23:19:50 +08:00
			`setup(name='tinygrad',`
version bump 2023-05-26 08:51:25 +08:00			`version='0.6.0',`
Fixed package description (#761) * Updated LICENSE year * Fixed package description 2023-05-04 01:21:05 +08:00			`description='You like pytorch? You like micrograd? You love tinygrad! <3',`
add setup.py and change imports to relative 2020-10-26 23:19:50 +08:00			`author='George Hotz',`
			`license='MIT',`
			`long_description=long_description,`
			`long_description_content_type='text/markdown',`
Refactor ASTs (#622) * ugh worst branch name * compiler refactor continues * scc -> cloc * buf -> _buf * finish _buf, and program -> runtime * gpu is still working, clang isn't * clang in new style * ops_metal * something broke it * improve metal * clean up tons of cl crap * hack fix sync * cleaner gpu * gpu metal clang * cleanups * minor refactor * GPUCodegen * fix up LLVM * blind CUDA refactor * codegen / runtime * keep ops naming * linter passes * woah, llvm was allocing 4x what it needed to * bugfixes * fix openpilot compiler * fix compile_efficientnet * method cache should fix tests * deal with duped functions 2023-03-02 10:57:29 +08:00			`packages = ['tinygrad', 'tinygrad.codegen', 'tinygrad.nn', 'tinygrad.runtime', 'tinygrad.shape'],`
add setup.py and change imports to relative 2020-10-26 23:19:50 +08:00			`classifiers=[`
two spaces 2020-10-26 23:54:55 +08:00			`"Programming Language :: Python :: 3",`
			`"License :: OSI Approved :: MIT License"`
add setup.py and change imports to relative 2020-10-26 23:19:50 +08:00			`],`
RDNA assembly backend ($1000 bounty) (#787) * Revert "Revert "ops rdna"" This reverts commit 04003150785314ecec6cdf4859f6db0c18a8d76a. * Revert "Revert "writing 2"" This reverts commit 325a3bf2cfb7be6d66ccb3911141c336a379a505. * no dump * 2x 2 * simple asm * local size * sub * lil work * support args != 3 * assembler work * generate that * ptx assembler * begin index renderer * max * ptx loops * gemms work * valid works * asm working a bit more * close * passing all ops tests * ptx is a codegen only, not a backend * ptx * float16 support * rdna goes here * install types * make amd disassemble * ansilen for pretty print * fix ptx log2/exp2 * assemblyinstruction * new asm * working gemm * fix cmp * more passing * mod * ptx works again * rdan3 add works * log exp * sin is sin 2pi * fix types * progress * loops work * rdna xyz * better addressing * cleanups * handle exception in early process * div support * rdna float4 * locals work * fix neg index * cast * smaller diff * yaml * import only if selected * fromimport * types * this all needs rewriting * a few more 2023-06-17 00:33:18 +08:00			`install_requires=['numpy', 'requests', 'pillow', 'tqdm', 'networkx', 'pyopencl', 'PyYAML'],`
Update setup.py (#49) I think `:=` in tinygrad/test/test_mnist.py actually needs 3.8 2020-11-03 10:09:31 +08:00			`python_requires='>=3.8',`
Extra install requirements. (#164) * Testing install requirements * GPU install requirements 2020-12-09 18:22:47 +08:00			`extras_require={`
Simple chonker (#431) * chonker will make llvm fast * work * better speed tests, we will make them fast * with the cache add is the same speed * relu and neg are fast * fix sum speed * maximum maxnum? * hack for gemm opt * gemm very slow * zeros like * test_permute * shapetracker returns self * fix shapetracker factorization * err, int strides * permutes are faster now in tinygrad than pytorch * support -1 in expand * gemm unrolled * improve final test case * WIP GEMM * why isn't GEMM fast? * revert cache dim * ffp contract works on clang, not llvm? * ignore llvm ir * this makes fma work at least, but no faster * USE_4x4 * 63 GFLOPS * 87 GFLOPS * that wasn't matmul, 44 GFLOPS now * 82 GFLOPS permuted * this permute too * a little speed for the convs * 45 GFLOPS * speed tests pass again * clean up prints * fix FMA WHAT A WASTE OF TIME * colors * moar fair * GPU * useless on chonker * cleanups * improve factorized shapetracker * better threshold * label conv * work * ops test pass again * hot load the index * run the last view, no need to create * ZeroView needs a repr for the key to work * fix segfault on out of bounds * one more test * start amx, and llvm.initialize_native_asmparser * amx works * nice AMX class * nicer AMX class * refactor get_idxs * amx working * is slower... * useless flip * cache * SZ_X * AMX_SZ_X/Y work alone * Contiguous mlop * test gemm packed * PREPARE in packed * use_amx factor * prefetch isn't faster * loop * same 3ms * 2.24 ms * allow double on store in TG * amx reduce is the same speed as non amx reduce * include memory bandwidth * clean up shapetracker * flip returns stride * prepare for upstream * Update ops_llvm.py (#426) * permutes are yellow and green now * faster conv * llvm cleanups * Show optimised IR under debug 4 (#428) * ASTKernel class * Make tinygrad work with older python version (#427) * Make tinygrad work with older python version * Use partialmethod instead of partial * smiple chonker is chonking * remove junk from test speed vs torch * fix linker and types * AMX is only here now * add LLVM tests, it's a valid backend now * oops, run llvm test * contiguous_op * fix loadops compare * dedup reduceops Co-authored-by: calledit <1573053+calledit@users.noreply.github.com> 2022-11-11 15:17:09 +08:00			`'llvm': ["llvmlite"],`
Simple CUDA Runtime (#480) * factor out opencl runtime * don't use CL outside the runtime * cuda runtime adds * final_dimension * tests pass with CUDA backend * more cuda * cuda simpler * retain old functionality * linter and typing * move globalcounters out of runtimes * oops, GlobalCounters in cuda * MAX_OUTPUT_SHAPE=3 is fine for CUDA 2023-01-28 08:26:24 +08:00			`'cuda': ["pycuda"],`
A Triton backend for tinygrad (#470) * triton can add * print stuff from triton * write out file * ops triton working * reduce ops * sort of works * Triton bugfixes & implementation of remaining ops (#490) * padding * support pow, max, relu, gt0 * allocate return buffer * Fix reduce * Add tests for power op * Fix triton illegal memory accesses and memory leak (#512) * Fix mypy issue * Add triton to setup.py * Replace torch with pycuda * Use one cuda stream for data transfer and kernels * Remove triton submodule * Fix memory leak by using weakrefs for caching * Fix memory access by adding valid as mask for load * Fix invalid kernel launches by flattening the grid (#515) --------- Co-authored-by: Martin Loretz <20306567+martinloretzzz@users.noreply.github.com> 2023-02-02 03:53:57 +08:00			`'triton': ["triton>=2.0.0.dev20221202"],`
Webgpu support (#1077) * initial commit * 81 passing * 105 passing tests * 148 passing * CI tests * install dep on ci * try opencl pkgs * try using vulkan * down to only 6 failing * refactor * cleaning up * another test skipped due to buffer limit * linter * segfault * indent fix * another segfault found * small touchups * Fix max and maxpool tests * Add constant folding * Add javascript export script * better asserts in codegen * manual upcasting * reverted token type change * skip safetensor test due to unsupported type * FIx efficientnet and all other model tests * Remove np copy * fixed indent and missing import * manually destroy the buffer * revert back to length * linter errors * removed extra val * skip broken tests * skipping more tests * Make the page pretty * Save model weights as safetensor * Fix imagenet to c test * Fix second imagenet to c bug * Async and paralel kernel compilation * workgroup support * reversed local size * fixed non local bug * correct local groups * ci experiment * removed typo * Fix define local by using shared memory * Refactor * try running on mac * match metal tests * add more workers * scope down tests * trying windows runner * fixed windows env * see how many it can do * merged master * refactor * missed refactor * increase test suite coverage * missing import * whitespace in test_efficientnet.py * getting there * fixed reset * fixed bufs * switched to cstyle * cleanup * min/max rename * one more linter issue * fixed demo * linter * testing ci chrome * add unsafe webgpu arg * add build step * remove WEBGPU from cmd line * use module * try forcing directx * trying forced metal backend * temp disable conv2d for CI * disable conv_trasnpose2d --------- Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> 2023-07-13 03:52:06 +08:00			`'webgpu': ["wgpu"],`
fix metal dep 2023-02-21 12:43:15 +08:00			`'metal': ["pyobjc-framework-Metal", "pyobjc-framework-Cocoa", "pyobjc-framework-libdispatch"],`
shuffle deps: always tqdm, make linting category 2023-02-06 23:27:01 +08:00			`'linting': [`
			`"flake8",`
			`"pylint",`
			`"mypy",`
			`"pre-commit",`
			`],`
			`'testing': [`
latest torch and onnx should be fine 2023-03-23 14:33:50 +08:00			`"torch",`
oops, pytest is for testing 2023-02-06 23:30:12 +08:00			`"pytest",`
Parallelize CI tests (#535) 2023-02-07 05:27:44 +08:00			`"pytest-xdist",`
latest torch and onnx should be fine 2023-03-23 14:33:50 +08:00			`"onnx",`
it's a real test now 2022-06-12 02:33:33 +08:00			`"onnx2torch",`
Cleanup yolo and remove stateless classes (#604) * Add AvgPool2d as a layer * Clean up a bit * Remove stateless layers in yolo_nn * More cleanup * Save label for test * Add test for YOLO * Test without cv2 * Don't fail if cv2 not installed * Better import * Fix image read * Use opencv :) * Don't download the file * Fix errors * Use same version * Set higher confidence * Why is the confidence so low? * Start over * Remove stateless layers * Remove extra lines * Revert changes * Save a few more lines 2023-02-27 08:55:21 +08:00			`"opencv-python",`
make tests faster + add onnx (#815) * search one dir, disable slow * onnx tests * fast rnnt test 2023-05-27 23:53:32 +08:00			`"tabulate",`
safetensors! (#903) * safetensors test * safe_save * load back with real safetensors * bugfix in device name. add simple torch_load * it works for llama, but it's slower... * mmap * no intermediate * load mmaped * readinto speed * not ready yet * revert that 2023-06-03 04:41:09 +08:00			`"safetensors",`
RDNA assembly backend ($1000 bounty) (#787) * Revert "Revert "ops rdna"" This reverts commit 04003150785314ecec6cdf4859f6db0c18a8d76a. * Revert "Revert "writing 2"" This reverts commit 325a3bf2cfb7be6d66ccb3911141c336a379a505. * no dump * 2x 2 * simple asm * local size * sub * lil work * support args != 3 * assembler work * generate that * ptx assembler * begin index renderer * max * ptx loops * gemms work * valid works * asm working a bit more * close * passing all ops tests * ptx is a codegen only, not a backend * ptx * float16 support * rdna goes here * install types * make amd disassemble * ansilen for pretty print * fix ptx log2/exp2 * assemblyinstruction * new asm * working gemm * fix cmp * more passing * mod * ptx works again * rdan3 add works * log exp * sin is sin 2pi * fix types * progress * loops work * rdna xyz * better addressing * cleanups * handle exception in early process * div support * rdna float4 * locals work * fix neg index * cast * smaller diff * yaml * import only if selected * fromimport * types * this all needs rewriting * a few more 2023-06-17 00:33:18 +08:00			`"types-PyYAML",`
Make cross_process use cloudpickle (#1118) * fix syntax issues in imagenet_download.py * use cloudpickle in cross_process to make it work in Python 3.9+ * add cross_process test * prevent unpickling on every function call * add cloudpickle to setup.py * add support for args/kwargs 2023-07-04 15:47:34 +08:00			`"cloudpickle",`
Extra install requirements. (#164) * Testing install requirements * GPU install requirements 2020-12-09 18:22:47 +08:00			`],`
			`},`
add setup.py and change imports to relative 2020-10-26 23:19:50 +08:00			`include_package_data=True)`