* start
* fix err 93
* gpu
* ioctl mappings
* alloc like cuda
* semaphores
* wait for semaphores value
* start ops_nv
* very simple kernels work
* init several gpus
* qmd dumper
* dirty, but most of kernels work
* always all test_ops
* progress, more tests, stable
* test_ops passes, gpt2 works
but wth big fifo, wrap of fifo doesn't work, i think it's something coherency releated
* need better sync
* fix sync
* alloc2
* all tests pass!
* cleanup 1
* cleanup
* multigpu, simple transfer
* fix sync
* correct init
* nv_gpu autogen + sync bug fix
* clean extra/nv_gpu_driver
* p2p
* clean up
* remove old gen
* small fixes
* cleanup
* cleanup 2
* small fixes
* bigger queue size
* cleanups
* wait
* fixed signals for devs
* fix hang + parallel beam
* small fixes
* detect when local memory is big in kernel
* correct assert
* small fixes
* correct tls size est
* one va space
* less lines
* shorter
* save 2 lines
* save some lines
* remove type ignores
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* kfd driver wip
* cleanups
* kfd almost ready to ring doorbell
* ding dong?
* issues with signals
* something
* works
* ops kfd
* add amd_signal_t
* works...sometimes
* program runs
* _gpu_alloc cleanup
* cleanups
* work
* header + enable profiling (#3959)
* header + enable profiling
* just cleaner
* measure
* only local time domain
* remove old comments
* fix with master
* elf parsing (#3965)
* elf parsing
* fix kernels with private
* not used
* clean up
* clean up 2
* add flags
* kfd sdma (#3970)
* working sdma
* remove driver, shorter
* all commands we might need
* svm
* kfd remove hardcoded values (#4007)
* remove hardcoded values
* match above line
* 7k lines + revert hsa
* update that from origin
* fix sdma reg gen
* not the updated SDMA
* compiler_opts
* don't require kfd_ioctl
* get ioctls from python
* get ioctls from python
* remove build_sdma_command
* merge into 64-bit fields
* shorter
* fix property spelling and off by one
---------
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
* move gpuctypes in tree
* fix mypy
* regex exclude
* autogen sh
* mypy exclude
* does that fix it
* fix mypy
* add hip confirm
* verify all autogens
* build clang2py
* opencl headers
* gpu on 22.04