CUDA Memory Operators¶
- 
Tensor new_managed_tensor(const Tensor &self, const std::vector<std::int64_t> &sizes)¶
 Allocate an
at::Tensorwith unified managed memory (UVM). Then set its preferred storage location to CPU (host memory) and establish mappings on the CUDA device to the host memory.- Parameters:
 self – The input tensor
sizes – The target tensor dimensions
- Returns:
 A new tensor backed by UVM
- 
Tensor new_managed_tensor_meta(const Tensor &self, const std::vector<std::int64_t> &sizes)¶
 Placeholder operator for the
Metadispatch key.- Parameters:
 self – The input tensor
sizes – The target tensor dimensions
- Returns:
 A new empty tensor
- 
Tensor new_host_mapped_tensor(const Tensor &self, const std::vector<std::int64_t> &sizes)¶
 Allocate the
at::Tensorwith host-mapped memory.- Parameters:
 self – The input tensor
sizes – The target tensor dimensions
- Returns:
 A new tensor backed by host-mapped memory
- 
Tensor new_unified_tensor(const Tensor &self, const std::vector<std::int64_t> &sizes, bool is_host_mapped)¶
 Allocate the
at::Tensorwith either unified managed memory (UVM) or host-mapped memory.- Parameters:
 self – The input tensor
sizes – The target tensor dimensions
is_host_mapped – Whether to allocate UVM or host-mapped memory
- Returns:
 A new tensor backed by UVM or host-mapped memory, depending on the value of
is_host_mapped
- 
Tensor new_unified_tensor_meta(const Tensor &self, const std::vector<std::int64_t> &sizes, bool is_host_mapped)¶
 Placeholder operator for the
Metadispatch key for new_unified_tensor- Parameters:
 self – The input tensor
sizes – The target tensor dimensions
is_host_mapped – Whether to allocate UVM or host-mapped memory
- Returns:
 A new tensor backed by UVM or host-mapped memory, depending on the value of
is_host_mapped
- 
Tensor new_vanilla_managed_tensor(const Tensor &self, const std::vector<std::int64_t> &sizes)¶
 Allocate an
at::Tensorwith unified managed memory (UVM), but allow for its preferred storage location to be automatically managed.- Parameters:
 self – The input tensor
sizes – The target tensor dimensions
- Returns:
 A new tensor backed by UVM
- 
bool uvm_storage(const Tensor &self)¶
 Check if a tensor is allocated with UVM (either CPU or GPU tensor).
- Parameters:
 self – The input tensor
- Returns:
 trueif the tensor is allocated with UVM, otherwisefalse
- 
bool is_uvm_tensor(const Tensor &self)¶
 Check if a tensor is allocated with UVM, BUT is not a CPU tensor.
- Parameters:
 self – The input tensor
- Returns:
 trueif the tensor is a non-CPU tensor allocated with UVM, otherwisefalse
- 
Tensor uvm_to_cpu(const Tensor &self)¶
 Convert a UVM tensor to a CPU tensor.
- Parameters:
 self – The input tensor
- Returns:
 A new tensor that is effectively the input moved from UVM to CPU
- 
Tensor uvm_to_device(const Tensor &self, const Tensor &prototype)¶
 Create a new UVM tensor that shares the same device and UVM storage with
prototype.- Parameters:
 self – The input tensor
prototype – The target tensor whose device and and UVM storage will be shared with the new tensor
- Returns:
 A new tensor that shares the same device and UVM storage with
prototype.
- 
void uvm_cuda_mem_advise(const Tensor &self, int64_t cuda_memory_advise)¶
 Call
cudaMemAdvise()on a UVM tensor’s storage. ThecudaMemoryAdviseenum is available on the Python side in thefbgemm_gpu.uvmnamespace; see the documentation over there for valid values.See also
See here for more information on the
cudaMemoryAdviseenum.- Parameters:
 self – The input tensor
cuda_memory_advise – The
cudaMemoryAdviseenum value, as integer
- 
void uvm_cuda_mem_prefetch_async(const Tensor &self, std::optional<Tensor> device_t)¶
 Call
cudaMemPrefetchAsync()on a UVM tensor’s storage to prefetch memory to a destination device.See also
See here for more information on
cudaMemPrefetchAsync().- Parameters:
 self – The input tensor
device_t – [OPTIONAL] The tensor whose device will be the prefetch destination
- 
void uvm_mem_advice_dont_fork(const Tensor &self)¶
 Call
madvise(...MADV_DONTFORK)on a UVM tensor’s storage. This is a workaround for an issue where the UVM kernel driver un-maps UVM storage pages from the page table on fork, causing slowdown on the next access from a CPU.See also
See here for more information on
madvise().- Parameters:
 self – The input tensor
- 
Tensor uvm_to_cpu_clone(const Tensor &self)¶
 Copy a UVM tensor’s contiguous storage (uvm_storage(t) is true) into a new CPU Tensor. The copy operation uses single-threaded
memcpy().- Parameters:
 self – The input tensor
- Returns:
 A new CPU tensor containing the data copied from the UVM tensor
Copy a tensors contents to shared memory. This can be useful for forcing the initialization state of gpu memory, which is relevant for testing.
- Parameters:
 self – The input tensor
Copy nan values into a gpu’s shared memory. This is useful for debugging or testing.
- Parameters:
 self – The input tensor