Functions for Fast Memory manipulation with Pentium-class processors. More...
#include <fast_mem.h>
Static Public Member Functions | |
| static void | precache (const void *src, uint nbytes) |
| Fast precaching of memory in L1 cache using SSE or MMX where available (NB: others methods don't do the test) nbytes should not override 4K. | |
| static void * | memcpySSE (void *dst, const void *src, size_t nbytes) |
| Fast memcpy using SSE instructions: prefetchnta and movntq. | |
| static void | precacheSSE (const void *src, uint nbytes) |
| Fast precaching of memory in L1 cache using MMX/SSE instructions: movq and prefetchnta Result is typically 880 Mo/s (surely slower because of overhead). | |
| static void | precacheMMX (const void *src, uint nbytes) |
| Fast precaching of memory in L1 cache using MMX instructions only: movq Result is typically 720 Mo/s (surely slower because of overhead). | |
Static Public Attributes | |
| static void *(* | memcpy )(void *dts, const void *src, size_t nbytes) = findBestmemcpy () |
| This is a function pointer that points on the best memcpy function available depending of the OS and proc. | |
Functions for Fast Memory manipulation with Pentium-class processors.
From http://www.sgi.com/developers/technology/irix/resources/asc_cpu.html
Definition at line 42 of file fast_mem.h.
| void * NLMISC::CFastMem::memcpySSE | ( | void * | dst, | |
| const void * | src, | |||
| size_t | nbytes | |||
| ) | [static] |
Fast memcpy using SSE instructions: prefetchnta and movntq.
Can be called only if SSE and MMX is supported NB: Copy per block of 4K through L1 cache Result is typically 420 Mo/s instead of 150 Mo/s.
Definition at line 203 of file fast_mem.cpp.
References memcpy.
Referenced by NLMISC::findBestmemcpy().
| void NLMISC::CFastMem::precache | ( | const void * | src, | |
| uint | nbytes | |||
| ) | [static] |
Fast precaching of memory in L1 cache using SSE or MMX where available (NB: others methods don't do the test) nbytes should not override 4K.
Definition at line 216 of file fast_mem.cpp.
Referenced by NL3D::CShadowSkin::applySkin(), NL3D::CRayMesh::fastIntersect(), NL3D::CTextureDLM::modulateAndfillRect565(), NL3D::CTextureDLM::modulateAndfillRect8888(), and NL3D::CTextureDLM::modulateConstantAndfillRect().
| void NLMISC::CFastMem::precacheMMX | ( | const void * | src, | |
| uint | nbytes | |||
| ) | [static] |
Fast precaching of memory in L1 cache using MMX instructions only: movq Result is typically 720 Mo/s (surely slower because of overhead).
Hence prefer precacheSSE() when available. nbytes should not override 4K
Definition at line 212 of file fast_mem.cpp.
| void NLMISC::CFastMem::precacheSSE | ( | const void * | src, | |
| uint | nbytes | |||
| ) | [static] |
Fast precaching of memory in L1 cache using MMX/SSE instructions: movq and prefetchnta Result is typically 880 Mo/s (surely slower because of overhead).
nbytes should not override 4K
Definition at line 208 of file fast_mem.cpp.
| void *(* NLMISC::CFastMem::memcpy)(void *dts, const void *src, size_t nbytes) | ( | void * | dts, | |
| const void * | src, | |||
| size_t | nbytes | |||
| ) | = findBestmemcpy () [static] |
This is a function pointer that points on the best memcpy function available depending of the OS and proc.
In the best case, it will use memcpySSE(), and in worst case, it'll use the libc memcpy() Simply use it this way: CFastMem::memcpy(dst, src, size);
Referenced by NL3D::CVertexBuffer::copyVertices(), NL3D::DrawDot(), NLMISC::CMemStream::fill(), NL3D::CVertexBuffer::fillBuffer(), NL3D::CIndexBuffer::fillBuffer(), NL3D::CCoarseMeshManager::flushRender(), NLMISC::CBufFIFO::front(), NLSOUND::CContextSoundContainer< NbJoker, UseRandom, Shift >::init(), memcpySSE(), NLMISC::CBufFIFO::push(), NLSOUND::IBuffer::readWav(), NLMISC::CBufFIFO::resize(), NLMISC::CMemStream::serialBuffer(), and NLMISC::CMemStream::serialSeparatedBufferOut().
1.6.1