While debugging some DSP code yesterday I came a cross a performance oddity. Adding more code lowered the performance of an unrelated function.
By itself this is not *that* odd. It happens if the size of your code is larger than your first level code-cache and different functions start to kick each other out of the cache. However, in my little toy program this was unlikely. I had only around 20kb of code and the code-cache is 32kb in size.
Better safe than sorry I thought and took a look how the caches are configured. Big and pleasant surprise: Two of them are running at half the maximum size for no good reason:
In my case after DSP-boot I got:
Level 1 Data-Cache 32k Level 1 Code-Cache 16k Level 2 Cache 32k
However, the maximum possible cache sizes for the BeagleBoard are
Level 1 Data-Cache 32k (no change) Level 1 Code-Cache 32k (16kb larger) Level 2 Cache 64k (32kb larger)
So 48kb of valuable cache has been left unused. Changing the cache sizes is easy:
#include < bcache.h > // and somewhere at the start of main() BCACHE_Size size; size.l1dsize = BCACHE_L1_32K; size.l1psize = BCACHE_L1_32K; size.l2size = BCACHE_L2_64K; BCACHE_setSize (&size);
That still leaves you the 48kb of L1DSRAM for single cycle access and 32kb of L2RAM to talk with the video accelerators. Oh – and it gave a noticeable performance boost.
Btw- it’s very possible that this only applies to the DspLink configuration that I am using.
It turned out that the reason for the smaller cache-sizes is the default DspLink configuration. You can override this if you add the following lines to your projects TCF-file. Just put them somewhere between utils.importFile(“dsplink-omap3530-base.tci”); and prog.gen():
prog.module("GBL").C64PLUSL2CFG = "64k"; prog.module("GBL").C64PLUSL1DCFG = "32k"; prog.module("GBL").C64PLUSL1PCFG = "32k"; var IRAM = prog.module("MEM").instance("IRAM"); IRAM.len = 32768;
This will configure the OMAP3530 DSP with:
L2-Cache: 64kb L1 Data-Cache 32kb L1 Code-Cache 32kb L1SDRAM 48kb IRAM (L2 Ram) 32kb