While debugging some DSP code yesterday I came a cross a performance oddity. Adding more code lowered the performance of an unrelated function.
By itself this is not *that* odd. It happens if the size of your code is larger than your first level code-cache and different functions start to kick each other out of the cache. However, in my little toy program this was unlikely. I had only around 20kb of code and the code-cache is 32kb in size.
Better safe than sorry I thought and took a look how the caches are configured. Big and pleasant surprise: Two of them are running at half the maximum size for no good reason:
In my case after DSP-boot I got:
Level 1 Data-Cache 32k Level 1 Code-Cache 16k Level 2 Cache 32k
However, the maximum possible cache sizes for the BeagleBoard are
Level 1 Data-Cache 32k (no change) Level 1 Code-Cache 32k (16kb larger) Level 2 Cache 64k (32kb larger)
So 48kb of valuable cache has been left unused. Changing the cache sizes is easy:
#include < bcache.h > // and somewhere at the start of main() BCACHE_Size size; size.l1dsize = BCACHE_L1_32K; size.l1psize = BCACHE_L1_32K; size.l2size = BCACHE_L2_64K; BCACHE_setSize (&size);
That still leaves you the 48kb of L1DSRAM for single cycle access and 32kb of L2RAM to talk with the video accelerators. Oh – and it gave a noticeable performance boost.
Btw- it’s very possible that this only applies to the DspLink configuration that I am using.
Update:
It turned out that the reason for the smaller cache-sizes is the default DspLink configuration. You can override this if you add the following lines to your projects TCF-file. Just put them somewhere between utils.importFile(“dsplink-omap3530-base.tci”); and prog.gen():
prog.module("GBL").C64PLUSL2CFG = "64k";
prog.module("GBL").C64PLUSL1DCFG = "32k";
prog.module("GBL").C64PLUSL1PCFG = "32k";
var IRAM = prog.module("MEM").instance("IRAM");
IRAM.len = 32768;
This will configure the OMAP3530 DSP with:
L2-Cache: 64kb L1 Data-Cache 32kb L1 Code-Cache 32kb L1SDRAM 48kb IRAM (L2 Ram) 32kb
Nils,
I am in the same situation. My .map gives this:
MEMORY CONFIGURATION
name origin length used unused attr fill
———————- ——– ——— ——– ——– —- ——–
CACHE_L2 10808000 00008000 00000000 00008000 RWIX
CACHE_L1P 10e04000 00004000 00000000 00004000 RWIX
L1DSRAM 10f04000 00004000 00000382 00003c7e RWIX
CACHE_L1D 10f10000 00008000 00000000 00008000 RWIX
But L1DSRAM is only 16k…
Regards,
Guillaume
Comment by Guillaume — February 7, 2010 @ 5:17 pm
Hi.
It seems like the .tcf-file is simply wrong. The L1DSRAM is 48kb, Take a look at the OMAP3530 Technical Reference Manual chapter 2.4.4 (L3 Interconnect View of the IVA2.2 Subsystem Memory Space). L1SDRAM goes from 0x10f04000 to 0x10f0ffff, directly followed by the 32kb of L1D-cache.
Cheers,
Nils
Comment by Nils — February 7, 2010 @ 5:53 pm
Hi,
I’m trying to change the dsplink-omap3530-base.tci file, but the I always have compilation failure… (like MEM segment IRAM: overlaps with another segment or cache configuration).
I haven’t figured out which other definition conflicts with this..
Guillaume
Comment by Guillaume — February 7, 2010 @ 6:13 pm
Hi Guillaume,
How about this? http://torus.untergrund.net/code/dsplink-omap3530-base.tci
The tconf works, but I haven’t tested if the dsp-binaries compiled with this script work. The map-file looks good though.
Cheers,
NIls
Comment by Nils — February 7, 2010 @ 6:36 pm
Just verified that it works.. Not only in theory but in practice. Yay!
Comment by Nils — February 7, 2010 @ 7:28 pm
Hi,
I am trying to implement an image processing algorithm on beagleboard. I am planning to use DSP in the beagleboard to do the job. Since I am a beginner with the beagleboard I have no idea how to write code for dsp and run it in the beaglebord. Could you please tell me the steps that I should follow to get this done?
I have a beagleboard C4 running embinux android eclair on it.
Comment by Joseph — February 23, 2011 @ 7:45 am
Hi,
Can you please let me know what all modifications can one make to the cache inside the BeagleBoard. I am interested in checking for cache optimization specifically for Set Top Boxes but do not have enough information available to go ahead. I already have a beagleboard.
Any inputs or pointers will be welcome.
Thanks,
Abhijeet
Comment by Abhijeet — December 9, 2011 @ 3:43 pm