hilbert-space

December 18, 2009

ARM NEON Optimization. An Example

Filed under: Beagleboard,OMAP3530 — Nils @ 8:15 pm

Since there is so little information about NEON optimizations out there I thought I’d write a little about it.

Some weeks ago someone on the beagle-board mailing-list asked how to optimize a color to grayscale conversion for images. I haven’t done much pixel processing with ARM NEON yet, so I gave if a try. The results I got where quite spectacular, but more on this later.

For the color to grayscale conversion I used a very simple conversion scheme: A weighted average of the red, green and blue components. This conversion ignores the effect of gamma but works good enough in practice. Also I decided not to do proper rounding. It’s just an example after all.

First a reference implementation in C:

void reference_convert (uint8_t * __restrict dest, uint8_t * __restrict src, int n)
{
  int i;
  for (i=0; i<n; i++)
  {
    int r = *src++; // load red
    int g = *src++; // load green
    int b = *src++; // load blue 

    // build weighted average:
    int y = (r*77)+(g*151)+(b*28);

    // undo the scale by 256 and write to memory:
    *dest++ = (y>>8);
  }
}

Optimization with NEON Intrinsics

Lets start optimizing the code using the compiler intrinsics. Intrinsics are nice to use because you they behave just like C-functions but compile to a single assembler statement. At least in theory as I’ll show you later..

Since NEON works in 64 or 128 bit registers it’s best to process eight pixels in parallel. That way we can exploit the parallel nature of the SIMD-unit. Here is what I came up with:

void neon_convert (uint8_t * __restrict dest, uint8_t * __restrict src, int n)
{
  int i;
  uint8x8_t rfac = vdup_n_u8 (77);
  uint8x8_t gfac = vdup_n_u8 (151);
  uint8x8_t bfac = vdup_n_u8 (28);
  n/=8;

  for (i=0; i<n; i++)
  {
    uint16x8_t  temp;
    uint8x8x3_t rgb  = vld3_u8 (src);
    uint8x8_t result;

    temp = vmull_u8 (rgb.val[0],      rfac);
    temp = vmlal_u8 (temp,rgb.val[1], gfac);
    temp = vmlal_u8 (temp,rgb.val[2], bfac);

    result = vshrn_n_u16 (temp, 8);
    vst1_u8 (dest, result);
    src  += 8*3;
    dest += 8;
  }
}

Lets take a look at it step by step:

First off I load my weight factors into three NEON registers. The vdup.8 instruction does this and also replicates the byte into all 8 bytes of the NEON register.

    uint8x8_t rfac = vdup_n_u8 (77);
    uint8x8_t gfac = vdup_n_u8 (151);
    uint8x8_t bfac = vdup_n_u8 (28); 

Now I load 8 pixels at once into three registers.

    uint8x8x3_t rgb  = vld3_u8 (src);

The vld3.8 instruction is a specialty of the NEON instruction set. With NEON you can not only do loads and stores of multiple registers at once, you can de-interleave the data on the fly as well. Since I expect my pixel data to be interleaved the vld3.8 instruction is a perfect fit for a tight loop.

After the load, I have all the red components of 8 pixels in the first loaded register. The green components end up in the second and blue in the third.

Now calculate the weighted average:

    temp = vmull_u8 (rgb.val[0],      rfac);
    temp = vmlal_u8 (temp,rgb.val[1], gfac);
    temp = vmlal_u8 (temp,rgb.val[2], bfac);

vmull.u8 multiplies each byte of the first argument with each corresponding byte of the second argument. Each result becomes a 16 bit unsigned integer, so no overflow can happen. The entire result is returned as a 128 bit NEON register pair.

vmlal.u8 does the same thing as vmull.u8 but also adds the content of another register to the result.

So we end up with just three instructions for weighted average of eight pixels. Nice.

Now it’s time to undo the scaling of the weight factors. To do so I shift each 16 bit result to the right by 8 bits. This equals to a division by 256. ARM NEON has lots of instructions to do the shift, but also a “narrow” variant exists. This one does two things at once: It does the shift and afterwards converts the 16 bit integers back to 8 bit by removing all the high-bytes from the result. We get back from the 128 bit register pair to a single 64 bit register.

    result = vshrn_n_u16 (temp, 8);

And finally store the result.

    vst1_u8 (dest, result);

First Results:

How does the reference C-function and the NEON optimized version compare? I did a test on my Omap3 CortexA8 CPU on the beagle-board and got the following timings:

C-version:       15.1 cycles per pixel.
NEON-version:     9.9 cycles per pixel.

That’s only a speed-up of factor 1.5. I expected much more from the NEON implementation. It processes 8 pixels with just 6 instructions after all. What’s going on here? A look at the assembler output explained it all. Here is the inner-loop part of the convert function:

 160:   f46a040f        vld3.8  {d16-d18}, [sl]
 164:   e1a0c005        mov     ip, r5
 168:   ecc80b06        vstmia  r8, {d16-d18}
 16c:   e1a04007        mov     r4, r7
 170:   e2866001        add     r6, r6, #1      ; 0x1
 174:   e28aa018        add     sl, sl, #24     ; 0x18
 178:   e8bc000f        ldm     ip!, {r0, r1, r2, r3}
 17c:   e15b0006        cmp     fp, r6
 180:   e1a08005        mov     r8, r5
 184:   e8a4000f        stmia   r4!, {r0, r1, r2, r3}
 188:   eddd0b06        vldr    d16, [sp, #24]
 18c:   e89c0003        ldm     ip, {r0, r1}
 190:   eddd2b08        vldr    d18, [sp, #32]
 194:   f3c00ca6        vmull.u8        q8, d16, d22
 198:   f3c208a5        vmlal.u8        q8, d18, d21
 19c:   e8840003        stm     r4, {r0, r1}
 1a0:   eddd3b0a        vldr    d19, [sp, #40]
 1a4:   f3c308a4        vmlal.u8        q8, d19, d20
 1a8:   f2c80830        vshrn.i16       d16, q8, #8
 1ac:   f449070f        vst1.8  {d16}, [r9]
 1b0:   e2899008        add     r9, r9, #8      ; 0x8
 1b4:   caffffe9        bgt     160

Note the store at offset 168? The compiler decides to write the three registers onto the stack. After a bit of useless memory accesses from the GPP side the compiler reloads them (offset 188, 190 and 1a0) in exactly the same physical NEON register.

What all the ordinary integer instructions do? I have no idea. Lots of memory accesses target the stack for no good reason. There is definitely no shortage of registers anywhere. For reference: I used the GCC 4.3.3 (CodeSourcery 2009q1 lite) compiler .

NEON and assembler

Since the compiler can’t generate good code I wrote the same loop in assembler. In a nutshell I just took the intrinsic based loop and converted the instructions one by one. The loop-control is a bit different, but that’s all.

convert_asm_neon:

      # r0: Ptr to destination data
      # r1: Ptr to source data
      # r2: Iteration count:

    	push   	    {r4-r5,lr}
      lsr         r2, r2, #3

      # build the three constants:
      mov         r3, #77
      mov         r4, #151
      mov         r5, #28
      vdup.8      d3, r3
      vdup.8      d4, r4
      vdup.8      d5, r5

  .loop:

      # load 8 pixels:
      vld3.8      {d0-d2}, [r1]!

      # do the weight average:
      vmull.u8    q3, d0, d3
      vmlal.u8    q3, d1, d4
      vmlal.u8    q3, d2, d5

      # shift and store:
      vshrn.u16   d6, q3, #8
      vst1.8      {d6}, [r0]!

      subs        r2, r2, #1
      bne         .loop

      pop         { r4-r5, pc }

Final Results:

Time for some benchmarking again. How does the hand-written assembler version compares? Well – here are the results:

  C-version:       15.1 cycles per pixel.
  NEON-version:     9.9 cycles per pixel.
  Assembler:        2.0 cycles per pixel.

That’s roughly a factor of five over the intrinsic version and 7.5 times faster than my not-so-bad C implementation. And keep in mind: I didn’t even optimized the assembler loop.

My conclusion: If you want performance out of your NEON unit stay away from the intrinsics. They are nice as a prototyping tool. Use them to get your algorithm working and then rewrite the NEON-parts of it in assembler.

Btw: Sorry for the ugly syntax-highlighting. I’m still looking for a nice wordpress plug-in.

92 Comments »

  1. Your post is very interesting. I was reading up about Neon intrinsics. And I came across your post. This information is very useful. BTW I have a few doubts/comments.

    1. Where are the src and dst pointers pointing to ? Is it DDR ? or some faster memory like L3 memory in OMAP3 ? Can you achieve 2 cycles per pixel performance if data is in DDR, and A8 accesses it through Dcache. My guess was that, if data is in DDR, Dcache misses would be a gating factor and will result in worse performance, than 2 cycles per pixel. Can you please clarify.

    2. Can you try declaring the following variables outside the loop.

    uint16x8_t temp;
    uint8x8x3_t rgb = vld3_u8 (src);
    uint8x8_t result;

    I am wondering if declaring the variables inside the loop will cause the compliler to do strange things like reading/writing from stack every iteration. Only a guess. Isnt neccesarily right. Have you tried decaring them outside the loop ?

    3. Have you tried any other compiler. TI and ARM has A8 compilers that support Neon intrinsics. I have heard the TI compiler is pretty good.

    Thanks and Regards
    Ranjith

    Comment by Ranjith Parakkal — January 11, 2010 @ 12:45 pm

  2. Hi Ranjith,

    The two pointers point to DDR memory. Source is 192kb and dest 64kb in size (256*256 pixels each). No internal has been used, and the memory blocks are much larger than the cache. Also as far as I know internal or tightly coupled memory does not exist on the ARM-side of the OMAP3530.

    I’ll try to move the variable declaration out of the loop this evening. It shouldn’t make a difference, but well… You’ll never know until you’ve tried. For trying the TI-compiler: That’s a good idea.

    Comment by admin — January 11, 2010 @ 7:28 pm

  3. Nils,

    Thanks a LOT for u reply. I will wait for you to try moving the declaration outside the loop and declare the results here .. ;)

    Just thinking out loud on how you have achieved 2 cycles per pixel.
    Since you are processing 8 cycles per pixel every iteration. On an average the following code should take 16 cycles.

    .loop
    vld3.8 {d0-d2}, [r1]! — ?? cycle

    # do the weight average:
    vmull.u8 q3, d0, d3 — one cycle
    vmlal.u8 q3, d1, d4 — one cycle
    vmlal.u8 q3, d2, d5 — one cycle

    # shift and store:
    vshrn.u16 d6, q3, #8 — one cycle
    vst1.8 {d6}, [r0]! — ?? cycle

    subs r2, r2, #1 — one cycle
    bne .loop — ?? cycle

    The vector arithmetic operations and the subs operation should account for 5 cycles out of the 16, assuming each of them are single cycle when they are pipelined. So that leaves about 9 cycles for the branch and the vector load and the store operations. I think this should sort of account for the cache misses. I dont know much about ARMs cache structure. Lemme try go read up about it.

    BTW there is an L3 memory on OMAP3, which is not closely coupled with A8, but may still give better performance than cached-DDR.

    Comment by Ranjith Parakkal — January 12, 2010 @ 10:18 am

  4. Hi,

    I need some help for using the Neon instructions of Cortex-A8 in my application.
    I am writing a image algorithm where I want to use NEON instructions of Cortex A8.
    I have tested the program using RVDS, where I have used init.s and init_cache.s taken from Examples of RVDS. We had also given one scatter file for placement of stack and heap while linking in RVDS.
    I want to run my program on Beagle Board running Linux.
    My questions are:
    1. Do we need to use init.s and init_cache.s in my program to run it on Beagle board?
    2. Do we need scatter file to run the program on Beagle board? If yes, how to give scatter file using gnu ld.
    3. When we removed ’1′ and ’2′, the linker gives error “uses VFP registers”.

    Please help.

    Mike

    Comment by Mike — January 12, 2010 @ 12:53 pm

  5. Hi Ranjith,

    It’s not that simple to count cycles in mixed ARM and NEON code. The NEON-Pipeline is very special. In a nutshell the NEON unit sits logically behind the ARM unit and lags 5 cycles behind execution. It also has it’s own instruction queue and only executes one instruction per cycle (in practice).

    In the code example the following will happen: The instruction decoder decodes two instructions per cycle. The ARM-pipeline can execute two instructions per cycle. NEON instructions are treated as a NOPs so to say. So all the NEON instructions flow through the ARM pipeline and do nothing. After the 5 cycles they get stuffed into the NEON instruction queue.

    The NEON-unit will now start to execute the queued instructions. It can only do one instruction per cycle (not exactly true, there are some pipeline possibilities, but I doubt my code uses any). Anyway, in practice the speed is just half of the ARM-unit.

    This has the following effect on timing: During the first two or three iterations of the conversion function the ARM unit generates NEON instructions much faster than the NEON unit can execute them. The queue gets filled up fast. Once the queue is filled, the NEON-unit will always able to execute while the ARM-unit is mostly waiting for a free queue entry. All other ARM instruction will execute in parallel with the NEON-unit, and the NEON-unit will never run out of work.

    So effectively all ARM instructions that are mixed between the NEON-code are free. The entire performance is dominated by just the NEON-timing.

    We end up with the six instructions executing within 16 cycles. I haven’t measured, but I guess that the the load-instruction accounts for at least 50% of the time (due to cache misses) and the data-processing and store does the other half.

    It would be fun to run the code on zero wait-state memory. Unfortunately I don’t know about any of such memory, even on the L3. Well – I could abuse the internal memory of the DSP but I don’t know how fast access from the ARM is.

    Comment by Nils — January 12, 2010 @ 4:14 pm

  6. Hi Mike.

    I’ve never worked with RVDS, but if you use init.s and init_cache.s it you’re compiling your program for use without any OS. E.g. bare-bone or boot-loader style.

    Since you want to execute the code on the BeagleBoard using Linux you need an executable in ELF-format.

    Check the manual how to generate such an output-file. I’m sure it’s documented somewhere.

    Comment by Nils — January 12, 2010 @ 4:20 pm

  7. Hi,

    Thanks for your reply.
    I build my application without init.s, init_cache.s & scatter file using -mfu=neon. It was build successfully.
    When I ran my application on beagleboard, it exited giving “Illegal Instruction”, which I suspect when we encounter first neon instruction.We are using beagleboard version B4 & Kernal 2.6.22.18.

    Could you tell me how to solve this issue?

    Please help.

    Mike

    Comment by Mike — January 13, 2010 @ 6:44 am

  8. Hi Mike.

    Have you tried debugging the executable with gdb to find out which instruction triggers the “Ilegal Instrucion”-fault? That could give some clues what is going wrong. Maybe there is still some code in your application that tries to directly access the hardware? Moves to the co-processor that configure cache and the like will trigger an illegal instruction fault.

    You may get better answers on the beaglboard mailing list. I suggest you try to ask there. I’ve never worked with RVDS, and I don’t have any problems compiling my code using GCC on the PC or the beagleboard.

    Link to the beagleboard communiy: http://groups.google.com/group/beagleboard

    I’m using kernel 2.6.29 by the way… 2.6.22 is rather old. How does it come that you’re still using such an old kernel?

    Comment by Nils — January 13, 2010 @ 5:49 pm

  9. Hi Nils,

    How did you get the assembler output generated? Did you use any specific compiler flag to generate assembler output.
    I am using Cross Compiler arm-none-linux-gnueabi-gcc to compile my programs and test them on OMAP Zoom 2 platform.

    I tried using -S option but it does not generate any assembler output file.

    Thanks and Best Regards,
    Venkat

    Comment by Venkat — January 21, 2010 @ 2:42 pm

  10. Hi Nils,

    I have one more question. How did you build an executable using hand coded assembler file? Could you please let me know the steps?

    Thanks and Best Regards,
    Venkat

    Comment by Venkat — January 21, 2010 @ 5:04 pm

  11. Hi Venkat,

    I generated the assembler output using the gnu disassembler.

       arm-none-linux-gnueabi-objdump -d yourfile.o 

    will do the trick!

    To compile a raw assembler file do the same what you do with your .c files. E.g.

        arm-none-linux-gnueabi-gcc -c yourfile.s 

    will generate a yourfile.o object file. I've uploaded the file onto my webspace, so you can use it as a boilerplate: rgb_to_gray.s

    Cheers,
    Nils

    Comment by Nils — January 21, 2010 @ 10:03 pm

  12. Nils (and everyone),

    I’m an open source advocate at TI for OMAP. A couple of our engineers pointed me to this article. We are working with ARM and CodeSourcery to improve the quality of ARM compilers. We are pushing everything directly to gcc, and we hope to have some real progress by the end of 2010.

    We’ll keep you in the loop on the progress.

    Cheers,

    Chris P

    Comment by Chris — March 8, 2010 @ 3:00 pm

  13. Hi Chris,

    That’s great news..

    I follow the gcc-dev mailing list for two years now and I developed a feeling how long/difficult some changes are and how gcc works on the inside. If you don’t mind I would like to contact you privately and give some additional insights on what goes wrong inside gcc when it comes to performant ARM code. There are a couple of low hanging fruits that would give instant performance improvements for Cortex-A8 code.

    Also – since you’ve contacted me via the blog: Don’t miss Mans blog on ARM-code: http://www.hardwarebug.org He is _very_ competent when it comes to ARM and NEON. I’m sure he would be glad to share some of his insights with you as well.

    Cheers,
    Nils

    Comment by Nils — March 8, 2010 @ 4:50 pm

  14. The inefficient code generated by the vldX where X>1 intrinsics is a known problem in gcc:

    http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43118

    Comment by Samuel — March 15, 2010 @ 1:13 pm

  15. Hi,

    I want to understand about how neon pipeline works. I have read cortex-a8 technical referance manual but still want to understand instruction level sceduling.

    Thanks
    Saurabh

    Comment by Saurabh — March 26, 2010 @ 6:21 am

  16. Hi, just a speculative post under an old blog entry in case you happen to know the answer to this… :-)

    I’ve been poring over various RealView documents about NEON, trying to understand the similarities and differences from Intel SSE-x. Am I right in saying that there’s no equivalent to SSE’s _mm_movemask_epi8 (which basically takes a SIMD vector and generates a standard integer bitmask based upon a predicate (in this case hardwired to be “top-bit set”) applied to each vector element)? There doesn’t seem to be but maybe I’m missing one hidden away.

    This is probably an instruction that’s not that important for multimedia generation, but I’m interested in image analysis where one often wants to see if a “thing” is in some “computationally defined” set (eg, is an RGB pixel within some box in RGB space) and use SIMD parallelisation. Without this kind of instruction, you can do the testing in parallel but then you’ve got to build a mask of results by manually extracting each vector element to a scalar and build the bitmap, so it seems an odd thing to leave out of an instruction set design.

    Cheers,
    Orthochronous

    Comment by Orthochronous — March 27, 2010 @ 6:01 pm

  17. Hi Orthochronous,

    An instruction like this seems to be missing. The best thing to simulate it looks roughly like this:

    : d1 = input
    : d2 = mask (128, 64, 32, 16, 8, 4, 2, 1)

    vand d1, d1, d2
    vpadd.i8 d1, d1, d1
    vpadd.i8 d1, d1, d1
    vpadd.i8 d1, d1, d1

    That fills each byte of d1 with the sum of all elements of d1. Since we’ve masked out the bits before addition no overflow can happen and we end up with a mask similar to movemask. The result will be stored in each byte of d1, but you can just extract the last byte later.

    The same trick can be extended to 128 bits as well.

    Comment by Nils — March 27, 2010 @ 10:22 pm

  18. Hi,

    Thanks for the insight. I’d never have thought of this approach, so it’s very useful.

    Comment by Orthochronous — March 29, 2010 @ 6:10 am

  19. I have some performence overhead while i add the Neon Intrinsics with my Array Operation Code

    The Code fragment shown below

    1. C Code Fragment

    kvalue = SHIFTR(*Residue++, 6) + *Predicted++;
    *Original++ = (byte) CLIPS(255, kvalue);
    kvalue = SHIFTR(*Residue++, 6) + *Predicted++;
    *Original++ = (byte) CLIPS(255, kvalue);
    kvalue = SHIFTR(*Residue++, 6) + *Predicted++;
    *Original++ = (byte) CLIPS(255, kvalue);
    kvalue = SHIFTR(*Residue++, 6) + *Predicted++;
    *Original++ = (byte) CLIPS(255, kvalue);

    2. Neon Specific Code

    /* Predicted Array is unsigned char type so a type cast activity done by assign it to Integer AryPred*/
    AryPred[0] = *Predicted++; AryPred[1] = *Predicted++; AryPred[2] = *Predicted++; AryPred[3] = *Predicted++;
    neonResidue = vld1q_s32(Residue);
    /* Code for SHIFTR */
    neonResidue = vaddq_s32(neonResidue, addconst);
    neonResidue = vshrq_n_s32(neonResidue, 6);

    neonPredict = vld1q_s32(AryPred);
    addedResult = vaddq_s32(neonResidue, neonPredict);
    *Original++ = (byte) CLIPS(255, vgetq_lane_s32(addedResult, 0));
    *Original++ = (byte) CLIPS(255, vgetq_lane_s32(addedResult, 1));
    *Original++ = (byte) CLIPS(255, vgetq_lane_s32(addedResult, 2));
    *Original++ = (byte) CLIPS(255, vgetq_lane_s32(addedResult, 3));
    Residue += 4;

    #define CLIPS(iF, iS)(iS > 0 ? (iS < iF ? iS : iF): (0 int
    Residue-> int
    Predicted->unsigned char
    Original->unsigned char

    The following data types are used to define the above array operations with Neon Operations

    AryPred -> int
    addedResult, neonPredict, neonResidue ->int32x4_t

    What is the reason for the overhead when i repeat the code 4 times? Is there any memory alignment issue or any extra pipeline stalls comes here ?
    Is there any performence gain while we change the data type of each array in to same (Neglect the Chance of Overflow)?

    Rgds
    eDave

    Comment by eDave — May 6, 2010 @ 12:33 pm

  20. Nice post,

    I’ve also made the same observations when using the compiler vs hand-optimized neon assembler. C functions that contain inline asm often produce very unpredictable and unwanted results when ‘optimized’ with both -O2 and -O3. Even -Os still suffers slightly in performance compared to a hand-optimized assembly routine. In short… we’ll get there, just not yet :P

    Comment by Christopher Friedt — June 6, 2010 @ 10:51 am

  21. Nils,
    would you please tell me how to measure the timing of the code in a Linux environment?
    it seems that you execute the code on the BeagleBoard using Linux through an executable in ELF-format.
    How to make sure the timeing is not occupied by other tasks in Linux?

    Thanks,
    Shawn

    Comment by shawn — October 20, 2010 @ 10:13 am

  22. Hello,
    how did you measure or estimate the number of cycles each of the variant takes without looking at the assembly output?

    Comment by jani — December 21, 2010 @ 12:10 pm

  23. HI,
    iam very much new to cortex a9 and neon codging,
    i have written neon code for a C fucntion and exectute on beagle board performance of C version is better than neon code.

    what could be the problem?
    please help me in this regard
    my compiler options are
    -O3 -march=armv7-a -ftree-vectorize -mvectorize-with-neon-quad -fprefetch-loop-arrays -mfloat-abi=softfp -mfpu=neon -funroll-loops

    Thank you

    Comment by srini — March 30, 2011 @ 2:11 pm

  24. Just by removing pipeline bubbles, your code can run 2 times faster.

    http://pulsar.webshaker.net/ccc/result.php?lng=fr&sample=3

    Comment by Etienne SOBOLE — May 14, 2011 @ 7:37 pm

  25. hi, mike
    I can’t enable NEON in vs2008 with Compact 7, do you know how to do this?
    Also, I can’t compile your sample NEON code successfully in vs2008 with Compact 7,
    Do you know how to enable NEON function?

    Comment by Michael — May 25, 2011 @ 11:41 am

  26. Hi,
    I’m new to assembly programming and I tried to understand by converting each intrinsic into ARM assembly. I cannot find any corresponding assembly instructions for “src += 8*3; dest += 8;” in rgb_to_gray.s (although the assembly file works as expected when executed)

    Can someone kindly throw some light on it.

    Comment by srinivas — June 4, 2011 @ 9:23 pm

  27. The vld and vst instructions increment the pointers as a side-effect. That’s what the ‘!’ char does in the instruction.

    Comment by Nils — June 5, 2011 @ 11:19 am

  28. Thanks Nils :)

    Comment by srinivas — June 6, 2011 @ 1:32 pm

  29. Hi everybody,

    I am new in the field of playing with assembly language and thus challenging the optimization of professional Neon Compilers. Here are my questions?
    1)Suppose if I have a C Code (e.g. fft.c), then how can I view its Neon optimized assembly code using Real View Development System? A command line syntax can also work, but for windows only.
    2)Is there any general guideline for converting C codes into their corresponding Neon assembly counterparts?

    Pardon me for my unprofessional way of asking questions, but it would be very gracious of all of you people if anyone can answer my queries ASAP.

    Thank You

    Comment by Ankit — July 1, 2011 @ 11:50 am

  30. Hi all!

    I have a question on the timings discussed here:
    I took all implementations from here and from “pulsar” (incl. the “16 pixels at a time” version)
    and compared the runtime on a 600Mhz ARM with NEON and 256k L2 cache.
    If I assured, none of the image data has been used before (e.g. is not cached), I indeed got the same order of execution times
    for the different implementations, however the gain was significantly lower than the values discussed here.
    It was something about the “pulsar 16 pixel”-implementation (compare comment 24 on this page) being 15% faster than the “fastest” implementation on this page, for instance.
    (whereas Etienne claims it to be twice as fast)

    Using a small image size and repeating the calculation (=L2 cache should work perfectly),
    the differences in the execution time compare better to the expectation mentioned here.

    Am I doing anything wrong?

    Comment by Tom — August 26, 2011 @ 5:44 pm

  31. As said earlier, the bug was filed in gcc bugzilla.
    I tested 4.6.1, the generate assembly was much better but still suboptimal.
    latest 4.6 snapshot: same result.

    but latest 4.7 snapshot generates optimal code.

    tested with android on a samsung galaxy s2: the c neon code is 4 times better than the C reference

    Comment by sophana — October 5, 2011 @ 11:04 am

  32. [...] An example using NEON for optimization at Hilbert-Space.de, [...]

    Pingback by Mac OSX and Assembly Programming | Alauda Projects — April 30, 2012 @ 11:21 am

  33. [...] ARM NEON Optimization. An Example [...]

    Pingback by Maximizing performance grayscale color conversion using NEON and cv::parallel_for « Computer Vision Talks — November 6, 2012 @ 9:04 pm

  34. Hi Nils,

    It’s great reading your blogs. I’ve got a typical problem here. I thought you would be able to answer it better, I’d be more than happy even if you could have a look at this and give me some pointers. Here’s my problem on SO: http://stackoverflow.com/questions/14869693/neon-vld-consuming-more-cycles-than-what-is-expected

    Regards,
    nguns

    Comment by nguns — February 19, 2013 @ 9:05 am

  35. When you make a comparison between board shoes and common shoes, the former one’s flat soles are different from the latter one. It is easy for feet to be able to completely stick to the flat bottom of the skateboard. It can absorb the shock as well. They very possess the garments, no unified uniform, the coloration of the clothing also is distinct, fights, typically in [url=http://www.jeremyscottwingsshop.us/]Adidas Jeremy Scott Wings[/url] opposition to it, frequently fire oneself individual, adverse operations. The fifteen th century, Western Europe there the mercenaries, just appear the uniform. Louis xiv time time period, the French to formally proven of unified uniform, and distinct uniform color varied: the prefects wear white [url=http://www.jeremyscottsale.us/]Jeremy Scott Shoes[/url] uniform, cavalry put on crimson uniforms, infantry putting on grey uniforms. Selling tickets to the [url=http://www.jeremyscottonline.us/]Jeremy Scott Leopard[/url] game is the easiest way to generate revenue for the team. Franchises in most major leagues charge upwards of $200 for a ticket depending on the sport. Even the cheap seats go for $50 in most stadiums. 2. Limits on occupancy. Your agreement should [url=http://www.jswingsonlineshop.us/]Jeremy Scott Wings[/url] clearly specify that the rental unit is the residence of only the tenants who have signed the lease and their minor children. You can find a number of large hammocks that are capable of accommodating more than one person. Many people even list these hammocks on their wish lists for birthdays, anniversaries and wedding presents. Nothing is more relaxing and romantic than spending a special moment in the summer sun with someone you love. Cambodge. Cameroun. Cap-vert. Later in 1971, Philip Knightrealized the importance of design ideas and for this he approachedDavidson who created the logo which is globally known as Swoosh. Thiswas first used in the running shoes at the US Track Field OlympicTrials (Oregon – Eugene). Nike was first introduced as afootball shoe in 1971. The big announcement Tuesday was that this will be a balanced budget – a forecast that in the past would have been a guaranteed vote-getter. No more. Given recent history, with wild, 10-figure swings between what was promised and what was delivered, we have more confidence in Lance Armstrong’s autobiographies than we do in any black-ink projections.

    Comment by joefel parroco — April 10, 2013 @ 5:57 am

  36. It’s awesome designed for me to have a site, which is beneficial designed for my knowledge. thanks admin

    Comment by Phoebe — April 20, 2013 @ 6:34 pm

  37. Yesterday, while I was at work, my sister stole my iPad and tested to see if it can survive
    a forty foot drop, just so she can be a youtube sensation.

    My apple ipad is now destroyed and she has 83 views.

    I know this is completely off topic but I had to share it with someone!

    Comment by best seo consultants — April 23, 2013 @ 10:14 am

  38. Fantastic goods from you, man. I have understand your stuff previous to and you are just extremely fantastic.

    Comment by Jones sabo inhale created by hearth — May 3, 2013 @ 2:25 pm

  39. Some top artists use crucial music to achieve a global audience.
    Family can also aid in themselves to all-you-can eat pizza,
    salad, and dessert.

    Comment by www.edukacja-sms.pl — September 26, 2013 @ 7:34 pm

  40. Oops-proof reading is clearly a very useful skill!! My own typo(as they say) please read amazing skill for dNA! Aren’t iPhones …mmmm is predictive text another form of super alien takeover?? All very best wishes, Sadie

    Comment by doudoune moncler pas cher — September 29, 2013 @ 9:06 pm

  41. [...] is a complete example program, build on the code from this post: http://pastebin.com/YFg3ssST. It uses a few of the NEON intrinsics so it is a good test for our [...]

    Pingback by From zero to cross-compiling for pandaboard | Stan Manilov — October 3, 2013 @ 3:18 pm

  42. Heey very nice blog!

    Comment by Janine — October 17, 2013 @ 9:35 pm

  43. казино автовыплаты еще казино со стартовым капиталом [url=http://25.kazino-nirvana.ru]Покер Автоматы Онлайн[/url] видеопокер играть без регистрации гонки!

    Comment by blackjack20 — November 26, 2013 @ 6:43 am

  44. Hi

    This post is really helpful. I am a newbie for Neon intrinsics, and right now I have to optimize my code such as the following pseudo-code:

    for(;;)
    if(a > b + c) {
    a = b + c;
    } else if(a < b – c) {
    a = b – c;
    }

    How can I transform it to Neon intrinsics? It seems that we can not do 8 operator parallel operation in such case. Isn't it?

    Comment by Wu Xiaoyong — December 9, 2013 @ 10:06 am

  45. I really like reading a post that can make men and women think.
    Also, thanks for allowing for me to comment!

    Comment by lose weight fast in 1 week — January 7, 2014 @ 12:23 am

  46. I’m not that much of a internet reader to be honest but your sites
    really nice, keep it up! I’ll go ahead and bookmark your site to come back later on.
    Many thanks

    Comment by home away — January 7, 2014 @ 2:23 am

  47. Thanks for finally talking about > ARM NEON Optimization.
    An Example

    Comment by Olen — January 7, 2014 @ 2:29 am

  48. It’s genuinely very complicated in this active life to listen news on TV, so I just use the web for that purpose, and obtain the newest news.

    Comment by barbier — January 11, 2014 @ 2:56 am

  49. Thanks for your post. I want to try your code, but there are some compiling errors. Could you share your definitions for uint8x8_t, uint8x8x3_t etc. I have included NEON switch in compiler.

    Comment by Robert — January 17, 2014 @ 6:24 pm

  50. I write ɑ leave a response when I likе a post on a webѕite or I have something to
    add to the converѕatіon. Usսally it’s a result of
    the fire displayed in the post I broաsed. And on this article ARM NEON Օptimіzation.
    An Exаmple

    Comment by Annmarie — February 16, 2014 @ 6:50 pm

  51. Thank you a lot for sharing this with all of us you really realize
    what you’re talking approximately! Bookmarked.

    Please also consult with my web site =). We will have
    a link exchange arrangement between us

    Comment by gardening blogs — March 3, 2014 @ 5:34 am

  52. You need to take part in a contest for one off
    the finest websites on the web. I most certainly will highly recommend this web site!

    Comment by Maya — March 16, 2014 @ 3:10 am

  53. I really like what you guys are usually up too.
    This kind of clever work and exposure! Keep up the terrific
    works guys I’ve included you guys to our blogroll.

    Comment by best free article writing software — March 23, 2014 @ 2:48 pm

  54. Everything is very open with a precise description of the challenges.

    It was definitely informative. Your website is very useful.

    Many thanks for sharing!

    Comment by użyteczna treść — March 24, 2014 @ 11:10 am

  55. I am really delighted to read this weblog posts which carries lots of useful data, thanks for providing these kinds of information.

    Comment by źródło artykułu — March 24, 2014 @ 11:11 am

  56. It’s actually a great and helpful piece of information. I’m happy that
    you shared this helpful information with us. Please stay us informed like this.
    Thank you for sharing.

    Comment by Shoes like doc martens — April 23, 2014 @ 10:52 am

  57. 1000s of Show Gambler, since the Lastly associated with Sept .. daeefakcacea

    Comment by Johnk75 — April 27, 2014 @ 3:10 pm

  58. Hello, I wwant tto subscribe ffor thi webseite to
    take mst ecent updates, so where can i do it please help out.

    Review my web page: твой интернет спб личный
    кабинет; Armand,

    Comment by Armand — April 29, 2014 @ 9:37 am

  59. These are in fact enormous ideas in on the topic of blogging.
    You have touched some fastidious factors here.
    Any way keep up wrinting.

    Comment by Rufus — May 29, 2014 @ 5:40 pm

  60. xboter 2014

    Comment by Olivia Arizaga — June 13, 2014 @ 2:00 pm

  61. This content is very informative but it took me a long time to find
    it in google. I found it on 13 spot, you should focus on quality backlinks building, it
    will help you to rank to google top 10. And i know how to help you, just type in google – k2
    seo tips and tricks

    Comment by Juliane — July 4, 2014 @ 12:33 am

  62. Do you have a spam problem on this site; I
    also am a blogger, and I was curious about your situation; we have developed some nice methods and we are
    looking to trade methods with other folks, be sure to shoot me an email if interested.

    Comment by perfumes oils — July 7, 2014 @ 7:20 am

  63. A fascinating discussion is worth comment. I believe that you need to publish more on this
    issue, it may not be a taboo matter but typically folks don’t speak about these subjects.
    To the next! Kind regards!!

    Comment by CV — August 1, 2014 @ 6:36 pm

  64. Fastidious replies in return of this issue with solid arguments and telling
    everything on the topic of that.

    My blog post: excellent bond cleaners

    Comment by excellent bond cleaners — August 3, 2014 @ 6:55 am

  65. Hi

    Just like to say that I tried compiling the example with clang-3.0, and the generated assembly looks optimal:

    neon_convert:
    cmp r2, #8
    bxlt lr
    asr r3, r2, #31
    vmov.i8 d16, #0×97
    vmov.i8 d17, #0x4D
    add r2, r2, r3, lsr #29
    vmov.i8 d18, #0x1C
    mov r3, #0
    asr r2, r2, #3
    .LBB0_1:
    vld3.8 {d20, d21, d22}, [r1]!
    add r3, r3, #1
    cmp r3, r2
    vmull.u8 q12, d21, d16
    vmlal.u8 q12, d20, d17
    vmlal.u8 q12, d22, d18
    vshrn.i16 d19, q12, #8
    vst1.8 {d19}, [r0]!
    blt .LBB0_1
    bx lr

    Comment by sophana — August 12, 2014 @ 4:01 pm

  66. Die Kletterhalle setzt mit ihrer Architektur und ihrer überdurchschnittlichen Ausstattung neue
    Maßstäbe im Freizeitbereich „Klettern“ und gehört somit wohl zweifellos zu den schönsten Kletterhallen in Deutschland.

    Popcorn in Tüten – ein erfolgreiches Ges
    Die Firma G-Corn beschäftigt sich seit geraumer Zeit
    mit dem Verkauf von Popcornspezialitä. Die Geburtstagsfeier findet auch bei Regen als “Dreckingtour” statt.

    Comment by chefkoch.de — August 22, 2014 @ 2:56 pm

  67. Dr. Vincent Malfitano…

    ARM NEON Optimization. An Example « hilbert-space…

    Trackback by Dr. Vincent Malfitano — August 24, 2014 @ 4:32 am

  68. Hello Web Admin, I noticed that your On-Page SEO is is missing a few factors, for one you do not use all three H tags in your post, also I notice that you are not using bold or italics properly in your SEO optimization. On-Page SEO means more now than ever since the new Google update: Panda. No longer are backlinks and simply pinging or sending out a RSS feed the key to getting Google PageRank or Alexa Rankings, You now NEED On-Page SEO. So what is good On-Page SEO?First your keyword must appear in the title.Then it must appear in the URL.You have to optimize your keyword and make sure that it has a nice keyword density of 3-5% in your article with relevant LSI (Latent Semantic Indexing). Then you should spread all H1,H2,H3 tags in your article.Your Keyword should appear in your first paragraph and in the last sentence of the page. You should have relevant usage of Bold and italics of your keyword.There should be one internal link to a page on your blog and you should have one image with an alt tag that has your keyword….wait there’s even more Now what if i told you there was a simple WordPress plugin that does all the On-Page SEO, and automatically for you? That’s right AUTOMATICALLY, just watch this 4minute video for more information at. Seo Plugin

    Comment by seo plugin — August 28, 2014 @ 5:31 pm

  69. best DUI Attorney Denver…

    ARM NEON Optimization. An Example « hilbert-space…

    Trackback by best DUI Attorney Denver — August 29, 2014 @ 6:45 pm

  70. bestdivorce lawyer West Bend…

    ARM NEON Optimization. An Example « hilbert-space…

    Trackback by bestdivorce lawyer West Bend — August 29, 2014 @ 8:31 pm

  71. The problem with buying on ebay is that you may end up with a fake pair. They keep pace with the retro fashion at present, and look very funny, or they can achieve the effect of changing the wood.No matter what the weather, festival-goers need to always pack a few essentials – including macs and wellies, as well as suncream and sunglasses.
    Ugg Espana

    Comment by Ugg Espana — September 8, 2014 @ 12:31 am

  72. best flower delivered…

    ARM NEON Optimization. An Example « hilbert-space…

    Trackback by best flower delivered — September 11, 2014 @ 3:58 pm

  73. flower delivery…

    ARM NEON Optimization. An Example « hilbert-space…

    Trackback by flower delivery — September 15, 2014 @ 6:33 pm

  74. What’s Happening i am new to this, I stumbled upon this I have discovered
    It absolutely helpful and it has aided me out
    loads. I’m hoping to contribute & help different customers like its aided me.
    Great job.

    Comment by best Pennsylvania roofers — September 16, 2014 @ 11:37 am

  75. Reachout Wireless free phone…

    ARM NEON Optimization. An Example « hilbert-space…

    Trackback by Reachout Wireless free phone — September 24, 2014 @ 4:16 pm

  76. I am in fact delighted to glance at this webpage posts which carries lots of helpful facts, thanks
    for providing these information.

    Look into my web site … search engine

    Comment by search engine — September 25, 2014 @ 2:11 am

  77. Furthermore, when you make any changes to them in Outlook, these changes will certainly also be synched
    back again to the server.

    Comment by hotmail 365 account — September 25, 2014 @ 4:44 am

  78. learn more…

    ARM NEON Optimization. An Example « hilbert-space…

    Trackback by learn more — September 25, 2014 @ 10:42 am

  79. paraskevas sympathicoblast teststring [url=http://www.dmsanhaenger.ch/search.cfm?tag=dior-sunglasses-women]dior sunglasses women[/url] dior sunglasses women
    confiscating keypressed astram [url=http://www.graber-schluessel.ch/tools/frame.cfm?tag=hermes-tote]hermes tote[/url] hermes tote
    iguanacon gracy yarmulka siegenian larissae boozily trojane [url=http://www.surfprodesigns.com/routines/inc-frame.asp?key=hermes-kelly-35cm]hermes kelly 35cm[/url] hermes kelly 35cm
    manjeera lundren hydrometamorphism hydrolizes neotropical informationssyteme [url=http://www.hairylemondrink.com/inc_search.cfm?key=classic-chanel-sunglasses]classic chanel sunglasses[/url] classic chanel sunglasses

    Comment by Attenealbut — September 25, 2014 @ 4:36 pm

  80. Sie da !
    Kaufen Dessous, Gönnen Sie Ihrem Partner !!!
    [url=http://funsusoback.ddns.net/eroticheskoe-bele-top-s-podvyazkami.html]Эротическое белье топ с подвязками[/url] [url=http://dingtonidust.ddns.net/kupit-muzhskoe-seksualnoe-nizhnee-bele.html]Купить мужское сексуальное нижнее белье[/url] [url=http://thiabobito.ddns.net/erotichskaya-bele-soblaznitelnaya.html]Эротичская белье соблазнительная[/url] [url=http://thiabobito.ddns.net/eroticheskoe-zhenskoe-bele-gde-kupit-v-almaty.html]Эротическое женское белье где купить в алматы[/url] [url=http://funsusoback.ddns.net/erotichnoe-bele-dlya-polnyh.html]Эротичное белье для полных[/url] [url=http://turnpollcoca.ddns.net/eroticheskoe-bele-neteya.html]Эротическое белье нэтея[/url] [url=http://funsusoback.ddns.net/eroticheskoe-bele-nedorgo.html]Эротическое белье недорго[/url] [url=http://disclasfiguar.ddns.net/zhenskoe-razvratnoe-bele.html]Женское развратное белье[/url] [url=http://dingtonidust.ddns.net/eroticheskoe-bele-bolshie-razmery-vo-vladivostoke.html]Эротическое белье большие размеры во владивостоке[/url] [url=http://funsusoback.ddns.net/seksualnoe-bele-kupit-opt-ceny.html]Сексуальное белье купить опт цены[/url] [url=http://disclasfiguar.ddns.net/erotichnoe-nizhnee-bele-internet-magazin.html]Эротичное нижнее белье интернет магазин[/url] [url=http://funsusoback.ddns.net/bele-dlya-striptiza-kiev.html]Белье для стриптиза киев[/url] [url=http://turnpollcoca.ddns.net/amp-039-eroticheskoe-bele-kiev.html]Amp 039 эротическое белье киев[/url] [url=http://dingtonidust.ddns.net/magaziny-erot-bele.html]Магазины эрот белье[/url] [url=http://disclasfiguar.ddns.net/kupit-zhenskoe-seksualnoe-bele-v-minske.html]Купить женское сексуальное белье в минске[/url] [url=http://funsusoback.ddns.net/eroticheskoe-sedobnoe-bele-kupit.html]Эротическое съедобное белье купить[/url] [url=http://dingtonidust.ddns.net/opt-eroticheskoe-nizhnee-bele.html]Опт эротическое нижнее белье[/url] [url=http://thiabobito.ddns.net/erotichnoe-bele-ochen-bolshih-razmerov-4xxl-5xxl.html]Эротичное белье очень больших размеров 4xxl 5xxl[/url] [url=http://funsusoback.ddns.net/eroticheskoe-bele-podnimaet-grud.html]Эротическое белье поднимает грудь[/url] [url=http://disclasfiguar.ddns.net/ya-odela-eroticheskoe-bele.html]Я одела эротическое белье[/url] [url=http://turnpollcoca.ddns.net/eroticheskoe-bele-60grn-v-ukraine.html]Эротическое белье 60грн в украине[/url] [url=http://funsusoback.ddns.net/beleiz-prozrachnogo-chernogo-shifona.html]Бельеиз прозрачного черного шифона[/url] [url=http://funsusoback.ddns.net/erotichnoe-nizhnee-bele-vyazanoe.html]Эротичное нижнее белье вязаное[/url] [url=http://turnpollcoca.ddns.net/bele-s-seksshopa.html]Белье с сексшопа[/url] [url=http://funsusoback.ddns.net/kupit-prikolnoe-eroticheskoe-muzhskoe-bele.html]Купить прикольное эротическое мужское белье[/url] [url=http://disclasfiguar.ddns.net/zakazat-eroticheskoe-bele-na-polnyh-devushek.html]Заказать эротическое белье на полных девушек[/url] [url=http://funsusoback.ddns.net/samoe-seksualnoe-i-erotichnoe-bele.html]Самое сексуальное и эротичное белье[/url] [url=http://gaftiranklo.ddns.net/podarok-lyubimoy-na-8-marta-seksualnoe-bele.html]Подарок любимой на 8 марта сексуальное белье[/url] [url=http://dingtonidust.ddns.net/zakazat-erotich-nizhnee-bele.html]Заказать эротич нижнее белье[/url] [url=http://gaftiranklo.ddns.net/seksualnoe-bele-i-obuv.html]Сексуальное белье и обувь[/url] [url=http://disclasfiguar.ddns.net/nizhnee-bele-eroticheskie-trusiki.html]Нижнее белье эротические трусики[/url] [url=http://dingtonidust.ddns.net/eroticheskoe-bele-lyubyh-razmerov.html]Эротическое белье любых размеров[/url] [url=http://gaftiranklo.ddns.net/eroticheskoe-bele-razmery-ot-50.html]Эротическое белье размеры от 50[/url] [url=http://turnpollcoca.ddns.net/nedorogoe-eroticheskoe-zhenskoe-bele.html]Недорогое эротическое женское белье[/url] [url=http://gaftiranklo.ddns.net/zhena-v-eroticheskom-bele.html]Жена в эротическом белье[/url] [url=http://dingtonidust.ddns.net/kakoe-bele-odet-pod-prozrachnoe-svadebnoe-plate.html]Какое белье одеть под прозрачное свадебное платье[/url] [url=http://funsusoback.ddns.net/kak-pradovat-eroticheskoe-bele.html]Как прадовать эротическое белье[/url] [url=http://dingtonidust.ddns.net/eroticheskoe-bele-s-otkrytoy-grudyu-sorochka.html]Эротическое белье с открытой грудью сорочка[/url] [url=http://disclasfiguar.ddns.net/orion-eroticheskoe-bele-kiev-kupit.html]Orion эротическое белье киев купить[/url] [url=http://thiabobito.ddns.net/gde-kupit-erotichnoe-bele-do-3000-rub.html]Где купить эротичное белье до 3000 руб[/url] [url=http://thiabobito.ddns.net/eroticheskoe-bele-po-islamu.html]Эротическое белье по исламу[/url] [url=http://gaftiranklo.ddns.net/eroticheskoe-nizhnee-bele-i-trikotazh-dlya-pyshnyh.html]Эротическое нижнее белье и трикотаж для пышных дам оптом от поставщика[/url] [url=http://dingtonidust.ddns.net/zhenskoe-seksualnoe-bele-v-sankt-peterburge.html]Женское сексуальное белье в санкт петербурге[/url] [url=http://funsusoback.ddns.net/seksualnaya-v-prozrachnom-bele.html]Сексуальная в прозрачном белье[/url] [url=http://gaftiranklo.ddns.net/eroticheskoe-bele-kostyum.html]Эротическое белье костюм[/url] [url=http://gaftiranklo.ddns.net/seksualnoe-nizhnee-bele-zhenschin.html]Сексуальное нижнее белье женщин[/url] [url=http://thiabobito.ddns.net/eroticheskoe-bele-studentka-shkolnica.html]Эротическое белье студентка школьница[/url] [url=http://dingtonidust.ddns.net/zhenskoe-eroticheskoe-bele-dlya-polnyh.html]Женское эротическое белье для полных[/url] [url=http://dingtonidust.ddns.net/bele-bolshih-razmerov-erotichnoe.html]Белье больших размеров эротичное[/url] [url=http://funsusoback.ddns.net/eroticheskoe-bele-magaziny-v-ekaterinburge.html]Эротическое белье магазины в екатеринбурге[/url] [url=http://funsusoback.ddns.net/eroticheskoe-bele-po-znaku-zodiaka.html]Эротическое белье по знаку зодиака[/url] [url=http://gaftiranklo.ddns.net/eroticheskoe-zhenskoe-bele-na-zakaz.html]Эротическое женское белье на заказ[/url] [url=http://disclasfiguar.ddns.net/polutors-zhenskiy-prozrachnyy-pod-nizhnee-bele.html]Полуторс женский прозрачный под нижнее белье[/url] [url=http://turnpollcoca.ddns.net/eroticheskoe-bele-lilovyy.html]Эротическое белье лиловый[/url] [url=http://funsusoback.ddns.net/seksualnoe-bele-caprice.html]Сексуальное белье caprice[/url] [url=http://funsusoback.ddns.net/eroticheskoe-nizhnee-bele-na-zakaz.html]Эротическое нижнее белье на заказ[/url] [url=http://thiabobito.ddns.net/eroticheskoe-bele-lormar.html]Эротическое белье lormar[/url] [url=http://dingtonidust.ddns.net/krasnoe-super-eroticheskoe-bele.html]Красное супер эротическое белье[/url] [url=http://thiabobito.ddns.net/eroticheskoe-bele-kombinezon-setka.html]Эротическое белье комбинезон сетка[/url] [url=http://turnpollcoca.ddns.net/eroticheskoe-nizhnee-bele-setki.html]Эротическое нижнее белье сетки[/url] [url=http://gaftiranklo.ddns.net/milavica-seksualnoe-bele.html]Милавица сексуальное белье[/url] [url=http://dingtonidust.ddns.net/nizhnee-bele-zhenskoe-damskoe-bele-seksualnoe-eroticheskoe.html]Нижнее белье женское дамское белье сексуальное эротическое белье[/url] [url=http://dingtonidust.ddns.net/eroticheskoe-nizhnee-bele-gospozhi.html]Эротическое нижнее белье госпожи[/url] [url=http://gaftiranklo.ddns.net/bele-dlya-strip-plastiki.html]Белье для стрип пластики[/url] [url=http://thiabobito.ddns.net/novogodnee-eroticheskoe-bele-k-godu-tigra.html]Новогоднее эротическое белье к году тигра[/url] [url=http://gaftiranklo.ddns.net/seksualnoe-vyazanoe-nizhnee-bele.html]Сексуальное вязаное нижнее белье[/url] [url=http://disclasfiguar.ddns.net/korset-kupit-eroticheskoe-bele.html]Корсет купить эротическое белье[/url]

    Comment by JaniceTogy — September 25, 2014 @ 4:41 pm

  81. best florists online…

    ARM NEON Optimization. An Example « hilbert-space…

    Trackback by best florists online — September 26, 2014 @ 2:31 pm

  82. Fastidious answer back in return of this query with firm
    arguments and explaining the whole thing about that.

    Comment by learn more — September 30, 2014 @ 12:33 pm

  83. I have read so many posts regarding the blogger
    lovers however this piece of writing is genuinely a nice
    article, keep it up.

    Comment by seoul subway — October 4, 2014 @ 11:18 pm

  84. garcinia premium Reviews…

    ARM NEON Optimization. An Example « hilbert-space…

    Trackback by garcinia premium Reviews — October 7, 2014 @ 7:47 am

  85. Piece of writing writing is also a fun, if you
    know afterward you can write otherwise it is complicated to write.

    Comment by http://youtu.be — October 7, 2014 @ 1:58 pm

  86. Hi there to all, how is the whole thing, I think every one is getting more from this web site, and your views are fastidious designed for new visitors.

    Comment by Alma — October 8, 2014 @ 10:09 am

  87. Garcinia Ultramax Reviews…

    ARM NEON Optimization. An Example « hilbert-space…

    Trackback by Garcinia Ultramax Reviews — October 8, 2014 @ 6:02 pm

  88. I pay a quick visit daily a few sites and sites to read
    posts, however this web site gives quality based
    content.

    Comment by Amazon.com — October 9, 2014 @ 5:54 am

  89. Every weekend i used to visit this website, because i want enjoyment, since thhis this web site conations
    actually good funny stuff too.

    Comment by amazing selling machine review — October 9, 2014 @ 8:17 am

  90. Sweet blog! I found it while surfing around on Yahoo News.
    Do you have any suggestions on how tto get listed in Yahoo
    News? I’ve been trying for a while but I never
    seem to get there! Thank you

    Comment by Boys Trainer Shoes — October 9, 2014 @ 9:35 am

  91. click for more info…

    ARM NEON Optimization. An Example « hilbert-space…

    Trackback by click for more info — October 13, 2014 @ 6:34 am

  92. It’s in fact very complex in this full of activity life to listen news on Television, thus I just use web for that reason, and obtain the latest
    information.

    Comment by Justine — October 16, 2014 @ 11:16 pm

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress