Messages from 156100

Article: 156100
Subject: Re: legacy Xilinx software
From: Jon Elson <jmelson@wustl.edu>
Date: Fri, 22 Nov 2013 16:57:07 -0600
Links: << >> << T >> << A >>

matt.lettau@gmail.com wrote:

> 
> I have no issues running the command line 10.1 tools on Win7pro 64 bit.
Webpack won't let you install 10.1 on a 64-bit OS (I think either Win or
Linux) due to US export restrictions at the time.  I was able to simply
copy the whole file tree over and synthesis, etc. worked, but the Ise
simulator didn't.  Probably related to C compiler version.

> I've no use for any of the GUI components, so I can't vouch for that. I
> don't install though. I keep a zip file that is the installation folder of
> the 10.1 tools from long ago.
OK, so then you did about the same.

Ise Webpack 13.4 does install fine on 64-bit Linux, and works except
for the download cable driver.  I have a workaround with xc3sprog, but
think I know how to make Impact work, eventually.  I still use their
GUI, but admit it could be better.  But, for the relatively simple
projects I do, it seems to work OK.

Jon

Article: 156101
Subject: Re: microZed adventures
From: Tim Wescott <tim@seemywebsite.really>
Date: Fri, 22 Nov 2013 17:43:23 -0600
Links: << >> << T >> << A >>

On Fri, 22 Nov 2013 11:16:48 -0800, John Larkin wrote:

> On Fri, 22 Nov 2013 12:20:21 -0600, Tim Wescott
> <tim@seemywebsite.really> wrote:
> 
>>On Fri, 22 Nov 2013 08:57:12 -0800, John Larkin wrote:
>>
>>> We're into this signal processing project, using a microZed/ZYNQ thing
>>> as the compute engine.
>>> 
>>> After a week or so of work by an FPGA guy and a programmer, we can now
>>> actually read and write an FPGA register from a C program, and wiggle
>>> a bit on a connector pin. Amazingly, the uZed eval kit does not
>>> include a demo of this, and the default boot image does not configure
>>> the FPGA!
>>> 
>>> We're using their build tools to embed the FPGA config into the boot
>>> image. We'd really like to be able to have a C program read a
>>> bitstream file and reconfigure the FPGA, but we haven't been able to
>>> figure that out.
>>> 
>>> If we run a C program that wiggles a pin as fast as it can, we can do
>>> a write to the FPGA register about every 170 ns. Without any attempts
>>> at optimization (like dedicating the second ARM core to the loop) we
>>> see stutters (OS stealing our CPU) that last tens or hundreds of
>>> microseconds, occasionally a full millisecond. That might get worse if
>>> we run TCP/IP sessions or host web pages or something, so dedicating
>>> the second ARM to realtime stuff would be good.
>>
>>There's not nearly enough information there, but if you're serious about
>>real time you don't just throw a bag of unknown software at something
>>and expect it to work.  Operating systems don't steal CPU time --
>>programmers steal CPU time, sometimes by choosing the wrong OS.
> 
> It's looking that the Linux that comes with the uZed will work fine for
> the current application. Is Linux "a bag of unknown software"?
> 
> We need to do TCP/IP stuff, and manage waveform data files, run BIST,
> things like that, and using Linux sure makes that part easy. A lot of
> the functionality could be in ARM code or might be in the FPGA, but I'm
> moving as much as possible into the ARM. C is a lot easier to code and
> compile than VHDL.

From the perspective of someone who needs an RTOS, Linux is a bag of very 
dubious software, indeed.

The fact that you were trying to work around Linux certainly explains the 
large and variable delays.

I agree that Linux makes the "complicated" stuff easier.  If you really 
have a need for real-time, and you don't really need that other processor 
for your "big box" stuff, then put an RTOS on it and run your timing-
critical stuff there.

Or, research the state of real time Linux -- it kind of dropped off my 
radar screen about five years ago, so either it's a done deal and no 
one's excited about it any more, or it never quite worked right and 
everyone who was excited about it is now too embarrassed to speak up.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Article: 156102
Subject: Re: microZed adventures
From: John Larkin <jlarkin@highlandtechnology.com>
Date: Fri, 22 Nov 2013 16:30:40 -0800
Links: << >> << T >> << A >>

On Fri, 22 Nov 2013 17:43:23 -0600, Tim Wescott
<tim@seemywebsite.really> wrote:

>On Fri, 22 Nov 2013 11:16:48 -0800, John Larkin wrote:
>
>> On Fri, 22 Nov 2013 12:20:21 -0600, Tim Wescott
>> <tim@seemywebsite.really> wrote:
>> 
>>>On Fri, 22 Nov 2013 08:57:12 -0800, John Larkin wrote:
>>>
>>>> We're into this signal processing project, using a microZed/ZYNQ thing
>>>> as the compute engine.
>>>> 
>>>> After a week or so of work by an FPGA guy and a programmer, we can now
>>>> actually read and write an FPGA register from a C program, and wiggle
>>>> a bit on a connector pin. Amazingly, the uZed eval kit does not
>>>> include a demo of this, and the default boot image does not configure
>>>> the FPGA!
>>>> 
>>>> We're using their build tools to embed the FPGA config into the boot
>>>> image. We'd really like to be able to have a C program read a
>>>> bitstream file and reconfigure the FPGA, but we haven't been able to
>>>> figure that out.
>>>> 
>>>> If we run a C program that wiggles a pin as fast as it can, we can do
>>>> a write to the FPGA register about every 170 ns. Without any attempts
>>>> at optimization (like dedicating the second ARM core to the loop) we
>>>> see stutters (OS stealing our CPU) that last tens or hundreds of
>>>> microseconds, occasionally a full millisecond. That might get worse if
>>>> we run TCP/IP sessions or host web pages or something, so dedicating
>>>> the second ARM to realtime stuff would be good.
>>>
>>>There's not nearly enough information there, but if you're serious about
>>>real time you don't just throw a bag of unknown software at something
>>>and expect it to work.  Operating systems don't steal CPU time --
>>>programmers steal CPU time, sometimes by choosing the wrong OS.
>> 
>> It's looking that the Linux that comes with the uZed will work fine for
>> the current application. Is Linux "a bag of unknown software"?
>> 
>> We need to do TCP/IP stuff, and manage waveform data files, run BIST,
>> things like that, and using Linux sure makes that part easy. A lot of
>> the functionality could be in ARM code or might be in the FPGA, but I'm
>> moving as much as possible into the ARM. C is a lot easier to code and
>> compile than VHDL.
>
>From the perspective of someone who needs an RTOS, Linux is a bag of very 
>dubious software, indeed.

It looks like the Linux speed is fine for this application.

>
>The fact that you were trying to work around Linux certainly explains the 
>large and variable delays.
>
>I agree that Linux makes the "complicated" stuff easier.  If you really 
>have a need for real-time, and you don't really need that other processor 
>for your "big box" stuff, then put an RTOS on it and run your timing-
>critical stuff there.
>
>Or, research the state of real time Linux -- it kind of dropped off my 
>radar screen about five years ago, so either it's a done deal and no 
>one's excited about it any more, or it never quite worked right and 
>everyone who was excited about it is now too embarrassed to speak up.

There is a huge advantage to running the Linux that comes installed in
the uZed. No Linux to recompile!

If we have realtime problems (which I don't think we will) the second
ARM core is available. We could ask Linux to run the signal processing
app on it, or even run the app bare metal on the second core. So we
have a bailout if we ever need it.



-- 

John Larkin         Highland Technology, Inc

jlarkin at highlandtechnology dot com
http://www.highlandtechnology.com

Precision electronic instrumentation
Picosecond-resolution Digital Delay and Pulse generators
Custom laser drivers and controllers
Photonics and fiberoptic TTL data links
VME thermocouple, LVDT, synchro   acquisition and simulation

Article: 156103
Subject: Re: Granularity of components for FPGA synthesis?
From: Kip Ingram <kip@liberty.kipingram.com>
Date: Fri, 22 Nov 2013 18:51:19 -0600
Links: << >> << T >> << A >>

* Jon Elson <jmelson@wustl.edu> wrote:
> Kip Ingram wrote:
>
>
>> Hi Jurgen.  Sorry to join the party late.  I tend to buck the
>> conventional wisdom on things like this.  I did my first programmable
>> logic design back in the 1980's when we used PALs (22V10, etc.)
>> Generally you took *total* control of your design.  You could use
>> equations, but even when I did I liked knowing exactly what was going on
>> with each and every fuse.
> With even the smallest modern FPGAs having 500K + "fuses", ie. configuration
> bits, it would be a very daunting task to know what they all do!
> Also, for design security and manufacturer's protection of their
> internal IP, they are getting VERY secretive of the internals.
>
> Jon

Agreed on both counts.  Regarding the first one, though, I don't mean to
imply that I don't want to have access at all to modern tools.  In a
really large design, well beyond my ability to hold in my mind at the
"detail level," I'd definitely use modern tools.  But I'd still like to
be able to do the performance-critical bits under full manual control -
a lot like using an assembler to do the most critical bits of a software
design and then wrapping that in compiled code.

And I think the parts of the IP that I'd need to know in order to do
what I discuss they already reveal, via detailed drawings of CLBs and
things like that.  It's mostly the interconnect network that they don't
document very well.

Anyway, I'm just pipe-dreaming.  I've already accepted that the world
just hasn't evolved the way I'd have preferred.  To paraphrase Connor
Macleod on that point - "in lots of different ways."

-- Kip

Article: 156104
Subject: Re: microZed adventures
From: Allan Herriman <allanherriman@hotmail.com>
Date: 23 Nov 2013 05:29:03 GMT
Links: << >> << T >> << A >>

On Fri, 22 Nov 2013 08:57:12 -0800, John Larkin wrote:

> We're using their build tools to embed the FPGA config into the boot
> image. We'd really like to be able to have a C program read a bitstream
> file and reconfigure the FPGA, but we haven't been able to figure that
> out.

I can confirm that it is quite possible to reconfigure the FPGA from a C 
program after the OS is running.

You removed the FPGA config step from the bootloader, and now it doesn't 
work?

My first guess would be that the (so-called) voltage translation buffers 
between the ARM and the FPGA have not been enabled.  There's a magic 
register somewhere that you need to write to before anything will work.

Get Rob to look through the (massive) TRM one more time...

Allan

Article: 156105
Subject: Re: Mill: FPGA version?
From: jhallen@TheWorld.com (Joseph H Allen)
Date: Sat, 23 Nov 2013 15:12:28 +0000 (UTC)
Links: << >> << T >> << A >>

In article <l6ocds$h6s$1@dont-email.me>,
Ivan Godard  <ivan@ootbcomp.com> wrote:

>So in the case of both a conventional bypass and the belt, each
>functional unit's source has a multiplexer to select from each possible
>slot result output. So each slot source has a quick path thru the result
>select mux for latency 1 results, and nearly a full pipe stage path thru
>the mux for all the rest.

On an FPGA the "all the rest" bypass MUXing is too expensive to do in one
stage.  The way I know to deal with this is to spread the bypass muxing
across a pipeline which feeds the function unit.  This bypass pipeline is
the same length as the datapath pipeline.  The FU results are broadcast to
all of these pipeline stages- if a request needs that particular result,
it's MUXed in.  With one FU, you only need 2:1 MUXes.  With N FUs, you need
(N+1):1 MUXes in each stage (unless there is clustering..).

Anyway, on an FPGA you have SRAM, so you definitely want to use it for the
register file, but you are stuck with 2 addressed ports per SRAM.  So I'm
envisioning that each FU has its own 2-port register file, plus copies so
other FUs can read results (older than in the bypass).  In other words, one
write port, many read ports is OK.

That you can make a compiler for the implied temporal location write
addressing, tells me that you should also be able to make one for a more
conventional design where there is explicit physical write addressing, but
with the restriction that each FU can only write to 1/N dedicated locations
in the register file.

With 4 FUs perhaps the load unit only can write to register 0, 4, 8, 12,
etc.  ALU1 can only write to register 1, 5, 9, 13, etc..  and so on.  It
seems this is a reasonable architectural restriction.  I'm wondering now if
there were any other machines like this this (well except for FP / integer
split).

I like also that load completons get the most recent store results to avoid
aliasing..  but I thought other systems did this also (basically bypass the
dcache).  Hmm, now that I think about it it's not fully solving the alias
problem.  You are not free to move a load completion earlier than any store
that you can't prove is alias hazard free.

>-- 
>Arthur Kahlich
>Chief Engineer
>Out-of-the-Box Computing

Thanks!

-- 
/*  jhallen@world.std.com AB1GO */                        /* Joseph H. Allen */
int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0)
+r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p<1659?79:0:p>158?-79:0,q?!a[p+q*2
]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}

Article: 156106
Subject: Re: microZed adventures
From: John Larkin <jjlarkin@highNOTlandTHIStechnologyPART.com>
Date: Sat, 23 Nov 2013 09:08:44 -0800
Links: << >> << T >> << A >>

On 23 Nov 2013 05:29:03 GMT, Allan Herriman <allanherriman@hotmail.com> wrote:

>On Fri, 22 Nov 2013 08:57:12 -0800, John Larkin wrote:
>
>> We're using their build tools to embed the FPGA config into the boot
>> image. We'd really like to be able to have a C program read a bitstream
>> file and reconfigure the FPGA, but we haven't been able to figure that
>> out.
>
>I can confirm that it is quite possible to reconfigure the FPGA from a C 
>program after the OS is running.
>
>You removed the FPGA config step from the bootloader, and now it doesn't 
>work?

My guys are building the whole boot image using the Xilinx tools, which combine
the OS and the FPGA and boot-load it all, and that works. My customer would like
to be able to upload separate files that are the application program and the
FPGA image, so I'd like to figure out how to do that. 

>
>My first guess would be that the (so-called) voltage translation buffers 
>between the ARM and the FPGA have not been enabled.  There's a magic 
>register somewhere that you need to write to before anything will work.
>
>Get Rob to look through the (massive) TRM one more time...

Rob has been busy on a couple of other projects, and I'm busy on the signal
processing architecture and the host board hardware design, so Paul (my embedded
programmer) and Blaine (our FPGA consultant) have been doing the serious work on
this. The secondary FPGA load thing hasn't been first priority... talking to an
FPGA register was, and now the signal processing is the next step.

Someone should write a book, Zed For Dummies. I'd buy one.

If anybody knows the details of reloading the FPGA live, I'd appreciate any
hints or references that I could pass on to the boys.

-- 

John Larkin                  Highland Technology Inc
www.highlandtechnology.com   jlarkin at highlandtechnology dot com   

Precision electronic instrumentation
Picosecond-resolution Digital Delay and Pulse generators
Custom timing and laser controllers
Photonics and fiberoptic TTL data links
VME  analog, thermocouple, LVDT, synchro, tachometer
Multichannel arbitrary waveform generators

Article: 156107
Subject: Re: Mill: FPGA version?
From: Ivan Godard <ivan@ootbcomp.com>
Date: Sat, 23 Nov 2013 11:29:48 -0800
Links: << >> << T >> << A >>

On 11/23/2013 7:12 AM, Joseph H Allen wrote:
> In article <l6ocds$h6s$1@dont-email.me>,
> Ivan Godard  <ivan@ootbcomp.com> wrote:
>
>> So in the case of both a conventional bypass and the belt, each
>> functional unit's source has a multiplexer to select from each possible
>> slot result output. So each slot source has a quick path thru the result
>> select mux for latency 1 results, and nearly a full pipe stage path thru
>> the mux for all the rest.
>
> On an FPGA the "all the rest" bypass MUXing is too expensive to do in one
> stage.  The way I know to deal with this is to spread the bypass muxing
> across a pipeline which feeds the function unit.  This bypass pipeline is
> the same length as the datapath pipeline.  The FU results are broadcast to
> all of these pipeline stages- if a request needs that particular result,
> it's MUXed in.  With one FU, you only need 2:1 MUXes.  With N FUs, you need
> (N+1):1 MUXes in each stage (unless there is clustering..).
>
> Anyway, on an FPGA you have SRAM, so you definitely want to use it for the
> register file, but you are stuck with 2 addressed ports per SRAM.  So I'm
> envisioning that each FU has its own 2-port register file, plus copies so
> other FUs can read results (older than in the bypass).  In other words, one
> write port, many read ports is OK.
>
> That you can make a compiler for the implied temporal location write
> addressing, tells me that you should also be able to make one for a more
> conventional design where there is explicit physical write addressing, but
> with the restriction that each FU can only write to 1/N dedicated locations
> in the register file.
>
> With 4 FUs perhaps the load unit only can write to register 0, 4, 8, 12,
> etc.  ALU1 can only write to register 1, 5, 9, 13, etc..  and so on.  It
> seems this is a reasonable architectural restriction.  I'm wondering now if
> there were any other machines like this this (well except for FP / integer
> split).

I think that I can reply to this without putting my foot in it (Art is 
even more swamped than I am).

Your suggestions would be appropriate were we trying to actually make a 
usable CPU in an FPGA, but that's not our plan; the Mill FPGA is purely 
a (faster than software) simulator for a regular hardware Mill. Its 
purpose is to test out correctness and to speed up software development, 
and we don't care how fast it runs. Consequently, the FPGA will work 
*exactly* the same way that the corresponding chip Mill will, using the 
same compilers. If that means that each Mill clock requires five, or 50, 
  FPGA stages and the bypass requires multiple SRAMS and muxes, then so 
be it. It's still going to be 10,000 times as fast as software sim.

> I like also that load completons get the most recent store results to avoid
> aliasing..  but I thought other systems did this also (basically bypass the
> dcache).  Hmm, now that I think about it it's not fully solving the alias
> problem.  You are not free to move a load completion earlier than any store
> that you can't prove is alias hazard free.

Load retires (completions) retain the same ordering vis a vis stores as 
program order. Only load issue is moved on a Mill. Of course, when the 
compiler can prove the absence of aliasing then it can also move load 
retire too, but that's the compiler; the hardware does not do moves, and 
views a compiler-revised order as "program order". The Mill is not 
responsible for compiler bugs :-)

Article: 156108
Subject: Re: microZed adventures
From: hal-usenet@ip-64-139-1-69.sjc.megapath.net (Hal Murray)
Date: Sat, 23 Nov 2013 21:15:42 -0600
Links: << >> << T >> << A >>

In article <KYidnUTnKrGGcRLPnZ2dnUVZ5r6dnZ2d@giganews.com>,
 Tim Wescott <tim@seemywebsite.really> writes:

>Or, research the state of real time Linux -- it kind of dropped off my 
>radar screen about five years ago, so either it's a done deal and no 
>one's excited about it any more, or it never quite worked right and 
>everyone who was excited about it is now too embarrassed to speak up.

http://lwn.net/Articles/572740/
The future of realtime Linux
November 6, 2013

Lots of info.  This may be the important piece:

While the financial industry was once a hotbed for interest in
realtime, that seems to have cooled somewhat. The traders are
more interested in throughput and are willing to allow Linux to
miss some latency deadlines in the interests of throughput, Hart
said. The embedded use cases for realtime seem to be where the
"action" is today, Gleixner and others said, but there has been
little effort to fund realtime development coming from that community.

-- 
These are my opinions.  I hate spam.

Article: 156109
Subject: Re: Interface Xilinx KC705 to BeagleBone?
From: david.middleton@gmail.com
Date: Mon, 25 Nov 2013 15:49:10 -0800 (PST)
Links: << >> << T >> << A >>

Hi Pete,

We are just prototyping a similar board to talk to a AD9361. 

Are you progressing with the plan to interface through the GPMC, and if so what are your software resources?

We are currently developing drivers with the intention of releasing a vhdl opencore & gpmc device driver but at this stage we are as vapour as everyone else (which is to say we are a reasonable way from being manufacturable).

David

On Monday, 23 July 2012 06:27:45 UTC+10, pfraser  wrote:
> I'm playing with the idea of interfacing a BeagleBone
> 
> (cheap dual ARM Cortex A8 board) to a Xilinx KC705
> 
> Kintex development board. This will give me much more
> 
> CPU processing power than a microblaze could.
> 
> 
> 
> I thought I could probably do it with a passive interface
> 
> because the Kintex can deal with 3.3 Volt I/O.
> 
> 
> 
> I'd probably use a Xilinx 105 debug board on the FMC
> 
> HPC connector, and hand build an interface board between
> 
> the debug board and the BeagleBone.
> 
> 
> 
> That would leave the LPC connector free for an Avnet
> 
> HDMI input board (I'm playing around with some video
> 
> processing / measurement ideas).
> 
> 
> 
> I would then develop a Angstrom Linux driver for
> 
> the TI GPMC interface to the Kintex.
> 
> 
> 
> Anybody see any flaws in this plan? Any advice?
> 
> Anybody done some / all of this already, and prepared
> 
> to share so that I don't need to re-invent the wheel?
> 
> 
> 
> Thanks
> 
> 
> 
> Pete



On Monday, 23 July 2012 06:27:45 UTC+10, pfraser  wrote:
> I'm playing with the idea of interfacing a BeagleBone
> 
> (cheap dual ARM Cortex A8 board) to a Xilinx KC705
> 
> Kintex development board. This will give me much more
> 
> CPU processing power than a microblaze could.
> 
> 
> 
> I thought I could probably do it with a passive interface
> 
> because the Kintex can deal with 3.3 Volt I/O.
> 
> 
> 
> I'd probably use a Xilinx 105 debug board on the FMC
> 
> HPC connector, and hand build an interface board between
> 
> the debug board and the BeagleBone.
> 
> 
> 
> That would leave the LPC connector free for an Avnet
> 
> HDMI input board (I'm playing around with some video
> 
> processing / measurement ideas).
> 
> 
> 
> I would then develop a Angstrom Linux driver for
> 
> the TI GPMC interface to the Kintex.
> 
> 
> 
> Anybody see any flaws in this plan? Any advice?
> 
> Anybody done some / all of this already, and prepared
> 
> to share so that I don't need to re-invent the wheel?
> 
> 
> 
> Thanks
> 
> 
> 
> Pete


\

Article: 156110
Subject: Re: FPGA Cryptosystem
From: alb <alessandro.basili@cern.ch>
Date: Wed, 27 Nov 2013 16:26:48 +0100
Links: << >> << T >> << A >>

Hi youngejoe,

On 11/22/2013 5:42 PM, youngejoe wrote:
[]
> Thanks for your reply. The issue I'm having is FPGA knowledge tbh. In
> research mode, still trying to develop a good, real concept.

I would recommend in focusing on the real concept. Your friend Google
has lots of hits for 'cryptography on fpga' (look at the scholarly
articles section for state-of-the-art research in the field).

While your motivation can be anything, your research goal should be
meaningful, but I believe your advisor can guide you here.

[]
> With FPGAs, for the above mentioned project, would I would require
> two development boards connected to PC1 and PC2? My initial concept
> was to use 2 FPGAs resembling 2 flash drives.

I would not even consider hardware for the time being. One thing is
doing cryptography on a message, one thing would be doing cryptography
'on the fly', so performances may matter. So stick to the functionality
first and get what you want, you will immediately see if your code fits
or not into a specific target and then is just a matter of buying a demo
board (typically research institutes have large discounts).

> 
> PGP is very interesting - employing both asymmetric and symmetric
> encryption as well as digital signatures etc. I like the idea behind
> it. As its software based, would it be difficult resembling it in
> hardware, say on FPGAs?

An algorithm sits on a piece of paper, the way it is implemented may
vary a great deal.

> The main guideline I'm trying to stick is an FPGA based cryptosystem.

If the aim of the project is to learn a bit of FPGA than I would go for
something simpler, if the aim is to learn a little bit of cryptography
than I would rather do it on software.

If the aim is to push cryptography further with the help of some
hardware support than I would really invest time in knowing what other
people/group are focusing on, what are the current challenges and which
among them intrigues you the most.

HTH,

Al

p.s.: as a matter of fact, there are lots of 'crypto cores' on
opencores.org, so you may also look at those.

Article: 156111
Subject: LCD test on Spartan 3E FPGA
From: Tung Thanh Le <ttungl@gmail.com>
Date: Wed, 27 Nov 2013 07:44:13 -0800 (PST)
Links: << >> << T >> << A >>

Hi,
   I got a problem that I cannot understand how to display on the LCD of Sp=
artan 3E FPGA. Then, how to get the inputs and outputs of a 16-bit Ripple C=
arry Adder  to show on LCD? Please any one who has known/dealt with this pr=
oblem, just let me know. I appreciate that. Thanks.
Best regards,
Tung

Article: 156112
Subject: Re: LCD test on Spartan 3E FPGA
From: Stef <stef33d@yahooI-N-V-A-L-I-D.com.invalid>
Date: Wed, 27 Nov 2013 17:00:37 +0100
Links: << >> << T >> << A >>

In comp.arch.fpga,
Tung Thanh Le <ttungl@gmail.com> wrote:
> Hi,
>    I got a problem that I cannot understand how to display on the LCD of Spartan 3E FPGA. Then, how to get the inputs and outputs of a 16-bit Ripple Carry Adder  to show on LCD? Please any one who has known/dealt with this problem, just let me know. I appreciate that. Thanks.

To my knowledge there is no Spartan 3E FPGA with built-in LCD. So you
probably have a board with a Spartan, an LCD and possibly other stuff.
To get some more meaningfull answers, you'll have to tell us some
details of the board you are using. What kind of LCD, how is it
connected to the Spartan, that sort of info.

It would also be appreciated if you tell us more about the context of
your question. Homework? Commercial? Hobby? Other?

-- 
Stef    (remove caps, dashes and .invalid from e-mail address to reply by mail)

There has been an alarming increase in the number of things you know
nothing about.

Article: 156113
Subject: Re: LCD test on Spartan 3E FPGA
From: Tung Thanh Le <ttungl@gmail.com>
Date: Wed, 27 Nov 2013 08:13:31 -0800 (PST)
Links: << >> << T >> << A >>

On Wednesday, November 27, 2013 10:00:37 AM UTC-6, Stef wrote:
> In comp.arch.fpga,
>=20
> Tung Thanh Le <ttungl@gmail.com> wrote:
>=20
> > Hi,
>=20
> >    I got a problem that I cannot understand how to display on the LCD o=
f Spartan 3E FPGA. Then, how to get the inputs and outputs of a 16-bit Ripp=
le Carry Adder  to show on LCD? Please any one who has known/dealt with thi=
s problem, just let me know. I appreciate that. Thanks.
>=20
>=20
>=20
> To my knowledge there is no Spartan 3E FPGA with built-in LCD. So you
>=20
> probably have a board with a Spartan, an LCD and possibly other stuff.
>=20
> To get some more meaningfull answers, you'll have to tell us some
>=20
> details of the board you are using. What kind of LCD, how is it
>=20
> connected to the Spartan, that sort of info.
>=20
>=20
>=20
> It would also be appreciated if you tell us more about the context of
>=20
> your question. Homework? Commercial? Hobby? Other?
>=20
>  =20
>=20
>=20
>=20
>=20
>=20
> --=20
>=20
> Stef    (remove caps, dashes and .invalid from e-mail address to reply by=
 mail)
>=20
>=20
>=20
> There has been an alarming increase in the number of things you know
>=20
> nothing about.

Hi Stef,
   Thank you for your reply. I am using Spartan 3E Starter Kit with LCD con=
trol. The thing that I am not familiar with this kind of indirect control L=
CD through its KCPSM3. This is the material links, please take a look at th=
is and please let me know how to cope with the LCD. Thanks.
https://www.dropbox.com/s/0nhp177502ka4ir/Spartan-3EManual%20-%20FPGA%20sta=
rter%20kit%20board%20user%20guide.pdf
https://www.dropbox.com/s/tkxtx3p3mqr0ook/s3esk_startup.pdf
Best regards,
Tung

Article: 156114
Subject: Re: LCD test on Spartan 3E FPGA
From: Tung Thanh Le <ttungl@gmail.com>
Date: Wed, 27 Nov 2013 08:18:05 -0800 (PST)
Links: << >> << T >> << A >>

On Wednesday, November 27, 2013 10:13:31 AM UTC-6, Tung Thanh Le wrote:
> On Wednesday, November 27, 2013 10:00:37 AM UTC-6, Stef wrote:
>=20
> > In comp.arch.fpga,
>=20
> >=20
>=20
> > Tung Thanh Le <ttungl@gmail.com> wrote:
>=20
> >=20
>=20
> > > Hi,
>=20
> >=20
>=20
> > >    I got a problem that I cannot understand how to display on the LCD=
 of Spartan 3E FPGA. Then, how to get the inputs and outputs of a 16-bit Ri=
pple Carry Adder  to show on LCD? Please any one who has known/dealt with t=
his problem, just let me know. I appreciate that. Thanks.
>=20
> >=20
>=20
> >=20
>=20
> >=20
>=20
> > To my knowledge there is no Spartan 3E FPGA with built-in LCD. So you
>=20
> >=20
>=20
> > probably have a board with a Spartan, an LCD and possibly other stuff.
>=20
> >=20
>=20
> > To get some more meaningfull answers, you'll have to tell us some
>=20
> >=20
>=20
> > details of the board you are using. What kind of LCD, how is it
>=20
> >=20
>=20
> > connected to the Spartan, that sort of info.
>=20
> >=20
>=20
> >=20
>=20
> >=20
>=20
> > It would also be appreciated if you tell us more about the context of
>=20
> >=20
>=20
> > your question. Homework? Commercial? Hobby? Other?
>=20
> >=20
>=20
> >  =20
>=20
> >=20
>=20
> >=20
>=20
> >=20
>=20
> >=20
>=20
> >=20
>=20
> > --=20
>=20
> >=20
>=20
> > Stef    (remove caps, dashes and .invalid from e-mail address to reply =
by mail)
>=20
> >=20
>=20
> >=20
>=20
> >=20
>=20
> > There has been an alarming increase in the number of things you know
>=20
> >=20
>=20
> > nothing about.
>=20
>=20
>=20
> Hi Stef,
>=20
>    Thank you for your reply. I am using Spartan 3E Starter Kit with LCD c=
ontrol. The thing that I am not familiar with this kind of indirect control=
 LCD through its KCPSM3. This is the material links, please take a look at =
this and please let me know how to cope with the LCD. Thanks.
>=20
> https://www.dropbox.com/s/0nhp177502ka4ir/Spartan-3EManual%20-%20FPGA%20s=
tarter%20kit%20board%20user%20guide.pdf
>=20
> https://www.dropbox.com/s/tkxtx3p3mqr0ook/s3esk_startup.pdf
>=20
> Best regards,
>=20
> Tung

In addition, it actually is my homework, but the deadline was over but I st=
ill not figure out yet, and I want to know how to control the LCD through t=
he board with sequence data, not only some characters as in its tutorial, s=
o it turns to my hobby now :). Appreciate for the help. Thanks.
Best regards,
Tung

Article: 156115
Subject: Re: FPGA Cryptosystem
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Wed, 27 Nov 2013 17:12:36 +0000 (UTC)
Links: << >> << T >> << A >>

alb <alessandro.basili@cern.ch> wrote:
> Hi youngejoe,

> On 11/22/2013 5:42 PM, youngejoe wrote:
> []
>> Thanks for your reply. The issue I'm having is FPGA knowledge tbh. In
>> research mode, still trying to develop a good, real concept.

> I would recommend in focusing on the real concept. Your friend Google
> has lots of hits for 'cryptography on fpga' (look at the scholarly
> articles section for state-of-the-art research in the field).

The main advantage of the FPGA is speed. There are some algorithms
that are slow in software, but can be be very fast in appropriate
hardware, which can be implemented on an FPGA.

Some of my favorite FPGA problems involve generating configuration
data specific to instance at hand. In cryptographic words, including 
the key in the hardware. 

Many encryption algorithms are based on bit manipulation that is hard
to do in software. With an FPGA, you can program the specific bit
operations into the hardware.

> While your motivation can be anything, your research goal should be
> meaningful, but I believe your advisor can guide you here.

> []
>> With FPGAs, for the above mentioned project, would I would require
>> two development boards connected to PC1 and PC2? My initial concept
>> was to use 2 FPGAs resembling 2 flash drives.

> I would not even consider hardware for the time being. One thing is
> doing cryptography on a message, one thing would be doing cryptography
> 'on the fly', so performances may matter. So stick to the functionality
> first and get what you want, you will immediately see if your code fits
> or not into a specific target and then is just a matter of buying a demo
> board (typically research institutes have large discounts).

Well, the other thing that is done with FPGAs is to do brute force
attacks on systems designed to be too slow to attack in software. 

I haven't followed it so closely, but I believe that DES is easy to
break now with an FPGA, but one fix is triple-DES. The window between
an FPGA break and people moving onto better encryption algorithms is
fairly small, but sometimes worth doing.

>> PGP is very interesting - employing both asymmetric and symmetric
>> encryption as well as digital signatures etc. I like the idea behind
>> it. As its software based, would it be difficult resembling it in
>> hardware, say on FPGAs?

Hard to say. I believe PGP was designed to be implemented in software.
In some cases, a hardware (FPGA) implementation of an algorithm is
very different from the software one. My favorite FPGA architechture
is the linear systolic array.  You might look that up and start
thinking about algorithms that it makes sense for.

> An algorithm sits on a piece of paper, the way it is implemented may
> vary a great deal.

>> The main guideline I'm trying to stick is an FPGA based cryptosystem.

> If the aim of the project is to learn a bit of FPGA than I would go for
> something simpler, if the aim is to learn a little bit of cryptography
> than I would rather do it on software.

Sounds about right. You could implement an existing algorithm that is
too slow for a given application now in software. That would be mostly
learning about hardware and FPGA, and not so much about cryptography.

> If the aim is to push cryptography further with the help of some
> hardware support than I would really invest time in knowing what other
> people/group are focusing on, what are the current challenges and which
> among them intrigues you the most.

That is generally true in research, and especially in this case.

Many encryption algorithms now require some deep math to understand. 

You could just implement and existing algorithm with a larger key, that
is, larger than anyone thought about doing in software. (But as
processors get faster, software key size also increases.)

As noted above, that mostly teaches you about FPGA and not so much about
encryption. (Well, it might teach you about encryption, but won't
advance the field much.) 

-- glen

Article: 156116
Subject: Re: LCD test on Spartan 3E FPGA
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Wed, 27 Nov 2013 17:17:26 +0000 (UTC)
Links: << >> << T >> << A >>

Stef <stef33d@yahooi-n-v-a-l-i-d.com.invalid> wrote:
> In comp.arch.fpga,
> Tung Thanh Le <ttungl@gmail.com> wrote:

>>    I got a problem that I cannot understand how to display on the 
>> LCD of Spartan 3E FPGA. Then, how to get the inputs and outputs of 
>> a 16-bit Ripple Carry Adder  to show on LCD? Please any one who has 
>> known/dealt with this problem, just let me know. I appreciate that. 
>> Thanks.

> To my knowledge there is no Spartan 3E FPGA with built-in LCD. So you
> probably have a board with a Spartan, an LCD and possibly other stuff.
> To get some more meaningfull answers, you'll have to tell us some
> details of the board you are using. What kind of LCD, how is it
> connected to the Spartan, that sort of info.

I believe that there is a board named after the Spartan 3E that it
contains. The name of the company selling the board would narrow
it down a little bit.

> It would also be appreciated if you tell us more about the context of
> your question. Homework? Commercial? Hobby? Other?

As well as I know it, LCDs need to be continuously refreshed, but the
usual boards have a display with built-in control logic. I believe,
then, to use that logic you have to send the data serially (maybe
one digit at a time) to the display device. That usually means a 
latch and a shift register.

-- glen

Article: 156117
Subject: Re: LCD test on Spartan 3E FPGA
From: Stef <stef33d@yahooI-N-V-A-L-I-D.com.invalid>
Date: Wed, 27 Nov 2013 22:53:39 +0100
Links: << >> << T >> << A >>

In comp.arch.fpga,
Tung Thanh Le <ttungl@gmail.com> wrote:
> On Wednesday, November 27, 2013 10:00:37 AM UTC-6, Stef wrote:
>> In comp.arch.fpga,
>> 
>> Tung Thanh Le <ttungl@gmail.com> wrote:
>> 
>> > Hi,
>> 
>> >    I got a problem that I cannot understand how to display on the LCD of Spartan 3E FPGA. Then, how to get the inputs and outputs of a 16-bit Ripple Carry Adder  to show on LCD? Please any one who has known/dealt with this problem, just let me know. I appreciate that. Thanks.
>> 
>> 
>> 
>> To my knowledge there is no Spartan 3E FPGA with built-in LCD. So you
>> probably have a board with a Spartan, an LCD and possibly other stuff.
>> To get some more meaningfull answers, you'll have to tell us some
>> details of the board you are using. What kind of LCD, how is it
>> connected to the Spartan, that sort of info.
>> 
>> 
>> It would also be appreciated if you tell us more about the context of
>> your question. Homework? Commercial? Hobby? Other?

>    Thank you for your reply. I am using Spartan 3E Starter Kit with LCD control. The thing that I am not familiar with this kind of indirect control LCD through its KCPSM3. This is the material links, please take a look at this and please let me know how to cope with the LCD. Thanks.
> https://www.dropbox.com/s/0nhp177502ka4ir/Spartan-3EManual%20-%20FPGA%20starter%20kit%20board%20user%20guide.pdf

OK you're using the Spartan-3E starter kit and want to use its on-board
LCD.

Lots of info on the LCD and how to interface to it in chapter 5, what
are you missing?

> https://www.dropbox.com/s/tkxtx3p3mqr0ook/s3esk_startup.pdf

This is a complete example of using the LCD with the picoblaze softcore.
If you intend to use picoblaze, just use the example and build from there.
If you don't want to use picoblaze, but do everything directly in logic,
there is still a lot of valuable information in that document. But you 
have to replace the code parts with logic (state machine) that performs
the transfers described.

And for your counter question, you'll have to add the counter and logic
to encode the value in the format you wish to show on the LCD.

-- 
Stef    (remove caps, dashes and .invalid from e-mail address to reply by mail)

Harp not on that string.
		-- William Shakespeare, "Henry VI"

Article: 156118
Subject: Free VHDL Testbench library for logging/reporting and checking. A
From: espen.tallaksen@bitvis.no
Date: Thu, 28 Nov 2013 07:47:20 -0800 (PST)
Links: << >> << T >> << A >>

If you are making VHDL testbenches you should be writing proper log message=
s. You should also make your result-checkers properly report mismatches and=
 also allow positive acknowledge. Equally important - your testbench should=
 time out with a good message when waiting too long for an event to happen.=
 And wouldn't it be nice to report a summary of all notes, warnings, errors=
, etc. at the end of your simulation, or perhaps stop on the fifth error?

All this is supported in a free and open source (and well documented) VHDL =
testbench Library from Bitvis.=20
http://www.bitvis.no/products/bitvis-utility-library

 It is my personal opinion that ANYONE making VHDL testbenches should use t=
his kind of library. It really makes you far more efficient, and it helps e=
verybody understand both your testbench and the transcript/log from your si=
mulations.
 It is extremely easy to use, and you can watch a free webinar or download =
a powerpoint file for a brief presentation on this library. A quick-referen=
ce and an example testbench for a simple interrupt controller is also provi=
ded.

 You may load it all down from our website. No registration required.
www.bitvis.no

Article: 156119
Subject: Re: FPGA Cryptosystem
From: Thomas Stanka <usenet_nospam_valid@stanka-web.de>
Date: Fri, 29 Nov 2013 01:41:21 -0800 (PST)
Links: << >> << T >> << A >>

Am Freitag, 22. November 2013 17:42:15 UTC+1 schrieb youngejoe:
> PGP is very interesting - employing both asymmetric and symmetric encrypt=
ion as well as digital signatures etc. I like the idea behind it. As its so=
ftware based, would it be difficult resembling it in hardware, say on FPGAs=
?

PGP is a software that required a lot of effort to be completed. HW transfe=
r will be a compareable large task.=20

RSA is one of the algorithm implemented in PGP. Its an example of an not so=
 trivial task to bring to HW.

Nevertheless I was part of a students group doing this task ~15 years ago (=
chip for symetric and asymetric encryption, digital signature etc..) From w=
hat I remember RSA required a good bunch of math to bring an efficient impl=
ementation to HW even if you limit keys to 1024 bit.=20
Implementing just one encryption algorith might be a good bunch of work, bu=
t if you think of key management, efficient IO transfer etc you have severa=
l projects.

regards Thomas=20

Article: 156120
Subject: Verilog! How to work with modules?
From: beginner <kristo.godari@gmail.com>
Date: Fri, 29 Nov 2013 02:40:04 -0800 (PST)
Links: << >> << T >> << A >>

Hello! 
1.sorry for my poor english
2.i have the following diagram to implement in verilog: http://elf.cs.pub.ro/ac/wiki/_media/teme/tema2/tema_2_top_module_instance.png
I want to make an synchronous sequential circuit that processes grayscale images.
a)make a blur filter b)make 90 degree rotation c)horizontal flip
I have searched and found algorithm that i will use.
I do not know how to access image module from process module?

My code look like this:
module process(
	input clk,						 
	input [1:0] op,					
	input [7:0] in_pix,					
        output [5:0] in_row, in_col, 	
	output [5:0] out_row, out_col,  
	output out_we,					
	output [7:0] out_pix,			
	output done						
	);	

 always @(posedge clk) begin
 if(op==0)begin
 //here i will implement blur algorithm
 end else if(op==1)begin	
 //here i will implement fliping algorithm
 end else begin
   //here i will implement rotation algorithm
   //I do not know how  to access elements of image module!??
   //I want to extract elements from the matrix so i can rotate the matrix and                 //create new rotated matrix?? 
 end
end
endmodule

module image(
	input clk,			 
	input[5:0] row,		
	input[5:0] col,		
	input we,			
	input[7:0] in,		
	output[7:0] out		
    );

  reg[7:0]  data[63:0][63:0];
  assign out = data[row][col];	
  always @(posedge clk) begin
  if(we)
    data[row][col] <= in;
  end
endmodule

Article: 156121
Subject: Use of hardware adders with long words to perform multiple
From: Wojciech M. Zabolotny <wzab@ise.pw.edu.pl>
Date: Sat, 30 Nov 2013 21:38:32 +0000 (UTC)
Links: << >> << T >> << A >>

I was solving a problem, when I needed to calculate every clock a sum of multiple values
encoded on a small number of bits (the latency of a few clocks is allowed).

A natural solution seemed to be a binary tree of adders, consisting of N levels,
when on each level I calculate a sum of two values.
E.g. assuming, that I have to calculate a sum of 8 values, I can calculate:

On the 1st level:
Y0 = X0 + X1, Y1=X2+X3, Y2=X4+X5, Y3=X6+X7 (4 adders)

On the 2nd level: 
V0 = Y0+Y1, V1=Y2+Y3 (2 adders)

On the 3rd level:
Z = V0+V1  (1 adder)

If each level is equipped with a pipeline register, I can get a short critical path,
and the final latency is equal to 3 clocks, the new values may be entered every clock, 
and the result is availeble every clock. The whole design uses 7 adders.

However modern FPGAs are equipped with adders using long words. E.g. the Xilinx family 7
chips use adders with 25 bit input operands. 
If we assume, that the input values are encoded only at 5 bits, we can significantly
reduce consumption of resources.
Lets encode input words X0..X7 on bits of operands on the 1st level as follows:
A(4 downto 0)=X0; A(5)='0';
A(10 downto 6)=X2; A(11)='0';
A(16 downto 12)=X4; A(17)='0';
A(22 downto 18)=X6; A(23)='0';

B(4 downto 0)=X1; B(5)='0';
B(10 downto 6)=X3; B(11)='0';
B(16 downto 12)=X5; B(17)='0';
B(22 downto 18)=X7; B(23)='0';

Then on the first layer we can perform all calculations using only single adder:
C=A+B, and sub-sums are encoded as follows:
C(5 downto 0)= X0+X1=Y0; C(11 downto 6)=X2+X3=Y1; C(17 downto 12)=X4+X5=Y2; C(23 downto 18)=X6+X7=Y3

On the 2nd level we work with 6-bit values (7-bit after addition of leading '0'), so we can perform
up to 3 additions in a single adder (but we need only 2)

D(5 downto 0)=Y0; D(6)='0'; D(12 downto 7)=Y2; D(13)='0';
E(5 downto 0)=Y1; D(6)='0'; D(12 downto 7)=Y3; D(13)='0';

After addition:
F=D+E we get:
F(6 downto 0)=Y0+Y1=V0 ; F(13 downto 7)=Y2+Y3=V1;

The final addition may be performed in a standard way.
Please note, that now we had to use only 3 adders!

The interesting problem is, how we should organize our adders' tree for different lengths of word
in the adders, different lengths of the input values and different number of the input values.
I've prepared a simple tool, written in Python, which automatically generates the appropriate
VHDL code.
The sources have been posted on the usenet alt.sources group:
Subject: Generator of VHDL code for parallel adder (using long-word hardware adders to perform multiple additions in parallel)
Google archive: https://groups.google.com/forum/#!topic/alt.sources/8DqGgELScDM

However I'm interested if it can be done in pure VHDL?
I hope that the above idea will be useful for someone...

-- 
Regards,
Wojciech M. Zabolotny
wzab@ise.pw.edu.pl

My GPG/PGP keys:
standard: B191 ACF0 7909 83FA 3F9B  450C 407E 3C4B 4569 D119
confidential: 2BF3 F90F 6EA8 7D35 59FD  5080 78ED 33DE 1312 D8F8

Article: 156122
Subject: Re: Use of hardware adders with long words to perform multiple
From: Allan Herriman <allanherriman@hotmail.com>
Date: 01 Dec 2013 00:40:26 GMT
Links: << >> << T >> << A >>

On Sat, 30 Nov 2013 21:38:32 +0000, Wojciech M. Zabolotny wrote:

> I was solving a problem, when I needed to calculate every clock a sum of
> multiple values encoded on a small number of bits (the latency of a few
> clocks is allowed).
> 
> A natural solution seemed to be a binary tree of adders, consisting of N
> levels,
> when on each level I calculate a sum of two values.
> E.g. assuming, that I have to calculate a sum of 8 values, I can
> calculate:
> 
> On the 1st level:
> Y0 = X0 + X1, Y1=X2+X3, Y2=X4+X5, Y3=X6+X7 (4 adders)
> 
> On the 2nd level:
> V0 = Y0+Y1, V1=Y2+Y3 (2 adders)
> 
> On the 3rd level:
> Z = V0+V1  (1 adder)
> 
> If each level is equipped with a pipeline register, I can get a short
> critical path,
> and the final latency is equal to 3 clocks, the new values may be
> entered every clock,
> and the result is availeble every clock. The whole design uses 7 adders.
> 
> However modern FPGAs are equipped with adders using long words. E.g. the
> Xilinx family 7 chips use adders with 25 bit input operands.
> If we assume, that the input values are encoded only at 5 bits, we can
> significantly reduce consumption of resources.
> Lets encode input words X0..X7 on bits of operands on the 1st level as
> follows:
> A(4 downto 0)=X0; A(5)='0';
> A(10 downto 6)=X2; A(11)='0';
> A(16 downto 12)=X4; A(17)='0';
> A(22 downto 18)=X6; A(23)='0';
> 
> B(4 downto 0)=X1; B(5)='0';
> B(10 downto 6)=X3; B(11)='0';
> B(16 downto 12)=X5; B(17)='0';
> B(22 downto 18)=X7; B(23)='0';
> 
> Then on the first layer we can perform all calculations using only
> single adder:
> C=A+B, and sub-sums are encoded as follows:
> C(5 downto 0)= X0+X1=Y0; C(11 downto 6)=X2+X3=Y1; C(17 downto
> 12)=X4+X5=Y2; C(23 downto 18)=X6+X7=Y3
> 
> On the 2nd level we work with 6-bit values (7-bit after addition of
> leading '0'), so we can perform up to 3 additions in a single adder (but
> we need only 2)
> 
> D(5 downto 0)=Y0; D(6)='0'; D(12 downto 7)=Y2; D(13)='0';
> E(5 downto 0)=Y1; D(6)='0'; D(12 downto 7)=Y3; D(13)='0';
> 
> After addition:
> F=D+E we get:
> F(6 downto 0)=Y0+Y1=V0 ; F(13 downto 7)=Y2+Y3=V1;
> 
> The final addition may be performed in a standard way.
> Please note, that now we had to use only 3 adders!
> 
> The interesting problem is, how we should organize our adders' tree for
> different lengths of word in the adders, different lengths of the input
> values and different number of the input values.
> I've prepared a simple tool, written in Python, which automatically
> generates the appropriate VHDL code.
> The sources have been posted on the usenet alt.sources group:
> Subject: Generator of VHDL code for parallel adder (using long-word
> hardware adders to perform multiple additions in parallel)
> Google archive:
> https://groups.google.com/forum/#!topic/alt.sources/8DqGgELScDM
> 
> However I'm interested if it can be done in pure VHDL?
> I hope that the above idea will be useful for someone...


Not a direct answer to your questions... have you researched Wallace 
Trees?  Like most solutions to computer arithmetic, it dates from about 
half a century ago.
Wallace tried to find a way of making a faster hardware multiplier.  Part 
of that involves adding a large number of binary numbers, and Wallace's 
solution allowed for a shorter critical path in the adder tree.

http://en.wikipedia.org/wiki/Wallace_tree

BTW, yes, it can be done in pure VHDL.  Expect to use a lot of if-
generate and for-generate constructs, possibly with functions to work out 
the ranges of the for-generates.

A few years back a wrote a "popcount" (e.g. population count) module in 
VHDL.  It had an input width that was configurable, and added up the 
number of '1' bits in that input word.  E.g. a 512 bit input would result 
in a 10 bit output word.  The whole thing worked in only a modest number 
of levels of logic and was moderately fast (180MHz after PAR in the logic 
of the day) because I used a Wallace Tree.  The first stages (which were 
adding six 1 bit numbers together) were implemented directly in six-input 
LUTs.  The second (and perhaps third) stages were implemented using 
similar LUTs.  I only switched to using behavioural (synthesising to 
ripple carry) adders once the words became wider a few stages into the 
tree.

Regards,
Allan

Article: 156123
Subject: Re: Use of hardware adders with long words to perform multiple additions in parallel
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Sun, 1 Dec 2013 02:21:29 +0000 (UTC)
Links: << >> << T >> << A >>

Wojciech M. Zabolotny <wzab@ise.pw.edu.pl> wrote:
> I was solving a problem, when I needed to calculate every clock a 
> sum of multiple values encoded on a small number of bits 
> (the latency of a few clocks is allowed).
 
> A natural solution seemed to be a binary tree of adders, consisting of N levels,
> when on each level I calculate a sum of two values.
> E.g. assuming, that I have to calculate a sum of 8 values, I can calculate:
 
> On the 1st level:
> Y0 = X0 + X1, Y1=X2+X3, Y2=X4+X5, Y3=X6+X7 (4 adders)
 
> On the 2nd level: 
> V0 = Y0+Y1, V1=Y2+Y3 (2 adders)
 
> On the 3rd level:
> Z = V0+V1  (1 adder)

The traditional solution to this problem is the carry save adder.

It isn't always so obvious that the traditional solutions are still
optimal in FPGAs with built-in carry logic, but I think it still works.

-- glen

Article: 156124
Subject: Re: Use of hardware adders with long words to perform multiple
From: wzab01@gmail.com
Date: Sun, 1 Dec 2013 04:00:09 -0800 (PST)
Links: << >> << T >> << A >>

W dniu niedziela, 1 grudnia 2013 01:40:26 UTC+1 u=C5=BCytkownik Allan Herri=
man napisa=C5=82:
> On Sat, 30 Nov 2013 21:38:32 +0000, Wojciech M. Zabolotny wrote:
>=20
>=20
>=20
> > I was solving a problem, when I needed to calculate every clock a sum o=
f
>=20
> > multiple values encoded on a small number of bits (the latency of a few
>=20
> > clocks is allowed).
>=20
> >=20
>=20
> > A natural solution seemed to be a binary tree of adders, consisting of =
N
>=20
> > levels,
>=20
> > when on each level I calculate a sum of two values.
>=20
> > E.g. assuming, that I have to calculate a sum of 8 values, I can
>=20
> > calculate:
>=20
> >=20
>=20
> > On the 1st level:
>=20
> > Y0 =3D X0 + X1, Y1=3DX2+X3, Y2=3DX4+X5, Y3=3DX6+X7 (4 adders)
>=20
> >=20
>=20
> > On the 2nd level:
>=20
> > V0 =3D Y0+Y1, V1=3DY2+Y3 (2 adders)
>=20
> >=20
>=20
> > On the 3rd level:
>=20
> > Z =3D V0+V1  (1 adder)
>=20
> >=20
>=20
> > If each level is equipped with a pipeline register, I can get a short
>=20
> > critical path,
>=20
> > and the final latency is equal to 3 clocks, the new values may be
>=20
> > entered every clock,
>=20
> > and the result is availeble every clock. The whole design uses 7 adders=
.
>=20
> >=20
>=20
> > However modern FPGAs are equipped with adders using long words. E.g. th=
e
>=20
> > Xilinx family 7 chips use adders with 25 bit input operands.
>=20
> > If we assume, that the input values are encoded only at 5 bits, we can
>=20
> > significantly reduce consumption of resources.
>=20
> > Lets encode input words X0..X7 on bits of operands on the 1st level as
>=20
> > follows:
>=20
> > A(4 downto 0)=3DX0; A(5)=3D'0';
>=20
> > A(10 downto 6)=3DX2; A(11)=3D'0';
>=20
> > A(16 downto 12)=3DX4; A(17)=3D'0';
>=20
> > A(22 downto 18)=3DX6; A(23)=3D'0';
>=20
> >=20
>=20
> > B(4 downto 0)=3DX1; B(5)=3D'0';
>=20
> > B(10 downto 6)=3DX3; B(11)=3D'0';
>=20
> > B(16 downto 12)=3DX5; B(17)=3D'0';
>=20
> > B(22 downto 18)=3DX7; B(23)=3D'0';
>=20
> >=20
>=20
> > Then on the first layer we can perform all calculations using only
>=20
> > single adder:
>=20
> > C=3DA+B, and sub-sums are encoded as follows:
>=20
> > C(5 downto 0)=3D X0+X1=3DY0; C(11 downto 6)=3DX2+X3=3DY1; C(17 downto
>=20
> > 12)=3DX4+X5=3DY2; C(23 downto 18)=3DX6+X7=3DY3
>=20
> >=20
>=20
> > On the 2nd level we work with 6-bit values (7-bit after addition of
>=20
> > leading '0'), so we can perform up to 3 additions in a single adder (bu=
t
>=20
> > we need only 2)
>=20
> >=20
>=20
> > D(5 downto 0)=3DY0; D(6)=3D'0'; D(12 downto 7)=3DY2; D(13)=3D'0';
>=20
> > E(5 downto 0)=3DY1; D(6)=3D'0'; D(12 downto 7)=3DY3; D(13)=3D'0';
>=20
> >=20
>=20
> > After addition:
>=20
> > F=3DD+E we get:
>=20
> > F(6 downto 0)=3DY0+Y1=3DV0 ; F(13 downto 7)=3DY2+Y3=3DV1;
>=20
> >=20
>=20
> > The final addition may be performed in a standard way.
>=20
> > Please note, that now we had to use only 3 adders!
>=20
> >=20
>=20
> > The interesting problem is, how we should organize our adders' tree for
>=20
> > different lengths of word in the adders, different lengths of the input
>=20
> > values and different number of the input values.
>=20
> > I've prepared a simple tool, written in Python, which automatically
>=20
> > generates the appropriate VHDL code.
>=20
> > The sources have been posted on the usenet alt.sources group:
>=20
> > Subject: Generator of VHDL code for parallel adder (using long-word
>=20
> > hardware adders to perform multiple additions in parallel)
>=20
> > Google archive:
>=20
> > https://groups.google.com/forum/#!topic/alt.sources/8DqGgELScDM
>=20
> >=20
>=20
> > However I'm interested if it can be done in pure VHDL?
>=20
> > I hope that the above idea will be useful for someone...
>=20
>=20
>=20
>=20
>=20
> Not a direct answer to your questions... have you researched Wallace=20
>=20
> Trees?  Like most solutions to computer arithmetic, it dates from about=
=20
>=20
> half a century ago.
>=20
> Wallace tried to find a way of making a faster hardware multiplier.  Part=
=20
>=20
> of that involves adding a large number of binary numbers, and Wallace's=
=20
>=20
> solution allowed for a shorter critical path in the adder tree.
>=20
>=20
>=20
> http://en.wikipedia.org/wiki/Wallace_tree
>=20
>=20
>=20
> BTW, yes, it can be done in pure VHDL.  Expect to use a lot of if-
>=20
> generate and for-generate constructs, possibly with functions to work out=
=20
>=20
> the ranges of the for-generates.
>=20
>=20
>=20
> A few years back a wrote a "popcount" (e.g. population count) module in=
=20
>=20
> VHDL.  It had an input width that was configurable, and added up the=20
>=20
> number of '1' bits in that input word.  E.g. a 512 bit input would result=
=20
>=20
> in a 10 bit output word.  The whole thing worked in only a modest number=
=20
>=20
> of levels of logic and was moderately fast (180MHz after PAR in the logic=
=20
>=20
> of the day) because I used a Wallace Tree.  The first stages (which were=
=20
>=20
> adding six 1 bit numbers together) were implemented directly in six-input=
=20
>=20
> LUTs.  The second (and perhaps third) stages were implemented using=20
>=20
> similar LUTs.  I only switched to using behavioural (synthesising to=20
>=20
> ripple carry) adders once the words became wider a few stages into the=20
>=20
> tree.
>=20
>=20
>=20
> Regards,
>=20
> Allan

Thanks a lot for suggestions.=20
I have not described it, but in my design I have yet another constraint.
I need to squeeze as much of those summing systems in single FPGA, while ma=
intaining latency on the lowest possible level.

That's why I tried to fully utilize the hardware features of my FPGA, and r=
euse adders to perform multiple operations in parallel.

In fact my problems starts at the level where "the words become wider" (quo=
ting from your post).

Regards,
Wojtek

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search