Messages from 62650

Article: 62650
Subject: Re: Building the 'uber processor'
From: "mikegw" <mikegw20@hotmail.spammers.must.die.com>
Date: Tue, 4 Nov 2003 19:34:05 +1100
Links: << >> << T >> << A >>


"John Williams" <jwilliams@itee.uq.edu.au> wrote in message
news:bo6l2c$u39$1@bunyip.cc.uq.edu.au...
> Hi Mike,
>
> mikegw wrote:
> > a) Will a FPGA co-processor board(s) offer a speed improvement in
running
> > our simulation jobs over using a 'traditional' cluster (mosix/Bewoulf)?
> > Bearing in mind that ours will be the only job on the machine so can we
> > reconfigure our FPGA boards to speed calculation?
>
> To parallel what Jon said earlier - the biggest gotcha that seems to
> bite people is IO bandwidth.  It's not necessarily hard to develop
> highly pipelined FPGA designs that will crunch your numbers at 100M
> sample/sec, but can you keep it busy?
>
As we will be stepping time, the data (particle information position etc...)
will be the output of the previous 'step'.  The only bit that might be messy
is to calculate the relative distances between particles.

I think that these devices might be the way to go.  To me it seems odd that
we seem to be taking a step back to the old analogue computer days when you
'built' your program.



> I read of an interesting approach a while ago - do a search for
> Pilchard, it's an FPGA coprocessor board developed at a Hong Kong
> university.  Basically it fits in the standard PC memory module form
> factor, with custom Linux drivers to access it.  The bandwidth on the
> memory bus is much greater than on PCI.

I took a look, it seems to be fairly interesting.  Given my particular data
set I might be on the wrong track thinking of an accelerator card.  Maybe a
stand alone device which the input is up-loaded and it is sent forth to do.

So much to learn.........

Mike

Article: 62651
Subject: Prototyping board with 4+ MB SRAM?
From: H. Peter Anvin <hpa@zytor.com>
Date: 4 Nov 2003 01:40:27 -0800
Links: << >> << T >> << A >>

Hello,

Does anyone happen to know of a stock FPGA prototyping board with (a)
onboard oscillator, (b) Ethernet and (c) at least 4 MB of SRAM?

I have a need for such a board in a configuration which needs to
support a very large range of input frequencies, hence I would prefer
using SRAM; however, most boards seem to have no more than 1 MB SRAM
and the rest SDRAM... which would have to be supported as an
asychronous clock domain in order to work correctly at the low end of
the frequency range.

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
If you send me mail in HTML format I will assume it's spam.
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

Article: 62652
Subject: Re: Vendor supplied symbol/part models?
From: Brian Drummond <brian@shapes.demon.co.uk>
Date: Tue, 04 Nov 2003 09:54:08 +0000
Links: << >> << T >> << A >>

On Mon, 03 Nov 2003 14:09:07 +1000, John Williams
<jwilliams@itee.uq.edu.au> wrote:

>Martin Euredjian wrote:

>> I wish chip vendors would agree upon a component decription
>> language/database format of some sort.  These files could be published and
>> CAD data very easily derived from them.  That would be very useful.
>
>Some existing standard like EDIF might be able to support this already.

I think the BSDL files for Xilinx parts come close, having a lot of the
required information in mmachine-digestible form.

- Brian

Article: 62653
Subject: Re: Building the 'uber processor'
From: dont@agora.rdrop.com (Don Taylor)
Date: 4 Nov 2003 04:03:53 -0600
Links: << >> << T >> << A >>

"mikegw" <mikegw20@hotmail.spammers.must.die.com> writes:
>As we will be stepping time, the data (particle information position etc...)
>will be the output of the previous 'step'.  The only bit that might be messy
>is to calculate the relative distances between particles.

>I think that these devices might be the way to go.  To me it seems odd that
>we seem to be taking a step back to the old analogue computer days when you
>'built' your program.

This is getting away from hardware, and you haven't said how much
expertise you have there to use on the problem, but I remember a
series of books published by MIT press in the '90s.  Each was the
summary of a different phd thesis.  One of those described break-
throughs in the simulation of many body problems that led to orders
of magnitude increase in speed for running the simulation.  I don't
know whether those results would apply in your case or not.

It seems to be a general rule that hardware can speed up a problem
by k-fold, where k is a modestly small number usually.  But finding
a better algorithm can speed up a problem by n-fold, where n is the
number of items you have to deal with.  With both you might get k*n.

>So much to learn.........

Someone once said to me "it takes six or eight years to really learn
something well, and you don't have very many six or eights, so don't
you go waste one."  Now I realize I really should have understood what
he meant then.

Article: 62654
Subject: Re: Building the 'uber processor'
From: Philip Freidin <philip@fliptronics.com>
Date: Tue, 04 Nov 2003 10:13:50 GMT
Links: << >> << T >> << A >>

On Mon, 3 Nov 2003 16:05:52 +1100, "mikegw" <mikegw20@hotmail.spammers.must.die.com> wrote:
>Hello all,

Hi Mike,

>Firstly I would like to say that other than knowing what a FPGA is on a most
>basic level my knowledge about the subject is nil.  I am looking at this
>from an application that needs a solution.  I have seen about the place add
>on boards for PC's that act as co-processors.  This is the interesting bit
>to me.  Our research group is looking into building a computer (cluster
>perhaps)  for calculation of particle dynamics, similar to CFD in
>application.  Our programs are in C/C++ running on Linux ( any flavour will
>do).

So that we may better help you, please answer the following questions:

Is the arithmetic Floating Point (FP) or Integer?

   If mixed, what is the ratio of the two?
    (i.e. 10000 integer ops to every floating point op)
	(If the ratio is greater than 100000:1, could you do the integer
         stuff in the FPGAs, and the FP in a host X86 processor?)

   If floating point:
     Does it need to be IEEE FP (i.e. identical to a software execution
          on the same data set)
     OR
     (Floating point with N bits of mantissa, M bits of exponent, X guard bits,
      etc...)

     What is the ratio of Mult, Div, Add, Sub, Sqrt, Sin, Cos, Exp, Log, ...
	(Are integer aproximation useable??)

For integer operations, how many bits of precission are needed?
Is this precision required all the way through the algorithm, or can the
precission be adjusted at each step?

How many arithmetic/logic ops per data item?

What is the data set size needed before calculations can start
   (i.e. 20 3D points, 10 scan lines, a 512 by 512 2D set, ...)

Can the calculations be partitioned in multiple identical sets that
perform the same operation on different parts of the total data set.

   If partitioning is possible how much communications (number of data
   items) is needed to be passed between the separate calculation
   clusters?  How often does this need to happen (what is the
   inter-processor bandwidth).

How much local data is created while calculations take place?
   (What bandwith is needed to support it)

How much table/look up data is required by the algorithm?
   (What bandwith is needed to support it)

Can data be thought of as a continuous stream in and out, or is
it 1 big chunk that must all arrive, then calculate till done, then
spit out a result (what is size of input chunk and output gems). Is there
a constant flow of chunks (Size, arrival rate, expected FP/Int ops per
chunk?)

Since you want an Über processor, do you have an Über hardware designer?
(It takes considerable effort to create one of these, especially if what
you start with is an Über software designer. It is an order of magnitude
easier to get a HW engineer to write passable SW than it is to get a SW
programmer to design passable HW.)

Are you aware that SW is basically written for sequential execution, or
extremely chunky parallelism (threads). Hardware design (for Über
processors) typically require Ultra parallelism (100s to 1000s of
operation running in parallel), which means that your algorithms will
have to totally re-arranged to match such application specific hardware.
Although this is daunting, there are hundreds of real life systems that
have done this (i.e. your basic question of "does this make sense" to
consider FPGAs to create an application specific co-processor is YES).
Implementing these successful systems was never achieved by just taking
the SW (C/C++ for example) and re-crafting as hardware. You will need to
go back to the basics of the algorithm's intent, then design for the
extreme parallelism that the FPGAs offer. This is not always possible, as
discussed by others who have answered your original question.

Are you thinking of a single co-processor board in a PC or something more
like a Bewoulf cluster with each node having its own accelerator board?

There are many more such questions, but this would be a good start.  

>My questions are
>
>a) Will a FPGA co-processor board(s) offer a speed improvement in running
>our simulation jobs over using a 'traditional' cluster (mosix/Bewoulf)?
>Bearing in mind that ours will be the only job on the machine so can we
>reconfigure our FPGA boards to speed calculation?

Can't answer this without far more information from you. See above :-)

Note that your:
   "so can we reconfigure our FPGA boards to speed calculation?"

is no trivial thing. The design of the hardware may take many months
to do even if you have a Über hardware designer.

>b) Can anyone recommend a good book that I can read and hopefully be able to
>ask more informed questions?

There is an annual conference held in Napa California where all the people
that do this type of thing meet. It is teh IEEE FCCM conference. You
would be well served by looking at the titles of the proceddings for the
last 7 years at http://www.fccm.org/  . You can probably get copies of the
proceedings from the IEEE for way too much money.

>Cheers

Happiness to you too.

>Mike

Philip

Philip Freidin
Fliptronics

Article: 62655
Subject: Re: Building the 'uber processor'
From: Mario Trams <Mario.Trams@informatik.tu-chemnitz.de>
Date: Tue, 04 Nov 2003 11:15:01 +0100
Links: << >> << T >> << A >>

mikegw wrote:

> Just so I understand you,   if I want to "realise" my c code in a FPGA
> array,  I can upload the code, data and the processing array.  Run it and
> download the data?
>
> The code (not actually mine I am just seeing if this is all possible) is
> basically applying an equation on a data set looping for all particles for
> each time step.  The tricky bit (in at least the programming sense) is to
> constantly calculate the relative positions of each particle to calculate
> their effect on each other.

Mike,

Surely, you might put something like a processor into an FPGA where
you can download your code and data. But you will very likely not 
gain very much from this as you are still stuck with your 
"program code execution" paradigm.

Depending on the application, you might get a little gain by 
placing a very special processor into the FPGA that is optimised 
for your application. DSPs are a good example here. They have 
special features that makes them very fast for some algorithms.  
This would also require that you have a special compiler, that
compiles the code (that you want to reuse) optimized for your 
special processor. But many things you would probably anyways 
need to code in assembly language, because there is no direct 
translation from an high-level language to a special machine
feature possible. As far as I know, this is the same for DSPs.

However, a real speed-up you will achieve by throwing the 
processor concept over board and thinking just in distributed 
state machines. This is a completely different thing compared
to implementing an algorithm in some language. 
At first, you have to be an experienced digital designer to do 
that. (Btw, you have to be the same when designing a special 
CPU, of course.) 

Regards,
Mario

Article: 62656
Subject: Re: Prototyping board with 4+ MB SRAM?
From: Philip Freidin <philip@fliptronics.com>
Date: Tue, 04 Nov 2003 10:17:09 GMT
Links: << >> << T >> << A >>

On 4 Nov 2003 01:40:27 -0800, H. Peter Anvin <hpa@zytor.com> wrote:
>Hello,
>
>Does anyone happen to know of a stock FPGA prototyping board with (a)
>onboard oscillator, (b) Ethernet and (c) at least 4 MB of SRAM?
>
>
>	-hpa

I would start looking here:

   http://www.fpga-faq.com/FPGA_Boards.shtml



===================
Philip Freidin
philip@fliptronics.com
Host for WWW.FPGA-FAQ.COM

Article: 62657
Subject: Re: Vendor supplied symbol/part models?
From: rjd@transtech-dsp.com (rob d)
Date: 4 Nov 2003 03:02:09 -0800
Links: << >> << T >> << A >>

John Williams <jwilliams@itee.uq.edu.au> wrote in message news:<bo4a46$sbg$1@bunyip.cc.uq.edu.au>...
> Hi folks,
> 
> Doing a board design with a 456 pin Xilinx FPGA, I find myself in the 
> laborious and potentially error-prone process of building a symbol, 
> footprint and part model from scratch.   I am aware that commercial part 
> libraries are available, but we are a university department and don't 
> have those sort of $$$ to throw around for small-run custom designs.
> 
> Anyway it seems to me that it would be in vendors' interests (Xilinx in 
> this case) to provide verified symbol and footprint models for major 
> design tools (Mentor, Protel etc)?  A quick search of the Xilinx web 
> site didn't turn up anything.
> 
> Is there some point I'm missing here, or are my expectations unreasonable?
> 
> Regards,
> 
> John


Some time ago I needed a symbol for a 1152 pin virtex2 which easily
showed which pins were not used for smaller devices. Xilinx don't even
give the data, you've got to superimpose every size you want before
you even start up the schematic tool.

I posted to this newsgroup about it, a thread didn't get going but I
had a direct correspondence from the Xilinx guy documenting SPARTAN n
(can't remember which). Turns out that he had done it properly
allready.

If you are lucky you are using spartan n and you can use the excell
version of the pin table.
If you are very lucky you can import excell into your symbol editor. I
use ORCAD, it isn't documented but you can easily paste (shift insert)
into the symbol editor.

Rob

Article: 62658
Subject: Re: Xilinx - Multi Volt Interfacing
From: "Lockie" <biglockie@hotmail.com>
Date: Tue, 4 Nov 2003 22:17:51 +1100
Links: << >> << T >> << A >>

Sandeep,
Thanks for your information,

Firstly i forgot to mention the device is a XC95288XL, and my reason for not
wanting to use a strong pull-up resistor was the amount of heating caused to
the device. (which im advised is well within tollerance).

Secondly, i seem to have some conflicting information with relation to the
5V issue.  I received a link to the virtex information on Xilinx's web site,
this makes mention of the device driving a 5V Load, and accepting a 5V
input, but not in both directions.
It is unfortunate but most of the devices in my system are 5V and require
bi-directional support (CPU, LCD Panel, RTC, etc).
I use about 120 of the I/O's for 5V interfacing, and if i use a 1K pull up
resistor (so as i can still meet timing requirements), i find myself using
600mA+ of current just for the I.O's.  I know that each pin is rated to
10mA, which is what i base my calc's on, but don't you think that seems a
little excessive.??

Im looking now into the interfacing devices,
Once again , thanks for you help.
Lockie.


"Sandeep Kulkarni" <sandeep@insight.memec.co.in> wrote in message
news:bo75gf$17dhsb$1@ID-199516.news.uni-berlin.de...
> Hello Lockie,
>
> I donot know your previous implementation was on XC95288 or the XC95288XL
> family, the thing here is the XL devices have 5V tolerant i/o, and you can
> interface them straight to the FPGA i/o. Pull up is required in case of 5V
> CMOS only.
> But in the case of Spartan2E, the i/o's are not 5V tolerant, and thus you
> cannot directly interface it to a 5V device. You need to have a series
> resistor or a buffer in between. In your case as you want to interface the
> fpga to the cpu, which in your case I presume to be 5V CMOS you can only
use
> a level translator or bidirectional buffer for e.g. from IDT, www.idt.com.
> If you don't want to use external buffer and use the pull up resistor
> arrangement, you will need to use the Spartan2 family instead, which has
5V
> tolerant I/o.
>
> Regards
> Sandeep
> "Lockie" <biglockie@hotmail.com> wrote in message
> news:3fa5fdfc@dnews.tpgi.com.au...
> > Hey All,
> >
> > Im using a XC2S300E and a 5V CPU.  The XC2S300E implements a simple
memory
> > interface to the CPU.
> >
> > My question is related to using the 5V CPU with the 3.3V XC2S I/O Pins.
> In
> > a past design i've used a XC95288 (CPLD),  and had to use strong pull up
> > resistors (<1K to 5V) to meet the timing requirements of the bus, along
> with
> > floating the Xilinx I/O Pins to implement a bi-directional interface.
> >
> > In this new design its not suitable to use such a "dodgey" method of
> > interfacing.
> >
> > Can anyone suggest any possible solutions I could try?
> >
> > Thanks in advance.
> > Lockie.
> >
> >
>
>

Article: 62659
(removed)

Article: 62660
Subject: Re: Building the 'uber processor'
From: news@sulimma.de (Kolja Sulimma)
Date: 4 Nov 2003 04:49:25 -0800
Links: << >> << T >> << A >>

> Thanks
> 
> Just so I understand you,   if I want to "realise" my c code in a FPGA
> array,  I can upload the code, data and the processing array.  Run it and
> download the data?
Yes. But you are likely to spend a lot of effort designing the
processing array.

> The code (not actually mine I am just seeing if this is all possible) is
> basically applying an equation on a data set looping for all particles for
> each time step.  The tricky bit (in at least the programming sense) is to
> constantly calculate the relative positions of each particle to calculate
> their effect on each other.

I guess that if you post the equation (maybe a simplified version),
the precision you need and the number of elements in a typical data
set you will get a pretty good estimate from this group about how well
this can be solved in FPGAs.

Kolja Sulimma

Article: 62661
Subject: DCM recover after interruption of input clock
From: "wolfgang" <wolfgang.hofmann@arcs.ac.at>
Date: Tue, 4 Nov 2003 13:50:00 +0100
Links: << >> << T >> << A >>

hey!

i'm using a virtex2 device for implementing a lvds channel link receiver. To
recover the bitclock from the incoming clock- line, i use a dcm with
feedback on clk0.

constant lvds_clk_m             : integer := 7 ;
constant lvds_clk_d             : integer := 2 ;
constant lvds_phase_mode        : string  := "FIXED" ;
constant lvds_phase_value       : integer := 45 ;  -- phase shift value for
place and route
constant lvds_phase_value_udsim : integer := 45 ; -- phase shift value for
unit delay simulation

attribute CLKIN_PERIOD        of dcm_3_5_lvds_clk: label is "30" ;
attribute CLKOUT_PHASE_SHIFT    of dcm_3_5_lvds_clk: label is
lvds_phase_mode ;
attribute PHASE_SHIFT           of dcm_3_5_lvds_clk: label is
lvds_phase_value ;
attribute CLKFX_DIVIDE          of dcm_3_5_lvds_clk: label is lvds_clk_d ;
attribute CLKFX_MULTIPLY        of dcm_3_5_lvds_clk: label is lvds_clk_m ;
attribute DUTY_CYCLE_CORRECTION of dcm_3_5_lvds_clk: label is "TRUE" ;
attribute DFS_FREQUENCY_MODE    of dcm_3_5_lvds_clk: label is "LOW";


the incoming clock is a 33 MHz clk with 57% to 43% dutycycle.

when i disconnect the clock line and plug it in again, the dcm doesn't lock
again. i tried to reset the dcm at the falling edge of the locked pin to
ensure a defined startup, but this seems to have no effect.

now my question: what can i do, to reset the dcm after an interruption of
input clock clkin? is there a specific sequence of actions if have to
execute?

regards

wolfgang

Article: 62662
Subject: Re: Building the 'uber processor'
From: "kryten_droid" <kryten_droid@ntlworld.com>
Date: Tue, 4 Nov 2003 13:01:29 -0000
Links: << >> << T >> << A >>


"mikegw" <mikegw20@hotmail.spammers.must.die.com> wrote in message
news:bo5bfr$ad4$1@tomahawk.unsw.edu.au...
>
> Just so I understand you,
> if I want to "realise" my c code in a FPGA array,
> I can upload the code, data and the processing array.
> Run it and download the data?

No.

Short answer:

C/Pascal/etc compile to machine code instructions to run on a
general-purpose processor,
only one executes at a time.

VHDL/Verilog compile to a description of many specific-purpose hardware
processes,
all executing at once.


Longer answer:

Microprocessors execute a single conceptual process at a time.

In the real world there are many processes running concurrently.

Conventional micros and software require blocks of sequential instructions.

Occam was a language to describe processing in terms of communicating
sequential processes.
These could then be farmed out over multiple processors and done in
parallel.
The transputer was designed in tandem with occam, optimised for this
programming model and communication between processors.

My old tutor said that hardware engineers grasped these concepts much
faster, because they are already comfortable with thinking in terms of many
things happening at once in hardware. Software engineers had to unlearn
their usual sequential thinking.

In the past, the general-purpose microprocessor was a great alternative to
single-purpose machines.
The latter could be much faster but took ages do design and build and
modify.

FPGA chips change that balance of power.

Like occam, VHDL and Verilog allow you to describe processing in terms of
communicating sequential processes
(occam has been used as a hardware description language).

However, instead of creating machine-code instructions to perform a process,
they create descriptions of hardware to do all these processes. The 'fitter'
then fits the design into particular makes of FPGA.

I can see that conventional programmers would love to be able to just chuck
their old C programs into an FPGA and have it run faster, but I feel this is
not sensible (although Handel C seems to be trying it). No pain, no gain.

I didn't find VHDL all that hard to pick up. In fact it is quite liberating
to throw off the shackles of conventional software design. Instead of
getting a single micro to rapidly poll, process and toggle dozens of
real-time inputs and outputs, I can now simply declare dozens of independent
hardware processors.

Benefits depend on the problem you want to solve. You can beat
microprocessors easily at some tasks but not others. Ideal tasks are simple
and easily scaled up, like a systolic processor for finding matches in DNA
sequences, or sifting keys for the enigma machine. The wartime machine
weighed tons, used kilowatts, and clocked at 5 kHz. It would beat many
modern chips, which shows the advantage of customised hardware. You might be
able to make an equivalent weighing grammes, using milliwatts, and clocked
at 50 MHz! I wonder if the government kept the 95% of Enigma messages that
they didn't have time to crack? I'm sure military historians would be
interested in the contents...

Article: 62663
Subject: Re: Shannon Entropy for Black Holes
From: "John Smith" <someone@microsoft.com>
Date: Tue, 4 Nov 2003 13:16:10 -0000
Links: << >> << T >> << A >>


"Jerry Avins" <jya@ieee.org> wrote in message
news:bo69ks$a3e$1@bob.news.rcn.net...
> John Smith wrote:
>
> > "Kevin Neilson" <kevin_neilson@removethiscomcast.net> wrote in message
> > news:ZzCob.71991$Fm2.57178@attbi_s04...
> >
> >>I read an article in "Scientific American" about how much information
can
> >
> > be
> >
> >>compressed into a certain volume, and apparently all objects have a
> >
> > Shannon
> >
> >>entropy in addition to the thermodynamic entropy.  Also, black holes
have
> >
> > a
> >
> >>Shannon entropy that is based on the surface area of the event horizon.
I
> >>was totally lost.   Can anybody else explain how Shannon's information
> >>theory applies to black holes?
> >>-Kevin
> >>
> >>
> >
> >
> > For the ignorant (me): what it Entropy?
> >
> > Rich
> >
> http://www.2ndlaw.com/ will be a good start. Note that if S is entropy, q
> the amount of heat -- BTU, Calories -- and T absolute temperature,
>                             S = Integral(dQ/T).
> Simplifying: heat, like water, runs downhill, and unless something like a
> waterwheel or a heat engine extracts energy when it does, some of what
> had been available energy is permanently lost. The water or heat is all
> still there, and so is the energy -- just not available. Lost available
> energy shows up as increased entropy.
>
> Two Laws of Thermodynamics have been stated thus:
>
> You can't get something for nothing. Water had to be pumped up before it
> ran down to turn the wheel.
>
> You can't even break even. (The second law is about entropy.) Because of
> inevitable inefficiencies -- friction or moving heat across a temperature
> gradient, entropy will increase, and you won't get all of the energy out.
>
> Let's leave the Third Law for some other time.
>
> Jerry
> -- 
> Engineering is the art of making what you want from things you can get.
> ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
>

Aggghhhh.... I'm gonna be on 2ndlaw.com, and its sister, I may be a while...

Thanks for the answers guys. 'Lost' energy (non-recoverable energy) is my
summary. Correct?

Thanks again
JS

Article: 62664
Subject: Re: Building the 'uber processor'
From: "Ron Huizen" <rhuizen@bittware.com>
Date: Tue, 4 Nov 2003 08:59:43 -0500
Links: << >> << T >> << A >>

To add more support to the IO bandwidth being one of the major issues, one
thing that I see often getting overlooked when people start clustering
machines with regular networking is the overhead of just running the network
connections.  There was an interesting article in EE Times sometime last
year (don't recall which one) showing how much of a GHz Pentium it took just
to run a 1 Gb Ethernet connection.  If I recall, it was on the order of 50%
of the processor, assuming of course you were keeping the Ether busy.  Of
course, just plugging in PCI boards has the same issue if all the data has
to move on the PCI bus, as the bus itself becomes the bottleneck.

If you are serious about building a monster machine out of multiple
processors, don't overlook the data movement aspect.

Now, it just so happens that the architectures we use on our boards have IO
capabilities that scale with system size, and that isn't a coincidence, as
our customers build large multiprocessor systems out of them.  The
underlying support for this is inherent in the SHARC and TigerSHARC
processors from Analog Devices, which have a built in IO Processor for
moving data into and out of the DSP's large internal memory so the core can
number crunch while data movement happens in the background.  These DSPs
also have multiple high speed point to point interconnects called link ports
(the TigerSHARC 101S has four 250 MByte/sec links as well as its 64 bit 100
MHz external bus) which can be used for shipping data around. We also use
large FPGAs and connect them to the DSPs using these links for the data
flows.

While some will argue that the best approach is a bunch of GHz PCs, and
others will say use traditional DSP, and yet more will say FPGA, there is no
one magic approach that applies to all systems.  Usually some combination of
these processing types will get the job done, it's a matter of deciding
which parts of your system are better served by which. And this of course is
dependent on the type of number crunching you need and the associated data
movement requirements.

-----
Ron Huizen
BittWare

"John Williams" <jwilliams@itee.uq.edu.au> wrote in message
news:bo6l2c$u39$1@bunyip.cc.uq.edu.au...
> Hi Mike,
>
> mikegw wrote:
> > a) Will a FPGA co-processor board(s) offer a speed improvement in
running
> > our simulation jobs over using a 'traditional' cluster (mosix/Bewoulf)?
> > Bearing in mind that ours will be the only job on the machine so can we
> > reconfigure our FPGA boards to speed calculation?
>
> To parallel what Jon said earlier - the biggest gotcha that seems to
> bite people is IO bandwidth.  It's not necessarily hard to develop
> highly pipelined FPGA designs that will crunch your numbers at 100M
> sample/sec, but can you keep it busy?
>
> I read of an interesting approach a while ago - do a search for
> Pilchard, it's an FPGA coprocessor board developed at a Hong Kong
> university.  Basically it fits in the standard PC memory module form
> factor, with custom Linux drivers to access it.  The bandwidth on the
> memory bus is much greater than on PCI.
>
> Regards,
>
> John
>

Article: 62665
Subject: Re: DCM recover after interruption of input clock
From: "Alvin Andries" <Alvin_Andries.dontusethispart@nowhere.agilent.remove_this_too.com>
Date: Tue, 4 Nov 2003 16:05:18 +0100
Links: << >> << T >> << A >>


"wolfgang" <wolfgang.hofmann@arcs.ac.at> wrote in message
news:bo879p$dip$1@newsreader1.netway.at...
> hey!
>
> i'm using a virtex2 device for implementing a lvds channel link receiver.
To
> recover the bitclock from the incoming clock- line, i use a dcm with
> feedback on clk0.
>
> constant lvds_clk_m             : integer := 7 ;
> constant lvds_clk_d             : integer := 2 ;
> constant lvds_phase_mode        : string  := "FIXED" ;
> constant lvds_phase_value       : integer := 45 ;  -- phase shift value
for
> place and route
> constant lvds_phase_value_udsim : integer := 45 ; -- phase shift value for
> unit delay simulation
>
> attribute CLKIN_PERIOD        of dcm_3_5_lvds_clk: label is "30" ;
> attribute CLKOUT_PHASE_SHIFT    of dcm_3_5_lvds_clk: label is
> lvds_phase_mode ;
> attribute PHASE_SHIFT           of dcm_3_5_lvds_clk: label is
> lvds_phase_value ;
> attribute CLKFX_DIVIDE          of dcm_3_5_lvds_clk: label is lvds_clk_d ;
> attribute CLKFX_MULTIPLY        of dcm_3_5_lvds_clk: label is lvds_clk_m ;
> attribute DUTY_CYCLE_CORRECTION of dcm_3_5_lvds_clk: label is "TRUE" ;
> attribute DFS_FREQUENCY_MODE    of dcm_3_5_lvds_clk: label is "LOW";
>
>
> the incoming clock is a 33 MHz clk with 57% to 43% dutycycle.
>
> when i disconnect the clock line and plug it in again, the dcm doesn't
lock
> again. i tried to reset the dcm at the falling edge of the locked pin to
> ensure a defined startup, but this seems to have no effect.
>
> now my question: what can i do, to reset the dcm after an interruption of
> input clock clkin? is there a specific sequence of actions if have to
> execute?
>
> regards
>
> wolfgang
>

Hi,

The point with the DCMs is that you must reset them when the clock returns
or it will fail to lock again. To make life a bit more fun: the lock pin
won't be giving a valid signal when the clock is removed! So no way to use
it for detecting a gone clock. My solution was a small module detecting the
working of the clock. The only constraint to this approach is that you need
a clock that's garanteed to work all the time!

Summary:
- Clock works
- Clock dies
- Clock returns
- Reset DCM
- Wait for "locked" signal to become active
- All is running fine again now

Regards,
Alvin.

Article: 62666
Subject: Re: Xilinx - Multi Volt Interfacing
From: "Nial Stewart" <nial@spamno.nialstewart.co.uk>
Date: Tue, 4 Nov 2003 15:14:45 -0000
Links: << >> << T >> << A >>


Lockie <biglockie@hotmail.com> wrote in message
news:3fa78ae6@dnews.tpgi.com.au...
> Sandeep,
> Thanks for your information,
>
> Firstly i forgot to mention the device is a XC95288XL, and my reason for
not
> wanting to use a strong pull-up resistor was the amount of heating caused
to
> the device. (which im advised is well within tollerance).
>
> Secondly, i seem to have some conflicting information with relation to the
> 5V issue.  I received a link to the virtex information on Xilinx's web
site,
> this makes mention of the device driving a 5V Load, and accepting a 5V
> input, but not in both directions.
> It is unfortunate but most of the devices in my system are 5V and require
> bi-directional support (CPU, LCD Panel, RTC, etc).
> I use about 120 of the I/O's for 5V interfacing, and if i use a 1K pull up
> resistor (so as i can still meet timing requirements), i find myself using
> 600mA+ of current just for the I.O's.  I know that each pin is rated to
> 10mA, which is what i base my calc's on, but don't you think that seems a
> little excessive.??
> Im looking now into the interfacing devices,
> Once again , thanks for you help.
> Lockie.

What are the Vih levels of your 5V devices?

If they are all < 3V you could probably use IDT's quickswitch devices.
These are bidirectional voltage clamps which will restrict the
voltage to your FPGA dependant on the bias level on the control
pins.

There's an app note, AN11 from memory, on the IDT site which explains
their use.

Hope this helps,

Nial Stewart


------------------------------------------------
Nial Stewart Developments Ltd
FPGA and High Speed Digital Design
www.nialstewartdevelopments.co.uk

Article: 62667
Subject: Re: DCM recover after interruption of input clock
From: Austin Lesea <Austin.Lesea@xilinx.com>
Date: Tue, 04 Nov 2003 07:40:01 -0800
Links: << >> << T >> << A >>

Wolfgang,

There is a status bit, CLKIN_STOPPED that is there just for these events.  If
you have LOCKED go low, OR CLKIN_STOPPED go high, you will need to reset.

Austin

wolfgang wrote:

> hey!
>
> i'm using a virtex2 device for implementing a lvds channel link receiver. To
> recover the bitclock from the incoming clock- line, i use a dcm with
> feedback on clk0.
>
> constant lvds_clk_m             : integer := 7 ;
> constant lvds_clk_d             : integer := 2 ;
> constant lvds_phase_mode        : string  := "FIXED" ;
> constant lvds_phase_value       : integer := 45 ;  -- phase shift value for
> place and route
> constant lvds_phase_value_udsim : integer := 45 ; -- phase shift value for
> unit delay simulation
>
> attribute CLKIN_PERIOD        of dcm_3_5_lvds_clk: label is "30" ;
> attribute CLKOUT_PHASE_SHIFT    of dcm_3_5_lvds_clk: label is
> lvds_phase_mode ;
> attribute PHASE_SHIFT           of dcm_3_5_lvds_clk: label is
> lvds_phase_value ;
> attribute CLKFX_DIVIDE          of dcm_3_5_lvds_clk: label is lvds_clk_d ;
> attribute CLKFX_MULTIPLY        of dcm_3_5_lvds_clk: label is lvds_clk_m ;
> attribute DUTY_CYCLE_CORRECTION of dcm_3_5_lvds_clk: label is "TRUE" ;
> attribute DFS_FREQUENCY_MODE    of dcm_3_5_lvds_clk: label is "LOW";
>
> the incoming clock is a 33 MHz clk with 57% to 43% dutycycle.
>
> when i disconnect the clock line and plug it in again, the dcm doesn't lock
> again. i tried to reset the dcm at the falling edge of the locked pin to
> ensure a defined startup, but this seems to have no effect.
>
> now my question: what can i do, to reset the dcm after an interruption of
> input clock clkin? is there a specific sequence of actions if have to
> execute?
>
> regards
>
> wolfgang

Article: 62668
Subject: Re: Shannon Entropy for Black Holes
From: Austin Lesea <Austin.Lesea@xilinx.com>
Date: Tue, 04 Nov 2003 07:42:32 -0800
Links: << >> << T >> << A >>

John,

Nope.  1st law says that energy is conserved.  Can not lose it.

Austin


John Smith wrote:

> "Jerry Avins" <jya@ieee.org> wrote in message
> news:bo69ks$a3e$1@bob.news.rcn.net...
> > John Smith wrote:
> >
> > > "Kevin Neilson" <kevin_neilson@removethiscomcast.net> wrote in message
> > > news:ZzCob.71991$Fm2.57178@attbi_s04...
> > >
> > >>I read an article in "Scientific American" about how much information
> can
> > >
> > > be
> > >
> > >>compressed into a certain volume, and apparently all objects have a
> > >
> > > Shannon
> > >
> > >>entropy in addition to the thermodynamic entropy.  Also, black holes
> have
> > >
> > > a
> > >
> > >>Shannon entropy that is based on the surface area of the event horizon.
> I
> > >>was totally lost.   Can anybody else explain how Shannon's information
> > >>theory applies to black holes?
> > >>-Kevin
> > >>
> > >>
> > >
> > >
> > > For the ignorant (me): what it Entropy?
> > >
> > > Rich
> > >
> > http://www.2ndlaw.com/ will be a good start. Note that if S is entropy, q
> > the amount of heat -- BTU, Calories -- and T absolute temperature,
> >                             S = Integral(dQ/T).
> > Simplifying: heat, like water, runs downhill, and unless something like a
> > waterwheel or a heat engine extracts energy when it does, some of what
> > had been available energy is permanently lost. The water or heat is all
> > still there, and so is the energy -- just not available. Lost available
> > energy shows up as increased entropy.
> >
> > Two Laws of Thermodynamics have been stated thus:
> >
> > You can't get something for nothing. Water had to be pumped up before it
> > ran down to turn the wheel.
> >
> > You can't even break even. (The second law is about entropy.) Because of
> > inevitable inefficiencies -- friction or moving heat across a temperature
> > gradient, entropy will increase, and you won't get all of the energy out.
> >
> > Let's leave the Third Law for some other time.
> >
> > Jerry
> > --
> > Engineering is the art of making what you want from things you can get.
> > ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
> >
>
> Aggghhhh.... I'm gonna be on 2ndlaw.com, and its sister, I may be a while...
>
> Thanks for the answers guys. 'Lost' energy (non-recoverable energy) is my
> summary. Correct?
>
> Thanks again
> JS

Article: 62669
Subject: Re: Vendor supplied symbol/part models?
From: Leon Heller <aqzf13@dsl.pipex.com>
Date: Tue, 04 Nov 2003 16:42:32 +0000
Links: << >> << T >> << A >>

John Williams wrote:

> Hi folks,
> 
> Doing a board design with a 456 pin Xilinx FPGA, I find myself in the 
> laborious and potentially error-prone process of building a symbol, 
> footprint and part model from scratch.   I am aware that commercial part 
> libraries are available, but we are a university department and don't 
> have those sort of $$$ to throw around for small-run custom designs.
> 
> Anyway it seems to me that it would be in vendors' interests (Xilinx in 
> this case) to provide verified symbol and footprint models for major 
> design tools (Mentor, Protel etc)?  A quick search of the Xilinx web 
> site didn't turn up anything.
> 
> Is there some point I'm missing here, or are my expectations unreasonable?

Xilinx has Excel spreadsheets with pinouts for many of their chips; they 
help a lot.

The Pulsonix software I use has a part generator import facility that 
removes a lot of the hard work when working with large chips. I've 
written a Perl script that generates an input file from text extracted 
from a PDF file into an Excel spreadsheet, this saves even more time.

Leon

Leon

Article: 62670
Subject: Re: Xilinx - Multi Volt Interfacing
From: Peter Alfke <peter@xilinx.com>
Date: Tue, 04 Nov 2003 10:03:15 -0800
Links: << >> << T >> << A >>

Regarding 95288XL:

Input:
The pins are 5-V tolerant, so you can drive a full-swing 5 V signal into
the input.

Output:
When used as an output, it obviously only drives up to the 3.3 V rail.
If your 5-V device uses "TTL" thresholds, it only requires a Vih of 2.4
V, so there is no problem. If your 5-V device has "CMOS" thresholds, you
need a pull-up resistor, and you also need to 3-state the CPLD outputs
(otherwise the pull-up transistor will conduct current backwards and
clamp the pin to Vcc anyhow.)

You may want to read about a circuit trick that speeds up the pull-up,
and has been very successful in FPGAs. This is from TechXclusives "Six
Easy Pieces" on the Xilinx website, where you also find the simple
schematic :

" 5. Driving a 5V Signal from a 3.3V Output

When a CMOS-level 5V input is driven, the output High voltage from a
3.3V device is marginal. If the 3.3V output is 5V tolerant, a pull-up
resistor to 5V can pull the output that is in a 3-state condition all
the way to 5V. The problem is the slow rise time of tens or hundreds of
nanoseconds, which is caused by the capacitive load. This circuit
greatly reduces the rise time by keeping the active pull-up engaged
until the output voltage has passed the threshold voltage of ~1.6V.
Slowing down the internal input signal and 2-input AND gate will speed
up the rise time even more. "

I have never implemented this in a CPLD, but it might work there also.

Peter Alfke

> 
>

Article: 62671
Subject: I/O on current FPGAs - deserialise first ??
From: Thomas Womack <twomack@chiark.greenend.org.uk>
Date: 04 Nov 2003 18:20:15 +0000 (GMT)
Links: << >> << T >> << A >>

Two common peripheral interfaces are Firewire (400Mbps or 800Mbps) and
USB2 (480Mbps).  These are serial, so you've got incredibly high bit
rates on the incoming pins; significantly higher than the clock rates
of reasonable FPGAs.

Do there exist chips to convert an 800Mbps serial stream to a 50MHz
stream of 16-bit words, and what are they called? I imagine it's not
impractical to hook a couple of those and a couple of SRAMs to a
single FPGA, stream in the signal and then read out little bits of
it if you need to look in the stream for control signals.

The proposed application is wire-speed video capture to a Firewire
disc from a Firewire or USB2 camera, using an FPGA in the middle to do
the trivial things like dark-frame subtraction.

Tom

Article: 62672
Subject: Silly ML300 question...
From: nweaver@ribbit.CS.Berkeley.EDU (Nicholas C. Weaver)
Date: Tue, 4 Nov 2003 18:42:45 +0000 (UTC)
Links: << >> << T >> << A >>

Does someone already have a nice skeleton project file for the ML300
demoboard that has all the pins assigned/IO Banks set, but nothing on
the inside?

-- 
Nicholas C. Weaver                                 nweaver@cs.berkeley.edu

Article: 62673
Subject: help with 120MHz comparator
From: eastwood132@yahoo.com (Ted Lechman)
Date: 4 Nov 2003 11:04:05 -0800
Links: << >> << T >> << A >>

I'm trying to convert a 120MHZ 1V p-p ac coupled sinewave to a LVTTL
(3.3V) format for use as a 120MHz clock signal to an FPGA.
1. PECL Comparator - I'm trying to avoid using PECL Comparators,
because I would have to convert the PECL output  to a LVTTL swing,
which just brings me back around to the original problem.
2. "normal" comparators - I've looked through Linear Tech and other 
places, and the max spec is 100MHz. Do you know of any LVTTL
comparators with specs closer to 120MHZ??????????
3. Transistor. There are many high frequency RF transistor with
excellent small signal gains around their bias points but lousy large
signal - DO you know of any discrete transistors that will switch
fully at 120MHz rates??
Thanks

Article: 62674
Subject: Re: I/O on current FPGAs - deserialise first ??
From: Peter Alfke <peter@xilinx.com>
Date: Tue, 04 Nov 2003 11:12:05 -0800
Links: << >> << T >> << A >>

You can use the Mult-Gigabit Transceivers (MGTs) on any of the
Virtex-IIPro devices. They provide/accept LVDs signals between 622 Mbps
and 3.125 Gbbps.
What is your method for insuring input transitions? The MGTs perform
transparent 8B10B as an option, but if you do not like that, you must
somehow guarantee input transitions for data recovery. There must be
many alternatives at your relatively slow rate.
Peter Alfke
====================
Thomas Womack wrote:
> 
> Two common peripheral interfaces are Firewire (400Mbps or 800Mbps) and
> USB2 (480Mbps).  These are serial, so you've got incredibly high bit
> rates on the incoming pins; significantly higher than the clock rates
> of reasonable FPGAs.
> 
> Do there exist chips to convert an 800Mbps serial stream to a 50MHz
> stream of 16-bit words, and what are they called? I imagine it's not
> impractical to hook a couple of those and a couple of SRAMs to a
> single FPGA, stream in the signal and then read out little bits of
> it if you need to look in the stream for control signals.
> 
> The proposed application is wire-speed video capture to a Firewire
> disc from a Firewire or USB2 camera, using an FPGA in the middle to do
> the trivial things like dark-frame subtraction.
> 
> Tom

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search