Messages from 7700

Article: 7700
Subject: Re: bidirectional bus problem
From: z80@ds.com (Peter)
Date: Sat, 04 Oct 1997 08:05:43 GMT
Links: << >> << T >> << A >>


>It's certainly possible: heck mere telephones accomplish this.  A telephone
>is a two-wire device (there is no seperate ground) and is bidirectional- yet
>there are still such things as telephone line repeaters that don't latch up
>or oscillate.  Why can't your company's FPGAs do it?  They must be inferior
>products :-) :-) :-)

:)

A telephone is an *analog* device, which uses signal cancellation to
do it.

FPGAs have a fundamental limitation here, in that all I/O is buffered.
The only way to do a straight bus which goes right inside the chip is
an ASIC, and even then it is hard - most ASIC vendors don't like doing
it.


Peter.

Return address is invalid to help stop junk mail.
E-mail replies to z80@digiXYZserve.com but
remove the XYZ.

Article: 7701
Subject: Re: Xilinx license idiocy
From: NgOsSmPiAtMh@passport.ca (Gregory Smith)
Date: Sat, 04 Oct 1997 15:46:32 GMT
Links: << >> << T >> << A >>

In article <3432a2a2.111139@news.netcomuk.co.uk>, z80@ds.com (Peter) wrote:
>I congratulate you on posting this information.
>
>Xilinx seem to forget theit software is already dongled in a
>completely un-hackable way: a Xilinx FPGA/CPLD.
>
>Why dongle on top of that?
>
>A Xilinx support engineer told me once it is done to limit their tech
>support workload. But Xilinx have often said that software profits are
>only a very small part of their overall profits. In that situation, it
>is sound business to give the stuff away FOR FREE.

The 'free software' argument makes a lot of sense. However,
I have found -- mostly -- that free software tends to be awful. I got
free PALASM from AMD once and it was a waste of 3 hours to
install it, try it, and remove it. I think that the decision-makers
can't justify big development $ for something that they're not
going to charge for directly. That's the root of the problem.

Another question - why not give away the software free, and
then charge to make support available? unfortunately, with the
way software is these days, those who are calling in for support
are to some extent doing the beta testing... I hate paying
for the privilege of being able to report bugs, but I hate even
more paying  *extra* for it... I still think this is a good approach
though, and very easily regulated: post the stuff on the Web.
Post the bug reports in an effective search engine.
Charge for hotline support, under a number of plans; credit the
charges if a bug was found which had not been posted.

Greg

Article: 7702
Subject: Re: Xilinx license idiocy
From: "Richard B. Katz" <stellere_nospam@erols.com>
Date: 4 Oct 1997 17:40:21 GMT
Links: << >> << T >> << A >>

Gregory Smith <NgOsSmPiAtMh@passport.ca> wrote in article
<615rcr$2tt$1@reader1.ftn.net>...
> In article <3432a2a2.111139@news.netcomuk.co.uk>, z80@ds.com (Peter)
wrote:

<snip>

> >A Xilinx support engineer told me once it is done to limit their tech
> >support workload. But Xilinx have often said that software profits are
> >only a very small part of their overall profits. In that situation, it
> >is sound business to give the stuff away FOR FREE.

intesting.  but what percentage of their r&d budget is it?  if it is
significant and is 'breaking even' than it might not be sound business to
give it away free as i'm sure the software guys enjoy getting paid :-). 
unless, of course, it attracts more customers who can't pay for it and they
make more profits elsewhere (henry position).

> 
> The 'free software' argument makes a lot of sense. However,
> I have found -- mostly -- that free software tends to be awful. I got
> free PALASM from AMD once and it was a waste of 3 hours to
> install it, try it, and remove it. I think that the decision-makers
> can't justify big development $ for something that they're not
> going to charge for directly. That's the root of the problem.
> 
> Another question - why not give away the software free, and
> then charge to make support available? unfortunately, with the
> way software is these days, those who are calling in for support
> are to some extent doing the beta testing... I hate paying
> for the privilege of being able to report bugs, but I hate even
> more paying  *extra* for it... I still think this is a good approach
> though, and very easily regulated: post the stuff on the Web.
> Post the bug reports in an effective search engine.
> Charge for hotline support, under a number of plans; credit the
> charges if a bug was found which had not been posted.
> 
> Greg
> 

i would go a step further; rather than just post the bug reports, QUICKLY
post patches on the internet for downloading; i know that viewlogic does
this to some extent.  and along with each patch, document what the bugs are
that are being fixed.  it seems that a lot of software releases are fairly
major with new features and involve major shipments (cutting cd's, printing
documents, packing in boxes, shipping, etc.).  using the www or ftp sites,
cae vendors would be able to get out maintenance releases quickly and
cheaply and save engineers using the stuff hassles and not force them to
wait 6-9 months for the next release.

for support, i've seen some companies charge per call, per hour, etc.  i'd
rather pay a flat fee for support say once per year.  this makes it easier
to budget and implement in companies.  for instance, some organizations
have to right a purchase request and find money and get it signed off by
10,000 managers.  i don't think anyone wants to do that when they're
calling for support because at that time, by definition, they are usually
in trouble.

-------------------------------------------------------------
rk

"there's nothing like real data to screw up a great theory" 
- me (modified from original, slightly more colorful version)
--------------------------------------------------------------

Article: 7703
Subject: Altera MAX+PLUS 2 timing backannotation problem
From: muzok@pacbell.net (muzo)
Date: Sat, 04 Oct 1997 20:23:47 GMT
Links: << >> << T >> << A >>

I've synthesised my microcontroler in verilog  using Exemplar Leonardo and P&R'ed
in Max+plus 2 8.0 and generated .vo file for back-annotation. The netlist
simulates over 25 MHz in Silos but the real 10K100 runs only at 15Mhz. Are the
timing values in Altera .vo files dependable ? I compiled for the fastest part
and using the same speed chip but still seeing differences between the netlist
and silicon in terms of timing. Any ideas ?

thanks

muzo

WDM & NT Kernel Driver Development Consulting <muzok@pacbell.net>

Article: 7704
Subject: Re: Xilinx license idiocy
From: z80@ds.com (Peter)
Date: Sun, 05 Oct 1997 08:30:52 GMT
Links: << >> << T >> << A >>


>The 'free software' argument makes a lot of sense. However,
>I have found -- mostly -- that free software tends to be awful. I got
>free PALASM from AMD once and it was a waste of 3 hours to
>install it, try it, and remove it. I think that the decision-makers
>can't justify big development $ for something that they're not
>going to charge for directly. That's the root of the problem.

That's because PALASM is not AMD-specific. It can be used for almost
any old PAL/GAL.

There is no motivation.

Peter.

Return address is invalid to help stop junk mail.
E-mail replies to z80@digiXYZserve.com but
remove the XYZ.

Article: 7705
Subject: Re: Xilinx license idiocy
From: z80@ds.com (Peter)
Date: Sun, 05 Oct 1997 08:30:53 GMT
Links: << >> << T >> << A >>


>i would go a step further; rather than just post the bug reports, QUICKLY
>post patches on the internet for downloading; i know that viewlogic does
>this to some extent.  and along with each patch, document what the bugs are
>that are being fixed.  

Very few s/w firms do this - no idea why. It would be SO easy.



Peter.

Return address is invalid to help stop junk mail.
E-mail replies to z80@digiXYZserve.com but
remove the XYZ.

Article: 7706
Subject: Xilinx xc9500 JTAG programming.
From: Bill Lenihan <lenihan3we@earthlink.net>
Date: Mon, 06 Oct 1997 00:16:17 -0700
Links: << >> << T >> << A >>

Has anyone been running into problems programming Xilinx 95108 or 95216
Flash CPLDs? We are using the Xilinx PC-serial-port-based XChecker cable
and their EZTAG software to download JEDEC files via the JTAG interface
to CPLDs already soldered to the PWB, and we've run into many problems.
Specifically, cases where you can program the device once, but not
thereafter? (No, we aren't enabling the design protection or design
security features.)

The integrity of the JTAG interface seems OK, and we can almost always
read the Device ID, so the chip is being recognized by the software. We
are starting to suspect that the CPLDs' non-JTAG I/O must be held in
certain static state and/or not dynamically changed while the JTAG
program operation is being performed (i.e., other chips or module I/O
connected to the CPLD must not be toggling).

Your xc95xxx CPLD programming problems/solutions are hereby solicited.

-- 
=====================================================================
William Lenihan                            lenihan3we@earthlink.net

    "The greatest barrier to communication is the delusion that
     it has already occurred."       -- Peter Cummings
=====================================================================

Article: 7707
Subject: Re: FPGA multiprocessors
From: Charles Sweeney <CharlesSweeney@compuserve.com>
Date: Mon, 06 Oct 1997 09:56:54 +0100
Links: << >> << T >> << A >>

Jan Gray wrote:
> 
> This just in from our paper designs department: the XC4062XL and XC4085XL
> are sooo big...
> 
> The J32 (www3.sympatico.ca/jsgray/homebrew.htm) (a 32-bit RISC in half a
> XC4010) processor's datapath, if redesigned for XC4000XL, should fit nicely
> in 16 rows by 8-9 columns of CLBs.  This got me thinking:
> 
> 16x9    datapath
> 16x5    (guess) control logic
> 16x6    16-entry by 4-word-line instruction cache
> 16x2    page mode DRAM controller (also reqs. 40-50 IOBs)
> ----------------------------------
> 16x22   integrated 32-bit RISC processor (32-bit instructions)
> 8x22    integrated 16-bit RISC processor (16-bit instructions)
> 
> Assuming careful floorplanning, it should be possible to place six 32-bit
> processor tiles, or twelve 16-bit processor tiles, in a single 56x56
> XC4085XL with space left over for interprocessor interconnect.  Also the
> number of processor tiles can be doubled if we eschew the I-cache and
> simplify the microarchitecture -- though performance would greatly suffer.
> 
> Jan Gray
> Redmond, WA

It's good to see you planning to take advantage of the parallelism
offered by FPGAs, but why constrain your software to have to run in a
particular microprocessor architecture? why not go further and compile
your programs directly into the hardware of the FPGA, Handel-C does
exactly that, please see our web site below.

Charles
-- 

Charles Sweeney, Engineering Director, Embedded Solutions Ltd
Tel/fax +44 1235 510456   <http://www.embedded-solutions.ltd.uk/>
Email CharlesSweeney@compuserve.com or
csweeney@embedded-solutions.ltd.uk
6 Main Road, East Hagbourne, Didcot, Oxfordshire. OX11 9LJ. UK.

Article: 7708
Subject: Re: Wanted: cheap way to learn VHDL
From: "Steven K. Knapp" <sknapp @ optimagic.com>
Date: 6 Oct 1997 16:21:53 GMT
Links: << >> << T >> << A >>

There are a few tutorials listed on The Programmable Logic Jump Station at
'http://www.optimagic.com/tutorials.html'.
-- 
Steven Knapp
OptiMagic, Inc.
E-mail:  sknapp @ optimagic.com
Programmable Logic Jump Station:  http://www.optimagic.com

Brad Eckert <brad4ellie@worldnet.att.net> wrote in article
<60vg69$g9d@bgtnsc02.worldnet.att.net>...
| What's the cheapest way for me to teach myself VHDL?  What are the free
| resources on the net?
| 
| -- Brad Eckert
|

Article: 7709
Subject: Re: FPGA multiprocessors
From: Jack Greenbaum <spamfilt@greenbaum.us.com>
Date: 06 Oct 1997 09:50:51 -0700
Links: << >> << T >> << A >>

"Jan Gray" <jsgray@acm.org.nospam> writes:
> Assuming careful floorplanning, it should be possible to place six 32-bit
> processor tiles, or twelve 16-bit processor tiles, in a single 56x56
> XC4085XL with space left over for interprocessor interconnect. 

You might be interested in another view of single chip multiprocessors.

Patt, et. al. "One Billion Transistors, One Uniprocessor, One Chip",
IEEE Computer, Sept 1997, pp 51-57.

They argue against multiple processors on a single chip, because it
makes what is already an I/O bound system even worse. Just because you
can put multiple processors on a dies doesn't mean you can feed them
instructions and data.

Jack Greenbaum -- jack at greenbaum.us.com

Article: 7710
Subject: Re: bidirectional bus problem
From: Peter Alfke <peter@xilinx.com>
Date: Mon, 06 Oct 1997 10:33:18 -0700
Links: << >> << T >> << A >>

Joseph,
were you trying to be funny ?

Yes, the analog phone uses only two wires for a bidirectional
conversation, and signaling, and detecting Busy, and metering coins, and
releasing coins, and ( in Europe ) indicating the cost of the call.
These are wonderful things our great-great-grandfathers invented in the
analog realm. It gets much more complicated when you throw in gain.
Whenever there is a repeating amplifier in the telephone circuit, the
two directions must first be separated. When you have a long-distance
connection, there really are two channels, one from A to B, the other
one from B to A. And they are totally separate, carried by microwaves,
satellites, or glass fibres. They only meet again in the Central Offices
serving the two end points.

Making a digital connection with inserted amplifiers truly bidirectional
without detecting the direction of data flow and turning selected
amplifier off, is, IMHO, impossible.

But there is always the Nobel prize...

Peter Alfke

Article: 7711
Subject: Re: bidirectional bus problem
From: jhallen@world.std.com (Joseph H Allen)
Date: Mon, 6 Oct 1997 19:48:15 GMT
Links: << >> << T >> << A >>

In article <3435f2b0.217213456@news.netcomuk.co.uk>, Peter <z80@ds.com> wrote:

>>It's certainly possible: heck mere telephones accomplish this.  A
>>telephone is a two-wire device (there is no seperate ground) and is
>>bidirectional- yet there are still such things as telephone line repeaters
>>that don't latch up or oscillate.  Why can't your company's FPGAs do it? 
>>They must be inferior products :-) :-) :-)

>A telephone is an *analog* device, which uses signal cancellation to
>do it.

There's no reason this can't be done in the digital world- in fact it works
better because in the digital world you have noise margins.  All you need is
a few resistors:

Tx --+--|>-----+------R------+
     |         |             |
     |         R             |
     |         |             |
     |         +-a         b-+--|>-- Rx
     |         |             |
     |         R             |
     |         |             |
     +--|>o----+----+  +-----+
                    |  |
                    |  |
                    line
                To other end
          (duplicate above circuit)

Where: Tx is data to transmit
       Rx is received data
       |>   is a non-inverting buffer
       |>o  is an inverting buffer
       R    a resistor.  All Rs are of the same value

       All buffers must be CMOS.

       An improved circuit can be made by replacing the receive buffer with
       a shmitt trigger, or better with a differential receiver with
       hysteresis (such as an RS-422 line receiver) between points a and b.

       The circuit works best when R is chosen to match the impedance of the
       line (100ohms for twisted pair).

* Note that if you don't connect the line, the bridge will be unbalanced and
  Rx will receive whatever is sent on Tx.  This can be useful.

An interesting possibility for a network is to for each node in the network
to have two of the above circuits, A and B, where A's Rx is conencted to B's
Tx.  B's Rx goes to the network controller, which passes any packet not
destined for the node as well as any packet originating from the node to A's
Tx.  The network controller should drop any received packets which were sent
by the node.

To complete the network between three nodes (1, 2 and 3) say, you just
string them together: n.c.---A1B-----A2B-----A3B---n.c.

Because of *, the signal bounces at the unconnected ends and you get a ring
network without an actual physical ring.

I tried this once with Rs-422 line driver and receivers.  It works fine.

-- 
/*  jhallen@world.std.com (192.74.137.5) */               /* Joseph H. Allen */
int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0)
+r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p<1659?79:0:p>158?-79:0,q?!a[p+q*2
]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}

Article: 7712
Subject: Re: High Speed FPGAs
From: gah@u.washington.edu (G. Herrmannsfeldt)
Date: 6 Oct 1997 22:51:31 GMT
Links: << >> << T >> << A >>

I would guess that an external PLL made out of whatever can run
that fast, and then a shift register that can run that fast and collect
enough bits so that slower logic can complete the task.

For example, a three bit shift register that can shift at 1GHz, followed
by a latch that can latch its output at 333.333MHz and feed the rest of
the logic at that frequency.  Why three?  To make minimize the cost of
higher speed parts, and show that the high speed section doesn't have to
follow byte boundaries.  It may be that a different divisor optimizes the
cost, you can decide that yourself.

I do believe that you need a good analog PLL, though, to do this.

Then again, I could be wrong.

-- glen

Article: 7713
Subject: How fast can fully pipelined XC4000 logic go?
From: gah@u.washington.edu (G. Herrmannsfeldt)
Date: 6 Oct 1997 22:58:36 GMT
Links: << >> << T >> << A >>

I am working on a design using XC4000 logic, which I want to be as
fast as possible.  The basic design is a pipelined systolic array
processor, so it is already well pipelined.

What I want to do is add more pipeline stages, so I can run even faster.

I have designs that have only one or two CLB worth of logic between
latch stages.  The CLB will be arithmetic logic, such as 16 bit adders or
comparators.

How fast can I go with something like XC4013E or even an EX part, such
as XC4028EX?

I am looking for a little more than in the Xilinx databook.  

thanks,

-- glen

Article: 7714
Subject: Re: Need help for Xilinx Demo Board
From: gavin@cypher.co.nz (Gavin Melville)
Date: Tue, 07 Oct 1997 00:11:37 GMT
Links: << >> << T >> << A >>

On Thu, 02 Oct 1997 08:47:01 -0700, davidtle@SoCA.com wrote:

>I got old version of xilinx demo board, XC40XX-PC84 REV. 2 ASSEMBLY #
>0430454, last weekend at ACP computer show. Please, some one can show
>me where to get documentation about this Demo board.
>
>Thanks a lot for your help.

The schematic is in the "hardware and peripherals guide" that comes
with XACT.    I can FAX if you don't have it.
--
Gavin Melville
gavin@cypher.co.nz

Article: 7715
Subject: Re: bidirectional bus problem
From: Ray Andraka <no_spam_randraka@ids.net>
Date: Mon, 06 Oct 1997 20:28:31 -0400
Links: << >> << T >> << A >>

Peter Alfke wrote:
> 
> Your proble IMHO is either trivial or impossible.
> 
> If you know the direction of data flow, then you just activate either
> the IBUF or the OBUF to connect the internal bidirectional bus to the
> external one.
> If you don't know whether the source is inside the chip or outside, it
> is either impossible or it needs somebody smarter than I am.
> 
> The problem is than you go through an amplifier whenever you enter or
> exit the chip, and that amplifier is in your way, since it would just
> latch up.
> But you never know, somebody may give you an answer...
> 
> Peter Alfke, Xilinx Applications

Assuming there are drivers both inside the Xilinx and outside it (if
there weren't there would be no need for a bidirectional interface) then
somewhere on your xilinx design there is logic or signals from outside
that determine when the internal drivers need to drive.  The controls
for the bidirects on the chip periphery should be derived from those
controls, no?  in other words, All the information to determine data
direction is already available on the chip.  Now decoding it fast enough
to be useful may be a different story.  Another point, if speed is at
all an issue, you are often better off keeping the internal portion as
separated  input and output busses or logic.

-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 7716
Subject: Re: FPGA multiprocessors => vs. uniprocessors
From: "Jan Gray" <jsgray@acm.org.nospam>
Date: 7 Oct 1997 05:30:04 GMT
Links: << >> << T >> << A >>

Jack Greenbaum <spamfilt@greenbaum.us.com> wrote in article
<ljg1qehm38.fsf@greenbaum.us.com>...
> "Jan Gray" <jsgray@acm.org.nospam> writes:
> > Assuming careful floorplanning, it should be possible to place six
32-bit
> > processor tiles, or twelve 16-bit processor tiles, in a single 56x56
> > XC4085XL with space left over for interprocessor interconnect. 
> 
> You might be interested in another view of single chip multiprocessors.
> 
> Patt, et. al. "One Billion Transistors, One Uniprocessor, One Chip",
> IEEE Computer, Sept 1997, pp 51-57.
> 
> They argue against multiple processors on a single chip, because it
> makes what is already an I/O bound system even worse. Just because you
> can put multiple processors on a dies doesn't mean you can feed them
> instructions and data.
> 
> Jack Greenbaum -- jack at greenbaum.us.com
> 

This month's IEEE Computer was certainly a blockbuster.
With all respect to Dr. Patt and the U.Mich guys, whose
earlier HPS ideas are proven and shipped in nearly every
high-end microprocessor, and whose present paper
is most intriguing, I'm afraid I was somewhat more convinced
by the Hammond et al paper "A Single-Chip Multiprocessor"
in the same issue.  In particular, I guess I don't yet
believe that branch predictors can get as good as they
say, especially on real world spaghetti object-oriented code.
But who knows where compiler technology, especially dynamic
recompilation, will take us in the years to come.

Also, Patt et al.'s statement is:
	"In our view, if the design point is performance, the
	implementation of choice is a very high-performance
	uniprocessor on each chip, with chips interconnected
	to form a shared memory multiprocessor."

This does not appear to be a statement about throughput or
about price/performance.  Indeed, today, processor implementations
are usually compared using single threaded benchmarks.  But
I think this will change in the next decade.  For example, my
real job is writing infrastructure for transaction processing
for distributed COM objects, and we can certainly keep many
many threads busy with useful work.
  (//www.microsoft.com/transaction)
  (//www.microsoft.com/backoffice/scalability/billion.htm)

Anyway, back to my posting about multiprocessor FPGAs.
I wrote it not because I seriously think it's the best way
to use FPGAs, but because I thought it remarkable that
FPGAs are now large enough to host half a dozen or a dozen
modest simple RISC processors.

And although I pointed out it was a paper design, I did
account for providing adequate instruction/data bandwidth
to each processor.  In the case of the 6-way 32-bit RISC
multiprocessor, there were 6 small I-caches and six separate
memory ports to DRAM (could be SRAM, I suppose).
The XC4085XL has enough IOBs and pins to do this.

(Certainly if Xilinx hurries up and licenses the RAMBUS interface
there will be pins and bandwidth galore.)

Also, FPGA RISC processors are of course relatively slow.
A straightforward 3- or 4-stage pipelined implementation
of a single-issue machine should go 40 MHz or so.
A more deeply pipelined microarch. could approach 80 MHz.
(In today's XC4000XLs we are unlikely to significantly
exceed 100 MHz because that is approximately the
register file (distributed RAM block) cycle rate.)

Their slowness means their bandwidth needs are more modest.
And their issue penalty for cache misses is also less severe.
So you don't need big caches, just a fairly efficient
memory cycle -- SRAM, page mode EDO, or open-bank
SDRAM accesses.

Now, the multiprocessor I originally described had (say)
six separate memory banks, each private to a processor.
A more useful and more readily programmed machine
would provide a single shared memory.  I'm thinking
of a design where (say) 4 processors contend for 2 or 4
address-interleaved banks of memory.  You use the
center part of the FPGA as a 4x2 or maybe 4x4 x32
crossbar switch so that several processors
can simultaneously access separate memory banks.

Of course, I haven't simulated this, so don't take it too
seriously.

Finally, let's discuss where we can go with a fast uniprocessor
on a large FPGA.  I have given this some thought over the
years.

One big challenge is register file design.  The custom guys don't
blink (much) at producing 1-cycle 8-read 4-write register files for
their wide issue superscalars.  But given today's Xilinx distributed
RAMs this is unachievable.  In my processors I do 1-cycle
2-read 1-write register files using 2 copies of the 1-read 1-write
distributed RAM.  But doing 1-cycle fully arbitrary n-read 2-write
reg files is damn hard.  Instead it is better to move to an LIW
ISA with multiple independent register files.  For instance a
2-issue LIW would have instructions like:
	op1 dest1 src1a src1b    op2 dest2 src2a src2b
where dest1 is retired into reg file 1 and dest2 into reg file 2.
With a few more copies of reg file 1 and reg file 2 we can allow
some or all the source operands to read from either reg file.
(For instance we can build dual 3-read 1-write reg files with
six words of distributed RAM.)

Another challenge is ALU delay.  Including clock-to-out,
ALU latency, and setup time and interconnect, etc.,
this can be >20 ns.  To speed this up requires either
putting a register in the middle of the adder (not good) or
duplicate adders for even/odd cycles (good) and a two cycle
adder delay.  Using this technique you can probably clock
a processor at 66 or 80 MHz.

Put these ideas together and one can certainly see a
66 MHz 2 issue LIW in a XC4013E and perhaps a 4 issue
VLIW in a XC4036XL.  But for the latter you need a very
good optimizing compiler.

Cheers,
Jan Gray
Redmond, WA

Article: 7717
Subject: Re: FPGA multiprocessors
From: "Jan Gray" <jsgray@acm.org.nospam>
Date: 7 Oct 1997 05:30:17 GMT
Links: << >> << T >> << A >>

Charles Sweeney <CharlesSweeney@compuserve.com> wrote in article
<3438A7D6.2431@compuserve.com>...
> Jan Gray wrote:
> > Assuming careful floorplanning, it should be possible to place six
32-bit
> > processor tiles, or twelve 16-bit processor tiles, in a single 56x56
> > XC4085XL with space left over for interprocessor interconnect.  Also
the
> > number of processor tiles can be doubled if we eschew the I-cache and
> > simplify the microarchitecture -- though performance would greatly
suffer.
> 
> It's good to see you planning to take advantage of the parallelism
> offered by FPGAs, but why constrain your software to have to run in a
> particular microprocessor architecture? why not go further and compile
> your programs directly into the hardware of the FPGA, Handel-C does
> exactly that, please see our web site below.

Good question.

The trite answer is since designing processor ISAs and microarchitectures
for FPGA implementations is my research interest, that's my hammer
in search of nails.  FPGA multiprocessors are now possible -- but it
remains to be seen if they are actually useful!

The other answer is that I don't preclude a modest custom
datapath per processor (and such datapaths could be designed
from source code by tools such as Handel-C).  So I think an FPGA
multiprocessor is the preferred solution for problems which:
1. are amenable to n-way "outer loop" parallelism and
2. involve too much irregular computation for custom datapath only and
3. involve enough inner loop regular computation that an FPGA
custom datapath is faster/cheaper than a general purpose processor
or multiprocessor built of same.
(Whether such problems exist and are important remains to be seen.)

As for your question "why not go further and compile your
programs directly into the hardware of the FPGA?" :-

There will always be very regular signal processing applications,
regular in computation, regular in operand fetch and result store,
and relatively simple in the computation kernel, for which a custom
datapath compiled to an FPGA is a good solution.

But there are also other computations which are either
too irregular or too large to practically implement in an FPGA
datapath, even in a time-multiplexed (reconfiguration) manner.

The "outer loops" and "outer function calls" of these
computations are best done in a general purpose processor,
even as you move the inner loop(s) to a custom datapath.
Indeed, the inner loops may constitute only a few percent
of the total text of the source code of the computation.

To help these large "dusty deck" applications take advantage
of custom datapaths, it must be extremely convenient to
interface the custom stuff to the general purpose processor.

For some problems where even the irregular computation
is a critical path, especially those involving floating-point,
it probably makes sense to choose a fast, cheap
commercial off-the-shelf microprocessor.
Of course there are penalties here.  Cost of processor.
Less integration.  Board real-estate costs.  "Representation
domain crossing" costs.  Relatively slow communication
between processor and FPGA.  Cost of FPGA resources
spent interfacing to processor.

But for problems where the irregular computation is
not the critical path, the now modest overhead (10-20%)
of an embedded general purpose CPU enables an
interesting integrated "system on chip" hybrid:
embedded processor, on-chip bus, on-chip custom
datapaths and peripherals.

In theory, you could compile your dusty deck C, C++,
Java, FORTRAN, Scheme, etc. and run it immediately
on your FPGA CPU.  Then automatically (profile driven)
or through explicit directives, you can compile the inner
loops to a custom datapath.  This can either be manifest
as an on-chip command oriented coprocessor, or in some
cases as new instructions.  The latter has the potential
advantage of very high custom operation issue rates
(today, 66 MHz) and access to processor register
file, etc.

Given this approach, even if your dusty deck app stores
its data in such advanced data structures (sarcasm)
as a linked list (/sarcasm), it can still potentially take
advantage of a custom datapath.  This is much less
feasible if your registers or operands(s) are microseconds
away on the non-embedded host processor.

For example, the unused logic in
    //www3.sympatico.ca/jsgray/sld021.htm
was reserved for the Gouraud rendering instructions described 
in the last paragraph in:
    //www3.sympatico.ca/jsgray/render.txt

Of course, embedded processor in programmable logic is just
one point on the CPU/custom datapath spectrum.  See also
the BRASS research
  //http.cs.berkeley.edu/Research/Projects/brass
and my old essay on FPGA PC coprocessors
  //www3.sympatico.ca/jsgray/coproc.txt

Jan Gray
Redmond, WA

Article: 7718
Subject: Re: FPGA multiprocessors
From: "Jan Gray" <jsgray@acm.org.nospam>
Date: 7 Oct 1997 06:06:31 GMT
Links: << >> << T >> << A >>

In article <EHHn69.GA1@world.std.com>, jhallen@world.std.com (Joseph H
Allen) wrote:
> Jan Gray <jsgray@acm.org.nospam> wrote:
> >This just in from our paper designs department: the XC4062XL and
XC4085XL
> >are sooo big...
> >
> >The J32 (www3.sympatico.ca/jsgray/homebrew.htm) (a 32-bit RISC in half a
> >XC4010) processor's datapath, if redesigned for XC4000XL, should fit
nicely
> >in 16 rows by 8-9 columns of CLBs.  This got me thinking:
> 
> I took a look at your web site... you should try to make your processor
MIPS
> R3000 compatible (minus the multiply, divide, variable endianness,
> coprocessor, trap handling and MMU).  The R3000 is clean and simple, so
you
> might be able to do it without too much work.  I've made hand-held R3000
> computers before- it's easy to get GNU C to generate ROMable code and so
on. 
> It would be very nice if there were a low-power fully static version of
the
> R3000 for handheld applications, and you could probably get a lot of
money
> for it if you wanted.  With the R3041 from IDT for example, you have to
turn
> the power off to the microprocessor and it's a big mess.  You can get 4
> rechargable AA cells to last for about 3 weeks on standby, but only for
an
> hour if the processor is on continually.  Generally it would be cool to
have
> a C compiler for your processor.  You could try getting the MIPS linux
port
> to run...

Thanks for your comments.  For compilers, I was looking at lcc, but
GNU would be better in that it also provides a C++ front end.
As for hosting an OS, you will notice J32 has no MMU.  I have
some promising schemes for reasonable MMUs for FPGA
implementation along the lines of 1 bit per 16 KB page write
protection (only) and some related software conventions.  Either
that or slow sequential (nonassoc) TLB entry lookup.

I am quite familiar with the MIPS architecture.  I've been a fan ever since
John Mashey posted his old MIPS Performance Briefs to comp.arch
in the late '80s.  And I bought MIPS at their IPO.  Alas.  The R4000
had such promise but they took too long to get it out and ARC died.
But I digress.

The J32 is designed to be as close to MIPS as possible but yet
achieve good performance in half an XC4010.  In particular there are
no PC-relative branch instructions because the jump instructions
are adequate and adding another adder, mux, and bus would add
at least 20% to the datapath area.

I don't recall why I chose to use condition codes instead of slt
and bne etc.  But two years ago it seemed like a good idea.

If we eliminate J32 condition codes then it would be
straightforward to write a MIPS to J32 cross assembler.
It would map multiplies, exotic shifts, floating point, etc.
into calls to emulation routines.

BTW, if I recall correctly, MIPS has patents on lwl lwr etc.

Jan Gray
Redmond, WA

Article: 7719
Subject: Re: FPGA multiprocessors
From: Achim Gratz <gratz@ite.inf.tu-dresden.de>
Date: 7 Oct 1997 10:01:38 +0200
Links: << >> << T >> << A >>

"Jan Gray" <jsgray@acm.org.nospam> writes:

> The trite answer is since designing processor ISAs and
> microarchitectures for FPGA implementations is my research interest,
> that's my hammer in search of nails.  FPGA multiprocessors are now
> possible -- but it remains to be seen if they are actually useful!

When trying to put multiprocessors on a single chip, I think you'll
find that the only way to keep them fed is to get their data from the
same chip (preferably close to the processing element), which limits
the usefully exploitable algorithms tremendously.  That is not to say
they don't exist, only that the bandwith problem becomes worse and any
algorithm that is already bandwith limited will see no benefit.  So
what needs to be done is find new applications and algorithms or dig
out the old ones that were able to work from and to tape.

Achim Gratz.

--+<[ It's the small pleasures that make life so miserable. ]>+--
WWW:    http://www.inf.tu-dresden.de/~ag7/{english/}
E-Mail: gratz@ite.inf.tu-dresden.de
Phone:  +49 351 463 - 8325

Article: 7720
Subject: Re: Need help for Xilinx Demo Board
From: Frank Gilbert <gilbert@informatik.uni-kl.de>
Date: Tue, 07 Oct 1997 14:15:52 +0100
Links: << >> << T >> << A >>

davidtle@SoCA.com wrote:
> 
> I got old version of xilinx demo board, XC40XX-PC84 REV. 2 ASSEMBLY #
> 0430454, last weekend at ACP computer show. Please, some one can show
> me where to get documentation about this Demo board.

Dear David,

you need an old version of the "XACT Hardware & Peripherals Guide". We
have got a printed version, its from April 1994. The older boards are
not described in the actual documentation.

Email me, if you can't find it online.

Frank

____________________________________________________________________

Frank Gilbert                       | University of Kaiserslautern
mailto:gilbert@informatik.uni-kl.de | Center for Microelectronics (ZMK)
                                    | Erwin-Schroedinger-Strasse
                                    | 67663 Kaiserslautern

Article: 7721
Subject: Re: Wanted: cheap way to learn VHDL
From: timolmst@cyberramp.net
Date: Tue, 07 Oct 1997 13:16:19 GMT
Links: << >> << T >> << A >>

"Brad Eckert" <brad4ellie@worldnet.att.net> wrote:

>What's the cheapest way for me to teach myself VHDL?  What are the free
>resources on the net?

>-- Brad Eckert

Check out www.aldec.com for a free vhdl tutorial.


Tim Olmstead
webmaster of the CP/M Unofficial web page
http://cdl.uta.edu/cpm

Article: 7722
Subject: Re: bidirectional bus problem
From: brian@shapes.demon.co.uk (Brian Drummond)
Date: Tue, 07 Oct 1997 14:01:24 GMT
Links: << >> << T >> << A >>

jhallen@world.std.com (Joseph H Allen) wrote:

>In article <3435f2b0.217213456@news.netcomuk.co.uk>, Peter <z80@ds.com> wrote:
>
>>>It's certainly possible: heck mere telephones accomplish this.  A
>>>telephone is a two-wire device (there is no seperate ground) and is
>>>bidirectional- yet there are still such things as telephone line repeaters
>>>that don't latch up or oscillate.  Why can't your company's FPGAs do it? 
>>>They must be inferior products :-) :-) :-)
>
>>A telephone is an *analog* device, which uses signal cancellation to
>>do it.
>
>There's no reason this can't be done in the digital world- in fact it works
>better because in the digital world you have noise margins.  All you need is
>a few resistors:
>
>Tx --+--|>-----+------R------+
>     |         |             |
>     |         R             |
>     |         |             |
>     |         +-a         b-+--|>-- Rx
>     |         |             |
>     |         R             |
>     |         |             |
>     +--|>o----+----+  +-----+
>                    |  |
>                    |  |
>                    line
>                To other end
>          (duplicate above circuit)
[...]
>I tried this once with Rs-422 line driver and receivers.  It works fine.

Of course this works because you are returning to the analog domain
(more strictly, because you never left it ;) - those resistors are among
other things, a D/A converter by another name. You need at least three
signalling levels on the line.

More importantly, any real worls implementation will have to account for
skew between the two legs of the transmit circuit, in the presence of
line imbalance and other effects - resulting in glitches on the receiver
output. But with those constraints - yes it'll work.

- Brian.

Article: 7723
Subject: Re: FPGA multiprocessors => vs. uniprocessors
From: "Jan Gray" <jsgray@acm.org.nospam>
Date: 7 Oct 1997 15:18:55 GMT
Links: << >> << T >> << A >>

Me again.  I wrote:
> Put these ideas together and one can certainly see a
> 66 MHz 2 issue LIW in a XC4013E and perhaps a 4 issue
> VLIW in a XC4036XL.  But for the latter you need a very
> good optimizing compiler.

It can be done, but I think I chose the wrong parts.  First,
for this speed we need each half LIW processor
to get an I-cache slice or at least a loop buffer.

This widens the datapath from 2x16x13 (say) to
2x16x20 CLBs and forces you up into the larger
XC4000XL parts.

Jan Gray
Redmond, WA

Article: 7724
Subject: Re: Help: ABEL program for ISPLSI1000 series.
From: Tom Bowns <bowns@data-io.com>
Date: Tue, 7 Oct 1997 15:21:09 GMT
Links: << >> << T >> << A >>

Bulent UNALMIS wrote:

> I want to use IOC registers for registereted inputs.


Use the "property" statement in the ABEL-HDL source, and
create a buried register node to "become" the input register.
The ispDS+ fitter will take care of tranlating the buried
register node into the input register, if the correct 
property is used.

The example below sets up an input in1 to be an input register by
creating an IOC register from a buried node, then feeding the
IOC register with a single input. The fitter will do the rest:


   module mydes
   
   in1, in2, ioclk   pin;
   q1                pin;
   inreg             node istype 'reg';

   plsi property 'REGTYPE inreg IOC';   "node inreg is IOC register


   Equations

   inreg := in1;            "this makes in1 the IOC register.
   inreg.clk = ioclk;       "use IOCLK to feed IOC register clock.

   q1 = inreg & in2;

   end

      



-Tom Bowns
 Synario Design Automation
 Systems Integration Engineering

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search