Messages from 19425

Article: 19425
Subject: fpga cost
From: elynum@my-deja.com
Date: Tue, 21 Dec 1999 04:24:30 GMT
Links: << >> << T >> << A >>

Anyone know if it is possible that I could
get free samples of fpgas.


Sent via Deja.com http://www.deja.com/
Before you buy.

Article: 19426
Subject: Re: JamPlayer and 10K10
From: "Thomas Bornhaupt" <Thomas.Bornhaupt@t-online.de>
Date: Tue, 21 Dec 1999 10:39:55 +0100
Links: << >> << T >> << A >>

Hi Michael

I work on two machines Win95 PII 400MHz and DRdos 486 66Mhz. The target
Cumputer is the 486. So i use only the 16Bit Version of Jam.exe. Now i have
differnt versions. Jam 1.2 (JAM102), Jam 2.0 (JAM200) and Jam 2.12 (JAM212).

On Win95 the dosbox close directly after Program terminated, so i does not
recognize the error statment. On DRdos machine i can see the error, and i
pipe it in a textfile:

> jam212 -v -dDO_CONFIGURE=1 -p378 hicen1.jam >t.txt
-------------
Jam (Stapl) Player Version 2.12
Copywrite (C) 1997-1999 Altera Corporation
CRC matched: CRC value = 5B21
NOTE "CREATOR" = "POF to JAM converter Version 9.3 7/23/1999"
NOTE "DEVICE" = "EPF10K10"
NOTE "FILE" = "hicentotal.sof"
NOTE "USERCODE" = "0FF100DD"
NOTE "JAM_VERSION" = "1.0"
NOTE "ALG_VERSION" = "2.3"
Device #1 IDCODE is 010100DD
configuring FLEX device(s)...
Error on line 411: undefined symbol.   <<-----------------
Program terminated.
Elapsed time = 00:00:02
------------

All Versions of JAM???.EXE make the same Error on Line 411.

From the Altera destributor in Munic i got a jamfile generated with MAX+plus
9.1. Here i have the "Error on Line 405".

Now i also test the 32Bit Version of Jam.exe. Here i get memory protection
error:
--------
JAM verursachte einen Fehler durch eine ungültige Seite
in Modul JAM.EXE bei 0167:00414dc7.
Register:
EAX=00000000 CS=0167 EIP=00414dc7 EFLGS=00010246
EBX=00000073 SS=016f ESP=0065fb30 EBP=0065fb34
ECX=ffffffff DS=016f ESI=0041f03e FS=69df
EDX=00000010 ES=016f EDI=00000010 GS=0000
Bytes bei CS:EIP:
--------

Now I have no idear what i can do that it works!









Michael Stanton <mikes@magtech.com.au> schrieb in im Newsbeitrag:
385ED38E.24214D76@magtech.com.au...
> Hi Thomas
>
> We have never had to alter any lines inside the Jam source file and have
always
> been able to use the .jam file produced by Max+Plus II.
>
> The following is the DOS command line we use to program a FLEX 10K30A as
part of
> a three device JTAG chain :
>
> jam -v -dDO_CONFIGURE=1 -p378 cpld_top.jam
>
> We are using Jam.exe ver 1.2 and Max+Plus II 9.3 and have a ByteBlasterMV
> connected to a standard PC printer port (LPT1 at 378h) via a 2m long
D25M-D25F
> extension cable.
>
> There are two versions of the jam.exe ; 16-bit-DOS and Win95-WinNT. Have
you
> tried each version ?
>
> Can't think of anything else to try, - hope it works out for you !
>
> Regards, Michael
>
>
> Thomas Bornhaupt wrote:
>
> > Hi Michael,
> >
> > thank you for your tipps. But it doesnot work.
> >
> > It seemt to me, that the MAX+plus (9.3) genarates wrong JAM or JBC
files.
> >
> > I testet Jam.EXE 1.2 with the -dDO_CONFIGURE. But the Chip is not
> > programmed.
> >
> > Inside of the JAM-File (Language 1.1) i found this line
> >
> > BOOLEAN DO_CONFIGURE = 0;
> >
> > So i set it to
> >
> > BOOLEAN DO_CONFIGURE = 1;
> >
> > Starting JAM.EXE i got a syntax-error in line 440!
> >
> > Also i tested JAM.EXE 2.2. Here you have the option -aCONFIGURE. This is
the
> > Action out of the JAM-file (STAPL Format):
> >
> > ACTION CONFIGURE = PR_INIT_CONFIGURE, PR_EXECUTE;
> >
> > And now I got an exception. The Dosbox went direcly away and a pure
> > dosmachine hang up with an EMM386 error.
> >
> > regards
> > Thomas Bornhaupt
>
>
>

Article: 19427
Subject: Re: fpga cost
From: rk <stellare@nospam.erols.com>
Date: Tue, 21 Dec 1999 05:45:41 -0500
Links: << >> << T >> << A >>

elynum@my-deja.com wrote:

> Anyone know if it is possible that I could
> get free samples of fpgas.

i would say that it is possible - of course, it depends on your
negotiating skills, how much it looks like you might actually buy
something, etc., etc.

one company that i know of does have a policy of giving out free
devices, quicklogic:

     With QuickLogic’s new
     WebASIC program, you can
     receive programmed FPGA and
     ESP devices at no cost within
     24-48 hours of sending us your
     design data via the Internet.

while we have been talking about free software for years (sort of here
for some) this is the first that i've seen of making it policy for free
hardware.

good luck!

----------------------------------------------------------------------
rk                               The world of space holds vast promise
stellar engineering, ltd.        for the service of man, and it is a
stellare@erols.com.NOSPAM        world we have only begun to explore.
Hi-Rel Digital Systems Design    -- James E. Webb, 1968

Article: 19428
Subject: Re: Necessary to 'synchronise' an asynchronous FSM reset?
From: micheal_thompson@my-deja.com
Date: Tue, 21 Dec 1999 14:25:55 GMT
Links: << >> << T >> << A >>

Hi Bob

Thanks for this suggestion.
I wonder too as a further precaution might it be a good idea to release
the reset on the opposite edge to that used for the FSM's - assuming of
course that they all use the same edge - as otherwise, depending on the
consistency of the skew between clk and sync_reset across a device/
board (as they are high fanout signals) this approach might even make
things worse?

regds
Mike

In article <385e4a2b.75584735@nntp.best.com>,
  bob@nospam.thanks (Bob Perlman) wrote:
> My policy is to give every FSM an asynchronous reset and a synchronous
> reset.  The asynchronous reset puts the FSM in the right state even in
> the absence of a clock, which is important if the FSM is controlling,
> say, internal or external TriStates that might otherwise contend.  The
> synchronous reset works around the problem you mentioned (by the way,
> 'slim and none' is just another phrase for, 'sooner or later, for
> sure').  I do one-hot FSMs exclusively, and I apply the synchronous
> reset only to the initial state FF of the FSM; I use it to (a) hold
> that FF set and (b) gate off that FF's output to any other state FF.
>
> I create the synchronous reset with a pipeline of 3 or 4 FFs, all of
> which get a global reset.  A HIGH is fed to the D of the first FF, and
> gets propagated to the end of the chain after reset is released.  The
> output of the last FF is inverted to produce the active HIGH
> synchronous reset.  For devices that support global sets, you can just
> set all the FFs, feed a LOW into the first FF, and dispense with the
> inverter at the end.  It's important to clock this FF chain with the
> same clock used for the FSM, of course.
>
> There are other ways to work around this problem, such as adding extra
> do-nothing states after the initial states in a one-hot, or making
> sure that the FSM won't transition out of the initial state until a
> few cycles after the asynch reset has been released.  These work, too.
> The method I've described is easy to do in either schematics or HDL
> and, if desired, allows you to easily synchronize the startup of
> multiple FSMs.
>
> Take care,
> Bob Perlman
>


Sent via Deja.com http://www.deja.com/
Before you buy.

Article: 19429
Subject: Re: fpga cost
From: Ray Andraka <randraka@ids.net>
Date: Tue, 21 Dec 1999 09:29:43 -0500
Links: << >> << T >> << A >>

And once you get it, I guess you'll need free PWB layout, fab and
assembly?  Most of the modern packaging is not well suited for hobbyist
work - fine pitch quad flat packs and  ball grid arrays take special
techniques to mount on the board.

rk wrote:

> elynum@my-deja.com wrote:
>
> > Anyone know if it is possible that I could
> > get free samples of fpgas.
>
> i would say that it is possible - of course, it depends on your
> negotiating skills, how much it looks like you might actually buy
> something, etc., etc.
>
> one company that i know of does have a policy of giving out free
> devices, quicklogic:
>
>      With QuickLogic’s new
>      WebASIC program, you can
>      receive programmed FPGA and
>      ESP devices at no cost within
>      24-48 hours of sending us your
>      design data via the Internet.
>
> while we have been talking about free software for years (sort of here
> for some) this is the first that i've seen of making it policy for free
> hardware.
>
> good luck!
>
> ----------------------------------------------------------------------
> rk                               The world of space holds vast promise
> stellar engineering, ltd.        for the service of man, and it is a
> stellare@erols.com.NOSPAM        world we have only begun to explore.
> Hi-Rel Digital Systems Design    -- James E. Webb, 1968



--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 19430
Subject: Re: Dumb question springing from a discussion about chess on a chip...
From: vdiepeve@cs.uu.nl (Vincent Diepeveen)
Date: 21 Dec 1999 15:01:33 GMT
Links: << >> << T >> << A >>

In <385E7128.4335A25A@ids.net> Ray Andraka <randraka@ids.net> writes:


Thanks Ray,

if i start it in a simple way, say in next way:

  instead of writing all the difficult search stuff in FPGA,
  i only write the evaluation in FPGA. This can be done in parallel
  of course to a big extend.
  So it's smart to do it in fpga.
  
  If this can be done under 1 usec, then that would be great.
  Even 2 usec is acceptible.
  If it's needing more than 10 usec then that would suck bigtime.

  Basically it must be able to evaluate 200,000 times a second.

I bet that this is technical a lot simpler than writing the whole search with
big amountof memory in fpga.

My evaluation is gigantic though. First i generate a datastructure
with a lot of information, which is later used everywhere in the evaluation,
so that's at least some clocks extra.

If i'm not mistaken in FPGA the eval at 50Mhz may need in totally 100 clocks
to still deliver 500k evaluations a second, right?

What kind of components would i need for this?

How about a PCI card, are those already available?


  

>
>Vincent Diepeveen wrote:
>
>> On Sat, 18 Dec 1999 12:50:33 -0500, Ray Andraka <randraka@ids.net>
>> wrote:
>>
>> >
>> >
>> >Dann Corbit wrote:
>> >
>> >> "Ray Andraka" <randraka@ids.net> wrote in message
>> >> news:385B1DEE.7517AAC7@ids.net...
>> >> > The chess processor as you describe would be sensible in an FPGA.  Current
>> >> > offerings have extraordinary logic densities, and some of the newer FPGAs
>> >> have
>> >> > over 500K of on-chip RAM which can be arranged as a very wide memory.
>> >> Some of
>> >> > the newest parts have several million 'marketing' gates available too.
>> >> FPGAs
>> >> > have long been used as prototyping platforms for custom silicon.
>> >>
>> >> I am curious about the memory.  Chess programs need to access at least tens
>> >> of megabytes of memory.  This is used for the hash tables, since the same
>> >> areas are repeatedly searched.  Without a hash table, the calculations must
>> >> be performed over and over.  Some programs can even access gigabytes of ram
>> >> when implemented on a mainframe architecture.  Is very fast external ram
>> >> access possible from FPGA's?
>> >
>> >This is conventional CPU thinking.  With the high degree of parallelism in the
>>
>> No this is algorithmic speedup design.
>>
>
>What I meant by this is that just using the FPGA to accelerate the CPU algorithm
>isn't necessarily going to give you all the FPGA is capable of doing.  You need to
>rethink some of the algorithm to optimize it to the resources you have available in
>the FPGA.  The algorithm as it stands now is at least somewhat tailored to a cpu
>implementation.  It appears your thinking is jsut using the FPGA to speed up the
>inner loop, where what I am proposing is to rearrange the algorithm so that the FPGA
>might for example look at the whole board state on the current then next move.  In a
>CPU based algorithm, the storage is cheap and the computation is expensive.  In an
>FPGA, you have an opportunity for very wide parallel processes (you can even send a
>lock signal laterally across process threads).  Here the processing is generally
>cheaper than the storage of intermediate results.  The limiting factor is often the
>I/O bandwidth, so you want to rearrange your algorithm to tailor it to the quite
>different limitations of the FPGA.
>
>> Branching factor (time multiplyer to see another move ahead)
>> gets better with it by a large margin.
>>
>> So BF in the next formula gets better
>>
>>   # operations in FGPA   =  C *  (BF^n)
>>       where n is a positive integer.
>>
>> >FPGA and the large amount of resources in some of the more recent devices, it
>> >may very well be that it is more advantageous to recompute the values rather
>> >than fetching them.  There may even be a better approach to the algorithm that
>> >just isn't practical on a conventional CPU.  Early computer chess did not use
>> >the huge memories.  I suspect the large memory is more used to speed up the
>> >processing rather than a necessity to solving the problem.
>>
>> Though  #operations used by deep blue was incredible compared to
>> any program of today at world championship 1999 many programs searched
>> positionally deeper (deep blue 5 to 6 moves ahead some programs
>> looking there 6-7 moves ahead).
>>
>> This all because of these algorithmic improvements.
>>
>> It's like comparing bubblesort against merge sort.
>> You need more memory for merge sort as this is not in situ but
>> it's O (n log n). Take into account that in computergames the
>> option to use an in situ algorithm is not available.
>>
>> >> > If I were doing such I design in an FPGA however, I would look deeper to
>> >> see
>> >> > what algorithmic changes could be done to take advantage of the
>> >> parallelism
>> >> > offered by the FPGA architecture.  Usually that means moving away from a
>> >> > traditional GP CPU architecture which is limited by the inherently serial
>> >> > instruction stream.  If you are trying to mimic the behavior of a CPU, you
>> >> would
>> >> > possibly do better with a fast CPU, as you will get be able to run those
>> >> at a
>> >> > higher clock rate.  The FPGA gains an advantage over CPUs when you can
>> >> take
>> >> > advantage of parallelism to get much more done in a clock cycle than you
>> >> can
>> >> > with a CPU.
>> >>
>> >> The ability to do many things at once may be a huge advantage.  I don't
>> >> really know anything about FPGA's, but I do know that in chess, there are a
>> >> large number of similar calcutions that take place at the same time.  The
>> >> more things that can be done in parallel, the better.
>> >
>> >Think of it as a medium for creating a custom logic circuit.  A conventional CPU
>> >is specific hardware optimized to perform a wide variety of tasks, none
>> >especially well.  Instead we can build a circuit the specifically addresses the
>> >chess algorithms at hand.  Now, I don't really know much about the algorithms
>> >used for chess.  I suspect one would look ahead at all the possibilities for at
>> >least a few moves ahead and assign some metric to each to determine the one with
>> >the best likely cost/benefit ratio.  The FPGA might be used to search all the
>> >possible paths in parallel.
>>
>> My program allows parallellism. i need bigtime locking for this, in
>> order to balance the parallel paths.
>>
>> How are the possibilities in FPGA to press several of the same program
>> at one cpu, so that inside the FPGA there is a sense of parallellism?
>>
>> How about making something that enables to lock within the FPGA?
>>
>> It's not possible my parallellism without locking, as that's the same
>> bubblesort versus merge sort story, as 4 processors my program gets
>> 4.0 speedup, but without the locking 4 processors would be a
>> lot slower than a single sequential processor.
>>
>> >> > That said, I wouldn't recommend that someone without a sound footing in
>> >> > synchronous digital logic design take on such a project.  Ideally the
>> >> designer
>> >> > for something like this is very familiar with the FPGA architecture and
>> >> tools
>> >> > (knows what does and doesn't map efficiently in the FPGA architecture),
>> >> and is
>> >> > conversant in computer architecture and design and possibly has some
>> >> pipelined
>> >> > signal processing background (for exposure to hardware efficient
>> >> algorithms,
>> >> > which are usually different than ones optimized for software).
>> >> I am just curious about feasibility, since someone raised the question.  I
>> >> would not try such a thing by myself.
>> >>
>> >> Supposing that someone decided to do the project (however) what would a
>> >> rough ball-park guestimate be for design costs, the costs of creating the
>> >> actual masks, and production be for a part like that?
>> >
>> >The nice thing about FPGAs is that there is essentially no NRE or fabrication
>> >costs.  The parts are pretty much commodity items, purchased as generic
>> >components.  The user develops a program consisting of a compiled digital logic
>> >design, which is then used to field customize the part.  Some FPGAs are
>> >programmed once during the product manufacturer (one time programmables include
>> >Actel and Quicklogic).  Others, including the Xilinx line, have thousands of
>> >registers that are loaded up by a bitstream each time the device is powered up.
>> >The bitstream is typically stored in an external EPROM memory, or in some cases
>> >supplied by an attached CPU.  Part costs range from under $5 for small arrays to
>> >well over $1000 for the newest largest fastest parts.
>>
>> How about a program that's having thousands of chessrules and
>> incredible amount of loops within them and a huge search,
>>
>> So the engine & eval only equalling 1.5mb of C source code.
>>
>> How expensive would that be, am i understaning here that
>> i need for every few rules to spent another $1000 ?
>
>It really depends on the implementation.   The first step in finding a good FPGA
>implementation is repartitioning the algorithm.  This ground work is often the
>longest part of the FPGA design cycle, and it is a part that is not even really
>acknowledged in the literature or by the part vendors.  Do the system work up front
>to optimize the architecture for the resoucrces you have available, and in the end
>you will wind up with something much better, faster, and smaller than anything
>arrived at by simple translation.
>
>At one extreme, one could just us the FPGA to instantiate custom CPUs with a
>specialized instruction set for the chess program.  That approach would likely net
>you less performance than an emulator for the custom CPU running on a modern
>machine.  The reason for that is the modern CPUs are clocked at considerably higher
>clock rates than a typical FPGA design is capable of, so even if the emulation takes
>an average of 4 or 5 cycles for each custom instruction, it will still keep up with
>or outperform the FPGA.  Where the FPGA gets its power is the ability to do lots of
>stuff at the same time.   To take advantage of that, you usually need to get away
>from an instruction based processor.
>
>
>
>>
>>
>> >The design effort for the logic circuit you are looking at is not trivial.  For
>> >the project you describe, the bottom end would probably be anywhere from 12
>> >weeks to well over a year of effort depending on the actual complexity of the
>> >design, the experience of the designer with the algorithms, FPGA devices and
>> >tools.
>>
>> I needed years to write it in C already...
>>
>> Vincent Diepeveen
>> diep@xs4all.nl
>>
>> >> --
>> >> C-FAQ: http://www.eskimo.com/~scs/C-faq/top.html
>> >>  "The C-FAQ Book" ISBN 0-201-84519-9
>> >> C.A.P. Newsgroup   http://www.dejanews.com/~c_a_p
>> >> C.A.P. FAQ: ftp://38.168.214.175/pub/Chess%20Analysis%20Project%20FAQ.htm
>>
>> >--
>> >-Ray Andraka, P.E.
>> >President, the Andraka Consulting Group, Inc.
>> >401/884-7930     Fax 401/884-7950
>> >email randraka@ids.net
>> >http://users.ids.net/~randraka
>
>
>
>--
>-Ray Andraka, P.E.
>President, the Andraka Consulting Group, Inc.
>401/884-7930     Fax 401/884-7950
>email randraka@ids.net
>http://users.ids.net/~randraka
>
>
--
          +----------------------------------------------------+
          |  Vincent Diepeveen      email:  vdiepeve@cs.ruu.nl |
          |  http://www.students.cs.ruu.nl/~vdiepeve/          |
          +----------------------------------------------------+

Article: 19431
Subject: AMD FLASH ?
From: Bonio Lopez <bonio.lopezNOboSPAM@gmx.ch.invalid>
Date: Tue, 21 Dec 1999 07:06:39 -0800
Links: << >> << T >> << A >>

Hi friends,
am trying to use am29lv800b,
but redy/bysy pin is ever 0 (after reset too).
Have anybody ideas what could be a cause (or the name of appropriate
newsgroop).
I would like also to know, have somebody used this part in designs.



* Sent from RemarQ http://www.remarq.com The Internet's Discussion Network *
The fastest and easiest way to search and participate in Usenet - Free!

Article: 19432
Subject: Re: AMD FLASH ?
From: "Lutz Kleberhoff" <l.kleberhoff@mkc-gmbh.de>
Date: Tue, 21 Dec 1999 16:31:44 +0100
Links: << >> << T >> << A >>

Hi Bonio,

I use the 29F800B-chip. The Ready/busy Pin is Open-Collector (i remember).
Did you connect an Pull-UP

Regards,

Lutz

Bonio Lopez schrieb in Nachricht
<3fc5848e.be18dcdb@usw-ex0102-009.remarq.com>...
>Hi friends,
>am trying to use am29lv800b,
>but redy/bysy pin is ever 0 (after reset too).
>Have anybody ideas what could be a cause (or the name of appropriate
>newsgroop).
>I would like also to know, have somebody used this part in designs.
>
>
>
>* Sent from RemarQ http://www.remarq.com The Internet's Discussion Network
*
>The fastest and easiest way to search and participate in Usenet - Free!
>

Article: 19433
Subject: Re: Dumb question springing from a discussion about chess on a chip...
From: Ray Andraka <randraka@ids.net>
Date: Tue, 21 Dec 1999 10:34:46 -0500
Links: << >> << T >> << A >>

That would be one possible partition.  You can probably pipeline the evaluation so that
you might have several evaluations in progress at once.  That way you can get more than
the 100 clocks per evaluation in the case you mention below.  Again, I'd have to sit down
and noodle over the algorithms to get a really good partition and implementation.

For hardware, your best bet would probably be to buy one of the commercially available
boards out there.  Many have PCI interfaces, some of which use the FPGA for the PCI.
Check out www.optimagic.com for a pretty comprehensive listing of available boards.
You'll want to partition the algorithm between the processor and the FPGA before you make
a final selection of the board so that you make sure you have the right amount of and
connections to external memory for the application.  There are nearly as many board
architectures as there are boards.

Vincent Diepeveen wrote:

> In <385E7128.4335A25A@ids.net> Ray Andraka <randraka@ids.net> writes:
>
> Thanks Ray,
>
> if i start it in a simple way, say in next way:
>
>   instead of writing all the difficult search stuff in FPGA,
>   i only write the evaluation in FPGA. This can be done in parallel
>   of course to a big extend.
>   So it's smart to do it in fpga.
>
>   If this can be done under 1 usec, then that would be great.
>   Even 2 usec is acceptible.
>   If it's needing more than 10 usec then that would suck bigtime.
>
>   Basically it must be able to evaluate 200,000 times a second.
>
> I bet that this is technical a lot simpler than writing the whole search with
> big amountof memory in fpga.
>
> My evaluation is gigantic though. First i generate a datastructure
> with a lot of information, which is later used everywhere in the evaluation,
> so that's at least some clocks extra.
>
> If i'm not mistaken in FPGA the eval at 50Mhz may need in totally 100 clocks
> to still deliver 500k evaluations a second, right?
>
> What kind of components would i need for this?
>
> How about a PCI card, are those already available?
>
>
>
> >
> >Vincent Diepeveen wrote:
> >
> >> On Sat, 18 Dec 1999 12:50:33 -0500, Ray Andraka <randraka@ids.net>
> >> wrote:
> >>
> >> >
> >> >
> >> >Dann Corbit wrote:
> >> >
> >> >> "Ray Andraka" <randraka@ids.net> wrote in message
> >> >> news:385B1DEE.7517AAC7@ids.net...
> >> >> > The chess processor as you describe would be sensible in an FPGA.  Current
> >> >> > offerings have extraordinary logic densities, and some of the newer FPGAs
> >> >> have
> >> >> > over 500K of on-chip RAM which can be arranged as a very wide memory.
> >> >> Some of
> >> >> > the newest parts have several million 'marketing' gates available too.
> >> >> FPGAs
> >> >> > have long been used as prototyping platforms for custom silicon.
> >> >>
> >> >> I am curious about the memory.  Chess programs need to access at least tens
> >> >> of megabytes of memory.  This is used for the hash tables, since the same
> >> >> areas are repeatedly searched.  Without a hash table, the calculations must
> >> >> be performed over and over.  Some programs can even access gigabytes of ram
> >> >> when implemented on a mainframe architecture.  Is very fast external ram
> >> >> access possible from FPGA's?
> >> >
> >> >This is conventional CPU thinking.  With the high degree of parallelism in the
> >>
> >> No this is algorithmic speedup design.
> >>
> >
> >What I meant by this is that just using the FPGA to accelerate the CPU algorithm
> >isn't necessarily going to give you all the FPGA is capable of doing.  You need to
> >rethink some of the algorithm to optimize it to the resources you have available in
> >the FPGA.  The algorithm as it stands now is at least somewhat tailored to a cpu
> >implementation.  It appears your thinking is jsut using the FPGA to speed up the
> >inner loop, where what I am proposing is to rearrange the algorithm so that the FPGA
> >might for example look at the whole board state on the current then next move.  In a
> >CPU based algorithm, the storage is cheap and the computation is expensive.  In an
> >FPGA, you have an opportunity for very wide parallel processes (you can even send a
> >lock signal laterally across process threads).  Here the processing is generally
> >cheaper than the storage of intermediate results.  The limiting factor is often the
> >I/O bandwidth, so you want to rearrange your algorithm to tailor it to the quite
> >different limitations of the FPGA.
> >
> >> Branching factor (time multiplyer to see another move ahead)
> >> gets better with it by a large margin.
> >>
> >> So BF in the next formula gets better
> >>
> >>   # operations in FGPA   =  C *  (BF^n)
> >>       where n is a positive integer.
> >>
> >> >FPGA and the large amount of resources in some of the more recent devices, it
> >> >may very well be that it is more advantageous to recompute the values rather
> >> >than fetching them.  There may even be a better approach to the algorithm that
> >> >just isn't practical on a conventional CPU.  Early computer chess did not use
> >> >the huge memories.  I suspect the large memory is more used to speed up the
> >> >processing rather than a necessity to solving the problem.
> >>
> >> Though  #operations used by deep blue was incredible compared to
> >> any program of today at world championship 1999 many programs searched
> >> positionally deeper (deep blue 5 to 6 moves ahead some programs
> >> looking there 6-7 moves ahead).
> >>
> >> This all because of these algorithmic improvements.
> >>
> >> It's like comparing bubblesort against merge sort.
> >> You need more memory for merge sort as this is not in situ but
> >> it's O (n log n). Take into account that in computergames the
> >> option to use an in situ algorithm is not available.
> >>
> >> >> > If I were doing such I design in an FPGA however, I would look deeper to
> >> >> see
> >> >> > what algorithmic changes could be done to take advantage of the
> >> >> parallelism
> >> >> > offered by the FPGA architecture.  Usually that means moving away from a
> >> >> > traditional GP CPU architecture which is limited by the inherently serial
> >> >> > instruction stream.  If you are trying to mimic the behavior of a CPU, you
> >> >> would
> >> >> > possibly do better with a fast CPU, as you will get be able to run those
> >> >> at a
> >> >> > higher clock rate.  The FPGA gains an advantage over CPUs when you can
> >> >> take
> >> >> > advantage of parallelism to get much more done in a clock cycle than you
> >> >> can
> >> >> > with a CPU.
> >> >>
> >> >> The ability to do many things at once may be a huge advantage.  I don't
> >> >> really know anything about FPGA's, but I do know that in chess, there are a
> >> >> large number of similar calcutions that take place at the same time.  The
> >> >> more things that can be done in parallel, the better.
> >> >
> >> >Think of it as a medium for creating a custom logic circuit.  A conventional CPU
> >> >is specific hardware optimized to perform a wide variety of tasks, none
> >> >especially well.  Instead we can build a circuit the specifically addresses the
> >> >chess algorithms at hand.  Now, I don't really know much about the algorithms
> >> >used for chess.  I suspect one would look ahead at all the possibilities for at
> >> >least a few moves ahead and assign some metric to each to determine the one with
> >> >the best likely cost/benefit ratio.  The FPGA might be used to search all the
> >> >possible paths in parallel.
> >>
> >> My program allows parallellism. i need bigtime locking for this, in
> >> order to balance the parallel paths.
> >>
> >> How are the possibilities in FPGA to press several of the same program
> >> at one cpu, so that inside the FPGA there is a sense of parallellism?
> >>
> >> How about making something that enables to lock within the FPGA?
> >>
> >> It's not possible my parallellism without locking, as that's the same
> >> bubblesort versus merge sort story, as 4 processors my program gets
> >> 4.0 speedup, but without the locking 4 processors would be a
> >> lot slower than a single sequential processor.
> >>
> >> >> > That said, I wouldn't recommend that someone without a sound footing in
> >> >> > synchronous digital logic design take on such a project.  Ideally the
> >> >> designer
> >> >> > for something like this is very familiar with the FPGA architecture and
> >> >> tools
> >> >> > (knows what does and doesn't map efficiently in the FPGA architecture),
> >> >> and is
> >> >> > conversant in computer architecture and design and possibly has some
> >> >> pipelined
> >> >> > signal processing background (for exposure to hardware efficient
> >> >> algorithms,
> >> >> > which are usually different than ones optimized for software).
> >> >> I am just curious about feasibility, since someone raised the question.  I
> >> >> would not try such a thing by myself.
> >> >>
> >> >> Supposing that someone decided to do the project (however) what would a
> >> >> rough ball-park guestimate be for design costs, the costs of creating the
> >> >> actual masks, and production be for a part like that?
> >> >
> >> >The nice thing about FPGAs is that there is essentially no NRE or fabrication
> >> >costs.  The parts are pretty much commodity items, purchased as generic
> >> >components.  The user develops a program consisting of a compiled digital logic
> >> >design, which is then used to field customize the part.  Some FPGAs are
> >> >programmed once during the product manufacturer (one time programmables include
> >> >Actel and Quicklogic).  Others, including the Xilinx line, have thousands of
> >> >registers that are loaded up by a bitstream each time the device is powered up.
> >> >The bitstream is typically stored in an external EPROM memory, or in some cases
> >> >supplied by an attached CPU.  Part costs range from under $5 for small arrays to
> >> >well over $1000 for the newest largest fastest parts.
> >>
> >> How about a program that's having thousands of chessrules and
> >> incredible amount of loops within them and a huge search,
> >>
> >> So the engine & eval only equalling 1.5mb of C source code.
> >>
> >> How expensive would that be, am i understaning here that
> >> i need for every few rules to spent another $1000 ?
> >
> >It really depends on the implementation.   The first step in finding a good FPGA
> >implementation is repartitioning the algorithm.  This ground work is often the
> >longest part of the FPGA design cycle, and it is a part that is not even really
> >acknowledged in the literature or by the part vendors.  Do the system work up front
> >to optimize the architecture for the resoucrces you have available, and in the end
> >you will wind up with something much better, faster, and smaller than anything
> >arrived at by simple translation.
> >
> >At one extreme, one could just us the FPGA to instantiate custom CPUs with a
> >specialized instruction set for the chess program.  That approach would likely net
> >you less performance than an emulator for the custom CPU running on a modern
> >machine.  The reason for that is the modern CPUs are clocked at considerably higher
> >clock rates than a typical FPGA design is capable of, so even if the emulation takes
> >an average of 4 or 5 cycles for each custom instruction, it will still keep up with
> >or outperform the FPGA.  Where the FPGA gets its power is the ability to do lots of
> >stuff at the same time.   To take advantage of that, you usually need to get away
> >from an instruction based processor.
> >
> >
> >
> >>
> >>
> >> >The design effort for the logic circuit you are looking at is not trivial.  For
> >> >the project you describe, the bottom end would probably be anywhere from 12
> >> >weeks to well over a year of effort depending on the actual complexity of the
> >> >design, the experience of the designer with the algorithms, FPGA devices and
> >> >tools.
> >>
> >> I needed years to write it in C already...
> >>
> >> Vincent Diepeveen
> >> diep@xs4all.nl
> >>
> >> >> --
> >> >> C-FAQ: http://www.eskimo.com/~scs/C-faq/top.html
> >> >>  "The C-FAQ Book" ISBN 0-201-84519-9
> >> >> C.A.P. Newsgroup   http://www.dejanews.com/~c_a_p
> >> >> C.A.P. FAQ: ftp://38.168.214.175/pub/Chess%20Analysis%20Project%20FAQ.htm
> >>
> >> >--
> >> >-Ray Andraka, P.E.
> >> >President, the Andraka Consulting Group, Inc.
> >> >401/884-7930     Fax 401/884-7950
> >> >email randraka@ids.net
> >> >http://users.ids.net/~randraka
> >
> >
> >
> >--
> >-Ray Andraka, P.E.
> >President, the Andraka Consulting Group, Inc.
> >401/884-7930     Fax 401/884-7950
> >email randraka@ids.net
> >http://users.ids.net/~randraka
> >
> >
> --
>           +----------------------------------------------------+
>           |  Vincent Diepeveen      email:  vdiepeve@cs.ruu.nl |
>           |  http://www.students.cs.ruu.nl/~vdiepeve/          |
>           +----------------------------------------------------+



--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 19434
Subject: Re: AMD FLASH ?
From: Bonio Lopez <bonio.lopezNOboSPAM@gmx.ch.invalid>
Date: Tue, 21 Dec 1999 07:47:42 -0800
Links: << >> << T >> << A >>

I will see. I think it coud be a cause of such Re/nBY behavoir. (Danke
fuer Hinweis).
And now the second problem I cant understand:
If the reset= '1' ( working mode) the data bus get the same values as
address bus by read operation.

(nCE=0,nOE='0',nWE='1',reset='1').
I have absolutly no ideas, what coud it be.
May be you can point me my mistake?



* Sent from RemarQ http://www.remarq.com The Internet's Discussion Network *
The fastest and easiest way to search and participate in Usenet - Free!

Article: 19435
Subject: Re: Speed grade
From: "Joel Kolstad" <Joel.Kolstad@USA.Net>
Date: Tue, 21 Dec 1999 07:59:02 -0800
Links: << >> << T >> << A >>

rk <stellare@nospam.erols.com> wrote in message
news:385EE12E.F6315E86@nospam.erols.com...
> perhaps another one ...
>
> no self-clearing structures (f-f output to it's clear).  find another way
to make
> that pulse! i've seen this too many times!

I've used a form of this when interfacing an FPGA to a DSP's bus (both run
off different clocks).  The DSP's clock registers a write strobe in flip
flop #1 (while the data bus contents are registered in a bunch of other
registers).  This is followed by two flip flops, #2 and #3 (#2 acting as a
synchornizer, more or less) clocked by the FPGA's clock.  The output of #3
goes to the FPGA's state machines, etc., and also goes back to the
asynchronous reset input on 'flop #1.  (Note the the DSP is slow compared to
the FPGA; the output of 'flop #3 fires long before the DSP gets around to
sending another write strobe.)

Is this dangerous (doesn't seem to me to be...)?  ...and what's a better
alternative?

Thanks...

---Joel Kolstad

Article: 19436
Subject: Re: Speed grade
From: bob@nospam.thanks (Bob Perlman)
Date: Tue, 21 Dec 1999 16:06:51 GMT
Links: << >> << T >> << A >>

Hi - 

On Fri, 17 Dec 1999 09:41:50 -0800, Peter Alfke <peter@xilinx.com>
wrote:

<stuff snipped>
>To the buyer, this may look like a bargain, getting a Porsche for the
>price of a VW. But it can get the design in trouble if "dirty asynchronous
>tricks" were used.
>Always assume that the part you buy tomorrow may be faster than the one
>you bought yesterday.  That's why we are advocating synchronous design
>methods...

And it's a good thing to advocate.  Many problems in FPGAs can be
traced to undisciplined asynchronous design techniques whose principal
advantage is that they make for good anecdotes.

I thought I'd mention one place where speed increases cause problems
in synchronous systems: hold times.  

Most FPGA vendors guarantee internal hold margins by design.  If
you're sending a signal from one flip-flop to another inside an FPGA
and both flops are driven by the same (global) clock, reliable data
transfer is guaranteed, even if the part gets faster.  What's *not*
guaranteed is reliable clocking from chip to chip (FPGA to FPGA, FPGA
to something else, or something else to FPGA).  What happens if, say,
you've got an FPGA from a -X speed bin driving another FPGA from a -X
speed bin, and the first FPGA is actually a much faster part that's
been restamped -X because the vendor had an over-abundance of fast
parts?  Maybe you have a hold time problem, and maybe you don't.

How do you get around this?  Well, here are a few ideas:

a) First and foremost, calculate hold time margins for all board-level
interfaces.  Many designers calculate setup margins, but overlook hold
margins.  I don't know why this is.  Maybe they tried calculating them
once and got results that were too depressing.

b) Get serious about reducing board-level clock skew.  This includes
using low-skew clock buffers and matching clock trace lengths.

c) Use receiving devices whose input hold time is 0.  This doesn't
eliminate hold problems (clock skew can still get you), but it sure
helps.  (Aside: some bus interface parts have *terrible* hold times.
My theory is that every few years, the companies that make such parts
round up all the people who know anything about setup/hold time issues
and fire them.)

d)  When you calculate hold time margins, you'll need to estimate the
minimum delay for the clock-to-data-output path in the driving device.
When making this estimate, make sure you assume that the driving
device is from the fastest available speed grade.  If, for example,
you use a GateMeister -3 FPGA to drive a signal, and the fastest
available speed grade for that part is a -7, calculate a minimum delay
based on the -7 speed grade.  (I use 20 or 25% of max, depending on
how conservative I'm feeling.  Of course, if the vendor gives me a
guaranteed minimum for the -7, I use that.)

The problem I've described above has been around for ages, and not
just for FPGAs; it's everywhere.  Nor do I think it's unreasonable for
vendors to put faster parts in slower speed grades.  If customers want
to make a speed/cost tradeoff, vendors must be given a way of reliably
producing parts that they can sell in each grade, and re-marking parts
becomes a necessity.   

Bob Perlman

-----------------------------------------------------
Bob Perlman
Cambrian Design Works
Digital Design, Signal Integrity
http://www.best.com/~bobperl/cdw.htm
Send e-mail replies to best<dot>com, username bobperl
-----------------------------------------------------

Article: 19437
Subject: Re: fpga cost
From: "Joel Kolstad" <Joel.Kolstad@USA.Net>
Date: Tue, 21 Dec 1999 08:09:04 -0800
Links: << >> << T >> << A >>

rk <stellare@nospam.erols.com> wrote in message
news:385F5A54.FB6044B1@nospam.erols.com...
> one company that i know of does have a policy of giving out free
> devices, quicklogic:

...because they're fused based devices!  This is one of the ways they
compete with the Xilinx and Alteras of the world, where even "I don't need
no stinkin' simulation" designers can eventually get a design up and running
"crash and burn" style since they're just reprogramming SRAM all day.  The
other way they compete is on speed, of course.  (I'm a little surprised that
the Xilinx hardwire devices aren't a lot faster than the fastest FPGA speed
grade available.)

---Joel Kolstad

Article: 19438
Subject: Re: Necessary to 'synchronise' an asynchronous FSM reset?
From: bob@nospam.thanks (Bob Perlman)
Date: Tue, 21 Dec 1999 16:22:56 GMT
Links: << >> << T >> << A >>

Mike - 

On Tue, 21 Dec 1999 14:25:55 GMT, micheal_thompson@my-deja.com wrote:

>Hi Bob
>
>Thanks for this suggestion.
>I wonder too as a further precaution might it be a good idea to release
>the reset on the opposite edge to that used for the FSM's - assuming of
>course that they all use the same edge - as otherwise, depending on the
>consistency of the skew between clk and sync_reset across a device/
>board (as they are high fanout signals) this approach might even make
>things worse?
>
>regds
>Mike

Using opposite-edge clocking would increase hold time margin, but at
the expense of setup margin.  If both the synchronous reset generator
circuit and the FSM are inside the same FPGA, this isn't a good
tradeoff.  As a rule, FPGA vendors guarantee that a signal clocked
from one FF to another FF inside an FPGA will not have a hold problem,
provided that the two FFs are clocked by the same global clock.  In
other words, increasing hold margin in such a situation doesn't buy us
anything.

However, if the sync reset signal goes to a lot of places, there's
always the potential for a setup margin problem.  I have two ways of
getting around such problems:

 - I feed the sync reset to as few loads as possible (that's why I
feed sync reset only to the initial state FF and its state transition
logic).

 - The last FF in the sync reset generator can be duplicated.  Instead
of a single FF that feeds all sync reset destinations, you have N FFs
whose D inputs all get the same signal from the previous FF.  Each of
these FFs generates the same sync reset signal.  The FPGA's
place/route tool has the freedom to place these FFs close to their
destinations.

If you're distributing synchronous reset at a board level, then there
may be enough clock skew to justify the opposite-edge approach;
calculating the hold margin would tell you.  I've never tried
distributing such a reset among multiple chips; I distribute the async
reset, then generate a synchronous reset on each FPGA.

Take care,
Bob Perlman
  
>
>In article <385e4a2b.75584735@nntp.best.com>,
>  bob@nospam.thanks (Bob Perlman) wrote:
>> My policy is to give every FSM an asynchronous reset and a synchronous
>> reset.  The asynchronous reset puts the FSM in the right state even in
>> the absence of a clock, which is important if the FSM is controlling,
>> say, internal or external TriStates that might otherwise contend.  The
>> synchronous reset works around the problem you mentioned (by the way,
>> 'slim and none' is just another phrase for, 'sooner or later, for
>> sure').  I do one-hot FSMs exclusively, and I apply the synchronous
>> reset only to the initial state FF of the FSM; I use it to (a) hold
>> that FF set and (b) gate off that FF's output to any other state FF.
>>
>> I create the synchronous reset with a pipeline of 3 or 4 FFs, all of
>> which get a global reset.  A HIGH is fed to the D of the first FF, and
>> gets propagated to the end of the chain after reset is released.  The
>> output of the last FF is inverted to produce the active HIGH
>> synchronous reset.  For devices that support global sets, you can just
>> set all the FFs, feed a LOW into the first FF, and dispense with the
>> inverter at the end.  It's important to clock this FF chain with the
>> same clock used for the FSM, of course.
>>
>> There are other ways to work around this problem, such as adding extra
>> do-nothing states after the initial states in a one-hot, or making
>> sure that the FSM won't transition out of the initial state until a
>> few cycles after the asynch reset has been released.  These work, too.
>> The method I've described is easy to do in either schematics or HDL
>> and, if desired, allows you to easily synchronize the startup of
>> multiple FSMs.
>>
>> Take care,
>> Bob Perlman
>>
>
>
>Sent via Deja.com http://www.deja.com/
>Before you buy.

-----------------------------------------------------
Bob Perlman
Cambrian Design Works
Digital Design, Signal Integrity
http://www.best.com/~bobperl/cdw.htm
Send e-mail replies to best<dot>com, username bobperl
-----------------------------------------------------

Article: 19439
Subject: M1 timings
From: Christof Paar <christof@ece.wpi.edu>
Date: Tue, 21 Dec 1999 14:30:50 -0500
Links: << >> << T >> << A >>

Just a brief question: How reliable are the timing results which the the
M1 P&R tool (on Unix) provides for XC4000 family designs? In particular,
how likely is it that the maximum critical path delay can be met in an
actual design.

Thanks,

Christof

! WORKSHOP ON CRYPTOGRAPHIC HARDWARE AND EMBEDDED SYSTEMS (CHES 2000) !
!                   WPI, August 17 & 18, 2000                         !
!          http://www.ece.wpi.edu/Research/crypt/ches                 !

***********************************************************************
                 Christof Paar,  Assistant Professor
          Cryptography and Information Security (CRIS) Group
      ECE Dept., WPI, 100 Institute Rd., Worcester, MA 01609, USA
fon: (508) 831 5061    email: christof@ece.wpi.edu   
fax: (508) 831 5491    www:   http://ee.wpi.edu/People/faculty/cxp.html
***********************************************************************

Article: 19440
Subject: Re: Dumb question springing from a discussion about chess on a chip...
From: husby@fnal.gov (Don Husby)
Date: Tue, 21 Dec 1999 19:33:12 GMT
Links: << >> << T >> << A >>

vdiepeve@cs.uu.nl (Vincent Diepeveen) wrote:
> if i start it in a simple way, say in next way:
> 
>   instead of writing all the difficult search stuff in FPGA,
>   i only write the evaluation in FPGA. This can be done in parallel
>   of course to a big extend.
>   So it's smart to do it in fpga.

I think the search stuff is the easier part.  It's basically recursive
and decomposable:
  Assume you represent a board state as a large vector, for example
4 bits can represent the 12 unique pieces, so a 64x4 vector can represent
the entire board state.  (Probably less).   If you create a search
module that operates on a single square (probably a lookup table or
Altera CAM), then it can work serially on a square at a time.  This
can probably be implemented as a fairly small state machine.  It would
be possible to fit many of these on a single chip and have them run
in parallel.

The recursive structure can be implemented by feeding the results to
FIFOs which feed back to the input.  Something like this:

       {=          = Evaluator = FIFO }
       {= Parallel = Evaluator = FIFO }          Lookup   Feed
Start =>= search   = Evaluator = FIFO => Merge > Cached > back to
       {= Modules  = Evaluator = FIFO }          Path     Start
       {=          = Evaluator = FIFO }

Or you can arrange the data flow in a tree structure that
mimics the search tree.  The processing rate is likely to be
limited by the data path, but a rate of 12.5MHz per output
tree branch seems acheivable (A 64-bit wide bus at 50MHz).

If the evaluator is the bottleneck, and we assume an evaluator can
be pipelined to process a board state in an average 500ns, then you
would need only 6 of these to keep up with the 12.5MHz  path.

The cache will also be a bottleneck, since to be most effective, it
should be shared by all branches.  You'd probably want to construct a
multiport cache by time sharing it among several branches.  A cache
cycling at 100 MHz could service 8 branches running at 12.5 MHz.

--
Don Husby <husby@fnal.gov>             http://www-ese.fnal.gov/people/husby
Fermi National Accelerator Lab                          Phone: 630-840-3668
Batavia, IL 60510                                         Fax: 630-840-5406

Article: 19441
Subject: Re: fpga cost
From: rk <stellare@nospam.erols.com>
Date: Tue, 21 Dec 1999 15:57:58 -0500
Links: << >> << T >> << A >>

Ray Andraka wrote:

> And once you get it, I guess you'll need free PWB layout, fab and
> assembly?  Most of the modern packaging is not well suited for hobbyist
> work - fine pitch quad flat packs and  ball grid arrays take special
> techniques to mount on the board.

for most fpgas in plastic, the cost of the devices is rather cheap.  the
design software can get a bit pricy, although that is variable (student
editions, some company have "lite" versions for free downloads, etc.).  my
software investment is fairly high, it's worth considerably more than the
computer (over an order of magnitude more) and i'll be increasing the amount
of software soon (warning to salescritters - this is not an invitation to
call! :-).

a good quality pcb for these chips is required as most modern devices have
rather fast edge rates and/or large numbers of i/o's that can switch with
less noise margins - note that many devices can no longer switch from 0V ->
5V with a switching threshold set at approximately 2.5V; perhaps we'll be
seeing and using more differential i/o's.  in my opinion it's worth the $ to
get a good layout by an experienced layout guy and a solid multi-layer
board.  of course, this, combined with the software, dwarfs the cost of most
fpga devices.  heck, even the sockets/adapters get relatively expensive, with
many of them running over $100 each.

for assembly, i still use a fair amount of small fpgas in the plcc84; the
sockets for these make good contact and are easy for even me to solder onto a
pcb.  for bga, i'll be trying out a semi-easy-to-solder socket soon; this is
not for a high-speed application.

for surface mount of the popular pqfp or bga packages, i would definitely
agree, not a job for the hobbyist; a job for the experienced technician with
the right experience and equipment.

rk

Article: 19442
Subject: Re: Speed grade
From: rk <stellare@nospam.erols.com>
Date: Tue, 21 Dec 1999 18:54:10 -0500
Links: << >> << T >> << A >>

Joel Kolstad wrote:

> rk <stellare@nospam.erols.com> wrote in message
> news:385EE12E.F6315E86@nospam.erols.com...
> > perhaps another one ...
> >
> > no self-clearing structures (f-f output to it's clear).  find another way to
> make
> > that pulse! i've seen this too many times!
>
> I've used a form of this when interfacing an FPGA to a DSP's bus (both run
> off different clocks).  The DSP's clock registers a write strobe in flip
> flop #1 (while the data bus contents are registered in a bunch of other
> registers).  This is followed by two flip flops, #2 and #3 (#2 acting as a
> synchornizer, more or less) clocked by the FPGA's clock.  The output of #3
> goes to the FPGA's state machines, etc., and also goes back to the
> asynchronous reset input on 'flop #1.  (Note the the DSP is slow compared to
> the FPGA; the output of 'flop #3 fires long before the DSP gets around to
> sending another write strobe.)
>
> Is this dangerous (doesn't seem to me to be...)?  ...and what's a better
> alternative?
>
> Thanks...

hi,

well, i was really thinking of something else.  (at day job) i recently had to
interface to a microprocessor running over a bus ... and the bus was run too
fast to guarantee that i could clock in the data early in the project ... and in
the worst case, a signal from the microprocessor would go to another board and
then over to me.  in other words, although it was a "synchronous system" i had
to treat all incoming signals as asynchronous and sync them.  if i understand
correctly what you described, i did much the same thing.  after synchronization,
the pulses (both to my state machines and those clearing out the initial flop)
had a width determined by the period of the clock in my chip.  now, the
evil-asynchronous-logic-nazis saw this and then started to whine that i had a
signal going into an asynchronous clear and got all upset based on that fact
alone.  since i can guarantee that all pulses were well formed independent of
any propagation delay, the metastable state resolution time that i gave it was
quite generous, and  controlled by the frequency of a crystal oscillator, the
circuit was good(*), in my opinion.

the situation that i was describing above is when the clearing pulse width is
determined by gate and routing delays.  in the worst case that i have seen, two
flip-flops had their outputs NANDed and the output of the NAND went to another
sub-circuit and to the clears of both flops <rk takes a minute to heave>.  that
i consider a dangerous circuit.

perhaps i could have worded the original post better, something like
"asynchronously self-clearing structure".

(*) actually, the system was a disgrace and the circuit a local patch to make
for poor, or more correctly, non-existent system design.  no one knows why the
bus was run at high speed as there was nothing high speed going on and we were
shooting for absolute minimum power.  the contractor who did this stragetically
placed pads for R's and C's throughout the entire system to tune things up and
get the clock edges in the "right place."  <i feel sick again just typing
this>.  i again note that this was at day job!

have a good evening,

----------------------------------------------------------------------
rk                               The world of space holds vast promise
stellar engineering, ltd.        for the service of man, and it is a
stellare@erols.com.NOSPAM        world we have only begun to explore.
Hi-Rel Digital Systems Design    -- James E. Webb, 1968

Article: 19443
Subject: Re: Speed grade
From: Jamie Lokier <spamfilter.dec1999@tantalophile.demon.co.uk>
Date: 22 Dec 1999 02:21:30 +0100
Links: << >> << T >> << A >>

Bob Perlman writes:
> I thought I'd mention one place where speed increases cause problems
> in synchronous systems: hold times.  

> What happens if, say,
> you've got an FPGA from a -X speed bin driving another FPGA from a -X
> speed bin, and the first FPGA is actually a much faster part that's
> been restamped -X because the vendor had an over-abundance of fast
> parts?  Maybe you have a hold time problem, and maybe you don't.

I've been designing for the Altera Flex10KE devices.
They specify a /minimum/ external clock-to-output delay of 2ns, and an
external hold time of 0ns.

So you have up to 2ns of skew to play with between devices.  Cool huh?
We need it :-)

-- Jamie

Article: 19444
Subject: Re: Speed grade
From: Ray Andraka <randraka@ids.net>
Date: Tue, 21 Dec 1999 20:32:40 -0500
Links: << >> << T >> << A >>

A better method is to toggle a flag flip flop each time you write the DSP
register (ie you have one extra bit on the register which is loaded with its
inverted output).  Then take that flag, synchronize it to the FPGA domain, and
then use a change in state to generate a write pulse in the FPGA clock domain.
You can minimize the latency hit if you design the write pulse state machine
(gray code 4 states) so that the flag input is only sensed by one flip-flop.
The way you are doing it, can get you into trouble if the DSP comes in and sets
the flop near the time you do the reset.  This one works as long as the FPGA is
a little faster than the DSP (the smaller the differential, the less margin you
have though)

 valid: process( GSR, clk)
    variable state:std_logic_vector(1 downto 0);
    variable sync:std_logic;
    begin
        if GSR='1' then
            sync:='0';
            state:="00";
        elsif clk'event and clk='1' then
            sync:=toggle;
            case state is
                when "00" =>
                    if sync='1' then
                        state:="01";
                    else
                        state:="00";
                    end if;
                    wp<='0';
                when "01" =>
                    state:="11";
                    wp<='1';
                when "11" =>
                    if sync='0' then
                        state:="10";
                    else
                        state:="11";
                    end if;
                    wp<='0';
                when "10" =>
                    state:="00";
                    wp<='1';
                when others=>
                    null;
            end case;
        end if;
    end process;


Joel Kolstad wrote:

> rk <stellare@nospam.erols.com> wrote in message
> news:385EE12E.F6315E86@nospam.erols.com...
> > perhaps another one ...
> >
> > no self-clearing structures (f-f output to it's clear).  find another way
> to make
> > that pulse! i've seen this too many times!
>
> I've used a form of this when interfacing an FPGA to a DSP's bus (both run
> off different clocks).  The DSP's clock registers a write strobe in flip
> flop #1 (while the data bus contents are registered in a bunch of other
> registers).  This is followed by two flip flops, #2 and #3 (#2 acting as a
> synchornizer, more or less) clocked by the FPGA's clock.  The output of #3
> goes to the FPGA's state machines, etc., and also goes back to the
> asynchronous reset input on 'flop #1.  (Note the the DSP is slow compared to
> the FPGA; the output of 'flop #3 fires long before the DSP gets around to
> sending another write strobe.)
>
> Is this dangerous (doesn't seem to me to be...)?  ...and what's a better
> alternative?
>
> Thanks...
>
> ---Joel Kolstad



--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 19445
Subject: Re: M1 timings
From: Phil Hays <Spampostmaster@sprynet.com>
Date: Tue, 21 Dec 1999 18:08:22 -0800
Links: << >> << T >> << A >>

Christof Paar wrote:

> Just a brief question: How reliable are the timing results which the the
> M1 P&R tool (on Unix) provides for XC4000 family designs? In particular,
> how likely is it that the maximum critical path delay can be met in an
> actual design.

I'm not quite sure what you are asking:  are you asking "if the Xilinx toolset
reports 15 ns maximum for a path, how likely is that path going to be more than 
15 ns in a real part"?  Answer is: unlikely.  Xilinx seems to do a good job.  
Aiding this is that the XC4000 parts are not cutting edge, they have been in 
use for years.  I have had fairly good experience with XC4000 family parts.

Or are you asking for real quality statistics from a list of major users of 
Xilinx parts?

Or are you asking "the Xilinx toolset reports 15.2 ns maximum and I want to
run it at 15 ns period, what are my odds of it working?  Answer is: not as 
likely as the first, but still pretty good.  Usually parts at nominal 
conditions (temperature and voltage) are faster than the xdelay number.  How 
much faster?  How lucky do you feel?  I do not suggest this for any type of 
critical application: life support, machine controls, or anything else that
might cause harm if the device malfunctions.

Or are you asking "the Xilinx toolset reports 15 ns, and my circuit will
not work if the delay is faster than 7 ns"?  The answer to this is that your
circuit is likely not to work sometime.  It might work when you are checking 
it out, but failure when showing it to someone important is guaranteed by one 
variant of Murphy's Law.  Never count on minimum delays.  Keep your clock skew
as low as possible.  Never draw to an inside straight.

-- 
Phil Hays
"Irritatingly,  science claims to set limits on what 
we can do,  even in principle."   Carl Sagan

Article: 19446
Subject: Re: fpga cost
From: Ray Andraka <randraka@ids.net>
Date: Tue, 21 Dec 1999 21:48:58 -0500
Links: << >> << T >> << A >>

Rich.

I'm surprised you've had good luck with the PLCC 84 sockets and FPGAs.  I've had
some very bad experiences with those.  THey seem OK the first time you put a chip
in.  Remove and replace the chip once or twice and let the games begin.

rk wrote:

> Ray Andraka wrote:
>
> > And once you get it, I guess you'll need free PWB layout, fab and
> > assembly?  Most of the modern packaging is not well suited for hobbyist
> > work - fine pitch quad flat packs and  ball grid arrays take special
> > techniques to mount on the board.
>
> for most fpgas in plastic, the cost of the devices is rather cheap.  the
> design software can get a bit pricy, although that is variable (student
> editions, some company have "lite" versions for free downloads, etc.).  my
> software investment is fairly high, it's worth considerably more than the
> computer (over an order of magnitude more) and i'll be increasing the amount
> of software soon (warning to salescritters - this is not an invitation to
> call! :-).
>
> a good quality pcb for these chips is required as most modern devices have
> rather fast edge rates and/or large numbers of i/o's that can switch with
> less noise margins - note that many devices can no longer switch from 0V ->
> 5V with a switching threshold set at approximately 2.5V; perhaps we'll be
> seeing and using more differential i/o's.  in my opinion it's worth the $ to
> get a good layout by an experienced layout guy and a solid multi-layer
> board.  of course, this, combined with the software, dwarfs the cost of most
> fpga devices.  heck, even the sockets/adapters get relatively expensive, with
> many of them running over $100 each.
>
> for assembly, i still use a fair amount of small fpgas in the plcc84; the
> sockets for these make good contact and are easy for even me to solder onto a
> pcb.  for bga, i'll be trying out a semi-easy-to-solder socket soon; this is
> not for a high-speed application.
>
> for surface mount of the popular pqfp or bga packages, i would definitely
> agree, not a job for the hobbyist; a job for the experienced technician with
> the right experience and equipment.
>
> rk



--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 19447
Subject: Re: M1 timings
From: Ray Andraka <randraka@ids.net>
Date: Tue, 21 Dec 1999 21:56:12 -0500
Links: << >> << T >> << A >>

The output from the timing analyzer is real.  The summary results from PAR
are ballpark, but they do usually seem to err on the conservative side.

Christof Paar wrote:

> Just a brief question: How reliable are the timing results which the the
> M1 P&R tool (on Unix) provides for XC4000 family designs? In particular,
> how likely is it that the maximum critical path delay can be met in an
> actual design.
>
> Thanks,
>
> Christof
>
> ! WORKSHOP ON CRYPTOGRAPHIC HARDWARE AND EMBEDDED SYSTEMS (CHES 2000) !
> !                   WPI, August 17 & 18, 2000                         !
> !          http://www.ece.wpi.edu/Research/crypt/ches                 !
>
> ***********************************************************************
>                  Christof Paar,  Assistant Professor
>           Cryptography and Information Security (CRIS) Group
>       ECE Dept., WPI, 100 Institute Rd., Worcester, MA 01609, USA
> fon: (508) 831 5061    email: christof@ece.wpi.edu
> fax: (508) 831 5491    www:   http://ee.wpi.edu/People/faculty/cxp.html
> ***********************************************************************



--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 19448
Subject: Re: JamPlayer and 10K10
From: steve (Steve Rencontre)
Date: Wed, 22 Dec 1999 03:00 +0000 (GMT Standard Time)
Links: << >> << T >> << A >>

In article <385ED38E.24214D76@magtech.com.au>, mikes@magtech.com.au 
(Michael Stanton) wrote:

> Hi Thomas
> 
> We have never had to alter any lines inside the Jam source file and 
> have always
> been able to use the .jam file produced by Max+Plus II.
> 
> The following is the DOS command line we use to program a FLEX 10K30A 
> as part of
> a three device JTAG chain :
> 
> jam -v -dDO_CONFIGURE=1 -p378 cpld_top.jam
> 
> We are using Jam.exe ver 1.2 and Max+Plus II 9.3 and have a 
> ByteBlasterMV
> connected to a standard PC printer port (LPT1 at 378h) via a 2m long 
> D25M-D25F
> extension cable.
> 
> There are two versions of the jam.exe ; 16-bit-DOS and Win95-WinNT. 
> Have you
> tried each version ?

A while back, the Jam player had some nasty 16-bit limitations which 
prevented the DOS version from programming most 10Ks. That may have 
changed now, but the solution then was not to use it! Altera publicly 
promised to fix it, but privately said don't hold your breath.

For FLEX10K devices, Jam is a useless waste of effort. The bit stream is 
just clocked in with no fancy algorithm, and a simple program that reads 
the .SOF file[s] is orders of magnitude faster and smaller.

--
Steve Rencontre, Design Consultant
http://www.rsn-tech.demon.co.uk

Article: 19449
Subject: Re: M1 timings
From: Ray Andraka <randraka@ids.net>
Date: Tue, 21 Dec 1999 22:10:18 -0500
Links: << >> << T >> << A >>

The timing numbers from the timing analyzer are real for worst case delays.
The numbers reported in the PAR summary are ballpark, but they seem to
generally be conservative.  Run timing analyzer and use those times as the
worst case numbers, as they are more accurate.  Keep in mind that the
numbers reported are worst case delays over temperature, voltage and
process.  Realistically, you won't see those delays.  Use them as a bounding
box, and your design will be fine.  If you feel lucky and you know (for
instance) that the temperature will always be 70F, you can push them
beyond.  How much?  who knows.  You could characterize your part to find out
*for that part* if so inclined.  At nominal voltage and room temperature you
can see delays that are less than half of the worst case delays.  Also, be
aware that slower speed grades may be faster parts marked as slow parts.
All this goes to further the argument against depending on logic delays to
make the design work.

Christof Paar wrote:

> Just a brief question: How reliable are the timing results which the the
> M1 P&R tool (on Unix) provides for XC4000 family designs? In particular,
> how likely is it that the maximum critical path delay can be met in an
> actual design.
>
> Thanks,
>
> Christof
>
> ! WORKSHOP ON CRYPTOGRAPHIC HARDWARE AND EMBEDDED SYSTEMS (CHES 2000) !
> !                   WPI, August 17 & 18, 2000                         !
> !          http://www.ece.wpi.edu/Research/crypt/ches                 !
>
> ***********************************************************************
>                  Christof Paar,  Assistant Professor
>           Cryptography and Information Security (CRIS) Group
>       ECE Dept., WPI, 100 Institute Rd., Worcester, MA 01609, USA
> fon: (508) 831 5061    email: christof@ece.wpi.edu
> fax: (508) 831 5491    www:   http://ee.wpi.edu/People/faculty/cxp.html
> ***********************************************************************

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search