Messages from 19450

Article: 19450
Subject: Re: fpga cost
From: rk <stellare@nospam.erols.com>
Date: Tue, 21 Dec 1999 22:45:27 -0500
Links: << >> << T >> << A >>

Ray Andraka wrote:

> Rich.
>
> I'm surprised you've had good luck with the PLCC 84 sockets and FPGAs.  I've had
> some very bad experiences with those.  THey seem OK the first time you put a chip
> in.  Remove and replace the chip once or twice and let the games begin.

hi ray,

here's a few notes.  so far i haven't had any problems with the plcc sockets when
handled correctly.  probing on the top, i learned a few years ago, seems to be BAD as
it takes some of the springiness out of the little contacts on the side of the
package.  so i don't do that anymore, probe on the bottom of the board, although i'd
prefer to get on the device itself, but i found you have to be very careful for
that.  also, i got a very good quality extraction tool, it wasn't cheap.  they come
in individual sizes for each socket size.  i don't like the cheaper, universal ones,
i think that stresses the sockets.

most of the parts that i put in these sockets go in once, stay in.  i have operated
them for years, plenty of shipping and knocking around, have had zero socket
reliability problems.  in fact, i don't even worry about it.  a few i have pulled in
and out a number of times ... perhaps 3-4x, to read out the checksum (otp) or the
program (5192) to verify that i had the programming file to make more for more boards
or more copies of the device from the uploaded program.  i haven't had any problems
there at all.

there are zif sockets for the plcc84, too, that are used on the programming adapter.
i've had quite a few pass through the machine, zero problems at all, from one otp
manufacturer.  they have a nice alignment and the lead pitch isn't too fine and it
doesn't take much care to handle them.  from another manufacturer, non-volatile
reprogrammable, i have had a lot of problems with the zif socket on their programming
adapter, the fit is sloppier and alignment is tough.  i bought the adapter new and
haven't used it very much at all, perhaps 15-20, where the other manufacturer's zif
socket was used probably 200x, with no sign of wear.

one part i did have to do many burns to get the design right <yeah, i'm embarrassed,
but here's a good excuse>.  i had to interface to another device where, well, the
description was wrong and so was the schematic.  a very non-fun situation.  so i had
to build up my circuit, run it, and then based on response figure out what the
circuit i was interfacing to was doing and then check with the supplier.  ugghh!!!
there were a lot of things like inverted polarities, incorrect counts coming from
counters, etc., etc.  in any event, to get this one working it took about a dozen
tries until i got the thing figured out.  normally i don't care about otp since
things usually work right away and the cost is low, about $25 ea.  in any event, the
labor was high (it was a looooooooooooong night and a short deadline).  i've used
that board for about 3 1/2 years now, lots of shipping and other forms of abuse,
never had a problem; works like a champ.

otoh, the pqfp and their zif sockets are a headache at best.  a few times in and out,
the leads get spread a bit, and the alignment of the part to the socket is poor, too
much slop.  this is a pain when using that socket in a test board where, by design,
you have a lot parts moving in and out.  i like to align those things under a low
power microscope to ensure i have good contact between all the leads and the little
pins that sit down on them.  on the road i bring a big magnifying glass i borrow from
junior.  another problem with that packaging technology is that you have to handle
the parts very, very carefully as the leads get bent very, very easily.  cheap for
tha manufacturers, but a real pain to work with in a socket.  for test applications
you have no choice.  for circuit applications, i'm just soldering right down to the
board and bet i won't screw up the design (by nature of the job it's using an otp
device).  these zif sockets are unreliable with shipping and stuff; i've had poor
luck with that.

i did do one board with pbga but the parts didn't show up so it hasn't been used
yet.  that socket for that board was worrisome, it had little pins on the bottom and
it bolted to the board for a press fit.  for a newer device, i got a socket that
solders to the board with pins under each corresponding ball.  we'll see how that one
works in a month of so, i sent the board out for design/layout a few weeks ago.  but
i do like that better than the contraption that i was gonna try earlier.  the pbga
seems to be a good fit in these sockets.  what i don't like is that i can't get on
the actual pin of the device with a scope probe.

anyways,

have a good evening,

rk

Article: 19451
Subject: Re: State machine ok with binary encoding but unstable with one hot encoding
From: eml@riverside-machines.com.NOSPAM
Date: Wed, 22 Dec 1999 11:12:53 GMT
Links: << >> << T >> << A >>

On Tue, 14 Dec 1999 10:48:35 +0100, "Marc Battyani"
<Marc_Battyani@csi.com> wrote:

>I don't understand why the following state machine is ok when I use binary
>state encoding (with safest mode) but oscillate when I use one hot encoding
>(with safest mode also).

firstly, there's a couple of problems with your code. your type
declaration for 'State' is incomplete:

>type State is (Waiting, StartDataRead, InDataRead);

and you're assigning to a type rather than a signal inside your code:

>        State <= Waiting;

second, the if-else issue is a red herring. any synthesiser that
requires the else's that you eventually put in your code:

> when InAddrRead =>
>    if nAddrStb = '1' then
>        State <= Waiting;
>    else
>        State <= InAddrRead;
>    end if;

is simply broken, and fpga express certainly doesn't have this
problem. if it did, the vast majority of FSM's out there wouldn't
work.

the only time when you might want to worry about else's is in a
*combinatorial* process (and yours is clearly clocked, not
combinatorial). if it's possible for a combinatorial process to be
run, and for a particular signal not to be assigned to during the
execution, then you might want to put in an else (or a default
assignment) to ensure that there is an assignment to that signal (to
prevent memory being inferred for that signal).

if there aren't any other typos in your process, then  i suspect that
your synth's 'safe' mechanism is broken. this isn't covered by the
language definition, and is synth-specific - if you really need a safe
machine, how about coding it by hand?

evan

Article: 19452
Subject: Re: M1 timings
From: eml@riverside-machines.com.NOSPAM
Date: Wed, 22 Dec 1999 11:13:58 GMT
Links: << >> << T >> << A >>

On Tue, 21 Dec 1999 14:30:50 -0500, Christof Paar
<christof@ece.wpi.edu> wrote:

>Just a brief question: How reliable are the timing results which the the
>M1 P&R tool (on Unix) provides for XC4000 family designs? 

my first post to comp.arch.fpga was a detailed version of this
question, a couple of years ago. i got every imaginable reply: some
saying that the numbers were always right, or always wrong, some
saying that they were generally right, some saying that it depended on
how new the device and the speed file were, some who'd missed the
point, some who assumed that i couldn't design, and so on. i even got
into an email discussion with somebody (not from xilinx) about
neocad's timing extractor. the algorithm was inaccurate for sub-0.5
micron designs, they said, particularly for fast clocks, and it needed
to be fixed. presumably it's been fixed now.

so, the real answer is: who knows? we certainly don't. but don't worry
about it, or you might end up wasting valuable hours, or years, on
usenet.

evan

Article: 19453
Subject: EDIF and VITAL
From: Walter Soto Encinas Jr <soto@icmc.sc.usp.br>
Date: Wed, 22 Dec 1999 11:34:22 -0200
Links: << >> << T >> << A >>

Hi

	I am doing the backannotation in a VHDL design to do the accurate
gate-level simulation with Synopsys VSS. I got some error messages, in the
flip-flops instances:

**Error: vhdlsim,260: 
    (SDF File: addr.sdf Line: 1820) generic
    /ADDR/COUNT_LIN/COUNT_3_MID_1_FF/THOLD_CLR_C_noedge_posedge is not
    declared.

	I suppose these messages are about some VITAL timing parameter not present
in the VITAL libraries of the ff. Loking into the VHDL VITAL model of FF, I
couldn't find the parameter exactly as it appears above, but very similar:

entity FDC is
   generic(
      TimingChecksOn: Boolean := True;
      InstancePath: STRING := "*";
      Xon: Boolean := False;
      MsgOn: Boolean := True;
      tpd_CLR_Q                  :  VitalDelayType01 := (0.718 ns, 0.659
ns);
      tpd_C_Q                    :  VitalDelayType01 := (0.718 ns, 0.658
ns);
      tsetup_D_C                 :  VitalDelayType := 1.300 ns;
      thold_D_C                  :  VitalDelayType := 0.200 ns;
      trecovery_CLR_C            :  VitalDelayType := 2.325 ns;
 ---> thold_CLR_C                :  VitalDelayType := 0.000 ns;
      tpw_C_posedge              :  VitalDelayType := 4.000 ns;
      tpw_CLR_posedge            :  VitalDelayType := 4.000 ns;
      tpw_C_negedge              :  VitalDelayType := 4.000 ns;
      tpw_CLR_negedge            :  VitalDelayType := 4.000 ns;
      tipd_D                     :  VitalDelayType01 := (0.000 ns, 0.000
ns);
      tipd_C                     :  VitalDelayType01 := (0.000 ns, 0.000
ns);
      tipd_CLR                   :  VitalDelayType01 := (0.000 ns, 0.000
ns));

	May someone advice me how to correct this error? Should I change the SDF
file? How? I guess SDF is in EDIF format. A typical FF in SDF is:

	
(CELL
        (CELLTYPE "FDC")
        (INSTANCE COUNT_LIN/COUNT_8_FIM_1_FF)
        (TIMINGCHECK
                (SETUP D (posedge C) (1.338:1.338:1.338))
                (HOLD D (posedge C) (0.200:0.200:0.200))
                (WIDTH (posedge C) (1.658:1.658:1.658))
                (WIDTH (negedge C) (1.479:1.479:1.479))
                (RECOVERY (negedge CLR) (posedge C) (1.300:1.300:1.300))
                (HOLD CLR (posedge C) (0.200:0.200:0.200))
                (WIDTH (posedge CLR) (0.863:0.863:0.863))
                (WIDTH (negedge CLR) (0.001:0.001:0.001))
        )
        (DELAY
                (ABSOLUTE
                        (DEVICE Q (0.215:0.215:0.215) (0.298:0.298:0.298))
                )
        )
)

	Thanks is advance!

-- 
|                                       Walter Soto Encinas Jr          |
|                                            PhD  Student               |
|                                             IFSC / USP                |
|                                               Brazil                  |

Article: 19454
Subject: Re: Dumb question springing from a discussion about chess on a chip...
From: diep@xs4all.nl (Vincent Diepeveen)
Date: Wed, 22 Dec 1999 13:57:42 GMT
Links: << >> << T >> << A >>

On Tue, 21 Dec 1999 19:33:12 GMT, husby@fnal.gov (Don Husby) wrote:

>vdiepeve@cs.uu.nl (Vincent Diepeveen) wrote:
>> if i start it in a simple way, say in next way:
>> 
>>   instead of writing all the difficult search stuff in FPGA,
>>   i only write the evaluation in FPGA. This can be done in parallel
>>   of course to a big extend.
>>   So it's smart to do it in fpga.
>
>I think the search stuff is the easier part.  It's basically recursive
>and decomposable:
>  Assume you represent a board state as a large vector, for example
>4 bits can represent the 12 unique pieces, so a 64x4 vector can represent
>the entire board state.  (Probably less).   If you create a search
>module that operates on a single square (probably a lookup table or
>Altera CAM), then it can work serially on a square at a time.  This
>can probably be implemented as a fairly small state machine.  It would
>be possible to fit many of these on a single chip and have them run
>in parallel.

In 1980 i guess you would have been right about search, in 1999
it's immense tough to put search in hardware. It's very complicated
and it's important to use a lot of RAM (few hundreds of megabytes
at preferably) with it and store all kind of intermediate results.

Parallellizing my program in software took me 6 months.

Apart from this, search is less interesting to put on hardware,
it eats less than 5% of systemtime. the other 5% goes to
lookups in the big global RAM.

The most important thing is to put the evaluation in hardware.
Only after that can be done 'simply' with an evaluation
speed of less than or equalling about 2 usec 
(knowing that communication over PCI bus is 1 usec).

Discussing the difference between a search from 1999 and a search of
1980 makes little sense, but to get an idea you could ftp crafty
at ftp.crafty.cis.uab.edu   /pub/hyatt ..
..and get a look at the huge C source needed for the search.

>The recursive structure can be implemented by feeding the results to
>FIFOs which feed back to the input.  Something like this:
>
>       {=          = Evaluator = FIFO }
>       {= Parallel = Evaluator = FIFO }          Lookup   Feed
>Start =>= search   = Evaluator = FIFO => Merge > Cached > back to
>       {= Modules  = Evaluator = FIFO }          Path     Start
>       {=          = Evaluator = FIFO }
>
>Or you can arrange the data flow in a tree structure that
>mimics the search tree.  The processing rate is likely to be
>limited by the data path, but a rate of 12.5MHz per output
>tree branch seems acheivable (A 64-bit wide bus at 50MHz).
>
>If the evaluator is the bottleneck, and we assume an evaluator can
>be pipelined to process a board state in an average 500ns, then you
>would need only 6 of these to keep up with the 12.5MHz  path.
>
>The cache will also be a bottleneck, since to be most effective, it
>should be shared by all branches.  You'd probably want to construct a
>multiport cache by time sharing it among several branches.  A cache
>cycling at 100 MHz could service 8 branches running at 12.5 MHz.
>
>
>--
>Don Husby <husby@fnal.gov>             http://www-ese.fnal.gov/people/husby
>Fermi National Accelerator Lab                          Phone: 630-840-3668
>Batavia, IL 60510                                         Fax: 630-840-5406

Article: 19455
Subject: Re: Dumb question springing from a discussion about chess on a chip...
From: diep@xs4all.nl (Vincent Diepeveen)
Date: Wed, 22 Dec 1999 14:02:45 GMT
Links: << >> << T >> << A >>

On Tue, 21 Dec 1999 10:34:46 -0500, Ray Andraka <randraka@ids.net>
wrote:

>That would be one possible partition.  You can probably pipeline the evaluation so that
>you might have several evaluations in progress at once.  That way you can get more than
>the 100 clocks per evaluation in the case you mention below.  Again, I'd have to sit down
>and noodle over the algorithms to get a really good partition and implementation.

Thanks Ray,

I am interested in getting an eval under 2 usec from the FPGA. Whether
that's a single eval or more doesn't interest me much, as getting more
boards at the same time hardly speeds up the search.

If it's more than 2 usec to get a bunch of positions or a single
position then it's not smart to put my chessprogram in hardware
anyway.

>For hardware, your best bet would probably be to buy one of the commercially available
>boards out there.  Many have PCI interfaces, some of which use the FPGA for the PCI.
>Check out www.optimagic.com for a pretty comprehensive listing of available boards.

Thanks for the link, i'll check it out!

>You'll want to partition the algorithm between the processor and the FPGA before you make
>a final selection of the board so that you make sure you have the right amount of and
>connections to external memory for the application.  There are nearly as many board
>architectures as there are boards.

>Vincent Diepeveen wrote:

>> In <385E7128.4335A25A@ids.net> Ray Andraka <randraka@ids.net> writes:
>>
>> Thanks Ray,
>>
>> if i start it in a simple way, say in next way:
>>
>>   instead of writing all the difficult search stuff in FPGA,
>>   i only write the evaluation in FPGA. This can be done in parallel
>>   of course to a big extend.
>>   So it's smart to do it in fpga.
>>
>>   If this can be done under 1 usec, then that would be great.
>>   Even 2 usec is acceptible.
>>   If it's needing more than 10 usec then that would suck bigtime.
>>
>>   Basically it must be able to evaluate 200,000 times a second.
>>
>> I bet that this is technical a lot simpler than writing the whole search with
>> big amountof memory in fpga.
>>
>> My evaluation is gigantic though. First i generate a datastructure
>> with a lot of information, which is later used everywhere in the evaluation,
>> so that's at least some clocks extra.
>>
>> If i'm not mistaken in FPGA the eval at 50Mhz may need in totally 100 clocks
>> to still deliver 500k evaluations a second, right?
>>
>> What kind of components would i need for this?
>>
>> How about a PCI card, are those already available?
>>
>>
>>
>> >
>> >Vincent Diepeveen wrote:
>> >
>> >> On Sat, 18 Dec 1999 12:50:33 -0500, Ray Andraka <randraka@ids.net>
>> >> wrote:
>> >>
>> >> >
>> >> >
>> >> >Dann Corbit wrote:
>> >> >
>> >> >> "Ray Andraka" <randraka@ids.net> wrote in message
>> >> >> news:385B1DEE.7517AAC7@ids.net...
>> >> >> > The chess processor as you describe would be sensible in an FPGA.  Current
>> >> >> > offerings have extraordinary logic densities, and some of the newer FPGAs
>> >> >> have
>> >> >> > over 500K of on-chip RAM which can be arranged as a very wide memory.
>> >> >> Some of
>> >> >> > the newest parts have several million 'marketing' gates available too.
>> >> >> FPGAs
>> >> >> > have long been used as prototyping platforms for custom silicon.
>> >> >>
>> >> >> I am curious about the memory.  Chess programs need to access at least tens
>> >> >> of megabytes of memory.  This is used for the hash tables, since the same
>> >> >> areas are repeatedly searched.  Without a hash table, the calculations must
>> >> >> be performed over and over.  Some programs can even access gigabytes of ram
>> >> >> when implemented on a mainframe architecture.  Is very fast external ram
>> >> >> access possible from FPGA's?
>> >> >
>> >> >This is conventional CPU thinking.  With the high degree of parallelism in the
>> >>
>> >> No this is algorithmic speedup design.
>> >>
>> >
>> >What I meant by this is that just using the FPGA to accelerate the CPU algorithm
>> >isn't necessarily going to give you all the FPGA is capable of doing.  You need to
>> >rethink some of the algorithm to optimize it to the resources you have available in
>> >the FPGA.  The algorithm as it stands now is at least somewhat tailored to a cpu
>> >implementation.  It appears your thinking is jsut using the FPGA to speed up the
>> >inner loop, where what I am proposing is to rearrange the algorithm so that the FPGA
>> >might for example look at the whole board state on the current then next move.  In a
>> >CPU based algorithm, the storage is cheap and the computation is expensive.  In an
>> >FPGA, you have an opportunity for very wide parallel processes (you can even send a
>> >lock signal laterally across process threads).  Here the processing is generally
>> >cheaper than the storage of intermediate results.  The limiting factor is often the
>> >I/O bandwidth, so you want to rearrange your algorithm to tailor it to the quite
>> >different limitations of the FPGA.
>> >
>> >> Branching factor (time multiplyer to see another move ahead)
>> >> gets better with it by a large margin.
>> >>
>> >> So BF in the next formula gets better
>> >>
>> >>   # operations in FGPA   =  C *  (BF^n)
>> >>       where n is a positive integer.
>> >>
>> >> >FPGA and the large amount of resources in some of the more recent devices, it
>> >> >may very well be that it is more advantageous to recompute the values rather
>> >> >than fetching them.  There may even be a better approach to the algorithm that
>> >> >just isn't practical on a conventional CPU.  Early computer chess did not use
>> >> >the huge memories.  I suspect the large memory is more used to speed up the
>> >> >processing rather than a necessity to solving the problem.
>> >>
>> >> Though  #operations used by deep blue was incredible compared to
>> >> any program of today at world championship 1999 many programs searched
>> >> positionally deeper (deep blue 5 to 6 moves ahead some programs
>> >> looking there 6-7 moves ahead).
>> >>
>> >> This all because of these algorithmic improvements.
>> >>
>> >> It's like comparing bubblesort against merge sort.
>> >> You need more memory for merge sort as this is not in situ but
>> >> it's O (n log n). Take into account that in computergames the
>> >> option to use an in situ algorithm is not available.
>> >>
>> >> >> > If I were doing such I design in an FPGA however, I would look deeper to
>> >> >> see
>> >> >> > what algorithmic changes could be done to take advantage of the
>> >> >> parallelism
>> >> >> > offered by the FPGA architecture.  Usually that means moving away from a
>> >> >> > traditional GP CPU architecture which is limited by the inherently serial
>> >> >> > instruction stream.  If you are trying to mimic the behavior of a CPU, you
>> >> >> would
>> >> >> > possibly do better with a fast CPU, as you will get be able to run those
>> >> >> at a
>> >> >> > higher clock rate.  The FPGA gains an advantage over CPUs when you can
>> >> >> take
>> >> >> > advantage of parallelism to get much more done in a clock cycle than you
>> >> >> can
>> >> >> > with a CPU.
>> >> >>
>> >> >> The ability to do many things at once may be a huge advantage.  I don't
>> >> >> really know anything about FPGA's, but I do know that in chess, there are a
>> >> >> large number of similar calcutions that take place at the same time.  The
>> >> >> more things that can be done in parallel, the better.
>> >> >
>> >> >Think of it as a medium for creating a custom logic circuit.  A conventional CPU
>> >> >is specific hardware optimized to perform a wide variety of tasks, none
>> >> >especially well.  Instead we can build a circuit the specifically addresses the
>> >> >chess algorithms at hand.  Now, I don't really know much about the algorithms
>> >> >used for chess.  I suspect one would look ahead at all the possibilities for at
>> >> >least a few moves ahead and assign some metric to each to determine the one with
>> >> >the best likely cost/benefit ratio.  The FPGA might be used to search all the
>> >> >possible paths in parallel.
>> >>
>> >> My program allows parallellism. i need bigtime locking for this, in
>> >> order to balance the parallel paths.
>> >>
>> >> How are the possibilities in FPGA to press several of the same program
>> >> at one cpu, so that inside the FPGA there is a sense of parallellism?
>> >>
>> >> How about making something that enables to lock within the FPGA?
>> >>
>> >> It's not possible my parallellism without locking, as that's the same
>> >> bubblesort versus merge sort story, as 4 processors my program gets
>> >> 4.0 speedup, but without the locking 4 processors would be a
>> >> lot slower than a single sequential processor.
>> >>
>> >> >> > That said, I wouldn't recommend that someone without a sound footing in
>> >> >> > synchronous digital logic design take on such a project.  Ideally the
>> >> >> designer
>> >> >> > for something like this is very familiar with the FPGA architecture and
>> >> >> tools
>> >> >> > (knows what does and doesn't map efficiently in the FPGA architecture),
>> >> >> and is
>> >> >> > conversant in computer architecture and design and possibly has some
>> >> >> pipelined
>> >> >> > signal processing background (for exposure to hardware efficient
>> >> >> algorithms,
>> >> >> > which are usually different than ones optimized for software).
>> >> >> I am just curious about feasibility, since someone raised the question.  I
>> >> >> would not try such a thing by myself.
>> >> >>
>> >> >> Supposing that someone decided to do the project (however) what would a
>> >> >> rough ball-park guestimate be for design costs, the costs of creating the
>> >> >> actual masks, and production be for a part like that?
>> >> >
>> >> >The nice thing about FPGAs is that there is essentially no NRE or fabrication
>> >> >costs.  The parts are pretty much commodity items, purchased as generic
>> >> >components.  The user develops a program consisting of a compiled digital logic
>> >> >design, which is then used to field customize the part.  Some FPGAs are
>> >> >programmed once during the product manufacturer (one time programmables include
>> >> >Actel and Quicklogic).  Others, including the Xilinx line, have thousands of
>> >> >registers that are loaded up by a bitstream each time the device is powered up.
>> >> >The bitstream is typically stored in an external EPROM memory, or in some cases
>> >> >supplied by an attached CPU.  Part costs range from under $5 for small arrays to
>> >> >well over $1000 for the newest largest fastest parts.
>> >>
>> >> How about a program that's having thousands of chessrules and
>> >> incredible amount of loops within them and a huge search,
>> >>
>> >> So the engine & eval only equalling 1.5mb of C source code.
>> >>
>> >> How expensive would that be, am i understaning here that
>> >> i need for every few rules to spent another $1000 ?
>> >
>> >It really depends on the implementation.   The first step in finding a good FPGA
>> >implementation is repartitioning the algorithm.  This ground work is often the
>> >longest part of the FPGA design cycle, and it is a part that is not even really
>> >acknowledged in the literature or by the part vendors.  Do the system work up front
>> >to optimize the architecture for the resoucrces you have available, and in the end
>> >you will wind up with something much better, faster, and smaller than anything
>> >arrived at by simple translation.
>> >
>> >At one extreme, one could just us the FPGA to instantiate custom CPUs with a
>> >specialized instruction set for the chess program.  That approach would likely net
>> >you less performance than an emulator for the custom CPU running on a modern
>> >machine.  The reason for that is the modern CPUs are clocked at considerably higher
>> >clock rates than a typical FPGA design is capable of, so even if the emulation takes
>> >an average of 4 or 5 cycles for each custom instruction, it will still keep up with
>> >or outperform the FPGA.  Where the FPGA gets its power is the ability to do lots of
>> >stuff at the same time.   To take advantage of that, you usually need to get away
>> >from an instruction based processor.
>> >
>> >
>> >
>> >>
>> >>
>> >> >The design effort for the logic circuit you are looking at is not trivial.  For
>> >> >the project you describe, the bottom end would probably be anywhere from 12
>> >> >weeks to well over a year of effort depending on the actual complexity of the
>> >> >design, the experience of the designer with the algorithms, FPGA devices and
>> >> >tools.
>> >>
>> >> I needed years to write it in C already...
>> >>
>> >> Vincent Diepeveen
>> >> diep@xs4all.nl
>> >>
>> >> >> --
>> >> >> C-FAQ: http://www.eskimo.com/~scs/C-faq/top.html
>> >> >>  "The C-FAQ Book" ISBN 0-201-84519-9
>> >> >> C.A.P. Newsgroup   http://www.dejanews.com/~c_a_p
>> >> >> C.A.P. FAQ: ftp://38.168.214.175/pub/Chess%20Analysis%20Project%20FAQ.htm
>> >>
>> >> >--
>> >> >-Ray Andraka, P.E.
>> >> >President, the Andraka Consulting Group, Inc.
>> >> >401/884-7930     Fax 401/884-7950
>> >> >email randraka@ids.net
>> >> >http://users.ids.net/~randraka
>> >
>> >
>> >
>> >--
>> >-Ray Andraka, P.E.
>> >President, the Andraka Consulting Group, Inc.
>> >401/884-7930     Fax 401/884-7950
>> >email randraka@ids.net
>> >http://users.ids.net/~randraka
>> >
>> >
>> --
>>           +----------------------------------------------------+
>>           |  Vincent Diepeveen      email:  vdiepeve@cs.ruu.nl |
>>           |  http://www.students.cs.ruu.nl/~vdiepeve/          |
>>           +----------------------------------------------------+
>
>
>
>--
>-Ray Andraka, P.E.
>President, the Andraka Consulting Group, Inc.
>401/884-7930     Fax 401/884-7950
>email randraka@ids.net
>http://users.ids.net/~randraka
>
>

Article: 19456
Subject: XC4000E
From: elynum@my-deja.com
Date: Wed, 22 Dec 1999 15:08:26 GMT
Links: << >> << T >> << A >>

I have an Xilinx XC400E 84 pin PLCC fpga.  I need more I/O.  I've
already used all the regular I/O.  Can I use the pins like pin 13 which
is I/O PGCK?  How many other I/O can I use and is there a special
configuration involved?


Sent via Deja.com http://www.deja.com/
Before you buy.

Article: 19457
Subject: Re: XC4000E
From: Etienne Racine <etienne@cae.ca>
Date: Wed, 22 Dec 1999 11:05:42 -0500
Links: << >> << T >> << A >>

Hi,

elynum@my-deja.com wrote:

> I have an Xilinx XC400E 84 pin PLCC fpga.  I need more I/O.  I've
> already used all the regular I/O.  Can I use the pins like pin 13 which
> is I/O PGCK?  How many other I/O can I use and is there a special
> configuration involved?

If you don't need boundary-scan and are not already using them, there are
four more pins you can use: TCK, TMS, TDI and TDO. If I recall correctly,
the first three may be used as unrestricted I/Os, but TDO has to be used as
an output-only pin.

You must make sure not to have the BSCAN symbol instanciated in your
design, and make sure you do *NOT* toggle signals on those pins before your
FPGA is configured.

Regards,

Etienne.
--
      ______ ______
*****/ ____// _  \_\*************************************************
*   / /_/_ / /_/ / /       Etienne Racine, Hardware Designer        *
*  / ____// __  /_/           Visual Systems Engineering            *
* / /_/_ / / /\ \ \              CAE Electronics Ltd.               *
*/_____//_/_/**\_\_\*************************************************

Article: 19458
Subject: Re: Speed grade
From: "Keith Jasinski, Jr." <jasinski@mortara.com>
Date: Wed, 22 Dec 1999 11:32:07 -0600
Links: << >> << T >> << A >>

The only thing that should be tied to the set or reset of a flip-flop is the
reset pin of the device (or whatver soft-reset circuit you use).  If you
need to set or reset a ff, use a mux in front of the device and clock it in
as a synchronous reset.

AMI Technologies (www.amis.com) will give you a very good tutorial on
synchronous practices.  They show you why you shouldn't do something, as
well as "If you were going to do this, do this instead" examples...

--
Keith F. Jasinski, Jr.
kfjasins@execpc.com
Hal Murray <murray@pa.dec.com> wrote in message
news:83mo3p$6ko@src-news.pa.dec.com...
>
> > no self-clearing structures (f-f output to it's clear).  find another
way to make
> > that pulse! i've seen this too many times!
>
> I'd go much farther than that.  Any pulse that isn't a multiple
> of the clock frequency (made by clocking a FF in a FSM) is asking
> for troubles.
>
> How about a list of reasonable ways to make a shorter pulse?
> (and/or things to keep in mind when you do)
>
> I think one of the Xilinx ap-notes mentions at least one.  It
> may be a very old one.
>
> --
> These are my opinions, not necessarily my employers.

Article: 19459
Subject: Re: Speed grade
From: "Keith Jasinski, Jr." <jasinski@mortara.com>
Date: Wed, 22 Dec 1999 11:37:13 -0600
Links: << >> << T >> << A >>

The reason you need to do this (what you're doing is good) is that
metastability risks go down every time you put a FF in the chain.  When you
have asynchronous boundaries between clock domains, the rising edge of the
signal incoming might be mid-rise when the clock edge of the other domain
hits.  This can cause the output of the FF to go to mid-voltage
(metastability).  Since the clock edges of the next flip flop should have
the same edge as the one that went metastabile, it should be OK (but maybe
not).  Every time you add a FF to the chain, you exponentially reduce your
risk of the metastable condition reaching your functional circuit.

For the record, I usually use 2 FF in the path between two clock domains and
I am not aware of a failure.  In a really critical area, I would use 3.

My 2 cents for what it's worth...

--
Keith F. Jasinski, Jr.
kfjasins@execpc.com
Joel Kolstad <Joel.Kolstad@USA.Net> wrote in message
news:s5v8qnm3rg353@corp.supernews.com...
> rk <stellare@nospam.erols.com> wrote in message
> news:385EE12E.F6315E86@nospam.erols.com...
> > perhaps another one ...
> >
> > no self-clearing structures (f-f output to it's clear).  find another
way
> to make
> > that pulse! i've seen this too many times!
>
> I've used a form of this when interfacing an FPGA to a DSP's bus (both run
> off different clocks).  The DSP's clock registers a write strobe in flip
> flop #1 (while the data bus contents are registered in a bunch of other
> registers).  This is followed by two flip flops, #2 and #3 (#2 acting as a
> synchornizer, more or less) clocked by the FPGA's clock.  The output of #3
> goes to the FPGA's state machines, etc., and also goes back to the
> asynchronous reset input on 'flop #1.  (Note the the DSP is slow compared
to
> the FPGA; the output of 'flop #3 fires long before the DSP gets around to
> sending another write strobe.)
>
> Is this dangerous (doesn't seem to me to be...)?  ...and what's a better
> alternative?
>
> Thanks...
>
> ---Joel Kolstad
>
>
>

Article: 19460
Subject: Re: fpga cost
From: steenl@pal.ECE.ORST.EDU (Steen Larsen)
Date: 22 Dec 1999 18:19:56 GMT
Links: << >> << T >> << A >>

I will second Rich's comments on sockets.  Additionally though, PCBs seem to be
becoming cost effective solutions to sockets when prototyping stuff.  I am 
doing a PCI card around a 10K30E 208QFP.  Rather than get a PCI protocard,
208QFP -> 100milPGA converter, plus wirewrap/sig-int worries, it seems better
to layout the card, putting the 2.5V regulator where I want it, etc.  For 6
15" boards at www.pcbexpress.com the board cost is $100.  True, they are only
prototype solutions, but seem an economical route where I can place test pads
where I want.  

This is not an endorsement for pcbexpress, I am just a happy customer.

-steen

Article: 19461
Subject: Re: XC4000E
From: Ray Andraka <randraka@ids.net>
Date: Wed, 22 Dec 1999 13:49:40 -0500
Links: << >> << T >> << A >>

Look at the data sheet.  It describes all the dual use pins and the
conditions where they are not a regular I/O.  PGCLK, and the other clock
pins are regular I/Os that have additional connections directly to the
global clock buffers.  There are also a number of IO pins that are used for
configuration and become a normal IO after configuration is complete.  You
can use those as long as you are careful about how you drive them during
configuration.  Be especially careful about the JTAG clock and enable pins,
as the wrong activity on them can screw up the configuration (there is a
xilinx app note dealing with that).  All of these pins become normal I/O
after configuration except in some cases where you instantiate specific
components in the design (for example, the 4 JTAG pins become a JTAG
interface if you use the BSCAN primitive).  You can also use the M[2:0]
pins, but be aware that they are not like regular IOBs (no flip flops, and
unidirectional) and that were are some tools issues with them (IIRC they
don't show up in the floorplanner, and they aren't covered by the timing
constraints)

elynum@my-deja.com wrote:

> I have an Xilinx XC400E 84 pin PLCC fpga.  I need more I/O.  I've
> already used all the regular I/O.  Can I use the pins like pin 13 which
> is I/O PGCK?  How many other I/O can I use and is there a special
> configuration involved?
>
> Sent via Deja.com http://www.deja.com/
> Before you buy.

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 19462
Subject: JOBS: Open Positions at Triscend: The Configurable System-on-Chip Company
From: "Danesh Tavana" <danesh@triscend.com>
Date: Wed, 22 Dec 1999 11:03:22 -0800
Links: << >> << T >> << A >>

Triscend is leading the era of user-configurable semiconductors--the
Configurable System-on-Chip. This CSoC device, combined with powerful and
intuitive software, enables designers to "instantly" create a customized
processing platform for embedded system applications that face intense
time-to-market pressure.

Located in Mountain View, California, Triscend is dedicated to advancing the
state-of-the-art in semiconductor IC and software design to fundamentally
change how embedded systems are designed.

Triscend has many openings and opportunities, see our web site at
www.triscend.com then send us an email (including the HR Code) with your
resume to careers@triscend.com, or you can fax us a resume (including the HR
Code) at 650-934-9393.

Some of the key positions we are currently hiring include:

Software Algorithm Developer: Design and develop algorithms for technology
mapping, placement, or routing of Triscend's Configurable System on Chip
devices. Participate in new CSoC architecture research.

Requirements: BSCS or BSEE, with 2 to 5 or more years industry
object-oriented experience developing mapping, placement, routing algorithms
for FPGAs or ASIC using Java or C++.  Experience developing algorithms or
supporting code for placement, routing, or technology mapping is a plus.
Must have strong interest in data modeling and compute-intensive software.

Software Algorithm Developer: Design and develop interfaces between Triscend
software and other CAE software, including schematic capture, logic
synthesis, and logic simulation.

Requirements: BSCS or BSEE, with 2 or more years industry object-oriented
experience using Java or C++. Must have experience developing software
related to schematic capture, logic synthesis, or logic simulation. Good
familiarity with EDIF, Verilog or VHDL a plus.

Article: 19463
Subject: Re: State machine ok with binary encoding but unstable with one hot encoding
From: "Marc Battyani" <Marc_Battyani@csi.com>
Date: Wed, 22 Dec 1999 22:02:32 +0100
Links: << >> << T >> << A >>

<eml@riverside-machines.com.NOSPAM> wrote in message
news:3860a092.4919168@news.dial.pipex.com...
> On Tue, 14 Dec 1999 10:48:35 +0100, "Marc Battyani"
> <Marc_Battyani@csi.com> wrote:
>
> >I don't understand why the following state machine is ok when I use
binary
> >state encoding (with safest mode) but oscillate when I use one hot
encoding
> >(with safest mode also).
>
> firstly, there's a couple of problems with your code. your type
.../...

It was just a partial exerpt of code edited (badly) for clarity and it does
not compile alone.

> second, the if-else issue is a red herring. any synthesiser that
> requires the else's that you eventually put in your code:
>
> > when InAddrRead =>
> >    if nAddrStb = '1' then
> >        State <= Waiting;
> >    else
> >        State <= InAddrRead;
> >    end if;
>
> is simply broken, and fpga express certainly doesn't have this
> problem. if it did, the vast majority of FSM's out there wouldn't
> work.

It's an interresting point because it's the only modification that I made
and it works well now.
If you have another idea of what could have been the pb, I will be
interested to hear it

.../...

> if there aren't any other typos in your process, then  i suspect that
> your synth's 'safe' mechanism is broken. this isn't covered by the
> language definition, and is synth-specific - if you really need a safe
> machine, how about coding it by hand?

I put the safe mode "on" just to be sure to always be in a known state.

Marc Battyani

Article: 19464
Subject: Re: Dumb question springing from a discussion about chess on a chip...
From: ahuramazda@my-deja.com
Date: Wed, 22 Dec 1999 21:25:25 GMT
Links: << >> << T >> << A >>


> I think the search stuff is the easier part.
It's basically recursive
> and decomposable:
>   Assume you represent a board state as a large
vector, for example
> 4 bits can represent the 12 unique pieces, so a
64x4 vector can represent
> the entire board state.  (Probably less).   If
you create a search
> module that operates on a single square
(probably a lookup table or
> Altera CAM), then it can work serially on a
square at a time.  This
> can probably be implemented as a fairly small
state machine.  It would
> be possible to fit many of these on a single
chip and have them run
> in parallel.
There is more than the pieces to keep tabs on. If
the last move was a pawn
move two squares forward that makes a special
type of pawn move possible.
And if the king or rooks have moved they cannot
be used for making a
castling move, so they can then be seen as
separate states. And if the king
is attacked by opponents pieces or moves trough
one or more attacked squares
castling is not allowed. All this makes sthe
state space bigger the number
of bits grow to something like 64*8, that is
still quite doable. The big
cruncher is that one has to keep tabs of the
fifty previous states because
of a rule that stipulates draw if fifty moves are
made w.o. a capture or a
pawn move. That can be fixed by a counter that
counts the latest ocurrence
of such a move but you would still have to cashe
the positions since such a
move because if a position is reachable a third
time the game can be called
a draw, so you need to make a lookup in the move
history to have the ability
to correctly know the gamevalue. The thing is
that the move history is not
possible to compute at a reached node, you can
reach a node by many routes
and there need not be any identical positions
between start and end
position. The naive state space for each node is
64*8*50 +1 (for color) aka
3.1 kByte, there are compression tecniques but
they depend on dynamic data
structures. You can alternatively store a hash
value of preferrably 64 bits,
then the
 state needs 464 byte + 1 bit. And besides any
naive paralellization
implementations w.o.
sufficient (big,fast,costly) communication and
syncronization is doomed to
achieve a speedup of the squareroot of the number
of nodes or less. A state
impaired chessmachine DeepBlueJr (it was a demo)
was not to impressive some
time ago when playing a software program on a
slow laptop

Regards DAn









Sent via Deja.com http://www.deja.com/
Before you buy.

Article: 19465
Subject: Working @ Home
From: "Xanatos" <deletemeaoe_londonfog@hotmail.com>
Date: Wed, 22 Dec 1999 21:58:25 GMT
Links: << >> << T >> << A >>

Hello fellow fpga'ers,

I'm fairly new to the FPGA/ASIC scene (and of course, the tools).

I just wanted to post a question regarding working at home....There are a
few people where I work who are able to do some work at home in some
capacity - obviously, place and route tools and synthesis tools are probably
out of the question. But what I am interested in is RTL simulators and
waveform viewers that I could use at home. I can dial into work and access
the Unix workstations to run, for example, VCSI, and download a dump file to
my home computer. I also know there are some freebee waveform viewers out
there as well.

Just wanted to hear what tools some of you use at home for working (I know
it can be a blasphemy sometimes, but hey, its better than sticking around
the office for 15+ hours). Even if I had to pay 300 bucks to get a decent
tool set at home, it would save me more in the long run. I have a Linux
server and a Windows box at home that I can use - the Windows machine is 450
MHz, 128 Megs of ram and the Linux box is a Pent Pro 200 with 128 Megs of
ram...what I'm getting at is that I have a Unix flavor machine and a windows
machine that I could use.

Oh, and I'm primarily using Verilog as a language. I do use VHDL sometimes,
so if it is a waveform viewer of simulator for a particular language, please
let me know.

I've taken a look at some of the links in the verilog FAQ, but again, I want
to get an opinion from other designers.

Thanks....

Xanatos

Article: 19466
Subject: Bi-directional 3-State Buffer
From: khKim <dreamer@palgong.kyungpook.ac.kr>
Date: Thu, 23 Dec 1999 13:33:29 +0900
Links: << >> << T >> << A >>

I am using FLEX10K of altera.
How do I make the Bi-directional 3-State Buffer using VHDL?

Article: 19467
Subject: PCI slot 3.3V pins.
From: Mahboob Ahmed <m.ahmed@ieee.org>
Date: Thu, 23 Dec 1999 04:42:14 GMT
Links: << >> << T >> << A >>

Most of the PCI slot in PCs, and SUN workstations I have checked, do not
provide 3.3V in the PCI slot, except the new Intel boards, which do
have 3.3V supply. Why the common PCs and workstations do not provide
3.3V supply in PCI slot as specified in PCI Rev? and why is it
disabled? If a PCI card needs 3.3V in 5V 32-bit slot then what
should be done to enable the dedicated 3.3V supply pins.


Sent via Deja.com http://www.deja.com/
Before you buy.

Article: 19468
Subject: Re: Dumb question springing from a discussion about chess on a chip...
From: mushh@slip.net (Dave Decker)
Date: Thu, 23 Dec 1999 05:59:32 GMT
Links: << >> << T >> << A >>

There is at least some agreement that the best thing for the FPGA to
do initially, is a stand alone position evaluation. Let the micro keep
track of 50 past positions and all that. 

What if the FPGA just did eval?

How complex is the eval algorithm? Remember this is an FPGA group, not
a chess group, so most of us need to be told what a representative
algorithm is. Any pointers to just the eval algorithm in 'C' or pseudo
code? 

Exactly how many points for what, probably doesn't change the FPGA
complexity, or speed very much. Just, the number of things that have
to be considered.

Let's see...
I'll get some points for having a piece or pawn on a square. (easy)

Probably get more points for being on generally more desirable
squares, like near the center. (easy)

Probably get more points for attacking opposition pieces. (examine all
squares where piece could move)

Probably get more points for particular formations, and having castled
etc. (look for same)

Now, we subtract points for all the same stuff that's true about the
other side.

What else?

The eval algorithm doesn't have to judge stuff like pins and threats
does it? All that next move stuff will be picked up by the eval of the
next move, right?

I'm sure the weightings will need to change depending on the game's
stage of development. The nice thing about FPGAs are that the whole
configuration, not just the weights, could be changed on entering mid
game and again when entering the end game.

Thanks,

Dave Decker

Article: 19469
Subject: Crossing clock domain boundaries[ was Speed grade]
From: Ray Andraka <randraka@ids.net>
Date: Thu, 23 Dec 1999 01:47:02 -0500
Links: << >> << T >> << A >>

In an off-line discussion, I realized I goofed this up a little (this still works,
but it is more awkward than what I've been using).  Some reason I was thinking gray
code, but the machine I use is a binary count sequence, which has the advantage of
requiring no decode to get the write pulse.  The toggle input still only affects
one flip-flop so where you have slow clocks, you can get away without the
synchronizing flip-flop between the toggle FF and the state machine.  I've
corrected the state assignments below.

Ray Andraka wrote:

> A better method is to toggle a flag flip flop each time you write the DSP
> register (ie you have one extra bit on the register which is loaded with its
> inverted output).  Then take that flag, synchronize it to the FPGA domain, and
> then use a change in state to generate a write pulse in the FPGA clock domain.
> You can minimize the latency hit if you design the write pulse state machine
> (gray code 4 states) so that the flag input is only sensed by one flip-flop.
> The way you are doing it, can get you into trouble if the DSP comes in and sets
> the flop near the time you do the reset.  This one works as long as the FPGA is
> a little faster than the DSP (the smaller the differential, the less margin you
> have though)
>
>  valid: process( GSR, clk)
>     variable state:std_logic_vector(1 downto 0);
>     variable sync:std_logic;
>     begin
>         if GSR='1' then
>             sync:='0';
>             state:="00";
>         elsif clk'event and clk='1' then
>             sync:=toggle;
>             case state is
>                 when "00" =>
>                     if sync='1' then
>                         state:="01";
>                     else
>                         state:="00";
>                     end if;
>                     wp<='0';
>                 when "01" =>
>                     state:="10";
>                     wp<='1';
>                 when "10" =>
>                     if sync='0' then
>                         state:="11";
>                     else
>                         state:="10";
>                     end if;
>                     wp<='0';
>                 when "11" =>
>                     state:="00";
>                     wp<='1';
>                 when others=>
>                     null;
>             end case;
>         end if;
>     end process;
>
> Joel Kolstad wrote:
>
> > rk <stellare@nospam.erols.com> wrote in message
> > news:385EE12E.F6315E86@nospam.erols.com...
> > > perhaps another one ...
> > >
> > > no self-clearing structures (f-f output to it's clear).  find another way
> > to make
> > > that pulse! i've seen this too many times!
> >
> > I've used a form of this when interfacing an FPGA to a DSP's bus (both run
> > off different clocks).  The DSP's clock registers a write strobe in flip
> > flop #1 (while the data bus contents are registered in a bunch of other
> > registers).  This is followed by two flip flops, #2 and #3 (#2 acting as a
> > synchornizer, more or less) clocked by the FPGA's clock.  The output of #3
> > goes to the FPGA's state machines, etc., and also goes back to the
> > asynchronous reset input on 'flop #1.  (Note the the DSP is slow compared to
> > the FPGA; the output of 'flop #3 fires long before the DSP gets around to
> > sending another write strobe.)
> >
> > Is this dangerous (doesn't seem to me to be...)?  ...and what's a better
> > alternative?
> >
> > Thanks...
> >
> > ---Joel Kolstad
>
> --
> -Ray Andraka, P.E.
> President, the Andraka Consulting Group, Inc.
> 401/884-7930     Fax 401/884-7950
> email randraka@ids.net
> http://users.ids.net/~randraka



--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 19470
Subject: Re: Bi-directional 3-State Buffer
From: "Mark van de Belt" <mvandebelt@roaxNOSPAM.nl>
Date: Thu, 23 Dec 1999 10:37:16 +0100
Links: << >> << T >> << A >>

khKim wrote in message <3861A619.174A619@palgong.kyungpook.ac.kr>...
>I am using FLEX10K of altera.
>How do I make the Bi-directional 3-State Buffer using VHDL?
>
>
You should define the port as INOUT. After this you set the associated
signal to 'Z' if you want it tri-stated and read from the input at that
time. Always set the signal to 'Z' if you don't want any output! As soon as
you want to use it as an output you define the output as '0' or '1'. The
next listing gives an example of a RAM cell with read and write actions
described:

entity ram is
port (
  data  : inout std_logic_vector(7 downto 0);
  n_wr  : in  std_logic;
  n_oe  : in  std_logic;
);
end ram;

architecture ram_arch of ram is

begin
signal  dataregister  : std_logic_vector(7 downto 0);

-- -------------------------------------------------------------------------
-- Read data from the databus (use data as OUTPUT)
--
read_data: process(n_oe, dataregister)
begin
    if n_oe = '0' then
        data <= dataregister;
    else
        data <= (others => 'Z');
    end if;
end process read_data;

-- -------------------------------------------------------------------------
-- Write data to dataregister (use data as INPUT)
--
write_data: process(n_wr)
begin
    if falling_edge(n_wr) then
        dataregister <= data;
    end if;
end process;

end ;
I hope this helps,
Mark van de Belt

Article: 19471
Subject: Re: Global buffer insertion (Synplify/Flex10K)
From: Thomas Rathgen <trathgen@gmx.de>
Date: Thu, 23 Dec 1999 10:59:50 +0100
Links: << >> << T >> << A >>

Hi, 

I don't know nothing about Synplify, but it's quite easy to implement in
MAX II.
All you need to do is to give your global net a name (I call my clock
nets clk and clk2 ...).
Thereafter insert  

**
LOGIC_OPTIONS
BEGIN
	|clk2 :	GLOBAL_SIGNAL = ON;
END;
**

into your acf. It should be recognised correctly as global net.

Note:
1. Global nets could *NOT* feed any combinatorical logic. If so it would
be forced to be a normal net.
2. If you use global nets corresponding inputs (meaning dedicated inputs
or clock inputs) could not be
   used.

Hope Synplify will know what to do ...


Tom

Article: 19472
Subject: Re: Dumb question springing from a discussion about chess on a chip...
From: vdiepeve@cs.uu.nl (Vincent Diepeveen)
Date: 23 Dec 1999 11:06:14 GMT
Links: << >> << T >> << A >>

In <83rfjv$821$1@nnrp1.deja.com> ahuramazda@my-deja.com writes:

>> I think the search stuff is the easier part.
>It's basically recursive
>> and decomposable:
>>   Assume you represent a board state as a large
>vector, for example
>> 4 bits can represent the 12 unique pieces, so a
>64x4 vector can represent
>> the entire board state.  (Probably less).   If
>you create a search
>> module that operates on a single square
>(probably a lookup table or
>> Altera CAM), then it can work serially on a
>square at a time.  This
>> can probably be implemented as a fairly small
>state machine.  It would
>> be possible to fit many of these on a single
>chip and have them run
>> in parallel.
>There is more than the pieces to keep tabs on. If
>the last move was a pawn
>move two squares forward that makes a special
>type of pawn move possible.
>And if the king or rooks have moved they cannot
>be used for making a
>castling move, so they can then be seen as
>separate states. And if the king
>is attacked by opponents pieces or moves trough
>one or more attacked squares
>castling is not allowed. All this makes sthe
>state space bigger the number
>of bits grow to something like 64*8, that is
>still quite doable. The big
>cruncher is that one has to keep tabs of the
>fifty previous states because
>of a rule that stipulates draw if fifty moves are
>made w.o. a capture or a
>pawn move. That can be fixed by a counter that
>counts the latest ocurrence
>of such a move but you would still have to cashe
>the positions since such a
>move because if a position is reachable a third
>time the game can be called
>a draw, so you need to make a lookup in the move
>history to have the ability
>to correctly know the gamevalue. The thing is
>that the move history is not
>possible to compute at a reached node, you can
>reach a node by many routes
>and there need not be any identical positions
>between start and end
>position. The naive state space for each node is
>64*8*50 +1 (for color) aka
>3.1 kByte, there are compression tecniques but
>they depend on dynamic data
>structures. You can alternatively store a hash
>value of preferrably 64 bits,
>then the
> state needs 464 byte + 1 bit. And besides any
>naive paralellization
>implementations w.o.
>sufficient (big,fast,costly) communication and
>syncronization is doomed to
>achieve a speedup of the squareroot of the number
>of nodes or less. A state

Well, Deep Blue perhaps had squareroot (though something
like 20% to 30% speedup out of n processors was claimed), 
but my program gets factor n out of n cpu's if 
n is not too big.

The simple things are for example repetition (must
allow 1024 moves, and details from each iteration (my
program doesn't have recursion of course that doesn't 
work very well when searching parallel, instead it
uses double iteration).

The allocated datastructure for 4 processors is 2.5 mb RAM,
and it needs shared memory too.

Compare that with the 80s :)

>impaired chessmachine DeepBlueJr (it was a demo)
>was not to impressive some
>time ago when playing a software program on a
>slow laptop

Yes i nearly kicked it myself too, but it was doing
time management in USA, and not locally and only allowed 5 minutes
whole game, so somewhere at move 60 or something in a won position
i blundered when my time was getting out, and the interface
was pathetic and other levels not possible. My chessrating? 2254,
i'm playing national masterclass, but i'm not having any chesstitle
like international master or FIDE master. Organisation didn't
allow me to play with my chessprogram against Deep Blue. Deep
Blue Junior (screen CLEARLY mentionned it was deep blue junior in
several colors) played somewhere far in the library where there
was no power. Only laptops could play there. Even if i would have
managed to get a monitor there and some power and an internet
connection to the quad xeon, i would have forfeited anyway operating.
They clearly don't like to play other programs. 

Deep Blue (1997 version that played kasparov)
is an old design without hashtables inside the 
processors and using old search forms, and a pathetic 
form of parallellism. Each general purpose processor is
controlling and communicating with 16 chessprocessors.
Each processor gets 2 to 2.5 million positoins a second,
but effectively the designer of it, Hsu, estimates
it gets 200 million nodes a second. 

Deep blue searched 5 to 6 moves (10 to 11 ply) deep. 

To give some details, at least 10 programs in the
world championships 1999 would have outsearched deep blue
if they ever would play against it, though those 10 programs
are objectively getting about a 1000 times less nodes a second
(some of them, 3 of them were on supercomputers, so got 
millions of nodes a second).

IBM stocks got up 22% after deep blue won that last game
because of a major children blunder of Kasparov in the opening
in the last game, after getting a 2.5-2.5 playing a kind of
coffee house chess (play of kasparov far below my level).

It was the only game deep blue didn't play
bad (no room to make bad moves there), blunders of kasparov were
too big.

We will obviously never hear from deep blue again. There are
too many reasons to hide it. The only reason why it is getting so much
praised by a few scientific people was its incredible aggressive
anti-kasparov tuning (which is dubious objectively seen), 
which makes impression soon when playing a human.

Please let's not get into discussions too much about deepblue
here, it's better to take that to rec.games.chess.computer
or www.icdchess.com/ccc/.

Perhaps more interesting is to mention what technology was
used for chessprocessors (480 of them) of deep blue: 0.6 micron 
technology.

>Regards DAn

Vincent Diepeveen
diep@xs4all.nl
--
          +----------------------------------------------------+
          |  Vincent Diepeveen      email:  vdiepeve@cs.ruu.nl |
          |  http://www.students.cs.ruu.nl/~vdiepeve/          |
          +----------------------------------------------------+

Article: 19473
Subject: Re: Dumb question springing from a discussion about chess on a chip...
From: vdiepeve@cs.uu.nl (Vincent Diepeveen)
Date: 23 Dec 1999 11:16:37 GMT
Links: << >> << T >> << A >>

In <3861ac0f.494461296@news.slip.net> mushh@slip.net (Dave Decker) writes:

>There is at least some agreement that the best thing for the FPGA to
>do initially, is a stand alone position evaluation. Let the micro keep
>track of 50 past positions and all that. 
>
>What if the FPGA just did eval?
>
>How complex is the eval algorithm? Remember this is an FPGA group, not
>a chess group, so most of us need to be told what a representative
>algorithm is. Any pointers to just the eval algorithm in 'C' or pseudo
>code? 
>
>Exactly how many points for what, probably doesn't change the FPGA
>complexity, or speed very much. Just, the number of things that have
>to be considered.
>
>Let's see...
>I'll get some points for having a piece or pawn on a square. (easy)
>
>Probably get more points for being on generally more desirable
>squares, like near the center. (easy)
>
>Probably get more points for attacking opposition pieces. (examine all
>squares where piece could move)
>
>Probably get more points for particular formations, and having castled
>etc. (look for same)
>
>Now, we subtract points for all the same stuff that's true about the
>other side.
>
>What else?
>
>The eval algorithm doesn't have to judge stuff like pins and threats
>does it? All that next move stuff will be picked up by the eval of the

Of course it must judge pins and potential dangers, and a couple other
of thousands things. 

Basically patterns are easy, also in software as you can skip a lot of
them if a certain condition appears to be tested false, but the
big problem is the slow scanning code: how much territory do i occupy?
How active is my position? Is my king having a safe spot, is there
a majority pawn attack upcoming? 

Just scanning majority already eats up quite some time!

How much open files do i have and what patterns in that direction
all apply?

This are all simple things, there are a lot more tougher things.

Human researcher De Groot estimated that a masterclass
human on average has 100,000 patterns which he considers.

HOWEVER that are human patterns. With a human interpretation which
is more flexible than anything else.

A single pattern for a human already leads to A LOT of subrules for
a computer as the computer doesn't flexible apply a pattern.

The same research of course applies to the game of GO, draughts,
shogi, chinese chess and some other games that are a bit more complex
than 4 in a row is.

>next move, right?
>
>I'm sure the weightings will need to change depending on the game's
>stage of development. The nice thing about FPGAs are that the whole
>configuration, not just the weights, could be changed on entering mid
>game and again when entering the end game.

development, holes in position, outposts, material left on the board,
it all is integrated with each others. Now i still didn't mention
that everywhere attacks to squares and pieces must be pregenerated
too.

You can see a simple evaluation function in gnuchess. Crafty is not
a good example to see the evaluation from, as that's rewritten to
a bitoriented datastructure that's impossible to work with. 

Now obviously the bigger this evaluation is, the more suited it is for
hardware. The good thing about putting an evaluation in hardware would
be that you can do some pretty things like massively collecting
scores of the different patterns and then write meta code for that.

This is very hard to do in a competing software environment where
rudely collecting things is punished by major slow downs of evaluation.

Vincent Diepeveen

>Thanks,
>Dave Decker

--
          +----------------------------------------------------+
          |  Vincent Diepeveen      email:  vdiepeve@cs.ruu.nl |
          |  http://www.students.cs.ruu.nl/~vdiepeve/          |
          +----------------------------------------------------+

Article: 19474
Subject: Re: fpga cost
From: rk <stellare@nospam.erols.com>
Date: Thu, 23 Dec 1999 06:44:54 -0500
Links: << >> << T >> << A >>

Steen Larsen wrote:

> I will second Rich's comments on sockets.  Additionally though, PCBs seem to be
> becoming cost effective solutions to sockets when prototyping stuff.  I am
> doing a PCI card around a 10K30E 208QFP.  Rather than get a PCI protocard,
> 208QFP -> 100milPGA converter, plus wirewrap/sig-int worries, it seems better
> to layout the card, putting the 2.5V regulator where I want it, etc.  For 6
> 15" boards at www.pcbexpress.com the board cost is $100.  True, they are only
> prototype solutions, but seem an economical route where I can place test pads
> where I want.
>
> This is not an endorsement for pcbexpress, I am just a happy customer.

actually, for just about any reasonable circuit, i would go with a pcb and
power/ground planes.  wire-wrap with fast edge rates (some fpgas even two years ago
had output transition times at 1 ns or so, some even measured sub-nanosecond) or
high output counts is often a false economy.

for any high performance design i would tend to stay away from most sockets.
depending on the package type, of course, it just complicates the problem of
maintaining signal quality.  the cqfp type devices and the sockets available for
them come to mind, as they come with long leads (before they are trimmed and bent)
that are contacted at the ends by the sockets that i have seen.

for low to medium speed applications, say up to about 20 MHz or so, i have had good
luck with the plcc84.  these are easy to work with and the assembly of the plcc84
socket (w/ 0.1" pin spacings on the bottom) is easier for me than to mount the
plcc84 directly on the board.  typically, the output signals from devices in this
socket, which may go 6-8" on average (please, no flames about metric/SI units,
although that's a bit of a sore point these days) are basically of 54HCxx type
devices.  transition times of 6-8 ns or so, which are a bit easier to handle then
the most recent devices.  i have a drawer full of old devices that are useful for
applications where i need a few thousand gates of logic, which pops up fairly
frequently.

for high-speed applications or where there are many outputs driving busses, i
almost (well, not me, but a good technician) always just solder the part onto the
surface mount board and bet that i got the design right (using an otp technology
because of the application).  one project that i am on (day job) mandated the use
of sockets on the board (building one unit and shipping it, no prototypes) for the
cqfp256 package; the results were simply a nightmare.  the problems associated with
the socket (cqfp -> pga) seemed to be endless and we wound up getting rid of them,
the boards being damaged in the process - hand removing the large j-lead base, even
with good techs, resulted in lots of pads being lifted as well as various other
problems leading to damage of other devices on the board.  of course, that was
typical of that whole project, another story for another newsgroup.

anyways, just a few of my experiences,

have a good day,

----------------------------------------------------------------------
rk                               The world of space holds vast promise
stellar engineering, ltd.        for the service of man, and it is a
stellare@erols.com.NOSPAM        world we have only begun to explore.
Hi-Rel Digital Systems Design    -- James E. Webb, 1968

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search