Messages from 24100

Article: 24100
Subject: Re: Pad trireg in XLA FPGA
From: rickman <spamgoeshere4@yahoo.com>
Date: Wed, 26 Jul 2000 18:21:12 -0400
Links: << >> << T >> << A >>

I remember a thread either here or in comp.lang.vhdl that was started by
someone who was asking about open source tools. Instead of addressing
his questions, everyone tried to tell him why open source tools were
impossible or the wrong idea or that he had no reason to want such a
thing. I think this is one of many reason that open source tools would
be a *great* idea. This is a bug that would be fixed if the tools were
open source. 

I wonder why Xilinx does not think this is an issue worth bothering
with? Is this OE register not available in Virtex? Or does the tool
support it in Virtex and not the XLA chips?

Someone from Xilinx posted that it had to do with the fact that Xilinx
is composed of divisions that are true to the term. They do not work
together is they decide not to. The only place I have ever worked where
this was tolerated was within the Federal Government. Every company I
worked for would crack heads together when this happened. 

Andy Peters wrote:
> 
> Isidro Urriza wrote in message <8lmcib$dcn$1@news.unizar.es>...
> >Hello All,
> >
> >Does any one knows how to use trireg registers in an XLA FPGA?
> >
> >XLA pads include a flip-flop to register  the control signal of tristate
> >output buffer
> >I 've modeled this register in VHDL, synthesised with FPGA express, and
> >mapped with
> > "-pr b" option, but the register is allways placced in an internal CLB not
> >in trireg pad register.
> 
> That's a known issue.  You need to follow the instructions in XAPP123:
> 
> http://www.xilinx.com/xapp/xapp123.pdf
> 
> Unfortunately, this process is convoluted.  One would hope that the P+R
> software would be to handle this nifty feature, but it doesn't.  And the
> word I got from one of the Xilinx apps engineers is that this issue will
> never be fixed.
> 
> -- a
> -----------------------------------------
> Andy Peters
> Sr Electrical Engineer
> National Optical Astronomy Observatories
> 950 N Cherry Ave
> Tucson, AZ 85719
> apeters (at) noao \dot\ edu
> 
> "A sufficiently advanced technology is indistinguishable from magic"
>      --Arthur C. Clarke

-- 

Rick Collins

rick.collins@XYarius.com

Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design

Arius
4 King Ave
Frederick, MD 21701-3110
301-682-7772 Voice
301-682-7666 FAX

Internet URL http://www.arius.com

Article: 24101
Subject: Re: Variable shifting
From: rickman <spamgoeshere4@yahoo.com>
Date: Wed, 26 Jul 2000 18:37:16 -0400
Links: << >> << T >> << A >>

I am still not following where you are finding some optimization. 

If you have an 8 bit barrel shifter, you have 1 bit which needs to
accept all 8 inputs, 1 which needs 7 inputs... and the last is either
the 1 input or a zero. For that first output which depends on all 8
inputs, you need an 8 input mux. You can implement this in a single
Virtex CLB using the F6 mux. If you do it in 2 input muxes (4 input
LUTs) you will need 7 muxes which use nearly 2 CLBs. 

Of course you can save a little on each bit as you work toward the other
end which only uses one input. But with a larger word width, I would
expect that the mux 4 and mux 8 would save you a great deal. 

Are you saying that you share muxes between bits? If so, I would like to
see how to do this. 



"Gilbert H. Herbeck" wrote:
> 
> rickman wrote:
> 
> > I can't say that I know what you mean when you say "an optimal merged
> > tree." Certainly you would want to connect the muxes in a tree, but what
> > further optimizations can you do unless you can restrict the range of
> > the select inputs?
> >
> > I would take exception to the statement "There's no savings using 4
> > input muxes in this architecture". To implement a 4 input mux using 2
> > input muxes in 4 input LUTs requires 3 LUTs. By using the F5 mux you can
> > save one LUT. Similarly (after looking at the data sheet) I see that
> > there is a F6 mux which will allow you to implement an 8 input mux using
> > just 4 LUTs vs. 7 LUTs by your method.
> >
> > Hmmmm... why do I feel like I am missing something important?
> 
> Rickman, you and Aki just haven't come across a barrel shifter before.
> It's not that complicated, but if you don't know what it is you can make
> some bad assumptions.  I would make Ray's statement much more
> strongly.  Not only do 4-1 muxes not save you anything, they *COST*
> you.
> 
> Take an 8 bit shifter as an example.  The proper implementation
> is 3 banks of 2-1 muxes.  That makes 3 2-1 muxes per bit (log2(8)=3).
> If you choose to use 4-1 muxes, the best you can do is 1 4-1 mux +
> 1 2-1 mux per bit.
> 
> Compare a 4-1 mux -vs- 2 2-1 muxes for your technology.
> 8-1 muxes make things even worse.  And as the shifter gets
> larger, the % penalty gets even worse.
> 
> Lastly, the stages of the shifter can be ordered to exploit the
> delay skew in the shift amount.  In this case the original poster
> said that the shift signal was coming from an adder.  Therefore
> use the shift bits from LSB to MSB and the adder delay will be
> mostly in parallel with the shifter itself.
> 
> Gil

-- 

Rick Collins

rick.collins@XYarius.com

Ignore the reply address. To email me use the above address with the XY
removed.



Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design

Arius
4 King Ave
Frederick, MD 21701-3110
301-682-7772 Voice
301-682-7666 FAX

Internet URL http://www.arius.com

Article: 24102
Subject: Re: Pad trireg in XLA FPGA
From: "Andy Peters" <apeters.Nospam@nospam.noao.edu.nospam>
Date: Wed, 26 Jul 2000 17:09:42 -0700
Links: << >> << T >> << A >>

rickman wrote in message <397F6458.8D010C4D@yahoo.com>...
>I remember a thread either here or in comp.lang.vhdl that was started by
>someone who was asking about open source tools. Instead of addressing
>his questions, everyone tried to tell him why open source tools were
>impossible or the wrong idea or that he had no reason to want such a
>thing. I think this is one of many reason that open source tools would
>be a *great* idea. This is a bug that would be fixed if the tools were
>open source.


Oh, I don't think this is a bug.  A bug is something that doesn't work as
expected.  This is a case of something existing in the hardware technology
that the software doesn't take advantage of.  I wonder how many people even
realized that this register existed?

>I wonder why Xilinx does not think this is an issue worth bothering
>with? Is this OE register not available in Virtex? Or does the tool
>support it in Virtex and not the XLA chips?


This OE register does exist in the Virtex parts.  I believe the toolset DOES
support it but I've not done a Virtex design so I don't really know.  As for
why the tools don't support it for the XLA parts, I would imagine that
Xilinx prefers to spend their time improving the tools for the newer parts;
at the same time, they're gently discouraging use of the 4K parts in favor
of Spartan or Virtex.  Hence, the "do not fix" directive.

Another detail, of course, is the the synthesis vendors need to be able to
support this tri-state register feature.  I mean, FPGA Express is so stupid
that it doesn't realize that the IOB tristate enables have a mux in front of
'em to select the polarity!  Yup -- if you write code for an active-high
output enable, the goddamn tool inverts the OE in a CLB, and that inverted
signal is what drives all of your output enables.

Hey, Synopsys, how 'bout this: instead of working on stuff like "incremental
synthesis," which I couldn't really care less about, howsabout doing a
better job of understanding the chips' architectures, and thus taking
advantage of the neat-o features that Xilinx thoughtfully put in there for
us?

Oh, I get it, it's the Microsoft plan.  Consider: Xilinx includes FPGA
Express with the tools, and Synopsys (possibly correctly) assumes that the
average designer creating average designs (i.e., those that don't "push the
envelope," as Ray would say) won't really need the extra strength that
Synplicity has, or more likely, the designer won't be able to justify (to
the Boss) the cost of "another tool that does what something you already
have does."

I mean, would you actually go out and PURCHASE a copy of MS Office, if it
wasn't pre-loaded onto your computer?

>Someone from Xilinx posted that it had to do with the fact that Xilinx
>is composed of divisions that are true to the term. They do not work
>together is they decide not to. The only place I have ever worked where
>this was tolerated was within the Federal Government. Every company I
>worked for would crack heads together when this happened.

I have no idea about that.


-- a
-----------------------------------------
Andy Peters
Sr Electrical Engineer
National Optical Astronomy Observatories
950 N Cherry Ave
Tucson, AZ 85719
apeters (at) noao \dot\ edu

"A sufficiently advanced technology is indistinguishable from magic"
     --Arthur C. Clarke

Article: 24103
Subject: Re: Xilinx "MUX_OP not inferred" error.
From: "Andy Peters" <apeters.Nospam@nospam.noao.edu.nospam>
Date: Wed, 26 Jul 2000 17:12:39 -0700
Links: << >> << T >> << A >>

K.Orthner wrote in message <8lman2$uu@inf-gw.inf.furukawa.co.jp>...

>I've gone through the EDIF file a little bit  (Anybody have a tool to make
>reading EDIF files easier?),

don't read the EDIF files.  Use the schematic viewer that's included with
FPGA Express.  You'll have to run the tool standalone (e.g., not from the
xilinx project man).


-- a
-----------------------------------------
Andy Peters
Sr Electrical Engineer
National Optical Astronomy Observatories
950 N Cherry Ave
Tucson, AZ 85719
apeters (at) noao \dot\ edu

"A sufficiently advanced technology is indistinguishable from magic"
     --Arthur C. Clarke

Article: 24104
Subject: Re: Variable shifting
From: "Gilbert H. Herbeck" <gilherbeck@home.com>
Date: Thu, 27 Jul 2000 00:17:16 GMT
Links: << >> << T >> << A >>

I don't know a CLB from a BLC, ... So I will let you decide if this
architecture makes sense for your technology (or FPGAs in general).
But I will try to explain it again.

For this 8-bit case, there are 3 stages of 2:1 mux.
The first selects a shift of 0 or 1 of the input.
The second selects a shift of 0 or 2 of the first stage output.
The third selects a shift of 0 or 4 of the second stage output.
(And so on for larger than 8-bit).
As you have already mentioned, there are 0's coming in
from one side and these muxes will be optimized further.

This is the way to go for cell-based.  I'll let you FPGA types
decide if you like it.

Gil


rickman wrote:

> I am still not following where you are finding some optimization.
>
> If you have an 8 bit barrel shifter, you have 1 bit which needs to
> accept all 8 inputs, 1 which needs 7 inputs... and the last is either
> the 1 input or a zero. For that first output which depends on all 8
> inputs, you need an 8 input mux. You can implement this in a single
> Virtex CLB using the F6 mux. If you do it in 2 input muxes (4 input
> LUTs) you will need 7 muxes which use nearly 2 CLBs.
>
> Of course you can save a little on each bit as you work toward the other
> end which only uses one input. But with a larger word width, I would
> expect that the mux 4 and mux 8 would save you a great deal.
>
> Are you saying that you share muxes between bits? If so, I would like to
> see how to do this.
>
> "Gilbert H. Herbeck" wrote:
> >
> > rickman wrote:
> >
> > > I can't say that I know what you mean when you say "an optimal merged
> > > tree." Certainly you would want to connect the muxes in a tree, but what
> > > further optimizations can you do unless you can restrict the range of
> > > the select inputs?
> > >
> > > I would take exception to the statement "There's no savings using 4
> > > input muxes in this architecture". To implement a 4 input mux using 2
> > > input muxes in 4 input LUTs requires 3 LUTs. By using the F5 mux you can
> > > save one LUT. Similarly (after looking at the data sheet) I see that
> > > there is a F6 mux which will allow you to implement an 8 input mux using
> > > just 4 LUTs vs. 7 LUTs by your method.
> > >
> > > Hmmmm... why do I feel like I am missing something important?
> >
> > Rickman, you and Aki just haven't come across a barrel shifter before.
> > It's not that complicated, but if you don't know what it is you can make
> > some bad assumptions.  I would make Ray's statement much more
> > strongly.  Not only do 4-1 muxes not save you anything, they *COST*
> > you.
> >
> > Take an 8 bit shifter as an example.  The proper implementation
> > is 3 banks of 2-1 muxes.  That makes 3 2-1 muxes per bit (log2(8)=3).
> > If you choose to use 4-1 muxes, the best you can do is 1 4-1 mux +
> > 1 2-1 mux per bit.
> >
> > Compare a 4-1 mux -vs- 2 2-1 muxes for your technology.
> > 8-1 muxes make things even worse.  And as the shifter gets
> > larger, the % penalty gets even worse.
> >
> > Lastly, the stages of the shifter can be ordered to exploit the
> > delay skew in the shift amount.  In this case the original poster
> > said that the shift signal was coming from an adder.  Therefore
> > use the shift bits from LSB to MSB and the adder delay will be
> > mostly in parallel with the shifter itself.
> >
> > Gil
>
> --
>
> Rick Collins
>
> rick.collins@XYarius.com
>
> Ignore the reply address. To email me use the above address with the XY
> removed.
>
> Arius - A Signal Processing Solutions Company
> Specializing in DSP and FPGA design
>
> Arius
> 4 King Ave
> Frederick, MD 21701-3110
> 301-682-7772 Voice
> 301-682-7666 FAX
>
> Internet URL http://www.arius.com

Article: 24105
Subject: Re: Retiming for Virtex with FC2
From: Arrigo Benedetti <arrigo@vision.caltech.edu>
Date: 26 Jul 2000 17:30:25 -0700
Links: << >> << T >> << A >>

Steven Derrien <sderrien@irisa.fr> writes:

> Hello,
> 
> I've been playing around with the retiming capabilities provided with
> the latest FC2 version. However, i'm a little bit deceived by the
> quality of results  I get from FC2, even when I provide several
> registers to allow a deep pipeline of a combinatorial function (up to 10
> stage for a 28 logic level comb funtion) I cannot get get less than 20
> ns for the critical path.
> 
> Does anyone here knows where this limitation could come from ?
> 
> Thanks,
> Steven

After many trials I came to the conclusion that the retiming feature in
FC2 is simply broken. Actually I'm switching to Synplicity: even if it
does not have this feature its timing reports are very useful to
find the paths that need optimization. Manual retiming is not very
difficult if you follow the basic rule to have the same number of
registers between the inputs of the entity (before any retiming
registers) and every internal input feeding the same block or operator.

good luck!

-Arrigo
--
Dr. Arrigo Benedetti                e-mail: arrigo@vision.caltech.edu
Caltech, MS 136-93	  			phone: (626) 395-3695
Pasadena, CA 91125	  			fax:   (626) 795-8649

Article: 24106
Subject: Re: Xilinx Core Generators.
From: "K.Orthner" <nospam@ihatespam.com>
Date: Thu, 27 Jul 2000 09:53:18 +0900
Links: << >> << T >> << A >>

Hi again, felix.

> my Perl script creates two entitys:
> * one with the name you specified in CoreGenerator
> * one with an appended "ii" for internal purposes

I think that I've pretty much figured out the Perl script now.
The script is completely "point-and-click" in that it sets up everything as
it should . . I caused myself problems by thinking I had to twiddle with
things.

> The other thing CoreGenerator creates is an EDIF netlist which should
> be included in your *synthesis* and especially your *implementation*. I
> feel that it is not a good idea to use the EDIF netlists for functional
> simulation.

Which leads me gracefully to my next problem.  Now that I have things
simulating smoothly, how do I include the Coregen-generated EDIF netlists in
my design?  Do I really want to include them in the synthesis step?  And if
so, how do I do it?

From what the CoreGen documentation says, I don't really have to do much
other than instantiate the coregen blocks in my design, and make sure all of
the outputs from CoreGen are in the same directory as my design files.
Which works well enough, except that I'm getting warning (during the
translate phase) that the design is not linked.  Again, according to the
documentation, I should get a warning that it's not expanded, as opposed to
not linked.  (Is this an indication of a problem, or is the documentation
wrong?)

Thanks again for the help!

------------
Kent Orthner

Article: 24107
Subject: Re: Xilinx "MUX_OP not inferred" error.
From: "K.Orthner" <nospam@ihatespam.com>
Date: Thu, 27 Jul 2000 09:58:21 +0900
Links: << >> << T >> << A >>

Rickman wrote:
> I expect that this is not the problem. If it is working correctly for
> one state, then it is likely working correctly for all the states.

Then the conclusion that I'm drawing from this is that the original error
message isn't really an indication of a problem, but maybe it's implying
that I coupld put together my code better?

> > Renaming my state variables would make my EDIF file easier to radi, I'm
> > sure!
>
> I checked my code and I did explicit state encoding. Here is a code
> fragment...

I'll try that and see if it helps.

> In the other two compilers that I had tried the "one hot" encoding
> attribute, but this did not work well with Express. They may have an app
> note on this at the Xilinx web site.

I know that it will work with express if you don't define it as an enum in
the first place, but as STD_Logic.  But then maybe I will end up with the
problem where every state is decoded to determine every other state.

Andy Wrote:
> >I've gone through the EDIF file a little bit  (Anybody have a tool to
make
> >reading EDIF files easier?),
>
> don't read the EDIF files.  Use the schematic viewer that's included with
> FPGA Express.  You'll have to run the tool standalone (e.g., not from the
> xilinx project man).

<laughing> I stumbled across that late last night, shortly before I decided
to give up for the evening.

Thanks!

-Kent

------------
Kent Orthner
korthner at hotmail dot com

Article: 24108
Subject: Re: Pad trireg in XLA FPGA
From: rickman <spamgoeshere4@yahoo.com>
Date: Wed, 26 Jul 2000 21:05:52 -0400
Links: << >> << T >> << A >>

Andy Peters wrote:
> 
> rickman wrote in message <397F6458.8D010C4D@yahoo.com>...
> >I remember a thread either here or in comp.lang.vhdl that was started by
> >someone who was asking about open source tools. Instead of addressing
> >his questions, everyone tried to tell him why open source tools were
> >impossible or the wrong idea or that he had no reason to want such a
> >thing. I think this is one of many reason that open source tools would
> >be a *great* idea. This is a bug that would be fixed if the tools were
> >open source.
> 
> Oh, I don't think this is a bug.  A bug is something that doesn't work as
> expected.  This is a case of something existing in the hardware technology
> that the software doesn't take advantage of.  I wonder how many people even
> realized that this register existed?

In this case the bug is in the attitude of the company that they don't
need to provide support to their customers for features like this. The
register was added by the chip designers because they knew it could make
a *big* difference in being able to meet timing external to the chip.
But the software people decided it was not important enough to add to
the software... until the next rev of the chip came out, Virtex. 

> Another detail, of course, is the the synthesis vendors need to be able to
> support this tri-state register feature.  I mean, FPGA Express is so stupid
> that it doesn't realize that the IOB tristate enables have a mux in front of
> 'em to select the polarity!  Yup -- if you write code for an active-high
> output enable, the goddamn tool inverts the OE in a CLB, and that inverted
> signal is what drives all of your output enables.

I don't understand why the synthesis vendors need to deal with it. This
should be a map, place and route issue which is done in the Xilinx
tools. The synthesis tools only need to generate the FF and the MPR
tools can put it where it will work best. 

> Hey, Synopsys, how 'bout this: instead of working on stuff like "incremental
> synthesis," which I couldn't really care less about, howsabout doing a
> better job of understanding the chips' architectures, and thus taking
> advantage of the neat-o features that Xilinx thoughtfully put in there for
> us?
> 
> Oh, I get it, it's the Microsoft plan.  Consider: Xilinx includes FPGA
> Express with the tools, and Synopsys (possibly correctly) assumes that the
> average designer creating average designs (i.e., those that don't "push the
> envelope," as Ray would say) won't really need the extra strength that
> Synplicity has, or more likely, the designer won't be able to justify (to
> the Boss) the cost of "another tool that does what something you already
> have does."

Yet another reason for open source tools. 

> I mean, would you actually go out and PURCHASE a copy of MS Office, if it
> wasn't pre-loaded onto your computer?

Somebody must be buying it. Office does not come free!

I think the real reason that FPGA tools are not open source in a widely
useful form is that FPGA designers are not software designers.
Carpenters make tools for their work out of wood. Machinists make their
own tools out of metal. FPGA designers make their tools out of...
FPGAs???

-- 

Rick Collins

rick.collins@XYarius.com

Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design

Arius
4 King Ave
Frederick, MD 21701-3110
301-682-7772 Voice
301-682-7666 FAX

Internet URL http://www.arius.com

Article: 24109
Subject: Re: Variable shifting
From: rickman <spamgoeshere4@yahoo.com>
Date: Wed, 26 Jul 2000 21:28:32 -0400
Links: << >> << T >> << A >>

I don't see how this is a complete discription. There are three levels
of muxes. But you need one mux at the last stage, two muxes at the next
stage, and four muxes at the first stage. That is seven muxes for each
bit as I said. 

So unless you can share muxes between bits, I think this is correct. 

I think there can be sharing of muxes between bits, but it would make
for some convoluted routing. For an N bit barrel shifter you would need
N-1 muxes at the first level producing N-1 outputs, plus one bit which
is fed straight through. Then second level would be the same, but with
the muxes connected differently to accomplish the 0 or 2 shifting. This
will repeat until you have reached the last level. You will need one AND
gate to zero the MSB when you shift by anything other than 0 bits. 

This gives you ceiling(log2(N)) * (N-1) muxes or LUTs; divide by four to
find the number of CLBs. Using 8 input muxes requires ceiling(log8(N)) *
(some messy stuff) which is order N^^2. For comparison lets do some
examples. The 2-1 mux is the "optimal merged tree" that Ray was talking
about (I assume). The 8-1 mux is straight. This is done in CLB counts. 

N   2-1 mux   8-1 mux
4      2        2 (actually 4-1 muxes)
8      6        8
16    15       48
24    30?      84
32    39      144

It looks like there is a big difference as the shifter gets larger. I
guess this is what you were describing and I did not understand. 


"Gilbert H. Herbeck" wrote:
> 
> I don't know a CLB from a BLC, ... So I will let you decide if this
> architecture makes sense for your technology (or FPGAs in general).
> But I will try to explain it again.
> 
> For this 8-bit case, there are 3 stages of 2:1 mux.
> The first selects a shift of 0 or 1 of the input.
> The second selects a shift of 0 or 2 of the first stage output.
> The third selects a shift of 0 or 4 of the second stage output.
> (And so on for larger than 8-bit).
> As you have already mentioned, there are 0's coming in
> from one side and these muxes will be optimized further.
> 
> This is the way to go for cell-based.  I'll let you FPGA types
> decide if you like it.
> 
> Gil
> 
> rickman wrote:
> 
> > I am still not following where you are finding some optimization.
> >
> > If you have an 8 bit barrel shifter, you have 1 bit which needs to
> > accept all 8 inputs, 1 which needs 7 inputs... and the last is either
> > the 1 input or a zero. For that first output which depends on all 8
> > inputs, you need an 8 input mux. You can implement this in a single
> > Virtex CLB using the F6 mux. If you do it in 2 input muxes (4 input
> > LUTs) you will need 7 muxes which use nearly 2 CLBs.
> >
> > Of course you can save a little on each bit as you work toward the other
> > end which only uses one input. But with a larger word width, I would
> > expect that the mux 4 and mux 8 would save you a great deal.
> >
> > Are you saying that you share muxes between bits? If so, I would like to
> > see how to do this.
> >
> > "Gilbert H. Herbeck" wrote:
> > >
> > > rickman wrote:
> > >
> > > > I can't say that I know what you mean when you say "an optimal merged
> > > > tree." Certainly you would want to connect the muxes in a tree, but what
> > > > further optimizations can you do unless you can restrict the range of
> > > > the select inputs?
> > > >
> > > > I would take exception to the statement "There's no savings using 4
> > > > input muxes in this architecture". To implement a 4 input mux using 2
> > > > input muxes in 4 input LUTs requires 3 LUTs. By using the F5 mux you can
> > > > save one LUT. Similarly (after looking at the data sheet) I see that
> > > > there is a F6 mux which will allow you to implement an 8 input mux using
> > > > just 4 LUTs vs. 7 LUTs by your method.
> > > >
> > > > Hmmmm... why do I feel like I am missing something important?
> > >
> > > Rickman, you and Aki just haven't come across a barrel shifter before.
> > > It's not that complicated, but if you don't know what it is you can make
> > > some bad assumptions.  I would make Ray's statement much more
> > > strongly.  Not only do 4-1 muxes not save you anything, they *COST*
> > > you.
> > >
> > > Take an 8 bit shifter as an example.  The proper implementation
> > > is 3 banks of 2-1 muxes.  That makes 3 2-1 muxes per bit (log2(8)=3).
> > > If you choose to use 4-1 muxes, the best you can do is 1 4-1 mux +
> > > 1 2-1 mux per bit.
> > >
> > > Compare a 4-1 mux -vs- 2 2-1 muxes for your technology.
> > > 8-1 muxes make things even worse.  And as the shifter gets
> > > larger, the % penalty gets even worse.
> > >
> > > Lastly, the stages of the shifter can be ordered to exploit the
> > > delay skew in the shift amount.  In this case the original poster
> > > said that the shift signal was coming from an adder.  Therefore
> > > use the shift bits from LSB to MSB and the adder delay will be
> > > mostly in parallel with the shifter itself.
> > >
> > > Gil
> >
> > --
> >
> > Rick Collins
> >
> > rick.collins@XYarius.com
> >
> > Ignore the reply address. To email me use the above address with the XY
> > removed.
> >
> > Arius - A Signal Processing Solutions Company
> > Specializing in DSP and FPGA design
> >
> > Arius
> > 4 King Ave
> > Frederick, MD 21701-3110
> > 301-682-7772 Voice
> > 301-682-7666 FAX
> >
> > Internet URL http://www.arius.com

-- 

Rick Collins

rick.collins@XYarius.com

Ignore the reply address. To email me use the above address with the XY
removed.



Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design

Arius
4 King Ave
Frederick, MD 21701-3110
301-682-7772 Voice
301-682-7666 FAX

Internet URL http://www.arius.com

Article: 24110
Subject: Re: Variable shifting
From: Ray Andraka <ray@andraka.com>
Date: Thu, 27 Jul 2000 03:35:10 GMT
Links: << >> << T >> << A >>

Nope, it doesn't help you with the xilinx architecture (which has the extra mux)
because you need 4 outputs for the 4 inputs.   Turns out it takes the same area as
two layers of 2 input muxes, and while the CLB delay is a tad less, the routing
delays and wider fan-out make up for the small difference.  In VIrtex it is about
break even, in 4K the 2 layers of 2 inputs is slightly faster, and considerably
faster if you use the pipeline registers.  In Altera, you actually pay both an area
and speed penalty by using 4 input muxes in the layers.  Again this is due to the
fact that you need 4 outputs for 4 inputs.  4 input muxes get implemented in two
levels of logic in Altera, and you wind up duplicating the first layer 2 input
muxes of the 4 input mux as a result.

Aki M Suihkonen wrote:

> In article <397F20FE.8C6325D1@andraka.com>,
> Ray Andraka  <ray@andraka.com> wrote:
> >There's a better way, rickman!
> >
> >Do the barrel shift as an optimal merged tree.  You wind up with log2(N)
> >layers of 2:1 muxes, and if you arrange them right it is not to congestive for
> >the routing even in 4KE parts even for 32 bit barrel shifts.  Nice thing too
> >is you can put pipeline registers between layers if you want to trade latency
> >for data rate.  There's no savings using 4 input muxes in this architecture,
> >BTW.
>
> except maybe higher throughput with smaller or equal area.
> log4(N) layers of 4:1 muxes, each implemented by
> 2 cascaded logic elements in Altera's FLEX series, perhaps?
> --
> Problems      1) do NOT write a virus or a worm program
> "A.K.Dewdney, The New Turing Omnibus; Chapter 60: Computer viruses"

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com  or http://www.fpga-guru.com

Article: 24111
Subject: Re: Variable shifting
From: Ray Andraka <ray@andraka.com>
Date: Thu, 27 Jul 2000 03:42:08 GMT
Links: << >> << T >> << A >>

rickman wrote:

> I can't say that I know what you mean when you say "an optimal merged
> tree." Certainly you would want to connect the muxes in a tree, but what
> further optimizations can you do unless you can restrict the range of
> the select inputs?

a merged tree eliminates the duplicate terms at each level.  As a result each level
is N 2 input muxes.  Think of it this way:  you do the aggregate shift as a series
of incremental shifts by powers of 2.  If you use 4 input muxes, then you need N 4
input muxes for each layer.  That is equivalent logically to 3*N 2 input muxes.  The
2 layers a 2x2 merged tree only uses 2*N 2input muxes.  From that you can see both
use the same area in the xilinx architecture, or in the case of Altera, the 4 input
actually uses 1-1/2 times the area!

>
>
> I would take exception to the statement "There's no savings using 4
> input muxes in this architecture". To implement a 4 input mux using 2
> input muxes in 4 input LUTs requires 3 LUTs. By using the F5 mux you can
> save one LUT. Similarly (after looking at the data sheet) I see that
> there is a F6 mux which will allow you to implement an 8 input mux using
> just 4 LUTs vs. 7 LUTs by your method.
>
> Hmmmm... why do I feel like I am missing something important?

Probably because you are!  Normally, yes a 4 input mux is more compact in xilinx.
The  point in this case is for N inputs you need N outputs, and by merging the first
layer you eliminate half of the logic in that first layer (by sharing the outputs).

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com  or http://www.fpga-guru.com

Article: 24112
Subject: Re: Pad trireg in XLA FPGA
From: Ray Andraka <ray@andraka.com>
Date: Thu, 27 Jul 2000 03:56:07 GMT
Links: << >> << T >> << A >>

The trireg was added in the latest 4K architectures, but software didn't want to
support it.  Virtex has them, but uses a different part of the software for the
place and route that allows their use.  I think part of the reason they won't fix
it in 4K, is that they consider 4K to be an obsolete architecture.  I was gently
reminded of that last week by some folks at xilinx.

rickman wrote:

> I remember a thread either here or in comp.lang.vhdl that was started by
> someone who was asking about open source tools. Instead of addressing
> his questions, everyone tried to tell him why open source tools were
> impossible or the wrong idea or that he had no reason to want such a
> thing. I think this is one of many reason that open source tools would
> be a *great* idea. This is a bug that would be fixed if the tools were
> open source.
>
> I wonder why Xilinx does not think this is an issue worth bothering
> with? Is this OE register not available in Virtex? Or does the tool
> support it in Virtex and not the XLA chips?
>
> Someone from Xilinx posted that it had to do with the fact that Xilinx
> is composed of divisions that are true to the term. They do not work
> together is they decide not to. The only place I have ever worked where
> this was tolerated was within the Federal Government. Every company I
> worked for would crack heads together when this happened.
>
> Andy Peters wrote:
> >
> > Isidro Urriza wrote in message <8lmcib$dcn$1@news.unizar.es>...
> > >Hello All,
> > >
> > >Does any one knows how to use trireg registers in an XLA FPGA?
> > >
> > >XLA pads include a flip-flop to register  the control signal of tristate
> > >output buffer
> > >I 've modeled this register in VHDL, synthesised with FPGA express, and
> > >mapped with
> > > "-pr b" option, but the register is allways placced in an internal CLB not
> > >in trireg pad register.
> >
> > That's a known issue.  You need to follow the instructions in XAPP123:
> >
> > http://www.xilinx.com/xapp/xapp123.pdf
> >
> > Unfortunately, this process is convoluted.  One would hope that the P+R
> > software would be to handle this nifty feature, but it doesn't.  And the
> > word I got from one of the Xilinx apps engineers is that this issue will
> > never be fixed.
> >
> > -- a
> > -----------------------------------------
> > Andy Peters
> > Sr Electrical Engineer
> > National Optical Astronomy Observatories
> > 950 N Cherry Ave
> > Tucson, AZ 85719
> > apeters (at) noao \dot\ edu
> >
> > "A sufficiently advanced technology is indistinguishable from magic"
> >      --Arthur C. Clarke
>
> --
>
> Rick Collins
>
> rick.collins@XYarius.com
>
> Ignore the reply address. To email me use the above address with the XY
> removed.
>
> Arius - A Signal Processing Solutions Company
> Specializing in DSP and FPGA design
>
> Arius
> 4 King Ave
> Frederick, MD 21701-3110
> 301-682-7772 Voice
> 301-682-7666 FAX
>
> Internet URL http://www.arius.com

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com  or http://www.fpga-guru.com

Article: 24113
Subject: Re: Variable shifting
From: rickman <spamgoeshere4@yahoo.com>
Date: Wed, 26 Jul 2000 23:57:55 -0400
Links: << >> << T >> << A >>

Ray Andraka wrote:
> 
> rickman wrote:
> 
> > I can't say that I know what you mean when you say "an optimal merged
> > tree." Certainly you would want to connect the muxes in a tree, but what
> > further optimizations can you do unless you can restrict the range of
> > the select inputs?
> 
> a merged tree eliminates the duplicate terms at each level.  As a result each level
> is N 2 input muxes.  Think of it this way:  you do the aggregate shift as a series
> of incremental shifts by powers of 2.  If you use 4 input muxes, then you need N 4
> input muxes for each layer.  That is equivalent logically to 3*N 2 input muxes.  The
> 2 layers a 2x2 merged tree only uses 2*N 2input muxes.  From that you can see both
> use the same area in the xilinx architecture, or in the case of Altera, the 4 input
> actually uses 1-1/2 times the area!

I see how the sharing of muxes between bits minimized the number of LUTs
used, but I don't agree that the 4 input mux is equivalent to 3 2 input
muxes. By using the F5 mux in the Virtex and the Spartan II chips, you
get a four input mux from two LUTs which is the same as 2 two input
muxes with less delay.


> > I would take exception to the statement "There's no savings using 4
> > input muxes in this architecture". To implement a 4 input mux using 2
> > input muxes in 4 input LUTs requires 3 LUTs. By using the F5 mux you can
> > save one LUT. Similarly (after looking at the data sheet) I see that
> > there is a F6 mux which will allow you to implement an 8 input mux using
> > just 4 LUTs vs. 7 LUTs by your method.
> >
> > Hmmmm... why do I feel like I am missing something important?
> 
> Probably because you are!  Normally, yes a 4 input mux is more compact in xilinx.
> The  point in this case is for N inputs you need N outputs, and by merging the first
> layer you eliminate half of the logic in that first layer (by sharing the outputs).

I still maintain that even with sharing of muxes you get a better design
with the 4 and 8 input muxes. I am also not too sure that the routing
congestion is not a problem. At the last level in a large barrel shifter
you have to route N/2 signals from mux i to i + N/2. This puts N/2
signals through a single spot. If you are doing a circular shift, this
becomes N signals! 


-- 

Rick Collins

rick.collins@XYarius.com

Ignore the reply address. To email me use the above address with the XY
removed.



Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design

Arius
4 King Ave
Frederick, MD 21701-3110
301-682-7772 Voice
301-682-7666 FAX

Internet URL http://www.arius.com

Article: 24114
Subject: Which one is good coding style?
From: "MK Yap" <mkyap@REMOVE.ieee.org>
Date: Thu, 27 Jul 2000 12:21:41 +0800
Links: << >> << T >> << A >>

Hi all,

I know both produce the same behaviour, but which is a good and preferred
coding style? Which one is more power efficient?
ByteReq is generated from other process, which is not straightforward.  That
results in the preset value in ByteCntr driven by complicated logics in
Example2 (not preferable right??)

Any comments??  Thank you very much.


Regards
MKYap

#Example 1
 delaycntr: PROCESS(nReset,Clk)
 BEGIN
  IF nReset='0'  THEN
   DelayCntr<=0;
  ELSIF Clk'event AND Clk='0' THEN
   IF ByteReq='0' THEN
        DelayCntr<=0;
   ELSE
        DelayCntr <= DelayCntr + 1;
   END IF;
  END IF;
 END PROCESS delaycntr;

#Example 2
 delaycntr: PROCESS(nReset,ByteReq, Clk)
 BEGIN
  IF nReset='0' OR ByteReq='0' THEN
       DelayCntr<=0;
  ELSIF Clk'event AND Clk='0' THEN
        DelayCntr <= DelayCntr + 1;
  END IF;
 END PROCESS delaycntr;

Article: 24115
Subject: Re: Which one is good coding style?
From: rickman <spamgoeshere4@yahoo.com>
Date: Thu, 27 Jul 2000 00:24:32 -0400
Links: << >> << T >> << A >>

Actually, they will not produce the same behviour. In example 1 ByteReq
is a synchronous reset. In example 2 ByteReq is an asynchronous reset;
very different. 

This will also be implemented very differently depending on the FPGA
family (or ASIC) that you choose. In a Xilinx FPGA the second example
will cause an AND gate to be inserted in line with the original reset
signal, nReset. This will preclude it being a part of the GSR net. 

MK Yap wrote:
> 
> Hi all,
> 
> I know both produce the same behaviour, but which is a good and preferred
> coding style? Which one is more power efficient?
> ByteReq is generated from other process, which is not straightforward.  That
> results in the preset value in ByteCntr driven by complicated logics in
> Example2 (not preferable right??)
> 
> Any comments??  Thank you very much.
> 
> Regards
> MKYap
> 
> #Example 1
>  delaycntr: PROCESS(nReset,Clk)
>  BEGIN
>   IF nReset='0'  THEN
>    DelayCntr<=0;
>   ELSIF Clk'event AND Clk='0' THEN
>    IF ByteReq='0' THEN
>         DelayCntr<=0;
>    ELSE
>         DelayCntr <= DelayCntr + 1;
>    END IF;
>   END IF;
>  END PROCESS delaycntr;
> 
> #Example 2
>  delaycntr: PROCESS(nReset,ByteReq, Clk)
>  BEGIN
>   IF nReset='0' OR ByteReq='0' THEN
>        DelayCntr<=0;
>   ELSIF Clk'event AND Clk='0' THEN
>         DelayCntr <= DelayCntr + 1;
>   END IF;
>  END PROCESS delaycntr;

-- 

Rick Collins

rick.collins@XYarius.com

Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design

Arius
4 King Ave
Frederick, MD 21701-3110
301-682-7772 Voice
301-682-7666 FAX

Internet URL http://www.arius.com

Article: 24116
Subject: Viewlogic Licencing
From: rickman <spamgoeshere4@yahoo.com>
Date: Thu, 27 Jul 2000 00:34:24 -0400
Links: << >> << T >> << A >>

I remember from years ago that Viewlogic has a licensing "quirk" (I used
much stronger language at the time). They had and still seem to have two
types of licenses. You can get a target specific license which will only
let the tools work with the libraries for a specific chip vendor's
devices. Or if you paid a much higher price you can get a full "board"
package that will work with any library including board design libs. 

The problem was if you paid the big bucks for the board package you
could not share any files with a customer who was using a vendor
specific version. This was not limited to files that were done for other
vendor's chips, but even for the libraries that their license was
authorize for. 

I investigated extensively and understand just how the licensing works.
I even found a way around the problem by cutting and pasting one line
from the schematic files. But this was a real pain and had to be done
each and every time the file was saved. 

The question is, has Viewlogic found a way to deal with this problem? I
am thinking about buying a full license for board level design. But
there is not much point if I can't share schematics with customers who
only have the chip level package. 


-- 

Rick Collins

rick.collins@XYarius.com

Ignore the reply address. To email me use the above address with the XY
removed.



Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design

Arius
4 King Ave
Frederick, MD 21701-3110
301-682-7772 Voice
301-682-7666 FAX

Internet URL http://www.arius.com

Article: 24117
Subject: Crossbar Switch.
From: Richard Meester <rme@quest-innovations.com>
Date: Thu, 27 Jul 2000 08:40:10 +0200
Links: << >> << T >> << A >>

Hello all,

Has anyone information of a crossbar switch (an how to build one)? I now
the basics of the crossbar, but am interested in a description a bit
more describing the technical background. Maye a sample vhdl model or
schematic?

Regards Richard Meester

--
Quest Innovations
tel: +31 (0) 227 604046
http://www.quest-innovations.com

Article: 24118
Subject: End of my rope.
From: "K.Orthner" <nospam@ihatespam.com>
Date: Thu, 27 Jul 2000 15:45:38 +0900
Links: << >> << T >> << A >>

As the subject says, I fell like I'm at the end of my rope, here.  ( I get
the feeling I'm mixing metaphors, but I haven't actually had an english
conversatoin for so long, I forget.)

This is (was?) related to the coregen thread, except that now, I have
completely removed everything Coregen-related.

I have the simple problem of:

My code simluates fine, I get exactly the answers that I expect.  Then I put
it throught the Xilinx mapper, it "optimizes" out some signals that are
rather important.  It then falls over saying that there are components who's
input signals have been optimized.

If I select the "Don't  trim logic"option it seems to work fine; but I
haven't tested it on a real board yet.  (It's PnR'ing as I type.)

Anyone have a suiggestion for how to track this problem down?

I've been tracing nets with the EDIF file/FPGA Express Schematic viewer, and
as far as I can tell, the output of the synthesizer is just fine.



------------
Kent Orthner

Article: 24119
Subject: Re: Xilinx Core Generators.
From: felix_bertram@my-deja.com
Date: Thu, 27 Jul 2000 06:59:50 GMT
Links: << >> << T >> << A >>

Hi Kent,

> Which leads me gracefully to my next problem.  Now that I have things
> simulating smoothly, how do I include the Coregen-generated EDIF
netlists in
> my design?  Do I really want to include them in the synthesis step?
And if
> so, how do I do it?

There is no need to include the netlists in synthesis. If you don't,
the synthesizer will treat it as black box (and gives you a warning for
that). This is propably the best thing it can do with technology
dependent CoreGen modules.

>
> From what the CoreGen documentation says, I don't really have to do
much
> other than instantiate the coregen blocks in my design, and make sure
all of
> the outputs from CoreGen are in the same directory as my design files.

Yes, just make sure the following files are in the same directory:
* Your netlist file from the synthesis (edf)
* Your netlist constraints file from synthesis (ncf)
* Your netlist from coregen (edn)

> Which works well enough, except that I'm getting warning (during the
> translate phase) that the design is not linked.  Again, according to
the
> documentation, I should get a warning that it's not expanded, as
opposed to
> not linked.  (Is this an indication of a problem, or is the
documentation
> wrong?)

This is output from my build step with Coregen module dpram1024x8 and
toplevel XC2S15. No errors, no warnings...

~~~~~~~~~~ snip ~~~~~~~~~~
ngdbuild:  version C.22
Copyright (c) 1995-1999 Xilinx, Inc.  All rights reserved.

Command Line: ngdbuild -p 2s50-tq144-6 -nt timestamp
c:\MyDocs\Felix\Xilinx\EZaudio\xflow/XC2S15.edf XC2S15.ngd

Launcher: Executing edif2ngd "XC2S15.edf" "XC2S15.ngo"
edif2ngd:  version C.22
Copyright (c) 1995-1999 Xilinx, Inc.  All rights reserved.
Reading NCF file "XC2S15.ncf"...
Writing the design to "XC2S15.ngo"...
Reading NGO file "c:/MyDocs/Felix/Xilinx/EZaudio/xflow/XC2S15.ngo" ...
Reading component libraries for design expansion...
Launcher: Executing edif2ngd -noa "dpram1024x8.edn" "dpram1024x8.ngo"
edif2ngd:  version C.22
Copyright (c) 1995-1999 Xilinx, Inc.  All rights reserved.
Writing the design to "dpram1024x8.ngo"...
Loading design
module "c:\MyDocs\Felix\Xilinx\EZaudio\xflow\dpram1024x8.ngo"...

Checking timing specifications ...
Checking expanded design ...

NGDBUILD Design Results Summary:
  Number of errors:     0
  Number of warnings:   0

Writing NGD file "XC2S15.ngd" ...
Writing NGDBUILD log file "XC2S15.bld"...
~~~~~~~~~~ snip ~~~~~~~~~~

Below you'll find another Perl script for xilinx implementation.

Have fun,
kind regards

Felix Bertram

~~~~~~~~~~ snip ~~~~~~~~~~
#-----------------------------------------------------------------------
--------
# Name:        XFLOW.pl
# Description: Perl Script to run Xilinx Flow Engine from Active-HDL
# History:     Felix Bertram, 00jul27, created.
# Notes:       Add to AHDL project and execute
#              Implements the current top level entity
#              Assumes synthesis output (edf, ncf) with same name as
toplevel
#              Flow options may be specified in fpga.flw
#-----------------------------------------------------------------------
--------

use File::Copy;
use Aldec::FrameConnector;
use Aldec::ID;

#-----------------------------------------------------------------------
--------
$IDesignInfo=      ConnectPlugIn(CLSID_ProjectPlugIn, IID_IDesignInfo);
$IDesignBrowser=   ConnectPlugIn(CLSID_ProjectPlugIn,
IID_IDesignBrowser);
$IDesignFiles=     ConnectPlugIn(CLSID_ProjectPlugIn, IID_IDesignFiles);
$IDesignStructure= ConnectPlugIn(CLSID_ProjectPlugIn,
IID_IDesignStructure);

#-----------------------------------------------------------------------
--------
#-- copy netlists & constraints
$SynthOutFolder=  $IDesignInfo->GetDesignFolder()."/synth/fexp";
$SynthOutFolder=~ s/\\/\//g;

$LogiCoreFolder=  $IDesignInfo->GetDesignSrcFolder()."/LogiCORE";
$LogiCoreFolder=~ s/\\/\//g;

$XflowFolder=     $IDesignInfo->GetDesignFolder()."/xflow";
$XflowFolder=~    s/\\/\//g;

mkdir($XflowFolder, 0744);

@Dirs= ($SynthOutFolder, $LogiCoreFolder);
foreach $Dir (@Dirs)
{	chdir $Dir;

	opendir(NETLIST, ".");
	while($netlist= readdir(NETLIST))
	{	if ($netlist=~ m/.*\.edf$/i             # netlist, FPGA-
Express
		||  $netlist=~ m/.*\.edn$/i             # netlist,
CoreGen
		||  $netlist=~ m/.*\.ncf$/i)            # constraits,
FPGA-Express
		{	copy($netlist, $XflowFolder."/".$netlist);
		}
	}
	closedir(NETLIST);
}
chdir $XflowFolder;

##----------------------------------------------------------------------
---------
#-- find toplevel, delete files, call flow engine
print "*** implement netlist ***\n";
$OptFile=  "balanced.opt";
$toplevel= $IDesignStructure->SimPathToHierPath("/");
$toplevel=~ m/(\|)(.*)(\|)/;
$toplevel=  $2;
$EdfFile= "";

opendir(DIR, ".");
while($xflowfile= readdir(DIR))
{	if ($xflowfile=~ m/^$toplevel\.edf/
	||  $xflowfile=~ m/^$toplevel\.xnf/
	)
	{	$EdfFile= $xflowfile;
	}

	if ($xflowfile=~ m/xflow\.log$/i            # Implementation
Log File
	||  $xflowfile=~ m/netlist\.lst$/i          # edif netlist
summary
	||  $xflowfile=~ m/xflow\.scr$/i            # script to run the
flow
	||  $xflowfile=~ m/$toplevel\.xpi$/i        # par status
	#-- Translation (ngdbuild, edif2ndg)
	||  $xflowfile=~ m/$toplevel.*\.ngo$/i
	||  $xflowfile=~ m/$toplevel.*\.ngd$/i
	||  $xflowfile=~ m/$toplevel.*\.bld$/i      # Translation Report
	#-- Mapping (map)
	||  $xflowfile=~ m/$toplevel.*_map\.ncd$/i
	||  $xflowfile=~ m/$toplevel.*_map\.ngm$/i
	||  $xflowfile=~ m/$toplevel\.pcf$/i
	||  $xflowfile=~ m/$toplevel.*_map\.mrp$/i  # Map Report
	#-- Place & Route (par)
	||  $xflowfile=~ m/$toplevel\.ncd$/i
	||  $xflowfile=~ m/$toplevel\.par$/i        # Place & Route
Report
	||  $xflowfile=~ m/$toplevel\.pad$/i        # Pad Report
	#-- Post Layout Timing (trce)
	||  $xflowfile=~ m/$toplevel\.twr$/i        # Post Layout
Timing Report
	#-- Timing Annotation (ngdanno)
	||  $xflowfile=~ m/$toplevel\.alf$/i        # (report)
	||  $xflowfile=~ m/$toplevel\.nga$/i
	||  $xflowfile=~ m/$toplevel\.dly$/i        # Asynchronous
Delay Report
	#-- ndg2vhdl
	||  $xflowfile=~ m/time_sim\.vhd$/i
	||  $xflowfile=~ m/time_sim\.sdf$/i
	#-- Bitstream Generator
	||  $xflowfile=~ m/$toplevel\.drc$/i        # (report)
	||  $xflowfile=~ m/$toplevel\.bit$/i
	||  $xflowfile=~ m/$toplevel\.ll$/i
	||  $xflowfile=~ m/$toplevel\.msk$/i
	||  $xflowfile=~ m/$toplevel\.bgn$/i        # Bitgen Report
	)
	{	unlink($xflowfile);
	}
}
closedir(DIR);


$cmd= "xflow -implement $OptFile -config bitgen.opt -tsim
generic_vhdl.opt $EdfFile";
system($cmd); #print $cmd;

#-----------------------------------------------------------------------
--------
#-- add files to project
$IDesignFiles->AddProjectFolder("xflow");

opendir(SIM, ".");
while($flowfile= readdir(SIM))
{	$flowfile= $XflowFolder."/".$flowfile;
	if ($flowfile=~ m/time_sim\.vhd$/i          # simulation model
	||  $flowfile=~ m/time_sim\.sdf$/i          # SDF
	)
	{	$IDesignFiles->AddFile($flowfile, "", "xflow", 0);
	}

	if ($flowfile=~ m/xflow\.log$/i             # Implementation
Log File
	||  $flowfile=~ m/$toplevel\.bld$/i         # Translation Report
	||  $flowfile=~ m/$toplevel.*\.mrp$/i       # Map Report
	||  $flowfile=~ m/$toplevel\.par$/i         # Place & Rout
Report
	||  $flowfile=~ m/$toplevel\.pad$/i         # Pad Report
	||  $flowfile=~ m/$toplevel\.dly$/i         # Asynchronous
Delay Report
	||  $flowfile=~ m/$toplevel\.twr$/i         # Post Layout
Timing Report
	||  $flowfile=~ m/$toplevel\.bgn$/i         # Bitgen Report
	||  $flowfile=~ m/fpga\.flw$/i              # Flow control file
	)
	{	$IDesignFiles->AddFile($flowfile, "Text File", "xflow",
0);
	}
}
closedir(SIM);


print "*** done ***\n";
#-----------------------------------------------------------------------
--------
# end of file



Sent via Deja.com http://www.deja.com/
Before you buy.

Article: 24120
Subject: Re: Which one is good coding style?
From: "MK Yap" <mkyap@REMOVE.ieee.org>
Date: Thu, 27 Jul 2000 16:05:12 +0800
Links: << >> << T >> << A >>

Hi,

Ya, sorry... but in my application both will work just fine...
I'm asking this because Altera Maxplus2 will complain (warning) that the
preset is being driven by complicated logic, so I'm wondering which one is
better practice, be it in FPGA, CPLD or ASIC.

Any comments? Which one is prefered, if both works for my application.


Thanks.

Regards
MKYap


MK Yap <mkyap@REMOVE.ieee.org> wrote in message
news:397fb653$1@news.starhub.net.sg...
> Hi all,
>
> I know both produce the same behaviour, but which is a good and preferred
> coding style? Which one is more power efficient?
> ByteReq is generated from other process, which is not straightforward.
That
> results in the preset value in ByteCntr driven by complicated logics in
> Example2 (not preferable right??)
>
> Any comments??  Thank you very much.
>
>
> Regards
> MKYap
>
> #Example 1
>  delaycntr: PROCESS(nReset,Clk)
>  BEGIN
>   IF nReset='0'  THEN
>    DelayCntr<=0;
>   ELSIF Clk'event AND Clk='0' THEN
>    IF ByteReq='0' THEN
>         DelayCntr<=0;
>    ELSE
>         DelayCntr <= DelayCntr + 1;
>    END IF;
>   END IF;
>  END PROCESS delaycntr;
>
> #Example 2
>  delaycntr: PROCESS(nReset,ByteReq, Clk)
>  BEGIN
>   IF nReset='0' OR ByteReq='0' THEN
>        DelayCntr<=0;
>   ELSIF Clk'event AND Clk='0' THEN
>         DelayCntr <= DelayCntr + 1;
>   END IF;
>  END PROCESS delaycntr;
>
>

Article: 24121
Subject: Re: Which one is good coding style?
From: Renaud Pacalet <pacalet@enst.fr>
Date: Thu, 27 Jul 2000 10:06:25 +0200
Links: << >> << T >> << A >>

MK Yap a écrit :
> 
> Hi,
> 
> Ya, sorry... but in my application both will work just fine...
> I'm asking this because Altera Maxplus2 will complain (warning) that the
> preset is being driven by complicated logic, so I'm wondering which one is
> better practice, be it in FPGA, CPLD or ASIC.
> 

Well, in ASIC design where asynchronous reset costs I'll use
synchronous reset only.

Regards.
-- 
Renaud Pacalet, ENST / COMELEC, 46 rue Barrault 75634 Paris Cedex 13
Tel. : 01 45 81 78 08 | Fax : 01 45 80 40 36 | Mel : pacalet@enst.fr

Article: 24122
Subject: Re: Which one is good coding style?
From: "MK Yap" <mkyap@REMOVE.ieee.org>
Date: Thu, 27 Jul 2000 16:31:35 +0800
Links: << >> << T >> << A >>

Hi!

Hmmm... but why is async reset more expensive in ASIC? I couldn't see the
connection....
but in terms of max system speed, the sync design is slower and take more
space isn't it??

I'm doing prototyping on FPGA and will finally convert to ASIC for mass pro.
So gaining understanding in the difference would help to choose the coding
style.

Thanks you very much!! learnt something today :-)

Regards
MKYap


Renaud Pacalet <pacalet@enst.fr> wrote in message
news:397FED81.3FBAAA51@enst.fr...
> MK Yap a écrit :
> >
> > Hi,
> >
> > Ya, sorry... but in my application both will work just fine...
> > I'm asking this because Altera Maxplus2 will complain (warning) that the
> > preset is being driven by complicated logic, so I'm wondering which one
is
> > better practice, be it in FPGA, CPLD or ASIC.
> >
>
> Well, in ASIC design where asynchronous reset costs I'll use
> synchronous reset only.
>
> Regards.
> --
> Renaud Pacalet, ENST / COMELEC, 46 rue Barrault 75634 Paris Cedex 13
> Tel. : 01 45 81 78 08 | Fax : 01 45 80 40 36 | Mel : pacalet@enst.fr

Article: 24123
Subject: Re: Which one is good coding style?
From: Renaud Pacalet <pacalet@enst.fr>
Date: Thu, 27 Jul 2000 10:47:04 +0200
Links: << >> << T >> << A >>

MK Yap a écrit :
> 
> Hi!
> 
> Hmmm... but why is async reset more expensive in ASIC? I couldn't see the
> connection....
> but in terms of max system speed, the sync design is slower and take more
> space isn't it??
> 

Because in standard cells libraries you usually find DFF with and
without asynchronous reset. And DFF without asynchronous reset are
smaller. So if you need a synchronous reset anyway for your
application, as you can use it to initialize your memory elements
you don't need asynchronous reset at all and can save silicon by
using simple DFF.

Sure if you need a reset but don't care about it beeing synchronous
or not then you should prefer asynchronous reset. It will save
silicon area when compared with a simple DFF plus AND gate. It will
save ns too.

Note: Most of the time you don't need reset at all because there is
      a way to initialize your memory elements by running a few init
      cycles. This is obviously not true with state machines and
      some other control parts but usually true in operative parts.

> I'm doing prototyping on FPGA and will finally convert to ASIC for mass pro.
> So gaining understanding in the difference would help to choose the coding
> style.
> 
> Thanks you very much!! learnt something today :-)
> 

You're welcome. I learnt something today too: nobody seems able to
explain me why left and right are inverted in a mirror but not top
and bottom ;-)

> Regards

Regards too.
-- 
Renaud Pacalet, ENST / COMELEC, 46 rue Barrault 75634 Paris Cedex 13
Tel. : 01 45 81 78 08 | Fax : 01 45 80 40 36 | Mel : pacalet@enst.fr

Article: 24124
Subject: Re: Which one is good coding style?
From: "K.Orthner" <nospam@ihatespam.com>
Date: Thu, 27 Jul 2000 18:20:54 +0900
Links: << >> << T >> << A >>


The sync may be better in most FPGA's as well.

If the nRESET signal in your code is an asynchronous that you use for your
entire design, then you can use the "Global Reset" signal that connects
directly to the reset inputs of your FF's.

If you use the asynch reset, then you need extra logic to convert the global
reset into your local reset for that one FF.  And depending on the tools (I
think!), that may prevent you from using the Global Reset at all.  (Someone
please verify this, I'm not 100% sure.)

Of course, for Virtex designs, it's suggested that you not use the Global
Reset anyways, so maybe it's a moot point.

> You're welcome. I learnt something today too: nobody seems able to
> explain me why left and right are inverted in a mirror but not top
> and bottom ;-)

You're eyes are on the left and right sides of your face.  If they were,
say, in the top and bottom of your face, then a mirror would invert top &
bottom!

(Aren't I full of answers today!)

-kent
------------
Kent Orthner

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search