Messages from 159975

Article: 159975
Subject: Re: RISC-V Support in FPGA
From: Kevin Neilson <kevin.neilson@xilinx.com>
Date: Wed, 3 May 2017 13:39:38 -0700 (PDT)
Links: << >> << T >> << A >>

> (continuing a bit OT...)
>=20
> Kevin,
>=20
> That's unfortunate.  We've been very successful with writing parameteriza=
ble code - even=20
> before SystemVerilog. Heck even before Verilog-2001.  Things like N-Tap F=
IRs,=20
> Two-D FIRs.  FFTs, Video Blenders, etc...  All with configurable settings=
 -=20
> bit widths, rounding/truncation options/etc..  I think in a previous job =
I had a=20
> parametizable Galois Field Multiplier too.
>=20
> I'm not sure what trouble you had with the tools.  It takes a bit more up=
 front work,
> but pays off quite a bit in the end.  We really had no choice, given the =
number of=20
> FPGAs we do, along with how many engineers support them.  Lot's of shared=
 code
> was the only way to go.=20
>=20
> If you've got something you like, then I suggest keeping it.  But for oth=
ers,
> I think writing parameterizable HDL isn't too much trouble - and is made
> even easier with SystemVerilog.  And higher level too.
>=20
> Regards,
>=20
> Mark

I've just been burned too many times.  I know better now.  The last time I =
made the mistake I was just making a simple PN generator (LFSR).  The only =
complication was that it was highly parallel--I think I had to generate may=
be 512 bits per cycle, so it ends up being a big matrix multiplication over=
 GF(2).  First I made the high-level version where you could set a paramete=
rs for the width and taps and so on.  It took forever for Vivado to crank o=
n it.  This is just a few lines of code, mind you, and is just a bunch of X=
ORs.  Then I had Matlab generate an include file with the matrix packed int=
o a long parameter which essentially sets up XOR taps.  That was, I think, =
~20x faster, which translated into hours of synthesis time.  The synthesize=
d circuit was also better for various reasons.  This is just one example.  =
I also still have to instantiate primitives frequently for various reasons.=
  The level of abstraction doesn't seem like it's changed much in 15 years =
if you really need performance.  This doesn't really have anything to do wi=
th the SystemVerilog constructs.  I'm just talking about high-level code in=
 general.  If I were allowed, I would still use modports, structs, enums, e=
tc.

Article: 159976
Subject: Re: RISC-V Support in FPGA
From: gtwrek@sonic.net (Mark Curry)
Date: Wed, 3 May 2017 22:11:10 -0000 (UTC)
Links: << >> << T >> << A >>

In article <a66c4c17-6f43-4aec-9dd5-c06badf5b11f@googlegroups.com>,
Kevin Neilson  <kevin.neilson@xilinx.com> wrote:
>> (continuing a bit OT...)
>> 
>> Kevin,
>> 
>> That's unfortunate.  We've been very successful with writing parameterizable code - even 
>> before SystemVerilog. Heck even before Verilog-2001.  Things like N-Tap FIRs, 
>> Two-D FIRs.  FFTs, Video Blenders, etc...  All with configurable settings - 
>> bit widths, rounding/truncation options/etc..  I think in a previous job I had a 
>> parametizable Galois Field Multiplier too.
>> 
>> I'm not sure what trouble you had with the tools.  It takes a bit more up front work,
>> but pays off quite a bit in the end.  We really had no choice, given the number of 
>> FPGAs we do, along with how many engineers support them.  Lot's of shared code
>> was the only way to go. 
>> 
>> If you've got something you like, then I suggest keeping it.  But for others,
>> I think writing parameterizable HDL isn't too much trouble - and is made
>> even easier with SystemVerilog.  And higher level too.
>> 
>> Regards,
>> 
>> Mark
>
>I've just been burned too many times.  I know better now.  The last time I made the mistake I was just making a simple PN generator (LFSR).  The
>only complication was that it was highly parallel--I think I had to generate maybe 512 bits per cycle, so it ends up being a big matrix
>multiplication over GF(2).  First I made the high-level version where you could set a parameters for the width and taps and so on.  It took forever
>for Vivado to crank on it.  This is just a few lines of code, mind you, and is just a bunch of XORs.  Then I had Matlab generate an include file
>with the matrix packed into a long parameter which essentially sets up XOR taps.  That was, I think, ~20x faster, which translated into hours of
>synthesis time.  The synthesized circuit was also better for various reasons.  This is just one example.  I also still have to instantiate
>primitives frequently for various reasons.  The level of abstraction doesn't seem like it's changed much in 15 years if you really need performance.
>This doesn't really have anything to do with the SystemVerilog constructs.  I'm just talking about high-level code in general.  If I were allowed, I
>would still use modports, structs, enums, etc.

Ah, we did find something similar in Vivado.  For use is was a large parallel 
CRC - which is pretty much functionally identical to your LFSR (big XOR trees).

We had code that calculated, basically a shift table to calculate the CRC of a long word.
The RTL code worked fine for ISE.  But when we hit Vivado, it'd pause 10 minutes or so 
over each instance (we had lots) which significantly hit our build times.

So, I changed this code to "almost" self-modifying code.  The code would by default
calculate the shift matrix using our "normal" RTL, which looked something like:
      assign H_n_o = h_pow_n( H_zero, NUM_ZEROS_MINUS_ONE );
where H_zero was an "matrix" of constants, and NUM_ZEROS_MINUS_ONE a static
parameter.  The end result is a matrix of constants as well, but "dynamically"
calculated. (Here "dynamically" means once at elaboration time, since all inputs
to the function were static).

Then we just added code to dump each unknown table entry sort-of like:
  if( ( POLY_WIDTH == 8 ) && ( NUM_ZEROS_MINUS_ONE == 7 ) && ( POLYNOMIAL == 'h2f ) )
    assign H_n_o = 'hd4eaf52e175ffba9;
  ...
  else // no table entry - use default RTL calc
    assign H_n_o = h_pow_n( H_zero, NUM_ZEROS_MINUS_ONE );

We "closed" the loop by hand.  If the "table" entry didn't exist, the tool would use the
RTL definition, and spit out the pre-calculated entry.  All done in 
verilog.   We insert that new table entry into our source code by hand, and continue - next
time the build would be quicker.

This *workaround* was a bit kludge, but was the rare (only really) exception for us
in our parameterized code.  Normally the tools just handled things fine.
And again to be clear the only thing we were working around was long synthesis times.  
The quality of results was fine in either case.

Maybe the code you were creating the pendulum swings the other way
and it was more the norm, rather than the exception to see things like this.

Interesting topic, I'm glad to hear of your (and others) experiences.

Regards,

Mark

Article: 159977
Subject: Re: RISC-V Support in FPGA
From: Allan Herriman <allanherriman@hotmail.com>
Date: 04 May 2017 10:36:05 GMT
Links: << >> << T >> << A >>

On Wed, 03 May 2017 13:39:38 -0700, Kevin Neilson wrote:

>> (continuing a bit OT...)
>> 
>> Kevin,
>> 
>> That's unfortunate.  We've been very successful with writing
>> parameterizable code - even before SystemVerilog. Heck even before
>> Verilog-2001.  Things like N-Tap FIRs,
>> Two-D FIRs.  FFTs, Video Blenders, etc...  All with configurable
>> settings -
>> bit widths, rounding/truncation options/etc..  I think in a previous
>> job I had a parametizable Galois Field Multiplier too.
>> 
>> I'm not sure what trouble you had with the tools.  It takes a bit more
>> up front work, but pays off quite a bit in the end.  We really had no
>> choice, given the number of FPGAs we do, along with how many engineers
>> support them.  Lot's of shared code was the only way to go.
>> 
>> If you've got something you like, then I suggest keeping it.  But for
>> others,
>> I think writing parameterizable HDL isn't too much trouble - and is
>> made even easier with SystemVerilog.  And higher level too.
>> 
>> Regards,
>> 
>> Mark
> 
> I've just been burned too many times.  I know better now.  The last time
> I made the mistake I was just making a simple PN generator (LFSR).  The
> only complication was that it was highly parallel--I think I had to
> generate maybe 512 bits per cycle, so it ends up being a big matrix
> multiplication over GF(2).  First I made the high-level version where
> you could set a parameters for the width and taps and so on.  It took
> forever for Vivado to crank on it.  This is just a few lines of code,
> mind you, and is just a bunch of XORs.  Then I had Matlab generate an
> include file with the matrix packed into a long parameter which
> essentially sets up XOR taps.  That was, I think, ~20x faster, which
> translated into hours of synthesis time.  The synthesized circuit was
> also better for various reasons.  This is just one example.  I also
> still have to instantiate primitives frequently for various reasons. 
> The level of abstraction doesn't seem like it's changed much in 15 years
> if you really need performance.  This doesn't really have anything to do
> with the SystemVerilog constructs.  I'm just talking about high-level
> code in general.  If I were allowed, I would still use modports,
> structs, enums, etc.

I use Vivado to do GF multiplications that wide using purely behavioural 
VHDL.  BTW, A straightforward behavioural implementation will *not* give 
good results with a wide bus.
I believe the problem is that most tools (in particular Vivado) do a poor 
job of synthesising xor trees with a massive fanin (e.g. >> 100 bits).  
The optimisers have a poor complexity (I guess at least O(N^2), but it 
might be exponential) wrt the size of the function.

You can use all sorts of mathematical tricks to make it work without need 
to go "low level".
For example, to deal with large fanin, partition your 512 bit input into 
N slices of 512/N bits each.  Use N multipliers, one for each slice, put 
a keep (or equivalent) attribute on the outputs, then xor the outputs 
together.  This gives the same result, uses about the same number of LUTs, 
but gives the optimiser in the tool a chance to do a good job.

I use the same GF multiplier code in ISE and Quartus, too (but not on 
buses that wide).

The entire flow is in VHDL and works in any LRM-compliant tool.  It's 
parameterised, too, so I don't need to rewrite for a different bus width.

I've been using similar approaches in VHDL since the turn of the century 
and have never been burned.

YMMV.

Regards,
Allan

Article: 159978
Subject: creating a seed on a FPGA.
From: kristoff <kristoff@skypro.be>
Date: Thu, 4 May 2017 12:44:51 +0200
Links: << >> << T >> << A >>

Hi,

I am aware that the best way to create a seed (for random numbers) is 
external hardware, but does anybody know any cheap-and-easy tricks to 
generate a random-ish number on an FPGA.


Kristoff

Article: 159979
Subject: Re: creating a seed on a FPGA.
From: colin <colin_toogood@yahoo.com>
Date: Thu, 4 May 2017 07:03:03 -0700 (PDT)
Links: << >> << T >> << A >>

On Thursday, May 4, 2017 at 11:44:54 AM UTC+1, kristoff wrote:
> Hi,
> 
> I am aware that the best way to create a seed (for random numbers) is 
> external hardware, but does anybody know any cheap-and-easy tricks to 
> generate a random-ish number on an FPGA.
> 
> 
> Kristoff

Read the LSB of the voltage monitoring in xilinx sysmon to get each bit of your number.

Colin

Article: 159980
Subject: Re: creating a seed on a FPGA.
From: Tim Wescott <tim@seemywebsite.really>
Date: Thu, 04 May 2017 09:58:52 -0500
Links: << >> << T >> << A >>

On Thu, 04 May 2017 07:03:03 -0700, colin wrote:

> On Thursday, May 4, 2017 at 11:44:54 AM UTC+1, kristoff wrote:
>> Hi,
>> 
>> I am aware that the best way to create a seed (for random numbers) is
>> external hardware, but does anybody know any cheap-and-easy tricks to
>> generate a random-ish number on an FPGA.
>> 
>> 
>> Kristoff
> 
> Read the LSB of the voltage monitoring in xilinx sysmon to get each bit
> of your number.
> 
> Colin

Or for sufficiently random request timing, keep a clock running and use 
the clock value for the seed.

Or combine the two.

I seem to remember doing a web search on this a while back -- there are a 
lot of papers, of varying degrees of technical soundness.

-- 
www.wescottdesign.com

Article: 159981
Subject: Re: creating a seed on a FPGA.
From: gtwrek@sonic.net (Mark Curry)
Date: Thu, 4 May 2017 15:33:00 -0000 (UTC)
Links: << >> << T >> << A >>

In article <u_idnX0HweEx3JbEnZ2dnUU7-IWdnZ2d@giganews.com>,
Tim Wescott  <tim@seemywebsite.really> wrote:
>On Thu, 04 May 2017 07:03:03 -0700, colin wrote:
>
>> On Thursday, May 4, 2017 at 11:44:54 AM UTC+1, kristoff wrote:
>>> Hi,
>>> 
>>> I am aware that the best way to create a seed (for random numbers) is
>>> external hardware, but does anybody know any cheap-and-easy tricks to
>>> generate a random-ish number on an FPGA.
>>> 
>>> 
>>> Kristoff
>> 
>> Read the LSB of the voltage monitoring in xilinx sysmon to get each bit
>> of your number.
>> 
>> Colin
>
>Or for sufficiently random request timing, keep a clock running and use 
>the clock value for the seed.
>
>Or combine the two.
>
>I seem to remember doing a web search on this a while back -- there are a 
>lot of papers, of varying degrees of technical soundness.

Xilinx has many app notes on this - it's a common requests.
Here's one I've read in the past, but there's others:

http://forums.xilinx.com/xlnx/attachments/xlnx/EDK/27322/1/HighSpeedTrueRandomNumberGeneratorsinXilinxFPGAs.pdf

Regards,

Mark

Article: 159982
Subject: Re: creating a seed on a FPGA.
From: rickman <gnuarm@gmail.com>
Date: Thu, 4 May 2017 12:04:31 -0400
Links: << >> << T >> << A >>

On 5/4/2017 11:33 AM, Mark Curry wrote:
> In article <u_idnX0HweEx3JbEnZ2dnUU7-IWdnZ2d@giganews.com>,
> Tim Wescott  <tim@seemywebsite.really> wrote:
>> On Thu, 04 May 2017 07:03:03 -0700, colin wrote:
>>
>>> On Thursday, May 4, 2017 at 11:44:54 AM UTC+1, kristoff wrote:
>>>> Hi,
>>>>
>>>> I am aware that the best way to create a seed (for random numbers) is
>>>> external hardware, but does anybody know any cheap-and-easy tricks to
>>>> generate a random-ish number on an FPGA.
>>>>
>>>>
>>>> Kristoff
>>>
>>> Read the LSB of the voltage monitoring in xilinx sysmon to get each bit
>>> of your number.
>>>
>>> Colin
>>
>> Or for sufficiently random request timing, keep a clock running and use
>> the clock value for the seed.
>>
>> Or combine the two.
>>
>> I seem to remember doing a web search on this a while back -- there are a
>> lot of papers, of varying degrees of technical soundness.
>
> Xilinx has many app notes on this - it's a common requests.
> Here's one I've read in the past, but there's others:
>
> http://forums.xilinx.com/xlnx/attachments/xlnx/EDK/27322/1/HighSpeedTrueRandomNumberGeneratorsinXilinxFPGAs.pdf

I thought this issue had been solved?

https://xkcd.com/221/

-- 

Rick C

Article: 159983
Subject: Re: RISC-V Support in FPGA
From: kristoff <kristoff@skypro.be>
Date: Thu, 4 May 2017 18:12:36 +0200
Links: << >> << T >> << A >>

Hi all,

As a follow-up in the RISC-V thread.

On 02-05-17 18:11, kristoff wrote:
> Or, you can "mix-match" licenses. Sifive (the company that sells the
> E310 CPU and hifive devboards) are an interesting example of this.
> They open-sourced the RTL design but keep the knowledge of actually
> implementing a risc-v core as optimised as possible for themselfs, as a
> service to sell.

This was on eenews Europe today:
http://www.eenewseurope.com/news/sifive-launches-commercial-risc-v-processor-cores

As a small follow-up question:
Does anybody have any idea how to get the hifive boards in Europe?

For the last thing I ordered in the US (a pandaboard), I had to pay VAT 
(ok, that's normal), but also a handling-fee for the shipping-company 
and the customs-service to get the thing shipped in.
In the end, these additional costs where more then the VAT itself.

Cheerio! Kr. Bonne.

Article: 159984
Subject: Re: creating a seed on a FPGA.
From: Rob Gaddi <rgaddi@highlandtechnology.invalid>
Date: Thu, 4 May 2017 10:06:31 -0700
Links: << >> << T >> << A >>

On 05/04/2017 07:03 AM, colin wrote:
> On Thursday, May 4, 2017 at 11:44:54 AM UTC+1, kristoff wrote:
>> Hi,
>>
>> I am aware that the best way to create a seed (for random numbers) is
>> external hardware, but does anybody know any cheap-and-easy tricks to
>> generate a random-ish number on an FPGA.
>>
>>
>> Kristoff
>
> Read the LSB of the voltage monitoring in xilinx sysmon to get each bit of your number.
>
> Colin
>

I was going to suggest asynchronous ring oscillator, but yours is 
downright elegant.

-- 
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order.  See above to fix.

Article: 159985
Subject: Re: creating a seed on a FPGA.
From: Kevin Neilson <kevin.neilson@xilinx.com>
Date: Thu, 4 May 2017 10:13:16 -0700 (PDT)
Links: << >> << T >> << A >>


> http://forums.xilinx.com/xlnx/attachments/xlnx/EDK/27322/1/HighSpeedTrueR=
andomNumberGeneratorsinXilinxFPGAs.pdf
>=20
I built a very wide version of this several months ago to do testing in the=
 lab but it hasn't been tested yet.  I'll have to report back after my cowo=
rkers try it out.  You do have to instantiate LUT primitives to get it to s=
ynthesize.  I don't know why the app note didn't address this nor supply an=
y example HDL.

Article: 159986
Subject: Re: RISC-V Support in FPGA
From: Kevin Neilson <kevin.neilson@xilinx.com>
Date: Thu, 4 May 2017 10:46:12 -0700 (PDT)
Links: << >> << T >> << A >>

> We had code that calculated, basically a shift table to calculate the CRC=
 of a long word.
> The RTL code worked fine for ISE.  But when we hit Vivado, it'd pause 10 =
minutes or so=20
> over each instance (we had lots) which significantly hit our build times.
>=20
> So, I changed this code to "almost" self-modifying code.  The code would =
by default
> calculate the shift matrix using our "normal" RTL, which looked something=
 like:
>       assign H_n_o =3D h_pow_n( H_zero, NUM_ZEROS_MINUS_ONE );
> where H_zero was an "matrix" of constants, and NUM_ZEROS_MINUS_ONE a stat=
ic
> parameter.  The end result is a matrix of constants as well, but "dynamic=
ally"
> calculated. (Here "dynamically" means once at elaboration time, since all=
 inputs
> to the function were static).
>=20
> Then we just added code to dump each unknown table entry sort-of like:
>   if( ( POLY_WIDTH =3D=3D 8 ) && ( NUM_ZEROS_MINUS_ONE =3D=3D 7 ) && ( PO=
LYNOMIAL =3D=3D 'h2f ) )
>     assign H_n_o =3D 'hd4eaf52e175ffba9;
>   ...
>   else // no table entry - use default RTL calc
>     assign H_n_o =3D h_pow_n( H_zero, NUM_ZEROS_MINUS_ONE );
>=20
> We "closed" the loop by hand.  If the "table" entry didn't exist, the too=
l would use the
> RTL definition, and spit out the pre-calculated entry.  All done in=20
> verilog.   We insert that new table entry into our source code by hand, a=
nd continue - next
> time the build would be quicker.
>=20
> This *workaround* was a bit kludge, but was the rare (only really) except=
ion for us
> in our parameterized code.  Normally the tools just handled things fine.
> And again to be clear the only thing we were working around was long synt=
hesis times. =20
> The quality of results was fine in either case.
>=20
> Maybe the code you were creating the pendulum swings the other way
> and it was more the norm, rather than the exception to see things like th=
is.
>=20
> Interesting topic, I'm glad to hear of your (and others) experiences.
>=20
> Regards,
>=20
> Mark

I looked up my notes for the LFSR I was referring to and one instance of th=
e more-abstract version took 16 min to synthesize and the less-abstract ver=
sion took less than a minute.  (And we needed many instances.)  When I try =
to do something at a higher level it ends up like your experience:  I have =
to do a lot of experiments to see what works and then tweak things endlessl=
y.  It eats up a lot of time.

Article: 159987
Subject: Re: RISC-V Support in FPGA
From: Kevin Neilson <kevin.neilson@xilinx.com>
Date: Thu, 4 May 2017 10:56:56 -0700 (PDT)
Links: << >> << T >> << A >>

> I use Vivado to do GF multiplications that wide using purely behavioural=
=20
> VHDL.  BTW, A straightforward behavioural implementation will *not* give=
=20
> good results with a wide bus.
> I believe the problem is that most tools (in particular Vivado) do a poor=
=20
> job of synthesising xor trees with a massive fanin (e.g. >> 100 bits). =
=20
> The optimisers have a poor complexity (I guess at least O(N^2), but it=20
> might be exponential) wrt the size of the function.
>=20
> You can use all sorts of mathematical tricks to make it work without need=
=20
> to go "low level".
> For example, to deal with large fanin, partition your 512 bit input into=
=20
> N slices of 512/N bits each.  Use N multipliers, one for each slice, put=
=20
> a keep (or equivalent) attribute on the outputs, then xor the outputs=20
> together.  This gives the same result, uses about the same number of LUTs=
,=20
> but gives the optimiser in the tool a chance to do a good job.
>=20
>=20
> I use the same GF multiplier code in ISE and Quartus, too (but not on=20
> buses that wide).
>=20
> The entire flow is in VHDL and works in any LRM-compliant tool.  It's=20
> parameterised, too, so I don't need to rewrite for a different bus width.
>=20
>=20
> I've been using similar approaches in VHDL since the turn of the century=
=20
> and have never been burned.
>=20
> YMMV.
>=20
> Regards,
> Allan

I used to do big GF matrix multiplications in which you could set parameter=
s for the field size and field generator poly, etc.  Vivado just gets bogge=
d down.  Now I just expand that into a GF(2) matrix in Matlab and dump it t=
o a parameter and all Vivado has to know how to do is XOR.

I also have problems with the wide XORs.  Multiplication by a big GF(2) mat=
rix means a wide XOR for each column.  Vivado tries to share LUTs with comm=
on subexpressions across the columns.  Too much sharing.  That sounds like =
a good thing, but it's not smart enough to know how much it's impacting tim=
ing.  You save LUTs, but you end up with a routing mess and too many levels=
 of logic and you don't come close to meeting timing at all.  So then I hav=
e to make a generate loop and put subsections of the matrix in separate mod=
ules and use directives to prevent optimizing across boundaries.  (KEEPs do=
n't work.)  It's all a pain.  But then I end up with something a little big=
ger but which meets timing.

I really wish there were a way to use the carry chains for wide XORs.

Article: 159988
Subject: Re: RISC-V Support in FPGA
From: David Brown <david.brown@hesbynett.no>
Date: Thu, 4 May 2017 21:55:17 +0200
Links: << >> << T >> << A >>

On 04/05/17 18:12, kristoff wrote:
> Hi all,
>
>
> As a follow-up in the RISC-V thread.
>
>
> On 02-05-17 18:11, kristoff wrote:
>> Or, you can "mix-match" licenses. Sifive (the company that sells the
>> E310 CPU and hifive devboards) are an interesting example of this.
>> They open-sourced the RTL design but keep the knowledge of actually
>> implementing a risc-v core as optimised as possible for themselfs, as a
>> service to sell.
>
> This was on eenews Europe today:
> http://www.eenewseurope.com/news/sifive-launches-commercial-risc-v-processor-cores
>
>
>
>
>
> As a small follow-up question:
> Does anybody have any idea how to get the hifive boards in Europe?
>

I got one from the crowdsupply site.  I haven't got round to trying it 
yet :-(

> For the last thing I ordered in the US (a pandaboard), I had to pay VAT
> (ok, that's normal), but also a handling-fee for the shipping-company
> and the customs-service to get the thing shipped in.
> In the end, these additional costs where more then the VAT itself.
>
>
>
> Cheerio! Kr. Bonne.

Article: 159989
Subject: Re: RISC-V Support in FPGA
From: Allan Herriman <allanherriman@hotmail.com>
Date: 05 May 2017 11:08:04 GMT
Links: << >> << T >> << A >>

On Thu, 04 May 2017 10:56:56 -0700, Kevin Neilson wrote:

>> I use Vivado to do GF multiplications that wide using purely
>> behavioural VHDL.  BTW, A straightforward behavioural implementation
>> will *not* give good results with a wide bus.
>> I believe the problem is that most tools (in particular Vivado) do a
>> poor job of synthesising xor trees with a massive fanin (e.g. >> 100
>> bits). The optimisers have a poor complexity (I guess at least O(N^2),
>> but it might be exponential) wrt the size of the function.
>> 
>> You can use all sorts of mathematical tricks to make it work without
>> need to go "low level".
>> For example, to deal with large fanin, partition your 512 bit input
>> into N slices of 512/N bits each.  Use N multipliers, one for each
>> slice, put a keep (or equivalent) attribute on the outputs, then xor
>> the outputs together.  This gives the same result, uses about the same
>> number of LUTs,
>> but gives the optimiser in the tool a chance to do a good job.
>> 
>> 
>> I use the same GF multiplier code in ISE and Quartus, too (but not on
>> buses that wide).
>> 
>> The entire flow is in VHDL and works in any LRM-compliant tool.  It's
>> parameterised, too, so I don't need to rewrite for a different bus
>> width.
>> 
>> 
>> I've been using similar approaches in VHDL since the turn of the
>> century and have never been burned.
>> 
>> YMMV.
>> 
>> Regards,
>> Allan
> 
> I used to do big GF matrix multiplications in which you could set
> parameters for the field size and field generator poly, etc.  Vivado
> just gets bogged down.  Now I just expand that into a GF(2) matrix in
> Matlab and dump it to a parameter and all Vivado has to know how to do
> is XOR.
> 
> I also have problems with the wide XORs.  Multiplication by a big GF(2)
> matrix means a wide XOR for each column.  Vivado tries to share LUTs
> with common subexpressions across the columns.  Too much sharing.  That
> sounds like a good thing, but it's not smart enough to know how much
> it's impacting timing.  You save LUTs, but you end up with a routing
> mess and too many levels of logic and you don't come close to meeting
> timing at all.  So then I have to make a generate loop and put
> subsections of the matrix in separate modules and use directives to
> prevent optimizing across boundaries.  (KEEPs don't work.)  It's all a
> pain.  But then I end up with something a little bigger but which meets
> timing.


I thought about my historical code some more, and I realised that I did 
have some examples of behavioural GF multipliers that didn't work as well 
as the same function expressed as a bunch of wide xors.

The particular example I'm thinking of had a 128 in, 128 xor tree that 
really shouldn't be any harder to synth than a CRC.  It's a linear 
mapping stage in an SP block cipher (like AES, but not AES (which has a 
relatively weak mixing function)).

Vivado gave (IIRC) 11 or 12 levels of logic rather than the expected 3 
levels of logic.  Hmmm.  The revised source code (expressed as a bunch of 
xors) produced 4 levels of logic, and routed to speed.

BTW, I used my VHDL testbench for the original function to write out the 
VHDL for the xor tree.

 
> I really wish there were a way to use the carry chains for wide XORs.

I think that carry chains (and similar structures) became less important 
for wide functions once six input LUTs became commonplace.

The Xilinx DSP48E2 has a wide xor mode that I think can give a 96 input 
xor in a single DSP48E2 slice.  I've never tried it.

Regards,
Allan

Article: 159990
Subject: When I'm Wrong I'd Like to Know
From: rickman <gnuarm@gmail.com>
Date: Fri, 5 May 2017 09:55:09 -0400
Links: << >> << T >> << A >>

But each topic I'm wrong about should be addressed in that newsgroup. 
If I were wrong about something related to FPGAs (just an imaginary 
example, of course) I wouldn't want it discussed in alt.religion.emacs. 
Likewise I don't wish for things not related to FPGAs to be discussed 
here.  Every discussion in its place!

Is there any chance the person this post is directed to will actually 
read and pay attention to it?

-- 

Rick C

Article: 159991
Subject: Re: Lattice ECP5 succesor ( with DDR4 phy) ?
From: rickman <gnuarm@gmail.com>
Date: Fri, 5 May 2017 21:37:44 -0400
Links: << >> << T >> << A >>

On 5/3/2017 3:04 PM, Brane2 wrote:
> AFAIK ECP5 is good for interfacing with DDR3, but not DDR4.
>
> Is there a plan to introduce new members with DDR4 or perhaps new family with such interface ?

I expect Lattice has many plans.  Best to ask them, no?  Do you have an 
email for your Lattice sales person or FAE?

-- 

Rick C

Article: 159992
Subject: Re: Lattice ECP5 succesor ( with DDR4 phy) ?
From: Brane2 <brankob@avtomatika.com>
Date: Fri, 5 May 2017 23:05:20 -0700 (PDT)
Links: << >> << T >> << A >>

Dne sobota, 06. maj 2017 03.37.48 UTC+2 je oseba rickman napisala:
> On 5/3/2017 3:04 PM, Brane2 wrote:
> > AFAIK ECP5 is good for interfacing with DDR3, but not DDR4.
> >
> > Is there a plan to introduce new members with DDR4 or perhaps new family with such interface ?
> 
> I expect Lattice has many plans.  Best to ask them, no?  Do you have an 
> email for your Lattice sales person or FAE?
> 
> -- 
> 
> Rick C

I did. They haven't even responded.

Article: 159993
Subject: Re: Lattice ECP5 succesor ( with DDR4 phy) ?
From: rickman <gnuarm@gmail.com>
Date: Sat, 6 May 2017 02:32:05 -0400
Links: << >> << T >> << A >>

On 5/6/2017 2:05 AM, Brane2 wrote:
> Dne sobota, 06. maj 2017 03.37.48 UTC+2 je oseba rickman napisala:
>> On 5/3/2017 3:04 PM, Brane2 wrote:
>>> AFAIK ECP5 is good for interfacing with DDR3, but not DDR4.
>>>
>>> Is there a plan to introduce new members with DDR4 or perhaps new family with such interface ?
>>
>> I expect Lattice has many plans.  Best to ask them, no?  Do you have an
>> email for your Lattice sales person or FAE?
>>
>> --
>>
>> Rick C
>
> I did. They haven't even responded.

Are you a customer that shows up on their radar?

-- 

Rick C

Article: 159994
Subject: Re: Lattice ECP5 succesor ( with DDR4 phy) ?
From: Brane2 <brankob@avtomatika.com>
Date: Fri, 5 May 2017 23:36:30 -0700 (PDT)
Links: << >> << T >> << A >>

Dne sobota, 06. maj 2017 08.32.09 UTC+2 je oseba rickman napisala:

> Are you a customer that shows up on their radar?

No.

Article: 159995
Subject: Lattice iCE40 UltraLite DIPSY - what happened?
From: rickman <gnuarm@gmail.com>
Date: Sat, 6 May 2017 13:27:16 -0400
Links: << >> << T >> << A >>

I was digging around for info on the iCE40 UL and found info on the 
DIPSY from 2015 when it was breaking news.  Not sure how I missed it, 
but this is a very small unit with a very tiny FPGA (likely the smallest 
FPGA package ever - 2 mm^2) and an LDO for the core power and of course 
some connectors.

I found a github page with various design details and an Indiegogo page. 
  There I found a video from Lattice that says this unit along with a 
programming board (over 100 times larger) is sold for $5.  It looks like 
they received 140% of their goal.  What I don't find is any way to buy 
this.  The video says to go to the lattice web site for more info, but I 
don't find even a mention there.

Maybe that's why I missed this unit.  It may have sold a few copies to 
the original contributors and then been dropped.

Anyone know anything about this?  The guy behind it is Antti Lukats who 
has been seen here from time to time.

-- 

Rick C

Article: 159996
Subject: Re: RISC-V Support in FPGA
From: Kevin Neilson <kevin.neilson@xilinx.com>
Date: Sat, 6 May 2017 10:55:49 -0700 (PDT)
Links: << >> << T >> << A >>

> The particular example I'm thinking of had a 128 in, 128 xor tree that=20
> really shouldn't be any harder to synth than a CRC.  It's a linear=20
> mapping stage in an SP block cipher (like AES, but not AES (which has a=
=20
> relatively weak mixing function)).
>=20
> Vivado gave (IIRC) 11 or 12 levels of logic rather than the expected 3=20
> levels of logic.  Hmmm.  The revised source code (expressed as a bunch of=
=20
> xors) produced 4 levels of logic, and routed to speed.
>=20
Same here.  I have constant multiplier matrices and each has a column weigh=
t of about 160 so I end up with a 160-input XOR for each column.  Ideally t=
hat would be log6(160)=3D2.8 levels.  First I have to use very low-level co=
de and even then Vivado shares subexpressions too much and I end up with 6 =
levels unless I isolate column groups in different modules.  If I isolate e=
ach column in its own module I can get the 3 levels.  Isolating column grou=
ps also means they are placed as a group which reduces wirelengths.

> The Xilinx DSP48E2 has a wide xor mode that I think can give a 96 input=
=20
> xor in a single DSP48E2 slice.  I've never tried it.

Yeah, I looked into this at one point but decided against it for a few reas=
ons.  I thought a nice feature would be to be able to turn off the carries =
in the DSP48 and then you could use them for GF multipliers.  I have used D=
SP48s as GF(2) accumulators and I've used them as transposers to extract co=
lumn data from rows stored in RAMs.

Article: 159997
Subject: Re: Lattice iCE40 UltraLite DIPSY - what happened?
From: "Michael Kellett" <nospam@invalid.com>
Date: sun, 7 may 2017 13:19:41 +0100
Links: << >> << T >> << A >>

rickman:
> I was digging around for info on the iCE40 UL and found info on the 
> DIPSY from 2015 when it was breaking news.  Not sure how I missed it,

> but this is a very small unit with a very tiny FPGA (likely the
smallest 
> FPGA package ever - 2 mm^2) and an LDO for the core power and of
course 
> some connectors.
> 
> I found a github page with various design details and an Indiegogo
page. 
>   There I found a video from Lattice that says this unit along with a

> programming board (over 100 times larger) is sold for $5.  It looks
like 
> they received 140% of their goal.  What I don't find is any way to buy

> this.  The video says to go to the lattice web site for more info, but
I 
> don't find even a mention there.
> 
> Maybe that's why I missed this unit.  It may have sold a few copies to

> the original contributors and then been dropped.
> 
> Anyone know anything about this?  The guy behind it is Antti Lukats
who 
> has been seen here from time to time.
> 
> -- 
> 
> Rick C

Antti still lives and breathes and hangs out a little on Farnell's
Element14 site.

I think he is currently working for Trenz Electronics who  are at
www.trenz-electronic.de

MK

Article: 159998
Subject: Re: Lattice iCE40 UltraLite DIPSY - what happened?
From: rickman <gnuarm@gmail.com>
Date: Sun, 7 May 2017 19:18:36 -0400
Links: << >> << T >> << A >>

On 5/7/2017 8:19 AM, Michael Kellett wrote:
> rickman:
>> I was digging around for info on the iCE40 UL and found info on the
>> DIPSY from 2015 when it was breaking news.  Not sure how I missed it,
>
>> but this is a very small unit with a very tiny FPGA (likely the
> smallest
>> FPGA package ever - 2 mm^2) and an LDO for the core power and of
> course
>> some connectors.
>>
>> I found a github page with various design details and an Indiegogo
> page.
>>   There I found a video from Lattice that says this unit along with a
>
>> programming board (over 100 times larger) is sold for $5.  It looks
> like
>> they received 140% of their goal.  What I don't find is any way to buy
>
>> this.  The video says to go to the lattice web site for more info, but
> I
>> don't find even a mention there.
>>
>> Maybe that's why I missed this unit.  It may have sold a few copies to
>
>> the original contributors and then been dropped.
>>
>> Anyone know anything about this?  The guy behind it is Antti Lukats
> who
>> has been seen here from time to time.
>>
>> --
>>
>> Rick C
>
> Antti still lives and breathes and hangs out a little on Farnell's
> Element14 site.
>
> I think he is currently working for Trenz Electronics who  are at
> www.trenz-electronic.de

Yes, I've seen signs of Antti in various places, but he never seems to 
be terribly thorough about documenting his work.  I just looked at the 
Gerbers for this design and there is a list of the files in a file with 
an EXTREP suffix as if that would be remotely obvious...  Then the drill 
file isn't listed.  After opening nearly all the possible files it 
turned out to be the .txt file.  Go figure.

Anyway, now that I can see the Gerbers, the PCB design rules are 2/2 mil 
trace/space (0.05 mm) and 6 mil drills (0.15 mm).  I've never made a 
board like this before.  It will require professional assembly as well I 
expect.

Anyone hand built a prototype with 0.35 mm center uBGAs before?

While the Indiegogo page has a video from Lattice Semi saying you can 
buy assembled boards from them for $5, I see no trace.  It appears the 
$5 price gets you a PCB directly from Antti according to a home page 
http://dipsy.cool

Hard to tell really.  Lots of hype and little real info.  I guess it's 
all there if you can find it and figure it out.  I just don't know where 
I can get PCBs with 2/2 design rules made.

-- 

Rick C

Article: 159999
Subject: Re: Lattice iCE40 UltraLite DIPSY - what happened?
From: rickman <gnuarm@gmail.com>
Date: Sun, 7 May 2017 23:32:55 -0400
Links: << >> << T >> << A >>

On 5/7/2017 7:18 PM, rickman wrote:
> On 5/7/2017 8:19 AM, Michael Kellett wrote:
>> rickman:
>>> I was digging around for info on the iCE40 UL and found info on the
>>> DIPSY from 2015 when it was breaking news.  Not sure how I missed it,
>>
>>> but this is a very small unit with a very tiny FPGA (likely the
>> smallest
>>> FPGA package ever - 2 mm^2) and an LDO for the core power and of
>> course
>>> some connectors.
>>>
>>> I found a github page with various design details and an Indiegogo
>> page.
>>>   There I found a video from Lattice that says this unit along with a
>>
>>> programming board (over 100 times larger) is sold for $5.  It looks
>> like
>>> they received 140% of their goal.  What I don't find is any way to buy
>>
>>> this.  The video says to go to the lattice web site for more info, but
>> I
>>> don't find even a mention there.
>>>
>>> Maybe that's why I missed this unit.  It may have sold a few copies to
>>
>>> the original contributors and then been dropped.
>>>
>>> Anyone know anything about this?  The guy behind it is Antti Lukats
>> who
>>> has been seen here from time to time.
>>>
>>> --
>>>
>>> Rick C
>>
>> Antti still lives and breathes and hangs out a little on Farnell's
>> Element14 site.
>>
>> I think he is currently working for Trenz Electronics who  are at
>> www.trenz-electronic.de
>
> Yes, I've seen signs of Antti in various places, but he never seems to
> be terribly thorough about documenting his work.  I just looked at the
> Gerbers for this design and there is a list of the files in a file with
> an EXTREP suffix as if that would be remotely obvious...  Then the drill
> file isn't listed.  After opening nearly all the possible files it
> turned out to be the .txt file.  Go figure.
>
> Anyway, now that I can see the Gerbers, the PCB design rules are 2/2 mil
> trace/space (0.05 mm) and 6 mil drills (0.15 mm).  I've never made a
> board like this before.  It will require professional assembly as well I
> expect.
>
> Anyone hand built a prototype with 0.35 mm center uBGAs before?
>
> While the Indiegogo page has a video from Lattice Semi saying you can
> buy assembled boards from them for $5, I see no trace.  It appears the
> $5 price gets you a PCB directly from Antti according to a home page
> http://dipsy.cool
>
> Hard to tell really.  Lots of hype and little real info.  I guess it's
> all there if you can find it and figure it out.  I just don't know where
> I can get PCBs with 2/2 design rules made.

I just reread my post.  I think my comments about Antti's Gerber files 
sound worse than I intended them.  I greatly appreciate the projects 
Antti does.  This is not the first project Antti has made available.  He 
tends toward very minimalistic projects which are very inexpensive. 
Sometimes that is exactly the right way to go.

-- 

Rick C

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search