Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive

Compare FPGA features and resources

Threads starting:

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search

rickman wrote: > On 10/1/2015 3:15 PM, ste3191 wrote: >>> On 9/30/2015 8:13 AM, ste3191 wrote: >>>> Hi, i have a serious problem with the architecture of a correlator for >> a >>>> planar antenna array (16 x 16). >>>> Theoretically i can't implent the normal expression sum(X*X^H) because >> i >>>> would obtain a covariance matrix of 256 x 256. Then i can think to >>>> implement the spatial smoothing technique, namely it takes an average >> of >>>> overlapped subarray, with the advantage to have a smaller covariance >>>> matrix. This is right but is slow technique!! I need efficient and >> fast >>>> method to compute the covariance matrix on FPGA. with a less number of >>>> multiplier possible. Infact for a covariance matrix 16 x 16 i need >> about >>>> 6000 multipliers! So i have seen the correlators based on hard-limiting >> ( >>>> sign+xor + counter) at this link >>>> >>>> >>> https://www.google.it/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&vedDYQFjACahUKEwjwsKjT257IAhVlgXIKHQKFCWw&url=http%3A%2F%2Fhandle.dtic.mil%2F100.2%2FADA337434&usg¯QjCNG5QUylZORV9KFHYizyu1QJZSBM5A&bvm=bv.103627116,d.d2s&cad=rja >>> >>>> >>>> but i don't know if this technique is right, on simulink is very >>> different >>>> from the results of normal correlator. >>>> Can someone help me? >>> >>> Even though your solution will be implemented in an FPGA, I'm not sure >>> the FPGA group is the best place to ask this question since it is about >>> the algorithm more than the FPGA implementation. I am cross posting to >>> the DSP group to see if anyone there has experience with it. >>> >>> That said, you don't say what your data rate and processing rates are. >>> How often do you need to run this calculation? If it is slow enough you >> >>> can use the same multipliers for many computation to produce one result. >> >>> Or will this be run on every data sample at a high rate? >>> >>> -- >>> >>> Rick >> >> Yes, the sampling rate is higher than 80MSPS and i can't share resources. >> I posted it on dsp forum but nobody has answered yet. > > Yes, I saw that. Looks like you beat me to it. lol > > I don't know where else to seek advice. Maybe talk to the FPGA vendors? > I know they have various expertise in applications. Is this something > you will end up building? If so, and it uses a lot of resources, you > should be able to get some application support. > > You know, 80 MHz is not so fast for multiplies or adds. The multiplier > block in most newer FPGAs will run at 100's of MHz. So you certainly > should be able to multiplex the multiplier unit by 4x or more. But that > really doesn't solve your problem if you want to do it on a single chip. > I haven't looked at the high end, but I'm pretty sure they don't put > 1500 multipliers on a chip. But it may put you in the ballpark where > you can do this with a small handful of large FPGAs. Very pricey though. > Actually you can get up to 1,920 DSP slices on a Kintex-7 and considerably more on the Virtex-7 and Virtex Ultrascale devices, however a "multiplier" may eat more than one DSP slice depending on the number of bits you want. On the other hand they are supposed to run at 500 MHz in these parts. -- Gabor

On 10/1/2015 4:14 PM, GaborSzakacs wrote: > rickman wrote: >> On 10/1/2015 3:15 PM, ste3191 wrote: >>>> On 9/30/2015 8:13 AM, ste3191 wrote: >>>>> Hi, i have a serious problem with the architecture of a correlator for >>> a >>>>> planar antenna array (16 x 16). >>>>> Theoretically i can't implent the normal expression sum(X*X^H) because >>> i >>>>> would obtain a covariance matrix of 256 x 256. Then i can think to >>>>> implement the spatial smoothing technique, namely it takes an average >>> of >>>>> overlapped subarray, with the advantage to have a smaller covariance >>>>> matrix. This is right but is slow technique!! I need efficient and >>> fast >>>>> method to compute the covariance matrix on FPGA. with a less number of >>>>> multiplier possible. Infact for a covariance matrix 16 x 16 i need >>> about >>>>> 6000 multipliers! So i have seen the correlators based on >>>>> hard-limiting >>> ( >>>>> sign+xor + counter) at this link >>>>> >>>>> >>>> https://www.google.it/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&vedDYQFjACahUKEwjwsKjT257IAhVlgXIKHQKFCWw&url=http%3A%2F%2Fhandle.dtic.mil%2F100.2%2FADA337434&usg¯QjCNG5QUylZORV9KFHYizyu1QJZSBM5A&bvm=bv.103627116,d.d2s&cad=rja >>>> >>>>> >>>>> but i don't know if this technique is right, on simulink is very >>>> different >>>>> from the results of normal correlator. >>>>> Can someone help me? >>>> >>>> Even though your solution will be implemented in an FPGA, I'm not sure >>>> the FPGA group is the best place to ask this question since it is about >>>> the algorithm more than the FPGA implementation. I am cross posting to >>>> the DSP group to see if anyone there has experience with it. >>>> >>>> That said, you don't say what your data rate and processing rates are. >>>> How often do you need to run this calculation? If it is slow enough >>>> you >>> >>>> can use the same multipliers for many computation to produce one >>>> result. >>> >>>> Or will this be run on every data sample at a high rate? >>>> >>>> -- >>>> >>>> Rick >>> >>> Yes, the sampling rate is higher than 80MSPS and i can't share >>> resources. >>> I posted it on dsp forum but nobody has answered yet. >> >> Yes, I saw that. Looks like you beat me to it. lol >> >> I don't know where else to seek advice. Maybe talk to the FPGA >> vendors? I know they have various expertise in applications. Is this >> something you will end up building? If so, and it uses a lot of >> resources, you should be able to get some application support. >> >> You know, 80 MHz is not so fast for multiplies or adds. The >> multiplier block in most newer FPGAs will run at 100's of MHz. So you >> certainly should be able to multiplex the multiplier unit by 4x or >> more. But that really doesn't solve your problem if you want to do it >> on a single chip. I haven't looked at the high end, but I'm pretty >> sure they don't put 1500 multipliers on a chip. But it may put you in >> the ballpark where you can do this with a small handful of large >> FPGAs. Very pricey though. >> > > Actually you can get up to 1,920 DSP slices on a Kintex-7 and > considerably more on the Virtex-7 and Virtex Ultrascale devices, > however a "multiplier" may eat more than one DSP slice depending > on the number of bits you want. On the other hand they are supposed > to run at 500 MHz in these parts. Are those the $1000 chips? I worked for a test equipment company once and they used a $1500 chip in a product that sold for over $100 k. They initially only used about 20% of the part so they could add more stuff as upgrades. Lots of margin in a $100k product just like there's lots of margin in a $1500 chip. -- Rick

rickman wrote: > On 10/1/2015 4:14 PM, GaborSzakacs wrote: >> rickman wrote: >>> On 10/1/2015 3:15 PM, ste3191 wrote: >>>>> On 9/30/2015 8:13 AM, ste3191 wrote: >>>>>> Hi, i have a serious problem with the architecture of a correlator >>>>>> for >>>> a >>>>>> planar antenna array (16 x 16). >>>>>> Theoretically i can't implent the normal expression sum(X*X^H) >>>>>> because >>>> i >>>>>> would obtain a covariance matrix of 256 x 256. Then i can think to >>>>>> implement the spatial smoothing technique, namely it takes an average >>>> of >>>>>> overlapped subarray, with the advantage to have a smaller covariance >>>>>> matrix. This is right but is slow technique!! I need efficient and >>>> fast >>>>>> method to compute the covariance matrix on FPGA. with a less >>>>>> number of >>>>>> multiplier possible. Infact for a covariance matrix 16 x 16 i need >>>> about >>>>>> 6000 multipliers! So i have seen the correlators based on >>>>>> hard-limiting >>>> ( >>>>>> sign+xor + counter) at this link >>>>>> >>>>>> >>>>> https://www.google.it/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&vedDYQFjACahUKEwjwsKjT257IAhVlgXIKHQKFCWw&url=http%3A%2F%2Fhandle.dtic.mil%2F100.2%2FADA337434&usg¯QjCNG5QUylZORV9KFHYizyu1QJZSBM5A&bvm=bv.103627116,d.d2s&cad=rja >>>>> >>>>> >>>>>> >>>>>> but i don't know if this technique is right, on simulink is very >>>>> different >>>>>> from the results of normal correlator. >>>>>> Can someone help me? >>>>> >>>>> Even though your solution will be implemented in an FPGA, I'm not sure >>>>> the FPGA group is the best place to ask this question since it is >>>>> about >>>>> the algorithm more than the FPGA implementation. I am cross >>>>> posting to >>>>> the DSP group to see if anyone there has experience with it. >>>>> >>>>> That said, you don't say what your data rate and processing rates are. >>>>> How often do you need to run this calculation? If it is slow enough >>>>> you >>>> >>>>> can use the same multipliers for many computation to produce one >>>>> result. >>>> >>>>> Or will this be run on every data sample at a high rate? >>>>> >>>>> -- >>>>> >>>>> Rick >>>> >>>> Yes, the sampling rate is higher than 80MSPS and i can't share >>>> resources. >>>> I posted it on dsp forum but nobody has answered yet. >>> >>> Yes, I saw that. Looks like you beat me to it. lol >>> >>> I don't know where else to seek advice. Maybe talk to the FPGA >>> vendors? I know they have various expertise in applications. Is this >>> something you will end up building? If so, and it uses a lot of >>> resources, you should be able to get some application support. >>> >>> You know, 80 MHz is not so fast for multiplies or adds. The >>> multiplier block in most newer FPGAs will run at 100's of MHz. So you >>> certainly should be able to multiplex the multiplier unit by 4x or >>> more. But that really doesn't solve your problem if you want to do it >>> on a single chip. I haven't looked at the high end, but I'm pretty >>> sure they don't put 1500 multipliers on a chip. But it may put you in >>> the ballpark where you can do this with a small handful of large >>> FPGAs. Very pricey though. >>> >> >> Actually you can get up to 1,920 DSP slices on a Kintex-7 and >> considerably more on the Virtex-7 and Virtex Ultrascale devices, >> however a "multiplier" may eat more than one DSP slice depending >> on the number of bits you want. On the other hand they are supposed >> to run at 500 MHz in these parts. > > Are those the $1000 chips? I worked for a test equipment company once > and they used a $1500 chip in a product that sold for over $100 k. They > initially only used about 20% of the part so they could add more stuff > as upgrades. Lots of margin in a $100k product just like there's lots > of margin in a $1500 chip. > The list price for the XC7K410T, which has 1,540 DSP slices starts at about $1,300. A DSP slice includes a 25 x 18 bit signed multiplier. The list price (you can see it at Digikey) for the largest Kintex-7 is around $3,000. Virtex-7 is more expensive. I'm not suggesting this as a solution unless there's no other way, including using several devices which often saves money over using the largest available ones. On the other hand you suggested that you can't get 1,500 multipliers in an FPGA, and I was just pointing out that in fact you can get that many and even more if you have the money to pay for it. If you can figure out how to partition the design into say 3 or 4 pieces, you can use an XC7K160T with 600 DSP units starting at about $210 each. This seems to be the sweet spot (for now) in price per DSP in that series. An Artix XC7A200T is in the same price range with a bit more logic and 740 DSP slices, but the fabric is a bit slower in that series. My guess is that Altera has a range of parts with similar multiplier counts, since they generally compete head to head with Xilinx and at this point the Xilinx 7-series is old news. -- Gabor

>rickman wrote: >> On 10/1/2015 4:14 PM, GaborSzakacs wrote: >>> rickman wrote: >>>> On 10/1/2015 3:15 PM, ste3191 wrote: >>>>>> On 9/30/2015 8:13 AM, ste3191 wrote: >>>>>>> Hi, i have a serious problem with the architecture of a correlator >>>>>>> for >>>>> a >>>>>>> planar antenna array (16 x 16). >>>>>>> Theoretically i can't implent the normal expression sum(X*X^H) >>>>>>> because >>>>> i >>>>>>> would obtain a covariance matrix of 256 x 256. Then i can think to >>>>>>> implement the spatial smoothing technique, namely it takes an >average >>>>> of >>>>>>> overlapped subarray, with the advantage to have a smaller covariance >>>>>>> matrix. This is right but is slow technique!! I need efficient and >>>>> fast >>>>>>> method to compute the covariance matrix on FPGA. with a less >>>>>>> number of >>>>>>> multiplier possible. Infact for a covariance matrix 16 x 16 i need >>>>> about >>>>>>> 6000 multipliers! So i have seen the correlators based on >>>>>>> hard-limiting >>>>> ( >>>>>>> sign+xor + counter) at this link >>>>>>> >>>>>>> >>>>>> >https://www.google.it/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&vedDYQFjACahUKEwjwsKjT257IAhVlgXIKHQKFCWw&url=http%3A%2F%2Fhandle.dtic.mil%2F100.2%2FADA337434&usgÂ¯QjCNG5QUylZORV9KFHYizyu1QJZSBM5A&bvm=bv.103627116,d.d2s&cad=rja > >>>>>> >>>>>> >>>>>>> >>>>>>> but i don't know if this technique is right, on simulink is very >>>>>> different >>>>>>> from the results of normal correlator. >>>>>>> Can someone help me? >>>>>> >>>>>> Even though your solution will be implemented in an FPGA, I'm not >sure >>>>>> the FPGA group is the best place to ask this question since it is >>>>>> about >>>>>> the algorithm more than the FPGA implementation. I am cross >>>>>> posting to >>>>>> the DSP group to see if anyone there has experience with it. >>>>>> >>>>>> That said, you don't say what your data rate and processing rates >are. >>>>>> How often do you need to run this calculation? If it is slow enough >>>>>> you >>>>> >>>>>> can use the same multipliers for many computation to produce one >>>>>> result. >>>>> >>>>>> Or will this be run on every data sample at a high rate? >>>>>> >>>>>> -- >>>>>> >>>>>> Rick >>>>> >>>>> Yes, the sampling rate is higher than 80MSPS and i can't share >>>>> resources. >>>>> I posted it on dsp forum but nobody has answered yet. >>>> >>>> Yes, I saw that. Looks like you beat me to it. lol >>>> >>>> I don't know where else to seek advice. Maybe talk to the FPGA >>>> vendors? I know they have various expertise in applications. Is this >>>> something you will end up building? If so, and it uses a lot of >>>> resources, you should be able to get some application support. >>>> >>>> You know, 80 MHz is not so fast for multiplies or adds. The >>>> multiplier block in most newer FPGAs will run at 100's of MHz. So you >>>> certainly should be able to multiplex the multiplier unit by 4x or >>>> more. But that really doesn't solve your problem if you want to do it >>>> on a single chip. I haven't looked at the high end, but I'm pretty >>>> sure they don't put 1500 multipliers on a chip. But it may put you in >>>> the ballpark where you can do this with a small handful of large >>>> FPGAs. Very pricey though. >>>> >>> >>> Actually you can get up to 1,920 DSP slices on a Kintex-7 and >>> considerably more on the Virtex-7 and Virtex Ultrascale devices, >>> however a "multiplier" may eat more than one DSP slice depending >>> on the number of bits you want. On the other hand they are supposed >>> to run at 500 MHz in these parts. >> >> Are those the $1000 chips? I worked for a test equipment company once >> and they used a $1500 chip in a product that sold for over $100 k. They >> initially only used about 20% of the part so they could add more stuff >> as upgrades. Lots of margin in a $100k product just like there's lots >> of margin in a $1500 chip. >> > >The list price for the XC7K410T, which has 1,540 DSP slices starts at >about $1,300. A DSP slice includes a 25 x 18 bit signed multiplier. >The list price (you can see it at Digikey) for the largest Kintex-7 is >around $3,000. Virtex-7 is more expensive. I'm not suggesting this as >a solution unless there's no other way, including using several devices >which often saves money over using the largest available ones. On the >other hand you suggested that you can't get 1,500 multipliers in an >FPGA, and I was just pointing out that in fact you can get that many and >even more if you have the money to pay for it. If you can figure out >how to partition the design into say 3 or 4 pieces, you can use an >XC7K160T with 600 DSP units starting at about $210 each. This seems >to be the sweet spot (for now) in price per DSP in that series. An >Artix XC7A200T is in the same price range with a bit more logic and >740 DSP slices, but the fabric is a bit slower in that series. > >My guess is that Altera has a range of parts with similar multiplier >counts, since they generally compete head to head with Xilinx and at >this point the Xilinx 7-series is old news. > >-- >Gabor A model of Virtex7 has more of 3000 multipliers at 600 MHz, but the problem isn't the price but the way for compute or estimate efficiently the large matrix. --------------------------------------- Posted through http://www.FPGARelated.com

On 10/2/2015 9:20 AM, GaborSzakacs wrote: > > The list price for the XC7K410T, which has 1,540 DSP slices starts at > about $1,300. A DSP slice includes a 25 x 18 bit signed multiplier. > The list price (you can see it at Digikey) for the largest Kintex-7 is > around $3,000. Virtex-7 is more expensive. I'm not suggesting this as > a solution unless there's no other way, including using several devices > which often saves money over using the largest available ones. On the > other hand you suggested that you can't get 1,500 multipliers in an > FPGA, and I was just pointing out that in fact you can get that many and > even more if you have the money to pay for it. If you can figure out > how to partition the design into say 3 or 4 pieces, you can use an > XC7K160T with 600 DSP units starting at about $210 each. This seems > to be the sweet spot (for now) in price per DSP in that series. An > Artix XC7A200T is in the same price range with a bit more logic and > 740 DSP slices, but the fabric is a bit slower in that series. > > My guess is that Altera has a range of parts with similar multiplier > counts, since they generally compete head to head with Xilinx and at > this point the Xilinx 7-series is old news. Yes, thank you for bringing me up to date. I tend to work at the lower end where you are happy if the parts *have* multipliers. lol -- Rick

I guess I stopped looking at the Microsemi products some time back. The SOC devices put out by Actel were ok, but the price was up there even for the smallest one, around $50. I was looking on Digikey and it seems their prices have come down and the new Smart Fusion 2 devices are even lower. The cheapest part is $16 qty at Digikey. These SOCs don't have any analog unless you consider the crystal clock to be analog, lol. Still, they have all the digital stuff you might want and are a lot cheaper than the Xilinx and Altera SOC lines. BTW, I found that Mouser doesn't seem to carry Xilinx anymore. When I search on Xilinx on the Mouser site they bring up the Altera page, lol. -- Rick

I did a couple of DDR controllers and used Verilog models from Micron and t= hey worked really well. I think I made one change to the source. The mode= l was really slow, so I put in a Modelsim directive to allocate the main ar= ray as a sparse matrix, so it would only allocate RAM (on the simulating co= mputer) as it was accessed. The models caught all sorts of obscure errors,= like not waiting 3.5 cycles to access a row in the same bank of a row that= was accessed within the last fortnight or whatever.

You might be referring to the technique of expressing a number in CSD (Canonical Signed Digits), which reduces the number of nonzero bits in a number.

I just built an optimized Galois Field vector multiplier, which multiplies = a vector by a scalar. As an experiment, I split it into two parts, one tha= t was common to all elements of the vector, and then the parts that were un= ique to each element of the vector. I had assumed this is something the sy= nthesizer would do anyway, but I was surprised to find that writing it the = way I did cut down the number of LUTs by a big margin.

I've had to work at the low end, where the part is always full and I have t= o fake multiplication with lookup tables. Now I'm at the other end, where = the volumes are low and the customer doesn't care about FPGA price so the p= arts are huge. They must cost a fortune. I still waste a lot of time of P= AR issues, but it's wonderful having more gates, DSPs, and blockRAMs than I= could ever need.

On 10/3/2015 3:37 PM, Kevin Neilson wrote: > I just built an optimized Galois Field vector multiplier, which multiplies a vector by a scalar. As an experiment, I split it into two parts, one that was common to all elements of the vector, and then the parts that were unique to each element of the vector. I had assumed this is something the synthesizer would do anyway, but I was surprised to find that writing it the way I did cut down the number of LUTs by a big margin. Why is it using LUTs instead of multipliers? Are these numbers too small and too many to use the multipliers efficiently? -- Rick

On 10/3/2015 3:42 PM, Kevin Neilson wrote: > I've had to work at the low end, where the part is always full and I have to fake multiplication with lookup tables. Now I'm at the other end, where the volumes are low and the customer doesn't care about FPGA price so the parts are huge. They must cost a fortune. I still waste a lot of time of PAR issues, but it's wonderful having more gates, DSPs, and blockRAMs than I could ever need. Personally I enjoy the challenge of fitting tight designs. To me trying to get a part to meet timing is not as much fun as getting a part to fit the device. I find timing analysis to be very tedious as you get literally hundreds of failed path reports from what is basically the same endpoints, just many variations. This makes it hard to see the next longer path that is also failing. Reminds me of debugging a program one mistake at a time in the old days when my first pass would have many bugs... and the new days too sometimes. lol Fitting can have very interesting tradeoffs. Often they are algorithmic and require learning new ways of calculating results. I find that very interesting. -- Rick

I agree, the SmartFusion2 devices are actually very competitive and they ha= ve some strengths that are absent from their competition. They are flash-b= ased, which has some distinct benefits: * No external configuration device is required. * The device is instant-on, i.e. you do not have this long dead configurati= on period. * The flash gate architecture provides inherent single-event-upset (SEU) im= munity. For long term reliability, SRAM-based FPGA's are vulnerable to err= or events when struck by cosmic rays. This is a strength, especially in ae= ronautics/space. * When all the clocks are stopped, the flash architecture consumes very lit= tle standby current. In general, these devices are super low power. For an SoC, the SmartFusion2 security features are really superior. Secure= key storage with active mesh protection and tamper detection, embedded AES= 256/SHA256, the physically uncloneable function (PUF) and the true random n= umber generator of the S devices are ideal for secure machine-to-machine co= mmunication and for protecting IP. Also, they have certified anti-key-hacking mechanisms like differential pow= er analysis (DPA) resistance. Read up on how FPGA keys can be compromised = with DPA: http://www.microsemi.com/document-portal/doc_download/131563-protecting-fpg= as-from-power-analysis In terms of development, Avnet recently started selling a super low-cost de= velopment board, the SmartFusion2 KickStart kit, that has a 10k gate SoC wi= th a 166MHz Cortex M3 - the M2S010S. It is only $59.95 and is a little USB= -powered module in the Arduino form factor with some sensors and PMODs for = expansion. Their reference design examples make it pretty quick to get up = and running. http://www.em.avnet.com/en-us/design/drc/Pages/Microsemi-SmartFusion2-KickS= tart-Development-Kit.aspx=20 SR

I agree, the SmartFusion2 devices are actually very competitive and they ha= ve some strengths that are absent from their competition. They are flash-b= ased, which has some distinct benefits: * No external configuration device is required. * The device is instant-on, i.e. you do not have this long dead configurati= on period. * The flash gate architecture provides inherent single-event-upset (SEU) im= munity. For long term reliability, SRAM-based FPGA's are vulnerable to err= or events when struck by cosmic rays. This is a strength, especially in ae= ronautics/space. * When all the clocks are stopped, the flash architecture consumes very lit= tle standby current. In general, these devices are super low power. For an SoC, the SmartFusion2 security features are really superior. Secure= key storage with active mesh protection and tamper detection, embedded AES= 256/SHA256, the physically uncloneable function (PUF) and the true random n= umber generator of the S devices are ideal for secure machine-to-machine co= mmunication and for protecting IP. Also, they have certified anti-key-hacking mechanisms like differential pow= er analysis (DPA) resistance. Read up on how FPGA keys can be compromised = with DPA: http://www.microsemi.com/document-portal/doc_download/131563-protecting-fpg= as-from-power-analysis In terms of development, Avnet recently started selling a super low-cost de= velopment board, the SmartFusion2 KickStart kit, that has a 10k gate SoC wi= th a 166MHz Cortex M3 - the M2S010S. It is only $59.95 and is a little USB= -powered module in the Arduino form factor with some sensors and PMODs for = expansion. There is a Bluetooth LE module on the board and some Android & W= indows demo's. Their reference design examples make it pretty quick to get = up and running. http://www.em.avnet.com/en-us/design/drc/Pages/Microsemi-SmartFusion2-KickS= tart-Development-Kit.aspx SR=20

On 10/3/2015 9:39 PM, zoomboom718@gmail.com wrote: > I agree, the SmartFusion2 devices are actually very competitive and they have some strengths that are absent from their competition. They are flash-based, which has some distinct benefits: > * No external configuration device is required. > * The device is instant-on, i.e. you do not have this long dead configuration period. > * The flash gate architecture provides inherent single-event-upset (SEU) immunity. For long term reliability, SRAM-based FPGA's are vulnerable to error events when struck by cosmic rays. This is a strength, especially in aeronautics/space. Not trying to be retarded, as I have not checked the data sheet on this, but isn't the FPGA fabric SRAM based and loaded (albeit more quickly than a serial config) from the internal Flash? I'm more familiar with Lattice Flash parts and that's what they do. Instead of large fractions of a second the config time is a couple of ms. The SRAM allows you to change the config from JTAG without flashing the part. Don't the MicroSemi parts do that too? I know Actel (now MicroSemi) is *very* familiar with the aerospace market. I expect that is a large part of their sales. > * When all the clocks are stopped, the flash architecture consumes very little standby current. In general, these devices are super low power. I *did* glance at the data sheet about this. They are better than parts from the big two, but Lattice has parts that are much better than the numbers I saw. Still, this is an SoC and Lattice isn't there yet. > For an SoC, the SmartFusion2 security features are really superior. Secure key storage with active mesh protection and tamper detection, embedded AES256/SHA256, the physically uncloneable function (PUF) and the true random number generator of the S devices are ideal for secure machine-to-machine communication and for protecting IP. > Also, they have certified anti-key-hacking mechanisms like differential power analysis (DPA) resistance. Read up on how FPGA keys can be compromised with DPA: > http://www.microsemi.com/document-portal/doc_download/131563-protecting-fpgas-from-power-analysis I won't say I understand it, but I have seen somethings about this. But "true random number generator"??? My understanding is this is virtually impossible. I haven't read about this. Is it based on noise from a diode or something? I recall a researcher trying that and it was good, but he could never find the source of a long term DC bias. > In terms of development, Avnet recently started selling a super low-cost development board, the SmartFusion2 KickStart kit, that has a 10k gate SoC with a 166MHz Cortex M3 - the M2S010S. It is only $59.95 and is a little USB-powered module in the Arduino form factor with some sensors and PMODs for expansion. Their reference design examples make it pretty quick to get up and running. > http://www.em.avnet.com/en-us/design/drc/Pages/Microsemi-SmartFusion2-KickStart-Development-Kit.aspx I saw that and thought, DARN IT! Often the manufacturer produces rather expensive eval boards (which MicroSemi did in this case) but Avnet spun a low cost one. I was hoping to find a market for a new product. But the low cost board is lacking a lot of I/O features like Ethernet. Maybe there is some potential for an add on board to bring it up to eval board functionality. They have a development board that looks like it has every bell and whistle in the book! I don't think I *want* to duplicate that. I wish I had an app for this device. I also wish it were a bit cheaper still. BTW, they have training on the KickStart kit including a kit for $100. I'm not sure why they are asking for the $40 above the cost of the kit. That isn't even paying the trainer to show up! I'm having a little trouble finding much info on board routing the VF256 package. It seems to not be included in most of their info. I guess it would be the same as the VF400 package? Seems every chip maker has a different name for the same package. -- Rick

On Sat, 03 Oct 2015 15:51:34 -0400, rickman wrote: > On 10/3/2015 3:37 PM, Kevin Neilson wrote: >> I just built an optimized Galois Field vector multiplier, which >> multiplies a vector by a scalar. As an experiment, I split it into two >> parts, one that was common to all elements of the vector, and then the >> parts that were unique to each element of the vector. I had assumed >> this is something the synthesizer would do anyway, but I was surprised >> to find that writing it the way I did cut down the number of LUTs by a >> big margin. > > Why is it using LUTs instead of multipliers? Are these numbers too > small and too many to use the multipliers efficiently? Kevin is talking about Galois Field multipliers. The integer multiplier blocks are useless for that. I occasionally implement cryptographic primitives in FPGAs. These often use a combination of linear mixing functions and (other stuff which provides nonlinearity, which I don't need to talk about here). The linear mixing functions often contain GF multiplications (sometimes by constants). It's possible to express the linear mixing functions in HDL behaviourally (e.g. including things that are recognisably GF multipliers), and it's also possible to express them as a sea of XOR gates. My experience using the latest tools from Xilinx is that for a particular 128 bit mixing function I was getting three times as many levels of logic from from the behavioural source as I was from the sea of XOR gates source, even though both described the same function. BTW, one runs into similar problems when calculating CRCs of wide buses. The CRCs also simplify to XOR trees, but in this case we are calculating the remainder after a division, rather than a multiplication. (And yes, the integer DSP blocks are useless for this too.) Regards, Allan

On 10/4/2015 2:01 AM, Allan Herriman wrote: > On Sat, 03 Oct 2015 15:51:34 -0400, rickman wrote: > >> On 10/3/2015 3:37 PM, Kevin Neilson wrote: >>> I just built an optimized Galois Field vector multiplier, which >>> multiplies a vector by a scalar. As an experiment, I split it into two >>> parts, one that was common to all elements of the vector, and then the >>> parts that were unique to each element of the vector. I had assumed >>> this is something the synthesizer would do anyway, but I was surprised >>> to find that writing it the way I did cut down the number of LUTs by a >>> big margin. >> >> Why is it using LUTs instead of multipliers? Are these numbers too >> small and too many to use the multipliers efficiently? > > > Kevin is talking about Galois Field multipliers. The integer multiplier > blocks are useless for that. We are talking modulo 2 multiplies at every bit, otherwise known as AND gates with no carry? I'm a bit fuzzy on this. Now I'm confused by Kevin's description. If the vector is multiplied by a scalar, what parts are common and what parts are unique? What parts of this are fixed vs. variable? The only parts a tool can optimize are the fixed operands. Or I am totally missing the concept. > I occasionally implement cryptographic primitives in FPGAs. These often > use a combination of linear mixing functions and (other stuff which > provides nonlinearity, which I don't need to talk about here). The > linear mixing functions often contain GF multiplications (sometimes by > constants). > > It's possible to express the linear mixing functions in HDL behaviourally > (e.g. including things that are recognisably GF multipliers), and it's > also possible to express them as a sea of XOR gates. > > My experience using the latest tools from Xilinx is that for a particular > 128 bit mixing function I was getting three times as many levels of logic > from from the behavioural source as I was from the sea of XOR gates > source, even though both described the same function. I don't know enough of how the tools work to say what is going on. I just know that when I dig into the output of the tools for poorly synthesized code, I find the problems are that my code doesn't specify the simple structure I had imagined it did. So I fix my code. :) Mostly this has to do with trying to use the carry out the top of adders. Sometimes I get two adders. > BTW, one runs into similar problems when calculating CRCs of wide buses. > The CRCs also simplify to XOR trees, but in this case we are calculating > the remainder after a division, rather than a multiplication. (And yes, > the integer DSP blocks are useless for this too.) Yep, you don't want a carry, so don't even think about using adders or multipliers. They are not adders and multipliers in every type of algebra. -- Rick

Am Sonntag, 4. Oktober 2015 04:23:38 UTC+2 schrieb rickman: > Not trying to be retarded, as I have not checked the data sheet on this, > but isn't the FPGA fabric SRAM based and loaded (albeit more quickly > than a serial config) from the internal Flash? The functional configuration of each cell is controlled by distributed flash. Else they would not be able to reach their SEE immunity for configuration, and would have trouble reaching their bootup times (aka instant-on). regards, Thomas

On 10/5/2015 7:06 AM, Thomas Stanka wrote: > Am Sonntag, 4. Oktober 2015 04:23:38 UTC+2 schrieb rickman: >> Not trying to be retarded, as I have not checked the data sheet on >> this, but isn't the FPGA fabric SRAM based and loaded (albeit more >> quickly than a serial config) from the internal Flash? > > The functional configuration of each cell is controlled by > distributed flash. Else they would not be able to reach their SEE > immunity for configuration, and would have trouble reaching their > bootup times (aka instant-on). I guess I am just hardwired to think of the config memory as RAM. But I don't see how they make the rest of the chip immune to SEE. The logic units have a FF which must be immune which I don't see described. I also don't see mention of the fabric memory being SEE immune. I guess they bury that in some radiation related document somewhere. I do see where the refer to certain part of the chip as only "SEU Resistant". -- Rick

> > I won't say I understand it, but I have seen somethings about this. But > "true random number generator"??? My understanding is this is virtually > impossible. I haven't read about this. Is it based on noise from a > diode or something? I recall a researcher trying that and it was good, > but he could never find the source of a long term DC bias. > I don't know how these guys do it, but you can make a decent true random number generator with ring oscillators. I read one paper that described using several of these along with non-linear feedback shift registers to get a good random number.

Not only can I not use the DSP blocks, but the carry chains don't work eith= er. The only way to do a big XOR is with LUTs, and at my clock speeds, I c= an only do about 3 LUT levels. At least there are plenty of BRAMs in Virte= x parts now, so it's easy to do logs, inverses, and exponentiation.

>=20 > We are talking modulo 2 multiplies at every bit, otherwise known as AND= =20 > gates with no carry? I'm a bit fuzzy on this. >=20 > Now I'm confused by Kevin's description. If the vector is multiplied by= =20 > a scalar, what parts are common and what parts are unique? What parts=20 > of this are fixed vs. variable? The only parts a tool can optimize are= =20 > the fixed operands. Or I am totally missing the concept. >=20 Say you're multiplying a by a vector [b c d]. Let's say we're using the fi= eld GF(8) so a is 3 bits. Now a can be thought of as ( a0*alpha^0 + a1*alp= ha^1 + a2*alpha^2 ), where a0 is bit 0 of a, and alpha is the primitive ele= ment of the field. Then a*b or a*c or a*d is just a sum of some combinatio= n of those 3 values in the parentheses, depending upon the locations of the= 1s in b, c, or d. So you can premultiply the three values in the parenthe= ses (the common part) and then take sums of subsets of those three (the ind= ividual parts). It's all a bunch of XORs at the end. This is just a compl= icated way of saying that by writing the HDL at a more explicit level, the = synthesizer is better able to find common factors and use a lot fewer gates= .

About SEU, I think the way this works is that yes as with any SRAM device t= he data or settings can be disrupted by an SEU and there is nothing one can= do about that. The device could possibly recover from that though. But i= f the configuration, i.e. the logic and connection grid, gets disrupted, th= e chances of irreversible damage is much greater and less likely to be temp= orary. The flash-based gate structure protects logic configuration against= that. The SF2 random number generator is clever and seeded by, amongst other sour= ces, RAM power-up conditions, shown to be "random enough". How random is r= andom enough for cryptography? This is a good article with some good comme= nts in it: http://www.eetimes.com/author.asp?section_id=3D36&doc_id=3D1326572 So "SP800-90 cryptographic-grade Non-Deterministic Random Bit Generator" mi= ght be more correct than "true RNG". So it follows recommendations in a NI= ST special publication on seeding, checking and maintaining random bits. Y= ou would probably have to invent something new to improve on that. In terms of Ethernet on the KickStart eval/development board, one can add a= n Arduino shield for that. There is Bluetooth on-board but not Ethernet. SR

rickman wrote: > BTW, I found that Mouser doesn't seem to carry Xilinx anymore. When I > search on Xilinx on the Mouser site they bring up the Altera page, lol. > Xilinx is only carried through Digi-Key and Avnet, for the last few years. Jon

On 10/5/2015 1:29 PM, Kevin Neilson wrote: >> >> We are talking modulo 2 multiplies at every bit, otherwise known as >> AND gates with no carry? I'm a bit fuzzy on this. >> >> Now I'm confused by Kevin's description. If the vector is >> multiplied by a scalar, what parts are common and what parts are >> unique? What parts of this are fixed vs. variable? The only parts >> a tool can optimize are the fixed operands. Or I am totally >> missing the concept. >> > Say you're multiplying a by a vector [b c d]. Let's say we're using > the field GF(8) so a is 3 bits. Now a can be thought of as ( > a0*alpha^0 + a1*alpha^1 + a2*alpha^2 ), where a0 is bit 0 of a, and > alpha is the primitive element of the field. Then a*b or a*c or a*d > is just a sum of some combination of those 3 values in the > parentheses, depending upon the locations of the 1s in b, c, or d. > So you can premultiply the three values in the parentheses (the > common part) and then take sums of subsets of those three (the > individual parts). It's all a bunch of XORs at the end. This is > just a complicated way of saying that by writing the HDL at a more > explicit level, the synthesizer is better able to find common factors > and use a lot fewer gates.. Ok, I'm not at all familiar with GFs. I see now a bit of what you are saying. But to be honest, I don't know the tools would have any trouble with the example you give. The tools are pretty durn good at optimizing... *but*... there are two things to optimize for, size and performance. They are sometimes mutually exclusive, sometimes not. If you ask the tool to give you the optimum size, I don't think you will do better if you code it differently, while describing *exactly* the same behavior. If you ask the tool to optimize for speed, the tool will feel free to duplicate logic if it allows higher performance, for example, by combining terms in different ways. Or less logic may require a longer chain of LUTs which will be slower. LUT sizes in FPGAs don't always match the logical breakdown so that speed or size can vary a lot depending on the partitioning. -- Rick

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive

Compare FPGA features and resources

Threads starting:

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search