Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive

Compare FPGA features and resources

Threads starting:

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search

On 4/2/2015 6:52 PM, already5chosen@yahoo.com wrote: > On Friday, April 3, 2015 at 1:03:23 AM UTC+3, rickman wrote: >> On 4/2/2015 5:30 PM, already5chosen@yahoo.com wrote: >>> On Thursday, April 2, 2015 at 2:28:33 AM UTC+3, Rob Gaddi wrote: >>>> On Wed, 01 Apr 2015 19:10:55 -0400, rickman wrote: >>>> >>>>> On 4/1/2015 1:27 PM, John Speth wrote: >>>> >>>>>> I've used both example products with great success. As you said, it's >>>>>> real convenient to roll your own peripherals with impunity. It saved >>>>>> me hours of coding effort when you can smartly implement the peripheral >>>>>> of your dreams with a little HW design. >>>>> >>>>> The part that gets me about the newer versions of this theme is that >>>>> they are large, pricey FPGAs and incorporate fairly high end CPUs which >>>>> are typically programmed under Linux... a very far cry from the >>>>> efficient solution I would like to see. There are few engineers who can >>>>> even design the entire system on that chip spanning logic design and >>>>> system programming. >>>> >>>> Agreed. We're looking hard at both Zynq and the Cyclone V SOC, both of >>>> which have big monster Cortex A9s meant to run Linux with a mess of DRAM >>>> and etc. Which, I mean we can make work. But if I could get a 10-20KLUT >>>> FPGA with a dual or quad Cortex M4 instead? Nice and light with every >>>> intention of running bare metal with 10-20K of code? I'd take it in a >>>> heartbeat. >>>> >>>> -- >>>> Rob Gaddi, Highland Technology -- www.highlandtechnology.com >>>> Email address domain is currently out of order. See above to fix. >>> >>> Cortex-M is most useful when you have good chunk of flash on the same die. Which, unfortunately, would be incompatible with silicon tech used for moderm FPGAs. >> >> That is a bit of nonsense unless you consider Lattice and MicroSemi to >> not be using "modern" FPGA processes. They include Flash in their >> devices for the configuration memory. >> > > Well, you are right, I am not familiar with Lattice and MicroSemi. From the little I know about them their FPGA are modern in a sense that they a new products and, may be, modern in specific system-level features, but when it comes to size and performance of the fabric, including such important to some of us characteristic as dynamic power per watt (static power is probably o.k) they are at least 5 years behind X&A, but probably more than 5. Sometimes I get really tired of of Thunderbird. When I reply to a post the quoted text is just as likely as not to extend beyond the margin and off the screen... a bloody nuisance I tell you. Anyway, I think you are not familiar at all with the Lattice products. They have lines of FPGAs that are RAM based like the X and A parts and are likely a generation behind in many terms... but you shouldn't focus on the things that are intangible to you. Do you really care what geometry a part is made in? No, you care whether your design will fit, if it will run fast enough and how much the part costs. I think unless you need the largest parts in the X or A line the L parts will do the job competitively. I would also mention that you can thank L for the availability of SERDES in lower cost FPGAs. Lattice was the first to offer that and X and A only followed begrudgingly I think. The other parts, like the XOx and XPx lines can't be compared to anything X or A makes (unless they've come out with something in the last 6 months) because X and A have shied away from the Flash based market. They are adequately fast and have brought the price down to a point where they are competitive against MCUs in some apps. So saying L is 5 years behind is probably no accurate and not very useful. >>> DACs and SAR ADCs are also problem. Delta-sigma ADCs probably less so, but I am not an expert. Anyway, for apps that I acre about SAR is more useful than delta-sigma. >> >> Once again you should tell that to MicroSemi... They make a mixed signal >> FPGA with CPU, analog and FPGA on one die. I don't use it because of >> the price, a bit higher than I like to see. >> >> >>> Due to all these factors small embedded solution based on Cortex-M integrated into FPGA is likely to and up more complex, using more chips and more expensive than solution based on Cortex-M (or even Cortex-R) MCU + FPGA. >> >> I think you are saying that an FPGA with internal MCU is not as useful >> as separate FPGA and MCU because the MCU will have lots of other stuff >> integrated that would be additional chips with the integrated approach. > > Yes, NOR flash, ADCs, DACs. > >> Clearly it doesn't have to be that ways since at least one company >> makes such parts. >> >> >>> The ugly part about MCU + FPGA solution is that, unlike chips from the past, small modern Cortex-M MCUs rarely have good bus to talk to FPGA (good=simple, not to slow and not too many pins). But then again, those old 32-bit MCUs that had buses that I liked were in $25+ price range. For fair comparison I probably have to look at old 8-bitter that I never even tried to connect to FPGA. >> >> That brings us back to the real differences between the MCU world and >> the typical FPGA world. MCUs are intended for apps where speed is >> limited by the software. FPGAs are intended for apps where speed is >> potentially much faster with the limitation potentially in the I/O. So >> a typical high end FPGA will have lots of I/O and some very fast I/O. >> > > That about right, except that I am not talking about high-end FPGAs, but about modern "low-cost" lines of A&X. So, fast I/O optional and very fast I/O is rarely even an option (fast=1-3.125 Gbit/s, very fast= >3.125). > But for MCU<->FPGA interface I will be mostly satisfied in much more moderate speed. Say, something logically similar to venerable LPC bus, but without 24-bit address space limit (28 bits probably acceptable) and with physical layer of RGMII. "High" speed is relative. Integrated MCUs would have direct bus mapped access to FPGA connections which clearly would run at full speed depending on your FPGA design. Multi-die solutions would need to be bit banged I/O from the MCU or use some peripheral like SPI or Ethernet. As you say, there ain't no buses on many MCUs anymore. >> But such an integrated MCU/FPGA device would not be intended for high >> end apps with Mbps I/O. The FPGA would be adding special functionality >> that perhaps can't be done in the MCU alone. I had a design that >> required exactly this sort of need and ended up having to use an FPGA >> with an attached CODEC since there were no MCUs which could implement >> one interface. The FPGA was a bit jammed up in terms of capacity (only >> 3 kLUT). A small soft core could do most of the work and potentially >> free up some space. Had a combined chip been available it would have >> been a breeze to implement the one interface in hardware (or maybe two) >> and the rest of the design in software. >> >> >>> Back to another reason why I think that hard ARM Cortex-M4 core in [Altera or Xilinx] FPGA does not look as a very good proposition: >>> The added value of M3/M4 core alone, without flash and mixed-signal peripherals, is not that big. After all Nios2-f core (only core, without debug support and avalon-mm infrastructure around it) occupies only ~25% of the smallest Cyclone4 device or ~7% of the smallest Cyclone5-E and achieves comparable performance. As far as I am concerned, the main advantage of Cortex-M is a code density - significantly more code can fits on-chip. But even that is less important if were are talking about Cyclone5 generation, because here the smallest member has 140 KB of embedded memory (not counting MLABs), which is often enough. >> >> Yep, the low end MCU on an FPGA without any of the peripherals would not >> be a lot more interesting than a soft core. > > Just a nitpick - by definition there is no such thing as "MCU without any of the peripherals". Let's call them "MCU-style hard cores" or just "ARM Cortex-M4" because this particular core looks like the most logical (or least illogical) candidate. Not sure what that means, but whatever. Potatoes, Patahtoes. >> So when will they be doing >> a better job of the Vulcan mind meld and getting more analog on the FPGA >> die? It's not like there is anything so special about FPGA logic that >> can't be done in analog compatible processes. Maybe you lose some >> density or performance, but that isn't what we are after. At least *I* >> am looking for a system on chip which includes some FPGA fabric. Don't >> think of it as an FPGA with an MCU on chip. > > Yes, could be nice. But to be real useful FPGA part should not be too small.. I wouldn't bother for 1K 4-input LUTs. 5K looks like reasonable minimum, at least for gray haired devs like you and me. Younger guys a spoiled, they'd want more than that. My current product is shipping in a 3 kLUT device. I could have shoved a lot more functionality in if I had used a soft core (of my own design, the canned ones are too large). >> Think of it as an MCU with >> FPGA fabric on chip just like the other umpty-nine peripherals they >> already have along with.... gasp!... 5 volt tolerance. lol >> > > Is 5-V tolerance really that useful [in new designs] without ability to actually drive 5V outputs? I suppose, even you don't expect the later in 2015 :-) There are any number of MCUs that still have 5 volt I/Os. A Cypress line that I was looking at can run with Vcc of 1.8-5 V. Clearly there is a need for such parts or they wouldn't keep designing them. The FPGA vendors ignore this segment because they have never wanted to go down the low price, high volume road in earnest. They would love to get some automotive products designed in and 5 volt I/Os are popular there I believe. I remember when the Xilinx folks were saying the next generation after Spartan 3 would not support 3.3 volt I/Os! But then they also told me that if I connected the FPGA to the load with a 1 inch trace I could blow up the Spartan I/Os if I didn't simulate it. Really? They tend to see the world through FPGA glasses as if they drove the market rather than the market driving their designs. -- Rick

On 27/03/2015 19:50, 1@FPGARelated wrote: > http://www.wsj.com/articles/intel-in-talks-to-buy-altera-1427485172 > --------------------------------------- > Posted through http://www.FPGARelated.com > For those that haven't seen it: http://www.deepchip.com/items/0548-02.html good article, Hans www.ht-lab.com

On Saturday, April 4, 2015 at 10:34:43 AM UTC+3, HT-Lab wrote: > On 27/03/2015 19:50, 1@FPGARelated wrote: > > http://www.wsj.com/articles/intel-in-talks-to-buy-altera-1427485172 > > --------------------------------------- > > Posted through http://www.FPGARelated.com > > >=20 > For those that haven't seen it: >=20 > http://www.deepchip.com/items/0548-02.html >=20 > good article, >=20 > Hans > www.ht-lab.com I am even more pessimistic than John Cooley. He lists "FPGA users" twice, both on the negative list (Instability in the = overall FPGA market (like when two biggest players are in chaos) means R&D = on the advanced FPGAs is cut; and the prices for current FPGAs go up. (It's= economics.)) and on the positive list ("No more Xilinx-Altera duopoly in F= PGA's!"). I personally don't see in which aspects lame duopoly can be better for me, = FPGA user, than functional duopoly. Yes, potentially some parts could becom= e slightly cheaper, but that's nothing relatively to impact of instability = on development process. Besides, it seems, Cooley underestimates ability of Intel to destroy good, = solid companies that they are acquiring.

Hi, Does each core of 8-core Intel processor has an independent floating X87 unit? Here are some texts from Intel latest datasheet: Intel(R) Core(tm) i7 Processor Family for LGA2011-v3 Socket Datasheet - Volume 1 of 2 Processor Feature Details *Up to 8 execution cores *Each core supports two threads (Intel(R) Hyper-Threading Technology) *32 KB instruction and 32 KB data first-level cache (L1) for each core *256 KB shared instruction/data mid-level (L2) cache for each core *Up to 15 MB last level cache (LLC): up to 2.5 MB per core instruction/data last level cache (LLC), shared among all cores. 5.2 X87 FPU INSTRUCTIONS The x87 FPU instructions are executed by the processor's x87 FPU. These instructions operate on floating-point, integer, and binary-coded decimal (BCD) operands. For more detail on x87 FPU instructions, see Chapter 8, "Programming with the x87 FPU." These instructions are divided into the following subgroups: data transfer, load constants, and FPU control instructions. From above text I have a feeling that all 8 execution cores share the same X87 FPU unit. Am I right or not? Is there anyone who has real experiences with X87 FPU unit? Thank you. Weng

Hello! I've using VHDL for like 2 years or even more but just today i wondeк how it works. Summation function form any package, std_arith for example operates on two arguments. But this one returns std_logic_vector as a result. So i have no ideas how it works when you are using something like this: signal result : std_logic_vector (15 downto 0); signal arg_1: std_logic_vector (15 downto 0); signal arg_2: std_logic_vector (15 downto 0); signal arg_3: std_logic_vector (15 downto 0); result <= signed(arg_1) + signed(arg_2) + signed(arg_3); Any of used summation returns std_logic_vector but then it needs to use one more summation but it's undefined for std_logic vector and signed. It would be great if someone would clarify. Thanks P.S. sorry for my English( --------------------------------------- Posted through http://www.FPGARelated.com

Adding two signed will produce a signed, not std_logic_vector. Kevin Jennings

Hello, I am trying to connect my IP to the microblaze by AXI streaming protocol. I have connected my IP to the microblaze using the AXI streaming link M0_AXIS in XPS. But it seems that the microblaze does not accept any data ie., the s_axis_tready never goes high. The below is the code for micro blaze. while(1){ print("waiting for a packet...n"); getfslx(temp,0,FSL_NONBLOCKING); temp2=temp; print(temp); print("getfsl passedn"); putfslx(temp2,0,FSL_NONBLOCKING); print("putfsl passedn"); print(temp2); } --------------------------------------- Posted through http://www.FPGARelated.com

So I just had a thought. Most synthesis tools (in VHDL, and I assume in Verilog) will allow you to use the division operator to perform truncating division by a constant in synthesizable code, so long as that constant is a power of 2. That seems like a reasonable restriction; that you can only divide when it's just a right shift, right up until you think a bit longer. Because I do synthesizable division by a constant all the time, actually, as multiplication by the reciprocal. So I wind up writing things like y := x * (2**17 / 3) / 2**17. It obscures the logic a bit, but works. But I was thinking, and not only does it obscure the logic, but it forces assumptions into my code about what the underlying multiplier block looks like. Why 2**17? Because I'm assuming a 18 bit signed multiplier, because that's what happens to be on some architecture (Altera Cyclone4 if I remember right). It seems trivial for the synthesizer to do that transformation, division by a constant => multiplication by the reciprocal, in a way that is optimized for the underlying hardware. Any non-braindamaged C compiler will do it without being asked. And maybe the synth tools do, it's just been forever since I've actually checked. Has anyone looked at this in a while? Are any of the synth tools smart enough to handle this on their own these days? -- Rob Gaddi, Highland Technology -- www.highlandtechnology.com Email address domain is currently out of order. See above to fix.

Am Donnerstag, 9. April 2015 18:58:41 UTC+2 schrieb Rob Gaddi: > That seems like a reasonable restriction; that you can only divide when= =20 > it's just a right shift, right up until you think a bit longer. Because= =20 > I do synthesizable division by a constant all the time, actually, as=20 > multiplication by the reciprocal. So I wind up writing things like >=20 > y :=3D x * (2**17 / 3) / 2**17. There are a few nontrivial problems when using division in digital logic th= at also apply for constant operation. Multiplicate with the reziprocal value is a valid function for real number = (mathematic term), but not with fixed point (unsigned with shifted decimal)= . If a number is 2^^n, its reciprog is well defined for fix point. For 3 yo= u will find no exact reciproc with fixpoint notation, regardless of the num= ber of digits you spend after fixpoint. If you use floating point your chances of correct calculation result are fa= r better, but you are not safe.=20 The term "(a-b)*c" is not always equivalent to "a*c - b*c" even in floating= point arithmetic with a and b beeing significant different in size. best regards Thomas

if you multiply by fraction e.g y = x*0.333... then you are just leading the compiler to avoid division. I don't use fractions but new libraries support them. Kaz --------------------------------------- Posted through http://www.FPGARelated.com

On Thu, 09 Apr 2015 16:57:43 +0000, Rob Gaddi wrote: > So I just had a thought. Most synthesis tools (in VHDL, and I assume in > Verilog) will allow you to use the division operator to perform > truncating division by a constant in synthesizable code, so long as that > constant is a power of 2. > > That seems like a reasonable restriction; that you can only divide when > it's just a right shift, right up until you think a bit longer. Because > I do synthesizable division by a constant all the time, actually, as > multiplication by the reciprocal. So I wind up writing things like > > y := x * (2**17 / 3) / 2**17. > > It obscures the logic a bit, but works. But I was thinking, and not > only does it obscure the logic, but it forces assumptions into my code > about what the underlying multiplier block looks like. Why 2**17? > Because I'm assuming a 18 bit signed multiplier, because that's what > happens to be on some architecture (Altera Cyclone4 if I remember > right). > > It seems trivial for the synthesizer to do that transformation, division > by a constant => multiplication by the reciprocal, in a way that is > optimized for the underlying hardware. Any non-braindamaged C compiler > will do it without being asked. And maybe the synth tools do, it's just > been forever since I've actually checked. > > Has anyone looked at this in a while? Are any of the synth tools smart > enough to handle this on their own these days? Hi Rob, I'm still not sure I'd trust a synthesiser to handle that sort of thing portably. I don't think I've ever actually used a multiplier or divider in a synthesisable design. There always seems to be some way to avoid them, even for DSP, usually by employing smarter design at the system level and careful selection of scaling factors, filter coefficients, etc. (I don't do the sort of filtering that needs extremely precise coefficients. YMMV.) When trying to avoid the use of the hard multipliers, I would consider employing tricks like Booth recoding: http://en.wikipedia.org/wiki/Booth%27s_multiplication_algorithm#How_it_works which can sometimes help with a fixed multiplicand that has a long string of 1s. I would also look for repeating patterns in a fixed multiplicand. Repeating patterns often arise when taking the reciprocal of a constant, e.g. 1/5 = (binary) 0.00110011001100110011001100 etc. This is equal to 11 * 10001 * 100000001 * ... (shifted right) and a small number of adders can produce this result to any desired precision. Both techniques ought to be amenable to automation in a synthesiser. But the synth tool I use doesn't even support VHDL 2008 yet (thanks Xilinx!) so I won't hold my breath waiting for comprehensive tool support for multiplication other than the basic/obvious use of the built-in hard blocks. Regards, Allan

I am getting unexpected SOF(AXI_OP_TVALID) signal down as shown in the figure (find the fig in attachments). I have taken example design as a reference. In the dsign, I fixed frame size (X"07"). But in Rx, SOF is getting down near EOF( as in fig).I want to receive complete frame data beat.In the waveform,SOF means VALID, EOF means TLAST.I am trying to get complete data from SOF to EOF as TX side.Can anybody give solution for the problem? Deatails Aurora IP 8B10B ,v8.3,Xilinx ISE 14.7 For image http://forums.xilinx.com/t5/New-Users-Forum/Aurora-8b10b-problem-with-M-AXI-RX-TVALID-SOF/td-p/588858 Regards, Raju --------------------------------------- Posted through http://www.FPGARelated.com

On Fri, 10 Apr 2015 04:26:26 -0700, Thomas Stanka wrote: > Multiplicate with the reziprocal value is a valid function for real > number (mathematic term), but not with fixed point (unsigned with > shifted decimal). If a number is 2^^n, its reciprog is well defined for > fix point. For 3 you will find no exact reciproc with fixpoint notation, > regardless of the number of digits you spend after fixpoint. > Yes and no. I believe one can prove (for values of one excluding me) that for a bounded integer numerator, you can always define a reciprocal multiply that will give the exact same result as "floor division" for all numerators in those bounds. Those differences you're talking about due to integers being "unreal" numbers are all pushed down into the remainder. And the same should therefore hold for fixed_point, since you're always looking for the quotient to be in some finite format past which you don't care about the errors. I did a quick Python script just to test with integers. It tests the dumb way, through complete exhaustion of the input set, but my arbitrary poking about sure implies that a) you can always find an answer and b) that answer will require no more than 1 bit more than the numerator. #!/usr/bin/env python3 """ Test the theory that, given a bounded numerator, there is a reciprocal multiply that will always give the same result as floor division. """ import numpy as np nbits = 22 divide_by = 65537 # Proof through exhaustion, create all possible numerators numerators = np.arange(2**nbits, dtype=int) quotients = numerators // divide_by class FoundAnswer(Exception): pass try: expected_dbits = nbits + divide_by.bit_length() for dbits in range(expected_dbits-1, expected_dbits+2): basic_recip = (2**dbits) // divide_by for recip in range(basic_recip, basic_recip + 2): approx_quotients = (numerators * recip) >> dbits if np.all(approx_quotients == quotients): print( 'For all {nbits} bit numerators N//{div} == N*{recip}>> {dbits}'.format( nbits = nbits, div = divide_by, recip=recip, dbits=dbits )) print('{recip} requires {b} bits.'.format(recip=recip, b=recip.bit_length())) raise FoundAnswer() except FoundAnswer: pass -- Rob Gaddi, Highland Technology -- www.highlandtechnology.com Email address domain is currently out of order. See above to fix.

Hi I just wanted to know if people use systemc in FPGA flow. systemc can be used for cycle accurate simulation, where it can replace RTL. In this mode test-benches will usually take advantage of c++ and SCV (for writing constraints). For big designs where RTL completion takes a lot of time systemc can be used for LT or AT simulations ( Loosely Timed, Approximately Timed TLM). Pini --------------------------------------- Posted through http://www.FPGARelated.com

On Sat, 11 Apr 2015 01:16:37 -0500, pini_kr wrote: > Hi > > I just wanted to know if people use systemc in FPGA flow. systemc can be > used for cycle accurate simulation, where it can replace RTL. In this > mode test-benches will usually take advantage of c++ and SCV (for > writing constraints). > For big designs where RTL completion takes a lot of time systemc can be > used for LT or AT simulations ( Loosely Timed, Approximately Timed TLM). Are you asking a question or pushing an advertisement? I doubt there are many on the group who don't know what SystemC is. Proper capitalization may help -- "systemc" looks like a misspelling of "systemic". SystemC looks like -- well, SystemC. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com

On 12/04/15 07:33, Tim Wescott wrote: > On Sat, 11 Apr 2015 01:16:37 -0500, pini_kr wrote: > >> Hi >> >> I just wanted to know if people use systemc in FPGA flow. systemc can be >> used for cycle accurate simulation, where it can replace RTL. In this >> mode test-benches will usually take advantage of c++ and SCV (for >> writing constraints). >> For big designs where RTL completion takes a lot of time systemc can be >> used for LT or AT simulations ( Loosely Timed, Approximately Timed TLM). > > Are you asking a question or pushing an advertisement? I doubt there are > many on the group who don't know what SystemC is. > > Proper capitalization may help -- "systemc" looks like a misspelling of > "systemic". SystemC looks like -- well, SystemC. > The OP might be interested in this: http://www.testandverification.com/conferences/verification-futures/2015-europe/speaker-andy-lunness-bluwireless-technology/ It's a verification paper, but the design flow was also in SystemC, regards Alan -- Alan Fitch

On Saturday, April 11, 2015 at 2:16:41 PM UTC+8, pini_kr wrote: > Hi >=20 > I just wanted to know if people use systemc in FPGA flow. systemc can be > used for cycle accurate simulation, where it can replace RTL. In this > mode test-benches will usually take advantage of c++ and SCV (for > writing constraints). > For big designs where RTL completion takes a lot of time systemc can be > used for LT or AT simulations ( Loosely Timed, Approximately Timed=20 > TLM). >=20 > Pini > --------------------------------------- > Posted through http://www.FPGARelated.com I am wondering what has SystemC got to do with FPGA design flow. If you are= asking about support for synthesizable subset of SystemC , Xilinx and Alte= ra do not support it in their FPGA flows (Quartus, ISE and Vivado). But it = is supported in Vivado HLS. If you use HLS for FPGA designs, then yes, you = can use SystemC directly. Of course, even outside of HLS, one can use Syste= mC...depends on how many different models of our design we want to make.

Rob Gaddi <rgaddi@technologyhighland.invalid> wrote: > On Fri, 10 Apr 2015 04:26:26 -0700, Thomas Stanka wrote: >> Multiplicate with the reziprocal value is a valid function for real >> number (mathematic term), but not with fixed point (unsigned with >> shifted decimal). (snip) > Yes and no. I believe one can prove (for values of one excluding me) > that for a bounded integer numerator, you can always define a reciprocal > multiply that will give the exact same result as "floor division" for all > numerators in those bounds. Those differences you're talking about due > to integers being "unreal" numbers are all pushed down into the > remainder. And the same should therefore hold for fixed_point, since > you're always looking for the quotient to be in some finite format past > which you don't care about the errors. With the multiply instruction on many processors, that generates a double length signed product, I believe that for many constants, maybe half of them, there is a multiplier that will generate the appropriate truncated quotient in the high half of the product. But in the case the OP asked, it isn't so obvious that it should do that. Another choice would be a primitive that would generate the appropriate multiplier. Often when you want a divider, you want it pipelined, which is unlikely to be synthesized. -- glen

On Thu, 16 Apr 2015 23:01:16 +0000, glen herrmannsfeldt wrote: > Rob Gaddi <rgaddi@technologyhighland.invalid> wrote: >> On Fri, 10 Apr 2015 04:26:26 -0700, Thomas Stanka wrote: > >>> Multiplicate with the reziprocal value is a valid function for real >>> number (mathematic term), but not with fixed point (unsigned with >>> shifted decimal). > > (snip) >> Yes and no. I believe one can prove (for values of one excluding me) >> that for a bounded integer numerator, you can always define a >> reciprocal multiply that will give the exact same result as "floor >> division" for all numerators in those bounds. Those differences you're >> talking about due to integers being "unreal" numbers are all pushed >> down into the remainder. And the same should therefore hold for >> fixed_point, since you're always looking for the quotient to be in some >> finite format past which you don't care about the errors. > > With the multiply instruction on many processors, that generates a > double length signed product, I believe that for many constants, > maybe half of them, there is a multiplier that will generate the > appropriate truncated quotient in the high half of the product. > Right, and I've never seen a multiplier block in an FPGA that doesn't do the same. For a while they were 18x18=>36, then I started hitting 18*25=>43, but regardless, the hard multiplier blocks always generate P'length = A'length + B'length. > But in the case the OP asked, it isn't so obvious that it should do > that. > > Another choice would be a primitive that would generate the appropriate > multiplier. Ugh, but then you'd have to instantiate it as a separate block and wire it in. That's even uglier than having to put the bit-shifting logic in manually. > > Often when you want a divider, you want it pipelined, which is unlikely > to be synthesized. > Not really. Often when I want a divider I have to pipeline it because the division algorithm is inherently serial. But I don't _want_ it to be pipelined, that's just the only choice I've got for a true X/Y divide. But for X/K with constant K, it can (in every case I've seen) be implemented with a multiplier block, or simply by wire if K is a power of 2. That gets us done in a single cycle. Sure that multiplier block may blow up into horrible cross-multiplies spanning multiple blocks if I ask for a stupidly large K. But the same can be said for X*K, and the tool lets me request that just fine, and if I ask for a stupid K I get a mess of logic that either a) only meets timing with a slow clock or b) requires me to make a few stages of register pushback available. -- Rob Gaddi, Highland Technology -- www.highlandtechnology.com Email address domain is currently out of order. See above to fix.

On 4/16/2015 7:23 PM, Rob Gaddi wrote: > On Thu, 16 Apr 2015 23:01:16 +0000, glen herrmannsfeldt wrote: > >> Rob Gaddi <rgaddi@technologyhighland.invalid> wrote: >>> On Fri, 10 Apr 2015 04:26:26 -0700, Thomas Stanka wrote: >> >>>> Multiplicate with the reziprocal value is a valid function for real >>>> number (mathematic term), but not with fixed point (unsigned with >>>> shifted decimal). >> >> (snip) >>> Yes and no. I believe one can prove (for values of one excluding me) >>> that for a bounded integer numerator, you can always define a >>> reciprocal multiply that will give the exact same result as "floor >>> division" for all numerators in those bounds. Those differences you're >>> talking about due to integers being "unreal" numbers are all pushed >>> down into the remainder. And the same should therefore hold for >>> fixed_point, since you're always looking for the quotient to be in some >>> finite format past which you don't care about the errors. >> >> With the multiply instruction on many processors, that generates a >> double length signed product, I believe that for many constants, >> maybe half of them, there is a multiplier that will generate the >> appropriate truncated quotient in the high half of the product. >> > > Right, and I've never seen a multiplier block in an FPGA that doesn't do > the same. For a while they were 18x18=>36, then I started hitting > 18*25=>43, but regardless, the hard multiplier blocks always generate > P'length = A'length + B'length. > >> But in the case the OP asked, it isn't so obvious that it should do >> that. >> >> Another choice would be a primitive that would generate the appropriate >> multiplier. > > Ugh, but then you'd have to instantiate it as a separate block and wire > it in. That's even uglier than having to put the bit-shifting logic in > manually. > >> >> Often when you want a divider, you want it pipelined, which is unlikely >> to be synthesized. >> > > Not really. Often when I want a divider I have to pipeline it because > the division algorithm is inherently serial. But I don't _want_ it to be > pipelined, that's just the only choice I've got for a true X/Y divide. Maybe I am missing something, but it doesn't need to be pipelined. Just take out the registers and it's no longer pipelined. > But for X/K with constant K, it can (in every case I've seen) be > implemented with a multiplier block, or simply by wire if K is a power of > 2. That gets us done in a single cycle. > > Sure that multiplier block may blow up into horrible cross-multiplies > spanning multiple blocks if I ask for a stupidly large K. But the same > can be said for X*K, and the tool lets me request that just fine, and if > I ask for a stupid K I get a mess of logic that either a) only meets > timing with a slow clock or b) requires me to make a few stages of > register pushback available. I don't think a large K will create any problems that require more multiplier blocks. The resolution required only depends on... the resolution required. If you are working with truncated integers it doesn't matter if the divisor is large. That just means you get smaller results, not more math to do. Or are you thinking you can save logic by using small values of K which can reduce the logic required? If using a block multiplier you can't use less than one... or can you? Hmmm... -- Rick

On Thu, 16 Apr 2015 20:00:45 -0400, rickman wrote: > On 4/16/2015 7:23 PM, Rob Gaddi wrote: > >> But for X/K with constant K, it can (in every case I've seen) be >> implemented with a multiplier block, or simply by wire if K is a power >> of 2. That gets us done in a single cycle. >> >> Sure that multiplier block may blow up into horrible cross-multiplies >> spanning multiple blocks if I ask for a stupidly large K. But the same >> can be said for X*K, and the tool lets me request that just fine, and >> if I ask for a stupid K I get a mess of logic that either a) only meets >> timing with a slow clock or b) requires me to make a few stages of >> register pushback available. > > I don't think a large K will create any problems that require more > multiplier blocks. The resolution required only depends on... the > resolution required. If you are working with truncated integers it > doesn't matter if the divisor is large. That just means you get smaller > results, not more math to do. Or are you thinking you can save logic by > using small values of K which can reduce the logic required? If using a > block multiplier you can't use less than one... or can you? Hmmm... There are probably some trivial cases where the divide by N reduces to a multiply by something silly like 3 followed by a bit-shift that might get implemented on fabric, but I'd tend to assume that any time I did a divide, like anytime I did a multiply, I'm likely to commit a multiplier to it. If I get lucky, all the better. If it takes a lot of bits to accurately represent K, it takes a lot of bits to accurately represent 1/K, subject to many of the same caveats about factoring out powers of 2. Likewise, if I tell the tools that I want to use a 32-bit numerator, that'll take cross-multiplies too. But all that already gets handled correctly in the multiplication case. -- Rob Gaddi, Highland Technology -- www.highlandtechnology.com Email address domain is currently out of order. See above to fix.

Rob Gaddi <rgaddi@technologyhighland.invalid> wrote: (snip, I wrote) >> With the multiply instruction on many processors, that generates a >> double length signed product, I believe that for many constants, >> maybe half of them, there is a multiplier that will generate the >> appropriate truncated quotient in the high half of the product. > Right, and I've never seen a multiplier block in an FPGA that > doesn't do the same. For a while they were 18x18=>36, then > I started hitting 18*25=>43, but regardless, the hard > multiplier blocks always generate > P'length = A'length + B'length. >> But in the case the OP asked, it isn't so obvious that it should do >> that. >> Another choice would be a primitive that would generate the appropriate >> multiplier. > Ugh, but then you'd have to instantiate it as a separate block and wire > it in. That's even uglier than having to put the bit-shifting logic in > manually. Yes it is ugh, but you know that you are asking for one. >> Often when you want a divider, you want it pipelined, >> which is unlikely to be synthesized. > Not really. Often when I want a divider I have to pipeline it because > the division algorithm is inherently serial. But I don't _want_ it to be > pipelined, that's just the only choice I've got for a true X/Y divide. Well, I usually go the FPGA route when I want something done fast, which means pipelined. Maybe not everyone does that. > But for X/K with constant K, it can (in every case I've seen) be > implemented with a multiplier block, or simply by wire if K is > a power of 2. That gets us done in a single cycle. > Sure that multiplier block may blow up into horrible cross-multiplies > spanning multiple blocks if I ask for a stupidly large K. But the same > can be said for X*K, and the tool lets me request that just fine, and if > I ask for a stupid K I get a mess of logic that either a) only meets > timing with a slow clock or b) requires me to make a few stages of > register pushback available. For software, you usually have (N)*(N)=(2N) and (2N)/(N)=(N) In the FPGA case, even though the hardware is 18 bits, you can choose any number of bits for your actual values. I am pretty sure that if you have one more bit in the constant than you need in the quotient, that it is enough. I am not sure that it always is when you have the same number of bits. That is, I am not sure that you can generate a 32 bit signed quotient as the high half of a 32 bit multiply for all possible 32 bit signed divisors. -- glen

Hi Rob and Glen have you used kdiv, my constant division routine generator? It produces low-level ("assembly") and C implementations for constant division. http://github.com/nkkav/kdiv Many people are happy with it; it is based on Warren's Hackers' Delight. Best regards, Nikolaos Kavvadias http://www.nkavvadias.com > > (snip, I wrote) > >> With the multiply instruction on many processors, that generates a > >> double length signed product, I believe that for many constants, > >> maybe half of them, there is a multiplier that will generate the > >> appropriate truncated quotient in the high half of the product. > > > Right, and I've never seen a multiplier block in an FPGA that > > doesn't do the same. For a while they were 18x18=>36, then > > I started hitting 18*25=>43, but regardless, the hard > > multiplier blocks always generate > > P'length = A'length + B'length. > > >> But in the case the OP asked, it isn't so obvious that it should do > >> that. > > >> Another choice would be a primitive that would generate the appropriate > >> multiplier. > > > Ugh, but then you'd have to instantiate it as a separate block and wire > > it in. That's even uglier than having to put the bit-shifting logic in > > manually. > > Yes it is ugh, but you know that you are asking for one. > > >> Often when you want a divider, you want it pipelined, > >> which is unlikely to be synthesized. > > > Not really. Often when I want a divider I have to pipeline it because > > the division algorithm is inherently serial. But I don't _want_ it to be > > pipelined, that's just the only choice I've got for a true X/Y divide. > > Well, I usually go the FPGA route when I want something done fast, > which means pipelined. Maybe not everyone does that. > > > But for X/K with constant K, it can (in every case I've seen) be > > implemented with a multiplier block, or simply by wire if K is > > a power of 2. That gets us done in a single cycle. > > > Sure that multiplier block may blow up into horrible cross-multiplies > > spanning multiple blocks if I ask for a stupidly large K. But the same > > can be said for X*K, and the tool lets me request that just fine, and if > > I ask for a stupid K I get a mess of logic that either a) only meets > > timing with a slow clock or b) requires me to make a few stages of > > register pushback available. > > For software, you usually have (N)*(N)=(2N) and (2N)/(N)=(N) > > In the FPGA case, even though the hardware is 18 bits, you can > choose any number of bits for your actual values. > > I am pretty sure that if you have one more bit in the constant > than you need in the quotient, that it is enough. I am not sure > that it always is when you have the same number of bits. > > That is, I am not sure that you can generate a 32 bit signed > quotient as the high half of a 32 bit multiply for all possible 32 > bit signed divisors. > > -- glen

Hello! I have several years of experience in programming, and I'd like to move on to FPGAs to enjoy more fun. As I have a limited budget for my playing with electronics, I'd like to choose the most versatile board for the best price with a decent support from manufacturer. I'm a student, so I guess the academic prices apply for me. I tried to do my own research on google. What I wanted to have on my board was: - VGA/HDMI port - SD card slot - some memory - PS/2 keyboard - USB and Enthernet, although I have almost no idea about how these two work I found these boards: > Basys™2 - Xilinx Spartan-3E, 8-bit VGA, PS/2 - 69$ http://www.digilentinc.com/Products/Detail.cfm?Nav... > Basys™3 - Xilinx Artix-7, 12-bit VGA, USB host for kb/mice, flash - 79$ http://www.digilentinc.com/Products/Detail.cfm?Nav... > miniSpartan6+ - Spartan 6 LX 9, HDMI, serial flash, microSD - 75$ http://www.scarabhardware.com/product/minisp6/ > ZYBO Zynq™-7000 - Xilinx Z-7010, Cortex-A9, flash, memory, SD, USB, gigabit Ethernet, HDMI, 16-bit VGA - 125$ http://www.digilentinc.com/Products/Detail.cfm?Nav... > Altera DE0 Board - Altera Cyclone III 3C16, 4-BIT VGA, SD, serial port, PS/2, flash - 81$ http://www.terasic.com.tw/cgi-bin/page/archive.pl?... > Altera DE0-CV Board - Altera Cyclone V 5CEBA4F23C7N, 4-bit VGA, microSD, PS/2 - 99$ > Altera DE1 Board - Altera Cyclone II 2C20, 4-bit R-2R per channel VGA, PS/2, SD, flash - 127$ here's where I can't decide. Again, cost is important for me, but I also know that Digilent and Terasic are Some Names. What would you choose? Do you have any of your own recommendations? Please help, I'm honestly an absolute nooob here. --------------------------------------- Posted through http://www.FPGARelated.com

On Monday, April 20, 2015 at 7:01:43 AM UTC-5, FrewCen wrote: > Hello! > > I have several years of experience in programming, and I'd like to move > on to FPGAs to enjoy more fun. > > As I have a limited budget for my playing with electronics, I'd like to > choose the most versatile board for the best price with a decent support > from manufacturer. I'm a student, so I guess the academic prices apply > for me. > > I tried to do my own research on google. What I wanted to have on my > board was: > - VGA/HDMI port > - SD card slot > - some memory > - PS/2 keyboard > - USB and Enthernet, although I have almost no idea about how these two > work > > > I found these boards: > > > Basys(tm)2 - Xilinx Spartan-3E, 8-bit VGA, PS/2 - 69$ > http://www.digilentinc.com/Products/Detail.cfm?Nav... > > Basys(tm)3 - Xilinx Artix-7, 12-bit VGA, USB host for kb/mice, flash - > 79$ > http://www.digilentinc.com/Products/Detail.cfm?Nav... > > miniSpartan6+ - Spartan 6 LX 9, HDMI, serial flash, microSD - 75$ > http://www.scarabhardware.com/product/minisp6/ > > ZYBO Zynq(tm)-7000 - Xilinx Z-7010, Cortex-A9, flash, memory, SD, USB, > gigabit > Ethernet, HDMI, 16-bit VGA - 125$ > http://www.digilentinc.com/Products/Detail.cfm?Nav... > > Altera DE0 Board - Altera Cyclone III 3C16, 4-BIT VGA, SD, serial port, > PS/2, > flash - 81$ > http://www.terasic.com.tw/cgi-bin/page/archive.pl?... > > Altera DE0-CV Board - Altera Cyclone V 5CEBA4F23C7N, 4-bit VGA, microSD, > PS/2 - > 99$ > > Altera DE1 Board - Altera Cyclone II 2C20, 4-bit R-2R per channel VGA, > PS/2, SD, > flash - 127$ > > here's where I can't decide. Again, cost is important for me, but I also > know that Digilent and Terasic are Some Names. > > What would you choose? Do you have any of your own recommendations? > Please help, I'm honestly an absolute nooob here. > > > --------------------------------------- > Posted through http://www.FPGARelated.com I'd stick with the newer FPGAs. To make your learning as relevant as possible. Another approach is to pay to go to a seminar where you get to keep the FPGA board. Cyclone V SOC or SmartFusion2 for $99 each.

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive

Compare FPGA features and resources

Threads starting:

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search