When tasked with the implementation of a rather complex function, e.g. a polynomial of higher order, the resource utilization quickly shoots through the roof if implemented straight forward (also called the naïve implementation).
To avoid this it is often easier, simpler and faster to use a lookup table (LUT) solution.

Instead of doing a lot of calculations and mathematics, the results for a given function argument is just read from a read-only memory (ROM) which contains precalculated results.
Oftentimes a LUT/ROM based implementation can be used in place of a “proper” implementation during early prototyping. In a later stage of the project the LUT/ROM can be replaced with an optimized implementation.

Arbitrary function realized using a LUT/ROM

Most people will encounter a ROM based lookup table solution when dealing with sine and cosine functions. The technique is then often called direct digital synthesis (DDS), because a waveform is generated by digital logic instead of analog circuitry, as it was done in ancient times.
Since this is a more or less common task I wrote an Octave script that takes a function as input and generates a memory initialization file for a LUT/ROM solution. The parameter range of interest can be specified and the width and depth of the LUT/ROM can be defined.
Periodic functions will likely result in problems if not handled carefully. E.g. a lot of people will specify the parameter range for a sine LUT/ROM go from 0 to 2*Pi. However, since the boundaries of the interval are always part of the LUT/ROM the start/end value of the period will appear twice, once at the highest LUT/ROM address and once at the lowest LUT/ROM address (because sin(0) = sin(2*Pi) = 0). This gotcha does not hold for non-periodic functions. It can also easily be fixed.

The quality of the result this technique yields depends on both the LUT/ROM’s memory depth and the LUT/ROM’s word width. The former defines the number of available sampling points and thus the quantization of the function parameter(s). The latter defines the quantization of the function result, which means how close a single LUT/ROM data value is to the exact result of the function. Both kinds of quantization contribute to the error of the LUT/ROM implementation, i.e. the deviation of the LUT/ROMs result from the precise function result value.

The total size of the LUT/ROM is memory depth * word width bits.
Another performance characteristic is the maximum operating frequency under which an implementation can run. On FPGAs the maximum operating frequency will depend on the number of BRAMs required to realize the LUT ROM. If only one BRAM is used there is not need for additional routing or coupling glue logic and thus the operating frequency will be maximized.
Using more than one BRAM requires “chaining”/”coupling” of BRAMs and thus will reduce the maximum operating frequency due to additional routing and/or logic delays.

LUT/ROM composed of multiple BRAMs