Search This Blog

Monday, 3 November 2014

Hashing

A hash is a numeric value generated by applying a math-ematical formula to the numeric values of the characters in a string of text (see characters and strings). The for-mula is chosen so that the values it produces are always the same length (regardless of the length of the original text) and are very likely to be unique. (Two different strings should not produce the same hash value. Such an event is called a collision.)


Applications

The two major application areas for hashing are informa-tion retrieval and cryptographic certification. In databases, an index table can be built that contains the hash values for the key fields and the corresponding record number for each field, with the entries in hash value order. To search the database, an input key is hashed and the value is compared with the index table (which can be done using a very fast binary search). If the hash value is found, the corresponding record number is used to look up the record. This tends to be much faster than searching an index file directly.

Alternatively, a “coarser” but faster hashing function can be used that will give the same hash value to small groups (called bins) of similar records. In this case the hash from the search key is matched to a bin and then the records within the bin are searched for an exact match.

In cryptography an encrypted message can be hashed, producing a unique fixed-length value. (The fixed length prevents attackers from using mathematical relationships that might be discoverable from the field lengths.) The hashed message can then be encrypted again to create an electronic signature (see certificate, digital). For long messages this is more efficient than having to apply the sig-nature function to each block of the encrypted message, yet the unique relationship between the original message and the hash maintains a high degree of security. Finally, hashing can be used for error detection. If a message and its hash are sent together, the recipient can hash the received text. If the hash value generated matches the one received, it is highly likely the message was received intact (see also error correction).

of cells to be “owned” by that variable. (While some languages such as Pascal and C use explicit memory allocation or deallocation functions, other languages such as LISP use a separate runtime module that is not the responsibility of the programmer.) Deallocation  (the  freeing  up  of  memory  no  longer needed by a variable so it can be used elsewhere) is more complicated. In many languages several different pointers
can be used to refer to the same memory location. It is therefore necessary not only to disconnect a given pointer from the cell, but to track the total number of pointers connected to the cell so that the cell itself is deallocated only when the last pointer to it has been disconnected. One way to accomplish this is by setting up an internal variable called a reference counter and incrementing or decrementing it as pointers are connected or disconnected. The disadvantages of this approach include the memory overhead needed to store the counters and the execution overhead of having to continually check and update the counters.
An alternative approach is garbage collection. Here the runtime system simply connects or disconnects pointers as required by the program’s declarations, without making an attempt to reclaim the disconnected (“dead”) cells. If and when the supply of free cells is exhausted, the runtime system takes over and begins a three-stage process. First, it provisionally sets the status indicator bit for each cell to show that it is “garbage.” Each pointer in the program is then traced (that is, its links are followed) into the heap, and if a valid cell is found that cell’s indicator is reset to “not garbage.” Finally, the garbage cells that remain are linked back to the pool of free cells available for future allocation. The chief drawback of garbage collection is that the more cells actually being used by the program, the longer the garbage-collecting process will take (since all of these cells have to be traced and verified). Yet it is precisely when most cells are in use that garbage collection is most likely to be required. The need for garbage collection has diminished in many
programming environments because modern computers not only have large amounts of memory, most operating systems also implement virtual memory, which allows a disk or other storage device to be treated as an extension of main memory. Note: the term heap is also used to describe a particular type of binary tree. (See tree.)




No comments:

Post a Comment