Barcode 101: Guide to Barcode Symbologies
A barcode symbol is a machine readable image which conveys data. Barcodes can be divided into three general types: linear, stacked linear, and two-dimensional (or 2D): Linear Barcodes
The UPC-A (also referred to simply as the UPC) is the standard retail “price code” barcode in the United States.
UPC-A is strictly numeric; the bars can only represent the digits from 0 to 9. A UPC-A barcode contains 12 digits, along with a quiet (blank) zone on either side, and start, middle, and stop symbols.
The middle symbol separates the left side and the right side, which are coded differently. When a digit is used on the left side, the bars are black and the spaces are white, and when it is used on the right side, the colors are reversed. The logic behind doing this is a little complicated, and involves a mathematical property called “parity,” but the effect is to reverse black and white, and to allow the scanner to tell whether it’s reading the code from left to right or from right to left.
The actual system of numbering depends on the type of product and the purpose of the barcode; the first digit of the barcode indicates the numbering system. The 10 digits that follow contain information about the product, and in all of the applications described below, the digit on the far right (not included in the application description) is a checksum, which can be used to test the accuracy of the scanner reading.
Below is a list of common UPC-A applications:
- Typical retail products: Indicated by a first digit of 0, 1, 6, 7, or 8. The five remaining digits on the left side of the barcode are the identify the manufacturer. The five digits on the right are the product code (determined by the manufacturer).
- Pharmaceuticals: For drugs and some other pharmaceutical products. The 10 digits that follow form the NDC (National Drug Code) number, which identifies the manufacturer (or distributor or packager), the product (along with information such as dose, strength, and formula), and the size and other characteristics of the package.
- Weight-based pricing: When products such as produce and meat are packaged and sold by weight, the UPC label begins with the number system code 2. The five digits that follow it on the left identify the item, and the ones that follow are for the product weight or the price.
- Coupons: The numbering system digits 5 and 9 are for coupons. As with retail products, the five left-hand digits are the manufacturer identification. The five digits on the right tell what products the coupon applies to and the discount.
- Reserved: Number system 4 is reserved for use by individual retailers or wholesalers, and is often used for store coupons, customer loyalty cards, and similar items.
The UPC-E barcode can be used when available space is too small for a UPC-A barcode. it contains the same information as a UPC-A label, but it uses some tricks to reduce the number of digits to six.
he UPC-E code’s most basic trick is to remove trailing zeroes in the manufacturer’s code, and leading zeroes in the product code. The details of the technique are complicated, and it doesn’t work for everything, but it does cover all codes with a total of 5 leading/trailing zeroes, as well as a significant number of codes with four zeroes.
UPC-E uses a much more complex trick to compress the checksum and the number system code. A side-effect of this technique is that the only numbering system codes allowed are 0 and 1.
The EAN-13 code is basically an international version of UPC-A. EAN-13 adds a 13th digit on the far left side of the UPC-A code (so that it becomes the first digit). The EAN-13 standard includes UPC-A barcodes; adding a leading 0 to a UPC-A code turns it into the equivalent EAN-13 code.
The main differences between EAN-13 and UPC-A (besides the extra leading digit) are that with EAN-13, the manufacturer and product codes can vary in length, and that the first three digits make up the GS1 prefix, or “country code.”
The GS1 prefix is issued by GS1, the international barcode standards organization. It may identify the national GS1 member organization or a special use. The member organizations issue the manufacturer’s codes, and the manufacturers set their own product codes. The complete EAN-13 barcode number, consisting of the GS1 prefix, the manufacturer’s code, the product code, and the checksum digit, is also known as the GTIN, or Global Trade Item Number. Besides the national GS1 prefixes, typically used for standard retail items, there are prefixes for specialized purposes, such as coupons, refunds, serial publications (magazines and newspapers), books (ISBN), and sheet music (ISMN).
In the United States, price code scanners and point-of-sale/inventory systems are typically capable of reading both UPC-A and EAN-13 barcodes.
EAN-8 is a GS1 barcode for use on small items when a full EAN-13 barcode label would be too large to fit. It consists of eight digits — four on the left side and four on the right. They use the same kind of encoding as UPC-A and EAN-13, with the last digit being used as a checksum.
An EAN-8 barcode can be used either with GTIN-8 or RCN-8 product identification numbers.
GTIN-8 is like a shortened version of the EAN-13 code, but without information about the product’s origin. In order to use a GTIN-8 number, a manufacturer must request it from the national member organization. An EAN-8 barcode that encodes a GTIN-8 identification number is valid for global use, like an EAN-13 barcode.
RCN-8 numbers, on the other hand, are for use only on house-brand or store-label products, and can be used only within the business that issues it. If it is scanned by another retailer, it will give an incorrect reading.
The UTF and EAN “price code” barcodes described above encode only numbers, but Code 128 is a linear barcode that encodes both letters of the alphabet and numbers, making it useful for a variety of purposes beyond basic pricing and inventory.
Code 128 encodes the 128-character ASCII set, which includes all of the alphabetic, numeric, punctuation, and arithmetic characters found on an English-language computer keyboard, plus several non-visible control characters.
In order to include all of the ASCII characters, Code 128 uses three different character sets:
- The Code A set consists of capital letters, numbers, punctuation, and nonprinting control characters
- Code B is similar to Code A, but replaces most of the control characters with the full set of lower-case letters, plus some added punctuation
- Code C does not include any letters or punctuation; it is made up of number pairs, from 00 to 99. This saves space when encoding numbers.
A single Code 128 barcode can include characters from all three character sets, switching between them repeatedly.
The basic Code 128 barcode format consists of a start code (which sets the initial character set to A, B, or C), the code data, a checksum digit, and a stop code, which marks the end of the barcode. As with other linear barcodes, there are blank quiet zones on either side.
GS1-128 (also known as UCC-128 and EAN-128) is an international standard for using Code 128 in supply-chain barcode labels. GS1-128 consists of the basic Code 128 format with an Application Identifier added to the code data.
Application identifiers are 2 to 4 characters in length, and identify the type of data that will follow — typically, standard supply-chain applications, such as serial number, number of containers, lot number, weight, volume, etc., including tracing and transaction information. Each identifier sets the length and format of the data that follows it.
Because most application code data is fixed-length, it is possible to include several codes in one GS1-128 barcode, simply by adding new Application Identifiers and code data.
The Code 39 symbology is also alphanumeric and variable-length. It was developed in 1974, and is still in relatively wide use; most barcode readers can read Code 39. In Code 39, each character is made up of five bars and four spaces, with three of those bars/spaces being wide, and the others narrow. As a result, all characters have the same width, and a Code 39 barcode generally takes up more space than the equivalent Code 128 barcode.
The basic Code 39 system is made up of 43 characters, including capital letters, numbers, and some special/punctuation characters. Depending on the application and the system, it may be possible to use all 128 ASCII characters.
A Code 39 barcode consists of a start character, the coded data, and a stop character. Both the start and stop characters are identical, and are generally represented by the * asterisk symbol. there is no checksum character, but some error-checking capabilities are built into the coding system.
Code 39 is used for many of the same types of applications as Code 128, and official Code 39 standards (including an ANSI standard) exist. It is not, however, included in the GS1 system.
Interleaved 2 of 5
Interleaved 2 of 5 (or ITF) is a variable-length numbers-only linear barcode. It encodes digits in pairs, with the first digit in each pair represented by bars, and the second digit represented by spaces, so that they are interleaved. Two of the five bars or spaces representing each digit are wide, and the others are narrow.
Interleaved 2 of 5 is included in the GS1 system as the ITF-14 standard, which has a set length of 14 digits.
An ITF barcode consists of a start code (two narrow bar/narrow space pairs), the encoded data, a checksum digit (required for ITF-14, optional elsewhere), and a stop code (wide bar, narrow space, narrow bar), with quiet zones on either side.
Patterns identical to the start and stop code can occur within the coded data, which can result in a bad reading if the scanner does not read the code all the way across. To prevent this, the ITF-14 standard requires a heavy black border called the bearer bar.
ITF barcodes are typically used in wholesale and shipping for box or carton lots of a product. A specialized version of the ITF barcode is also used on 135 film canisters.
Codabar was originally developed by Pitney Bowes in 1972. It is a variable-length barcode that uses a small set of bars to encode the digits 0 through 9, and in some applications, a few symbols such as the dollar and plus signs. it also includes four start/stop symbols (generally represented by A, B, C, and D). A Codabar code consists of a start symbol, the coded data, and a stop symbol. it is self-checking, although some applications do specify a check digit.
Codabar has traditionally been used by libraries, by blood banks, and for airbills by some companies such as Federal Express, and is still in use for some of those applications.
Pharmacode is designed for packaging control and security in the pharmaceutical industry.
APharmacode barcode consists of two widths of bars only, with a length of up to 12 bars. The data is a single integer (in the range 3 to 131070) encoded as a binary number. Pharmacode barcodes can use multiple colors as an added check for packaging accuracy.
Individual pharmaceutical companies generate their own Pharmacode barcodes. They are used on the production line, where they are automatically scanned on inserts and other items being placed in the package, in order to detect mismatches.
Databar is a GS1 family of barcode standards generally intended for reduced space applications. They encode GTIN-12 (UPC-A) and GTIN-13 (EAN-13) data in a 14-digit format (with added leading zeroes). Linear barcodes in the Databar family include the Omnidirectional and Expanded codes, which can be scanned omnidirectionally, and the Truncated and Limited codes, which are designed to be read by handheld scanners only.
Omnidirectional and Expanded Databar codes are used in point-of-sale applications, like UPC-A and EAN-13. Expanded codes can include additional information such as weight and expiration date, designated using Application Identifiers in the manner of GS1-128 barcodes.
Truncated and Limited Databar barcodes are generally used in the health care industry for small-item identification.
Postnet is the barcode system which has been in use by the United States Postal Service for routing mail; it is being phased out in favor of the Intelligent Mail System, described below. Postnet codes use variable-height bars to represent digits.
A Postnet barcode typically consists of the ZIP, ZIP+4, and delivery point codes, with each digit represented by five bars, two of which are full height, and the rest half-height.
Postal (Intelligent Mail Barcode)
The Intelligent Mail Barcode system is replacing the Postnet system for routing mail by the USPS. It is a 65-bar variable-height code with four types of bar.
And IM barcode consists of the following components:
- A barcode identifier containing presort information.
- A service type identifier indicating the mail class and any requested services.
- The mailer ID, representing the organization sending the item.
- The sequence number, representing the recipient.
- The delivery point ZIP code, containing the information in a Postnet barcode, described above.
Stacked barcodes are linear barcodes which are divided into segments and placed one above the other
The stacked versions of the GS1 Databar codes use the same basic encoding as the linear Databar codes, described above, and are used in similar applications. They are particularly useful for limited-space items with labels which have very narrow linear dimensions.
The GS1 Expanded Stacked Databar can stack a series of barcodes containing product data in addition to the basic point-of-sale EAN-13 price code.
PDF417 is a variable-height, variable-width stacked barcode made up of rows of short bars and spaces. It can have as few as 3 rows, or as many as 90. All rows must contain the same number of data codewords, but that number can vary from 1 to 30.
The actual method of encoding is based on a complex system which uses about 900 codewords to represent data in different formats. This allows PDF417 to encode text, digital data (in bytes), and large numbers within the same barcode.
Each row of a PDF417 barcode consists of a start pattern, the left-had codeword (identifying the row, among other things), the data codewords, the right-hand codeword, and the stop pattern. Unlike most 2D barcodes, PDF417 can be read with a laser scanner.
PDF417 barcodes can be linked so that large amounts of data can be scanned in sequence. This effectively removes the limit in the amount of data that can be encoded, making the PDF417 format competitive with true 2-D barcodes for representing large amounts of data.
The PDF417 is in use as a high-density barcode format in a number of applications, including:
- Driver’s licenses and ID cards which meet the Department of Homeland Security RealID standard.
- Airline boarding passes.
- Printed United States postage.
MicroPDF417 is a limited subset of PDF417 designed for situations where a full PDF417 code would be too large. It places limits on the dimensions of the bars, and on the amount and format of data that can be encoded (up to 200 characters of upper-case text, 150 binary bytes, or 366 numeric digits). it also places some restrictions on error-correction codewords.
MicroPDF417 is used in GS1 Databar Composite Codes, where it is combined with a linear barcode.
Unlike stacked barcodes, true 2D matrix codes represent data in a two-dimensional array, like squares on a chessboard. This allows them to pack a large amount of data into a compact space, and to represent a much larger character set. These codes must be read with an imaging scanner, rather than a laser scanner.
DataMatrix barcodes are square or rectangular arrays of black and white squares, or cells. Each cell is a bit, representing a one or a zero, and depending on the type of encoding, a DataMatrix barcode may be able to represent as many as 2,355 alphanumeric characters.
A DataMatrix code has two different types of border; on one set of adjacent sides, the border is solid, and on the other two sides, it alternates black and white cells, which gives it the appearance of having only the two solid borders. The solid, or finder, borders allow the scanner to orient the code’s image, while the alternating-cell, or timer, borders allow it to count the rows and columns.
DataMatrix codes can be extremely small, and they can be read at low contrast. This allows them to be printed or even laser-etched on small items. They can also be scaled up to a very large size for use on items such as heavy machinery, buildings, or railroad cars.
The actual encoding system is complex, and includes redundant storage of data, so if part of a DataMatrix code is lost or damaged, it may still be possible to read all of the data. DataMatrix can encode numbers and alphanumeric ASCII characters using several encoding and compression systems.
DataMatrix is used for labeling small components in the electronics industry, either with printed labels, or direct marking; they are also used in the food industry for quality control.
Most smartphones can read DataMatrix codes, allowing them to be used for marketing, advertising, and other applications where smartphone access is desirable.
The QR (or Quick Response) Code format was originally designed for use in the Japanese automobile industry to keep track of parts and of cars on the assembly line. Because of its versatility, it has become widely used in a variety of industrial and consumer-oriented applications.
A QR code resembles a DataMatrix code; it is square (surrounded by a blank quite zone), and consists of square black and white cells. But instead of borders, it uses a set of large position and alignment squares (and a smaller set of timing marks) set into the body of the code.
QR code can encode four different types of data: numbers, alphanumeric characters, binary/bytes, and Japanese kana/kanji. Alphanumeric encoding is limited to numerals, upper-case letters, and some punctuation, but binary/byte encoding includes the ISO 8859-1 Latin-1 character set, which fully or partially covers the Western European languages. Kana/kanji encoding uses the JIS X 0208 character set. QR code can encode website URLs, allowing mobile phone users to like directly to a website by scanning its encoded URL.
The size and density of a QR code can vary, depending on the amount of data to be stored. The maximum storage capacity is approximately 7,000 numeric characters, 4,200 alphanumeric characters, 2,900 binary characters, or 1,800 kana/kanji characters. A QR code can be broken up into several smaller codes, allowing them to fit into an area in which a larger code would not fit.
QR codes have seen a rapid increase in the number and range of applications for which they are used in recent years, in part because they can be easily read by smartphones, tablets, and other mobile devices. Current QQR code applications include:
- Marketing, advertising, sales, and other consumer-oriented applications.
- Both official and consumer identification by businesses and some government agencies. They are also used to encode electronic business cards and other forms of personal identification.
- Tickets, visas, and other documents required for travel identification.
- Financial transactions and banking information, particularly in connection with mobile banking.
- In industry for tracking parts and assembly-line items, and for storing both temporary and long-term information about products, tools, and processes.
- On maps and other documents to link the user to supplemental information, such as a navigational database.
The versatility, capacity, and accessibility of the QR code allows it to be used in a variety of unusual ways. QR codes have been included on artworks, stamps, money, tombstones, statues, museum exhibits, hiking trails, comic book covers, greeting cards — just about anywhere that they can fit and serve some kind of function.
The patents for the QR code are held by Denso Wave (a subsidiary of Denso, which is in turn owned by Toyota), which has chosen not to exercise its patent rights, and allows use of the codes without any license requirements.
In addition to QR code reading apps, free software and web-based services for generating QR codes are readily available.
The Aztec 2D barcode code resembles the DataMatrix and QR codes. It consists of a square of black and white cells (or pixels) with a locating symbol made of concentric squares directly in the center. The central area (around the square bull’s-eye) contains information about the symbol’s size, along with other encoding data. This means that it does not require a blank quite zone or a boundary. The code also contains an internal reference grid of alternating black/white pixels at every 16th row and column.
The data is arranged in a spiral from the center out; each layer of the spiral is made up of two rings of pixels, adding four pixels to the total width. The central bull’s-eye square plus the layer of encoding and size data together form the core, which may be compact (11 X 11) or full (15 X 15). An Aztec symbol with a compact core can have as many as 4 layers. A symbol with a full core can have 32 layers, and can encode over 3,800 digits, 3,000 characters of text, or 1,900 bytes of binary data. Text can be encoded as ASCII and Latin-1; the mode of encoding can be changed at multiple points within the data.
The Aztec code system is public domain, and applications are available for generating codes and reading them on mobile devices. Given the similarity in design, readability, and capacity, Aztec codes could be used in many of the applications for which QR codes are becoming popular, although in practice, their use is more limited.
Aztec codes are, however, fairly common in the transportation industry. They are used on airline electronic boarding passes, and for online and mobile railway tickets in many parts of Europe.
In addition, they are used in the billing systems of several Canadian corporations, and the Polish government uses them in its car registration system.
MaxiCode is a 2D matrix barcode which looks a bit like the Aztec code, only with a round bull’s-eye center instead of a square one. A closer look shows another difference — instead of square pixels, the data is encoded in hexagonal dots which are arranged in a hexagonal pattern.
MaxiCode was designed for a specialized function — routing and tracking United Parcel Service packages — and that continues to be its main use.
Unlike the other 2D matrix codes described here, MaxiCode symbols have a fixed size (approximately 1 inch square) and a fixed amount of data that may be encoded (approximately 93 characters, depending on the data mode). As manyas 8 MaxiCode symbols may be linked, or chained together.
There are five data modes in current use (as well as two obsolete modes):
- Mode 2 and 3 encode Structured Carrier Messages, which contain package shipping and routing information. Mode 2 is for domestic U.S. packages, and Mode 3 is for international shipping. These are the two modes most commonly used by UPS.
- Modes 4 and 5 are for unformatted text. Mode 4 uses standard error correction, and Mode 5 uses enhanced error correction.
- Mode 6 is used by manufacturers to program barcode scanners.
All of these modes may include a secondary message, which for UPS shipping usually contains more detailed shipping and tracking information. In Modes 4, 5, and 6, the secondary message is effectively merged with the primary message.
MaxiCode uses five code sets; a single message may switch between them repeatedly. The five code sets together include the standard ASCII character set plus most Latin-1 characters.