DATA ENCODING BASED ON TRICOLOR MATRIX BARCODES

encode data but to compress it. Due to data compression, more information can be represented keeping the same dimensions of the graphically coded symbol. Depending on the parameters, especially barcode symbol digital capacity, compression can be up to 20—25%.


Introduction
Matrix, or 2D, barcodes have been effectively used to make data input into a computer system faster and more secure (both free error and protected from third parties) for decades, but popularity of 2D barcodes in general and QR codes in particular was significantly increased since the era of smartphones had begun. Multiple use cases in different fields of humans' activity have appeared, such as ID verification, shipping and receiving, mobile ticketing or document tracking [1], and these are only few examples of using barcodes, along with new use cases and, respectfully, new problems arising. Thus, one of the subjects of particular interest is encoding more information in the form of barcode. It can be easily achieved by extending the area of a graphical barcode symbol, however it could be an inappropriate solution for certain use cases, when size is substantial (e.g. barcodes on microcircuit).
Barcoding methods and barcodes themselves are a subject of study for many researchers. Thus, in [2], the authors propose a new approach to color barcode decoding that does not require a reference color palette. According to their algorithm, groups of color bars are decoded at once, exploiting the fact that joint color changes can be represented by a lowdimensional space.
The authors of [3] present an approach for localization and segmentation of a 2D color barcode when it is read using computer vision techniques. They develop a progressive strategy to achieve high accuracy in diverse scenarios and computational efficiency.
In [4], a visible light communication (VLC) system for off-the-shelf smartphones entitled COBRA is presented. The system encodes information into specially designed 2D color barcodes. The authors developed a new barcode for COBRA, which is optimized for streaming between small-size screen and low-speed camera of smartphones.
The authors of the patent [5] propose to store information decoded from a bar-code as characterbased data in an auxiliary field, e.g. a comment field, of an image file.
In [6], the authors proposed High Capacity Colored QR codes (HCC2D) as an alternative to a standard QR codes. The main idea authors developed in the paper is to create a new 2D code which aims at increasing the space available for data, while preserving similar robustness, error correction and without losing compatibility with the original QR standard. Authors compared their new approach to Microsoft's High Capacity Color Barcode (HCCB) described in [7]. It is shown in the paper that HCC2D approach leads to a larger data density compared to QR at the price of a small computational overhead. Though the data density is slightly lower than in HCCB, HCC2D does not suffer from the problems in detection and alignment of the 2D code.
The author of [8] proposes a method of generating and decoding of two-dimensional color barcode, which includes a black and white configuration block that encodes configuration information about the barcode and a plurality of color data blocks that encodes data.
A method of high capacity color barcodes generation is proposed in [9]. This method operates by embedding independent data in two different printer colorant channels via halftone-dot orientation modulation.
In [10], the authors present a system and method for encoding and decoding data in a color barcode pattern using dot orientation and color separability. The authors assert that the method is robust against interseparation misregistration with a small symbol error rate.
The authors of [11] present a prototype for generating and reading the HCC2D code format on both PC (Linux and Windows platforms) and mobile phones (Android platform). The experimental results considering different operating scenarios and data densities in comparison with 2-dimensional barcodes are provided.
Thus, although there are a lot of solutions for barcoding, a problem of improving data representation as a barcode is still relevant and, among other issues, requires new approaches in data compressing.

Problem Statement
The subject of this study is a barcoding of data with compression.
A possible way to achieve increasing amount of information to be encoded with preservation of the area of a graphical symbol is to add more colors to a barcode and make it multicolor instead of blackand-white.
However, in case of using RGB palette there are few problems. In the first place, it will perceptibly augment overall complexity of calculations as each of RGB matrices has to be processed. Secondly, multicolor barcodes can be effectively used in the digital form. Modern printers are highly advanced, however there still could be a problem of color rendering. Particular colors can be distorted when printed, with the result that barcode scanner will incorrectly process encoded data.
Grayscale tricolor 2D barcode, or Black-Gray-White barcode (BGW barcode), solves the problem of representing multicolor barcodes in the printed form. Because of using grayscale, a barcode consisted of black and white colors and shades of gray can be easily printed using grayscale printing mode. In this case, barcode scanner will analyze an intensity of gray color, which would enable to minimize the possibility of scanning error.
In this paper, we propose the data barcoding method that enables to encode more amounts of information with preservation of the area of a graphical symbol, based on grayscale tricolor 2D barcode.

Data BGW Barcoding Method
Let us define some fundamental notations. A 2D barcode symbol is a set of barcode patterns that are densely spaced on a carrier in the form of matrix. A barcode pattern is graphical representation of s-digits ternary sequence of symbols.
A barcode pattern consists of s elements which are, in physical meaning, matrix cells. Each cell can be painted one of 3 colors; let us label them as white, gray and black. Consequently, maximum capacity of a 2D barcode symbol (i.e capacity of a set of all possible barcode patterns) will be max 3 s V  barcode patterns. Let us consider 2 types of barcode patterns: informational barcode patterns and auxiliary barcode patterns. Informational ones are used to present incoming alphanumerical sequence of symbols. Auxiliary ones are used to switch between modes, to indicate START and STOP symbols, scanner commands, etc.
Symbology  of a barcode is a set of all possible barcode patterns at fixed s. Let

A. Theoretical Background
Let an entering alphanumeric sequence is as follows: where i t is an element of entering sequence, Let us also define that the set ASCII can be presented as the following: where L is a letters set, D is a digits set, and C is a special symbols set. The sequence divides into adjacent subsets that are consisted of elements belonged to one of ASCII subsets: Thus, the entering sequence acquires the following form: w is a subsequence of the entering sequence which contains elements of only one set of ASCII subsets. However, in the entering sequence the subsequences 1 2 ... k w w w can be situated in any order.
Let us consider a mathematical model of the entering sequence transformation to a compressed sequence .
The subsequence 1 2 is considered as n-digits vector in a notation with a base A P . After compressing this subsequence transforms to m-digits vector in a notation with a base ensues so that the following compressed sequence is being obtained: The process of compressing a sequence of n adjacent symbols that belong to the alphabet A with a cardinality A P reduces to transforming these symbols to m barcode patterns, which are symbols of the alphabet inf  with cardinality inf P  : where n is a number of symbols from A and m is a number of symbols from inf P  .
Thus, the transformation (3) is a transformation of a number from one notation system to another one. It can be represented as following: where i  are corresponding codes of symbols from A and i  are corresponding codes of symbols from inf  .
In this case, the entering sequence (1) will be compressed to the resulting sequence (2)  Hence, the aim is to find such A P for fixed inf P  that a maximal compressing will be guaranteed when a sequence of symbols is transforming to a 2D barcode symbol.
To assess compression degree, let us calculate a ratio of an entering sequence length to a compressed sequence length: Hereinafter the ratio will be termed a compression coefficient.

B. BGW 2D Barcode Construction
As was stated at the beginning of the section, a barcode pattern is a graphical representation of sdigits ternary sequence of symbols, and a maximum capacity of a 2D barcode symbol is the capacity of a set of all possible barcode patterns which will be max 3 s V  barcode patterns. Table 1 shows the correspondence between a parameter s value and maximal capacity of a 2D barcode symbol.
It is inadvisable to consider barcode patterns when 4 s  as such 2D barcode symbols have too small capacity to be applied to real-life problems.
Thus, 4, 8 s  will be of practical value in further research. Fig. 1 shows possible alternatives on how to graphically represent a barcode pattern in accordance with s value.
The procedure of transforming an entering alphanumeric sequence to a 2D barcode symbol consists of the following stages:  I. Transformation of symbols from the entering sequence to a sequence of codes of corresponding symbols. At this stage, each symbol from the alphanumeric sequence is being replaced by a code corresponded to an order number of this symbol in the alphabet. As a result, we obtain a sequence of order numbers (Fig. 2). II. Transformation of a decimal number obtained at the first stage from notation system A P to notation system . inf P  Applying the transformation (6), we obtain a number in notation inf P  (Fig. 3) at the second stage. III. Transformation of the number in notation inf P  to a ternary number. At the third stage, we transform each number in notation inf P  to ternary system and obtain the ternary number (Fig. 4). for digit 0, gray for digit 1, and black for digit 2. At the end of this process, we obtain a barcode symbol representing the entering data in a graphical-coded form (Fig. 5). Thus, transforming initial alphanumeric data to a barcode occurs in 5 consecutive steps.
Multiple barcode patterns transformed from initial alphanumeric sequences of characters form a 2D barcode symbol which, afterwards, can be located at a physical object.

C. Practical Implementation
Let us consider an example of the proposed method implementation.
A lot of public and private offices keep information about their customers, or patients in case of medical institution. Generally, this information contains such personal data as full name, date of birth, passport number, residential address. To prevent third parties from accessing to these data, they could be transformed to a 2D barcode symbol by following the steps stated in the previous subsection.
As it is quite a small amount of information, there is no need to use large s values which are more appropriate to encode, for instance, a medical history or any other body of data. Hence, let 4 s  in this example. Then 70 inf P   as 81 P   and a u x P  has been chosen to be equal to 11 in this case. As a pre-step for barcoding, we need to determine alphabets which will be used when encoding the entering sequence of symbols. To do this, inequality system (7) must be solved for integer values only. When 4 s  , we obtain a set of solutions. The most useful integer solutions are presented in Table 2.
As we are going to encode Latin symbols, digits and some punctuation symbols (including space), the least possible cardinalities for each of these alphabets are 26, 10 and 4 correspondingly. However, if we analyze possible subsequences in these particular textual data, it is evident that punctuation symbols can be included into Latin and digits alphabets to avoid a necessity to switch between alphabets just because of one not-letter-symbol. Hence, we need to look for alphabets with cardinality covering those punctuation symbols. In general, these alphabets are being defined for each specific area of application (i.e., healthcare, public service, banking, etc.). In areas where large amounts of information must be stored and, consequently, encoded there could be a set of at least 3 alphabets, namely Latin letters, digits and punctuation symbols, including special characters. In some use cases ASCII both standard and extended can be also added.
In the final analysis, we choose the following alphabets to encode the data: As soon as the pre-step is done, we need to follow 4 stages of data transforming to a 2D barcode symbol which have been stated in the previous subsection.
At the first stage, the entering sequence splits up into a set of subsequences of 5 symbols as there are transformations 5 → 4 and 5 → 3. When dividing, as soon as we reach the first digit, which does not belong to alphabet L, we insert a switch mark D S and switch to alphabet D. Analogously, as soon as we reach a symbol that does not belong to alphabet D, we input a switch mark L S and switch back to alphabet L.
Each symbol in the set is being replaced with a corresponding code that is an order number of this symbol in the alphabet.
At the second stage, each subsequence is being transformed from notation system with base 29 for alphabet L and base 12 for alphabet D to notation system with base 70: Finally, the sequence obtained above is being represented in a form of matrix 710 where white color represents 0, gray 1 and black 2 (Fig. 6). Thus, the entering alphanumeric sequence containing private data has been transformed into the 2D barcode symbol. Its compressing degree is equal to 17%. Taking into account relatively small amount of the initial data, we consider that the obtained compression is satisfactory.

A. Enhanced Information Input
Initially barcodes have been used for quick and errorless input of information into a computer system. Today barcodes in general and QR-codes specifically are widely spread in advertising as an alternative URL to make a website access faster and more accurate, especially when a web-address is long and complex.
Quick and error-free data input is particularly vital in automated fields of activity, such as manufacturing. Even though automation have embraced most of industrial processes, people still take an active part in controlling them by entering specific commands into a computer system. BGW barcode enables holding larger amounts of information in a graphical symbol, so that more computer instructions can be represented by a single symbol.
Similarly to the data protection approach, sets of commands for industrial equipment are presented in the form of BGW barcode. The printed barcodes will form a so-called command sheet (Table 3). Such a command sheet might contain independent commands, command sets or executable code, depending on a field of application. For instance, command sheet approach can be used for installing drivers and software or customizing equipment for specific purposes. A command sheet can be located at the equipment, so that technical personnel can easily read it with a scanner and enter these commands into a computer system instead of using keyboard and control keys.
The proposed information technology for enhanced information input in the industrial area is presented at Fig. 7.
Such approach ensures quick and error-free input of complex data or set of instructions preventing a possibility of incorrect functioning of the technical system.

B. Textual Data Protection
Nowadays, paperless information technologies are widely used. In many use cases information exists only in electronic form. Nevertheless, printed textual documents are still in active use and it is very likely they will keep their role for a long time.
Information presented as a printed textual document frequently has confidential character. Names, addresses of residence, birthdates, marital statusall these data should be kept in a secured way. However, to protect a printed document from unauthorized access is much harder than to ensure information security in a computer system where a document security management system can be used and both cryptography and steganography can be applied [12,13].
Usually, the protection of printed documents is ensured by certain organizational procedures protecting information from access of unauthorized persons by limitation of physical presence of such people in offices as well as labeling documents with 'for official use only' mark. However, this approach cannot guarantee data protection.
The proposed approach for textual data protection is based on the fact that the human brain is not able both to recognize and to memorize data presented as a barcode. It is supposed that a textual document protected according to this approach can be visualized and perceived by a user only on a PC screen. When the document is printed, it is presented as a 2D barcode. Thus, the information technology we propose consists in the following (Fig. 8).
A computer system to be used for a textual documentation preparation includes a PC, a scanner, and a printer along with a special software.
A user uses PC for creation of a document which can contain a text in a certain language, e.g. in English. The document might be saved at the local data storage and it can be opened any time on this PC. When the document is opened, it is visualized in a usual way. At the same time, when the user prints the document, it is being converted into a 2D barcode and only after this transformation it might be printed on a paper. The printing of non-barcoded data is forbidden by the special software pre-installed at PC.
To see the printed textual document in a readable form, the user must scan the document. As the result of scanning, the text is displayed in the original form at the PC screen.

Conclusions
The proposed BGW barcoding method enables compact representation of a textual data. One of the benefits of the proposed method lies in a possibility to not only encode data but to compress it. Due to data compression, we can encode more information keeping the same dimensions of the graphicalcoded symbol.
In the example demonstrated above, the compression degree comes to 17%. Depending on the parameters, especially barcode symbol digital capacity, compression can be increased up to 20-25%.
The proposed method has a wide application. In particular, it can be used for enhanced information input. Another promising application area is textual data protection. In this regards, we propose the information technology, which allows to prevent unauthorized access to printed documents of confidential nature. The data barcoding method we propose in the paper allows to protect private data by transforming it from its initial textual representation into a graphical form of a 2D barcode symbol.
The BGW barcoding method proposed in this paper has its potential to further research. As we can define any alphabet to be used when encoding alphanumeric sequence of symbols, data in any language could be protected by transforming into a 2D barcode symbol. Thus, software developed on the basis of the data barcoding method can be easily extended for any language, including not-Latin-based ones, such as Korean, Japanese, Chinese, Georgian, etc.