Data and Its Types - Structured, Unstructured, Semi-structured

Data and Information

A computer is primarily for processing data. A computer system considers everything as data, be it instructions, pictures, songs, videos, documents, etc. Data can also be raw and unorganised facts that are processed to get meaningful information

So understanding the concept of data along with its different types is crucial to understand the overall functioning of a computer. Sometimes people use the terms data, information and knowledge interchangeably, which is incorrect. 

A computer system has many input devices, which provide it with raw data in the form of facts, concepts, instructions, etc., Internally everything is stored in binary form (0 and 1), but externally, data can be input to a computer in the text form consisting of English alphabets A–Z, a–z, numerals 0–9, and special symbols like @, #, etc. Data can be input in other languages too or it can be read from the files. 

The input data may be from different sources, hence it may be in different formats. For example, an image is a collection of Red, Green, Blue (RGB) pixels, a video is made up of frames, and a fee receipt is made of numeric and non-numeric characters. Primarily, there are three types of data.

(A) Structured Data 

Data which follows a strict record structure and is easy to comprehend is called structured data. Such data with pre-specified tabular format may be stored in a data file to access in the future. 

Table 1.3 shows structured data related to monthly attendance of students maintained by the school.

Structured data
It is clear that such data is organised in row/column format and is easily understandable. Structured data may be sorted in ascending or descending order. In the example, attendance data is sorted in increasing order on the column ‘month’. Other examples of structured data include sales transactions, online railway ticket bookings, ATM transactions, etc.

(B) Unstructured Data 

Data which are not organised in a pre-defined record format is called unstructured data. Examples include audio and video files, graphics, text documents, social media posts, satellite images, etc. 

Figure 1.10 shows a report card with monthly attendance record details sent to parents. 

Such data are unstructured as they consist of textual contents as well as graphics, which do not follow a specific format 

Unstructured Data

(C) Semi-structured Data 

Data which have no well-defined structure but maintains internal tags or markings to separate data elements are called semi-structured data. Examples include email document, HTML page, comma separated values (csv file), etc. 

Figure 1.11 shows an example of semi-structured data containing student’s month-wise attendance details. 

In this example, there is no specific format for each attendance record. Here, each data value is preceded by a tag (Name, Month, Class, Attendance) for the interpretation of the data value while processing.

Semi-structured Data