Semi-Structured Data
Semi structured data lies between structured and unstructured data. Data that stores in traditional database system or excel sheet can be denoted as structured data and organized in COLUMNS and ROWS. Unstructured data can be considered as any data or piece of information which can’t be stored in Databases/RDBMS etc. Email, facebook comments, news paper etc are the examples of unstructured data.
Semi structured data do not follow strict data model structure and neither raw data nor typed data in a traditional database system. To represent information as semi-structured data, certain format has to be followed. We can use JSON (JavaScript Object Notation ), XML format as well as to transport over wire. Specific parser is mandatory to retrieve desire data from JSON or XML at the data consumer end.
JSON is light weight and efficient compare to XML and easily human readable but we can’t store/persist or query from traditional database system. NoSQL databases like
HBase,
MongoDb, Cassandra, Hadoop distributed file system (HDFS) etc can be leveraged to store, query, analyze etc . In a typical client-server web application, JSON format widely used for bi-directional data interchange.
Here is the sample unstructured data ” The two company named ABCD and EFGH are located in Bangalore and Chennai respectively. ABCD is a pharmaceutical company and have 150 employs. They are into medical drugs supplier and associated with HDFC bank for all business transaction. Company EFGH is into manufacturing of PVC pipes and have 300 employs and doing financial transaction with State Bank Of India “. Above information or data can be transformed into semi-structured data using JSON format. Also possible to persist in NoSQL Database and transmit over wire as REST service request/response.
[
{
“CompanyName”: “ABCD”,
“Description”: “pharmaceutical company”,
“Type” : “Medical drugs supplier”,
“EmployNo”: “150”,
“BusineesBank”: “HDFC Bank”,
“Location” : “Bangalore”
},
{
“CompanyName”: “EFGH”,
“Description”: “Manufacturing company”,
“Type” : “PVC Pipes”,
“EmployNo”: “300”,
“BusineesBank”: “State Bank Of India”,
“Location” : “Chennai”
}
]
Facebook graph API provides semi-structured data in JSON format when we query from a specific node using GET method in REST service.
Written by
Gautam Goswami
Can be reached for real-time POC development and hands-on technical training at [email protected]. Besides, to design, develop just as help in any Hadoop/Big Data handling related task. Gautam is a advisor and furthermore an Educator as well. Before that, he filled in as Sr. Technical Architect in different technologies and business space across numerous nations.
He is energetic about sharing information through blogs, preparing workshops on different Big Data related innovations, systems and related technologies.