Data Science

Data Integration

What is Data Integration?

Data integration is the mechanism for access and delivery of data in an organisation. Such flow of information must be consistent in order to achieve complete accessibility, usage, and comprehension of data to all members of the enterprise.

There are several valuable benefits of initialising data integration.

  • Improvement in the quality of data
  • Efficiency boost in pattern and insight analysis
  • Better data management
  • Improvement in data utilisation and integrity

Extract, Transform, and Load (ETL)

As the name suggests, an ETL tool follows a three-phased process: extracts data from various sources, then the data is transformed into the particular format, and finally it stores the transformed data in to a central repository (such as a data warehouse or data lake).

Extract, Load, and Transform (ELT)

Similarly to the previous method, an ELT process is used to extract data from different sources. It differs by loading the data in to the central repository first, and then transforming the data into the desired output. The ELT approach benefits from the target system’s computing power and scalability to transform the data.

Data Virtualisation

Data virtualisation creates a virtual representation (view) of the data, by portraying the data through dashboards or visualisation tools. The process can retrieve and manipulate information without having to move the data between different platforms – this in turn reduces data errors.

Data Federation

Data federation is part of the data virtualisation system (both terms are often used interchangeably), and it creates a virtual database. Through data federation, multiple databases are able to operate as one, and data is taken from multiple sources.

Middleware Integration

Middleware integration process simplifies the connection between applications and their components. By providing such communication between applications, middleware boosts the effectiveness of application development, deployment, and running.

Data Replication

As the name suggest, rather than moving data between systems, data replication replicates (copies) data from one source to another. In certain scenarios, replication improves the availability and accessibility of data as well as data consistency.


Next: Big Data