answersLogoWhite

0

Data duplication happens all the time. It is an inevitable phenomenon as millions of data are gathered at very short intervals. A data warehouse is basically a database and having unintentional duplication of records created from the millions of data from other sources can hardly be avoided. In the data warehousing community, the task of finding duplicated records within large databases has long been a persistent problem and has become an area of active research. There have been many research undertakings to address the problems of data duplication caused by duplicate contamination of data. Several approaches have been implemented to counter the problem of data duplication. One approach is manually coding rules so that data can be filtered to avoid duplication. Other approaches include having applications of the latest machine learning techniques or more advance business intelligence applications. The accuracy of the different methods for countering data duplication varies. For very large data collection implementing some of the methods may be too complex and also expensive to be deployed in their full capacity.

User Avatar

Wiki User

14y ago

What else can I help you with?

Continue Learning about Information Science

Disadvantages of system data duplication?

System data duplication, or denormalization, causes excess use of redundant storage, excess time processing queries, and possible inconsistency when de-normalized data is changed in one place but not the other. (Any one else have examples? Please enhance this answer. Thank you.)


Occurs when the same data are stored in many places?

Data duplication occurs when the same data is stored in multiple locations or systems. This can lead to inconsistencies, errors, and challenges in maintaining data integrity. Employing data normalization techniques and centralized storage systems can help reduce data duplication.


What is the main purpose of relating data between tables in a database?

The main purpose of relating data between tables in a database is to establish connections between different pieces of information, allowing for efficient querying and retrieval of data. This relationship helps to avoid data duplication and ensures data integrity by enforcing constraints and maintaining consistency across the database.


What is meant by Data Consolidation?

Data consolidation simply means collecting and integrating data from two or more different sources to provide a single, consolidated data source, in order to reduce inefficiencies such as data duplication and making it easier to present data without the overhead of multiple data resources and the costs incurred in managing separate data sources.


Advantages and disadvantages of relational data model?

Advantages of relational data model include data integrity through normalization, flexibility to query data using SQL, and ease of understanding relationships between entities. Disadvantages can include performance issues with complex queries, potential for data duplication across tables, and difficulty in scaling for very large datasets.