Syncsort, a global provider of data liberation, integrity and integration solutions for analytics, announced new capabilities in its mainframe data access and integration solution. This populates Hadoop data lakes with changes in mainframe data.
The new DMX Change Data Capture (DMX CDC) functionality delivers real-time data replication that enables organisations to continually keep Hadoop data in sync with changes made on the mainframe, and consistently make the most current Big Data information available in the Hadoop data lake for analytics.
“Many organisations are using our industry-leading big data integration solution, DMX-h, to quickly and efficiently populate their data lakes with enterprise-wide data, including complex data from the mainframe for a variety of use cases, such as Hadoop as a Service, Data as a Service, data archive in the Cloud, fraud detection, anti-money laundering and Customer 360,” said Tendü Yoğurtçu, chief technology officer, Syncsort.
“After populating the data lake, it is very important to keep that data fresh to power real-time analytics and accurate decisions based on up-to-date information. Our new CDC offering provides our customers with an easy-to-use, highly efficient solution for ensuring the data lake is refreshed in real-time with the incremental updates, while meeting SLAs and conserving network resources.”
Syncsort already is in many large production deployments for accessing data from traditional enterprise systems and integrating it with Hadoop, including DataFunnel, its single-click solution for populating the data lake data from thousands of tables and automatically creating the corresponding metadata in Hadoop.
The new CDC capabilities extend this with real-time data replication, reducing the network load and providing up-to-the-minute mainframe data for analytics by:
- Speeding and simplifying the process of synchronising mainframe data with Hadoop, on-premise or in the cloud
- Saving time and resources using the DMX-h GUI and dynamic optimisations, which eliminates the need for coding and tuning
- Eliminating impact on mainframe database performance by avoiding database triggers
- Affording reliable data transfer – even during loss of mainframe to Hadoop connections or Hadoop cluster failures – picking-up where the transfer stopped without restarting the entire process
- Saving money with virtually no use of chargeable mainframe CPU resources
- Supporting IBM DB2 for z/OS and IBM z/OS VSAM files, with more sources to come
- Enabling queries on most current data via rapid update of Hive table data and statistics
- Handling all enterprise Hive file formats including text, ORC, Avro and Parquet
The combination of all the time savings and low resource utilisation with the new CDC facilities also helps organisations meet short SLAs with virtually no impact to MIPS costs.