Cloudera Professional Services
Cloudera on Kubernetes
Apache Spark
Apache Flink
Apache Kyuubi
Trino
Apache Kafka
Cloudera Open Data Lakehouse powered by Apache Iceberg
Telecommunications
Japan
LY Corporation: Transforming data management with Cloudera
LY Corporation is a leading Japanese player in the digital services industry. It offers a wide range of web and mobile services to over 320 million customers across Asia. Known for its innovative approach and commitment to excellence, LY Corporation has quickly built a reputation for delivering unmatched convenience to its users.
To modernize its data management practices, LY Corporation embarked on a journey with Cloudera to leverage cutting-edge technologies and enhance performance and scalability.
Overcoming data bottlenecks and enhancing compliance
LY Corporation’s data platform is built on Cloudera and utilizes multiple Hadoop Distributed File System (HDFS) clusters. It acts as a centralized data lakehouse for data engineering and machine learning projects, significantly boosting core business revenues and decision-making processes.
With over 100,000 tables and datasets and a total disk capacity exceeding 1.1 exabytes of all clusters, the platform manages a vast number of data loading, processing, and management tasks simultaneously. These tasks are crucial for business intelligence, classification and providing personalized recommendations to customers.
Despite its capabilities, LY Corporation’s platform faced several challenges. The existing system struggled with bottlenecks due to the large number of table partitions, impacting scalability and performance. Ensuring data integrity and pipeline availability under concurrent data access and modification was essential. Additionally, LY Corporation needed to comply with complex privacy policies and strict data protection regulations.
Empowering teams with independent data management
To tackle these challenges, LY Corporation partnered with Cloudera to modernize its existing data platform. The first step was to enhance their data ingestion process. By adopting Apache Iceberg as part of the data platform, LY Corporation is now able to update data every five minutes, a significant improvement over the previous method. To address the small file issues associated with this new format, they developed a background service that optimizes tables without disrupting users.
This Iceberg format was applied to over 8,000 tables, primarily for tracking user behavior and system events. This change simplified managing large amounts of data while ensuring its integrity.
Additionally, LY Corporation transitioned to a more modern approach by deploying data-related systems on Kubernetes, including several Cloudera components. These systems are also integrating Spark SQL, Flink, and Trino, enabling more efficient use of resources, improved scalability, and enhanced performance of their data processing tasks.
To support this migration, LY Corporation worked with Cloudera Professional Services to implement Apache Kyuubi, a service designed to streamline data queries. Cloudera provided expert guidance on seamlessly integrating Kyuubi with their platform, enhancing support for existing systems, and ensuring a smooth and efficient transition to the upgraded infrastructure.
Finally, LY Corporation focused on optimizing data management. They introduced a system that allowed for quicker updates and better handling of small files. The system enabled different teams to manage their data independently without interrupting ongoing analytics and machine learning tasks, improving efficiency and empowering teams to take ownership of their data.
Boosting productivity and performance with efficient data processing tools
The adoption of these new technologies and processes has significantly enhanced LY Corporation’s data platform performance and scalability. Data can now be delivered 10-12 times faster without increasing pressure on the central data management system.
The new data format's properties enabled data management tasks to be distributed among data owners and product teams without disrupting 24/7 production analytics and machine learning (ML) pipelines. The migration to more efficient data processing tools brought general performance improvements and new features, boosting the productivity of data scientists and ML engineers.
The increased self-service data management capabilities enabled product teams to respond more quickly to new regulatory requirements, policy updates, and right-to-be-forgotten (RTBF) requests for inactive or unregistered users.
Looking ahead, LY Corporation plans to extend the new data format to all major datasets and run significant data processing jobs on Kubernetes for better integration with in-house data tools and GPU support. They are also considering utilizing advanced features of the new data format to further enhance performance and scalability.