Big Data - Hadoop

NVIDIA Data volume, velocity, variety, veracity and value are the indexes to measure Big Data deployments. Hadoop, as the dominant Big Data application, helps organizations to improve, initiate new business, and advance research. Hadoop's cluster capabilities in data storage size, ingress flow, and analytics speed are the objectives of today's data scientist's work. Legacy Ethernet networks are no longer capable of delivering the performance required for Hadoop clusters. Multi-socket, multicore servers have outgrown legacy Gigabit Ethernet (and often 10GbE) capacity, even with multiple aggregated links.

The selection of a network and its required capabilities is one of the major challenges when building a Hadoop cluster, as workloads vary significantly. The goal is to buy enough network capacity that all nodes in the cluster can communicate with each other at optimal rates. Mellanox's end-to-end Hadoop networking solutions deliver the necessary performance to eliminate any current and future bottlenecks. Whether Ethernet or InfiniBand, Mellanox switches, cables, and adapter cards provide enough bandwidth to sustain today's advanced disk controller throughput. The low-latency features drive NoSQL database performance to new highs. Sub-microsecond latency reduces HBase inquiry response time and enables a more predictable answer time histogram.

New interactive frameworks, on top of Apache Hadoop, provide near-real-time data analysis performance. To handle the large amount of data, these frameworks require a low-latency, high throughput connectivity. Linear scalability is a requirement to answer today's business growth, as data magnitude is exponentially increasing with linear sales and revenue growth. With advances in microprocessor technology, multi-core processors and servers create a demand for higher throughput networks to feed the CPU's processing. Mellanox InfiniBand and Ethernet technologies deliver the highest available bandwidth, and each have RDMA capabilities to offload data movement.