IBM and Flash memory
in response to
by
posted on
Jul 25, 2011 09:53AM
With an eye toward helping tomorrow's data-deluged organizations, IBM researchers have created a super-fast storage system capable of scanning in 10 billion files in 43 minutes.
This system handily bested their previous system, demonstrated at Supercomputing 2007, which scanned 1 billion files in three hours.
Key to the increased performance was the use of speedy flash memory to store the metadata that the storage system uses to locate requested information. Traditionally, metadata repositories reside on disk, access to which slows operations.
With an eye toward helping tomorrow's data-deluged organizations, IBM researchers have created a super-fast storage system capable of scanning in 10 billion files in 43 minutes.
This system handily bested their previous system, demonstrated at Supercomputing 2007, which scanned 1 billion files in three hours.
Key to the increased performance was the use of speedy flash memory to store the metadata that the storage system uses to locate requested information. Traditionally, metadata repositories reside on disk, access to which slows operations.
"If we have that data on very fast storage, then we can do those operations much more quickly," said Bruce Hillsberg, director of storage systems at IBM Research Almaden, where the cluster was built. "Being able to use solid-state storage for metadata operations really allows us to do some of these management tasks more quickly than we could ever do if it was all on disk."
IBM foresees that its customers will be grappling with a lot more information in the years to come.
"As customers have to store and process large amounts of data for large periods of time, they will need efficient ways of managing that data," Hillsberg said.
For the new demonstration, IBM built a cluster of 10 eight-core servers equipped with a total of 6.8 terabytes of solid-state memory. IBM used four 3205 solid-state Storage Systems from Violin Memory. The resulting system was able to read files at a rate of almost 5 GB/s (gigabytes per second).
The system used a tuned version of IBM's General Parallel File System (GPFS), version 3.4. Originally developed for high-performance computing systems, GPFS is becoming increasingly relevant for other data-heavy enterprise workloads as well, Hillsberg said. GPFS allows all the processor cores to write to and from disks in parallel, which can significantly improve storage system responsiveness.
Today's file systems are not well-suited for managing data across multiple storage systems as a single namespace, Hillsberg explained. The 2007 demonstration showed how a parallel file system such as GPFS could be used as the basis for highly scalable storage systems. The new work demonstrates how such a system could be improved even more with the addition of solid-state disks.
The researchers posted a white paper
The idea of building flash-memory-assisted servers "is not that far out. The technology already exists,"