How to Build a Powerful Data-Crunching Computer, Part 1: Choosing the Parts

Most of my research hinges on the analysis of ‘big data’. I have terabytes of DNA and RNA sequence data that I am analyzing, and the analysis of that much data requires a powerful computer with a whole lot of RAM. Many people use supercomputers, which are like many powerful computers hooked up together, but there is often a long queue to use the computing resources, meaning that you may have to wait a while for everyone else to finish up using the supercomputer before your data can be run. Supercomputers are also a bit more complicated to use because you have to submit a job and you can’t just run your programs like you would on your own computer.

Since so much of my research requires these sophisticated computers, I’ve decided to build myself a computer! I thought it might be useful to some other scientists if I share my experiences choosing computer parts and then putting them together to make a sophisticated data-crunching machine!

The first step to building a computer is of course choosing the parts.

1. RAM

The first thing to decide is how much random-access memory (RAM) you want. RAM is what programs use to store data not saved in files, such as temporary information. In DNA analysis, how much RAM you have is very important because often you need to compare many sequences to find matches, and all of the potential matches are stored using RAM. The most RAM you can get before having to upgrade to a server instead of a desktop is 64 GB (which is a lot). For my lab computer, I went with 64 GB, but for my own personal computer I chose 32 GB, but left room to upgrade if I wanted to (i.e. I chose 4 x 8GB RAM, leaving an additional 4 slots available if I want them).

Once you’ve decide how much RAM you want, you need to decide what type of RAM. Technology is constantly changing, and there is a ‘new’ type of RAM available: DDR4 (as opposed to DDR3). DDR4 is faster but more expensive, so there’s a tradeoff there. I chose to go with DDR4, but DDR3 should work just as well.

Choosing a brand can be confusing, because there are so many companies out there! I went with G.SKILL Ripjaws series RAM because it came well-recommended from more experienced computer builders.

2. CPU

The core processing unit (CPU) is the next component to choose. This is where most of your computer’s power comes from. The type and amount of memory you chose will determine what type of CPU you can get. Intel is generally the CPU brand of choice, and at the moment their cutting-edge CPU model is the Intel i7. To match my desired 64 GB of RAM, I needed a LGA2011-v3 socket type, so that narrowed down my search to three types. Because I’m price-sensitive, I chose the Intel i7 5820k.

3. Motherboard

Once the CPU is picked out, you can choose a motherboard. I did most of my searching for parts on newegg.com, because you can narrow your search results based on the things you want. For instance, since I chose an Intel CPU, I searched “intel motherboard” and then clicked the box for “LGA2011-vs” and under “Maximum Memory Supported” I chose “64GB”. Then I looked through the other specifications I might be interested in (number of ports, expansion slots, etc.) and again asked other people about brands that are good, and I wound up going with the Asus X99 series of motherboards.

Using newegg.com to help narrow down options

Using newegg.com to help narrow down options

One thing to note is that reviews of motherboards are invariably going to be on average middle of the road. This is because there will always be some cases where the hardware is defective, regardless of the company, and those reviews will drag the average rating down.

Once you’ve chosen your RAM, CPU, and motherboard, choosing the rest of your components doesn’t need to happen in any particular order. You just want to make sure everything is compatible. I recommend using pcpartpicker.com to check compatibility of the components you choose.

4. Storage Size and Type

The first major decision you need to make is whether you want to run your operating system off of a solid-state hard drive. Solid state drives don’t have moving pieces like traditional spinning hard drives, but they are quite pricey in comparison. The main benefit is that they allow you to start up your operating system and programs much more quickly and are less prone to mechanical failures. I chose to go without one for my personal computer build, although I’ve got one in my lab computer and love the speed.

Then you need to decide how many other hard drives you want and how much storage space you want. I chose two 4 TB hard drives because my data takes up a lot of storage space, but that becomes a personal decision about how much you need to store and price.

5. Power supply

A necessary component of any computer is the power supply unit (PSU). The EVGA brand was recommended to me, and I chose to go with an 850W PSU because I might need quite a bit of power to run all of my analyses.

6. Graphics card

A graphics card is necessary to run the computer’s display. To choose a graphics card, look at your motherboard’s specifications to see what type(s) and how many PCI slots it has to narrow down your search results. For data analysis you don’t need a particularly fancy graphics card with a bunch of overclocking ability like you would if you were building a gaming computer. The EVGA brand was recommended to me, so I chose based on compatibility, brand, and price.

7. CPU Cooler

This is an important choice. Since I will be working my CPU pretty hard, I chose to go with a dual-fan CPU cooler, the Cooler Master Hpyer D92.

8. Case

The case is important. You want one big enough to fit all of your components and have a bit of breathing room, so I recommend a mid-tower design. I also chose one that came with several built-in fans and a couple of USB slots on the front. These cases can get pretty fancy, so as long as it fits your motherboard and has enough space to keep your computer cool, you can probably choose based on extra features and coolness factor (how many LED-lighted fans do you want?)

9. Wireless adapter

Some motherboards come with on-board wifi, but if yours doesn’t you may want to buy a WiFi adapter.

10. CD/DVD Drive

This is a purely optional component. Having an optical drive will help you install the drivers that come with all of your components, but it is not necessary

11. Accessories: Keyboard, mouse, monitor

I’m pretty happy with a standard wired mouse and keyboard, so I went for the cheap Microsoft set. But I did spring for a nice 27″ monitor. These things, and others (speakers etc) are completely up to your personal preferences.

Important Tip: Shop around!! 

There are many retailers selling computer parts, and some have different promotions going on at different times. newegg.com is usually pretty low-priced, but I recommend also checking TigerDirect.com, Best Buy, ncixus.com, and Amazon. I bought almost all of my components from Newegg, because it was the cheapest, but I did get a couple of items from other vendors.

If you’re having trouble deciding on components, read reviews, watch Youtube videos, and talk to experienced computer-builders. It can be overwhelming but I learned a lot about how computers work just by shopping around for the various components.

For how to put the pieces together, check out Part 2: Putting the Pieces Together!

Advertisements

12 thoughts on “How to Build a Powerful Data-Crunching Computer, Part 1: Choosing the Parts

  1. Hi Sarah, I am just starting to build a home based data crunching computer myself, I am a little surprised that you went for the “I” processor. Considering you have gone down the SSD and DDR4 route, (which I am as well), and that it is being built for data crunching, I think it would have made more sense to use a current model E5 2600 series v3 Intel Xeon CPU. These can be used in a tower set up (not just big servers). One very important difference between Xeon and “I” processors is Xeon allows double point precision calculation, the “I” range only allows single point. (depends a lot on the work your doing, but a lot of scientific data crunching now is double point due to increased accuracy). The X99 MoBo can take a single Xeon or “I” CPU, or for not much more $, you can have a MoBo that can run 2 x Xeon CPU’s in a normal Full Size Tower case. Essentially you could start with a single Xeon CPU and later add a 2nd CPU, (full size towers also often allow placement of 2 power supply units inside). Xeon CPU’s are specifically designed to take 24/7 workloads, peak outputs may be down from an I5 or I7, but are built for the long haul. the “I” range of CPU’s are not designed for 24/7 data crunching/analysis. Used for that kind of continuous work, I think you may find the lifespan of the I7 less than projected. “Before anyone jumps down my throat” I believe the I7 5820K to be an excellent, high quality CPU, just not designed to crunch data 24/7. Same with the Asus X99 MoBo. In fact I came very close to going for the X99. However for the reasons just stated and that it can utilise only 1 CPU, it got crossed off my list. I started researching my build in October last year and now have most of the components either bought or budgeted. My build includes, Intel S2600CWTS MoBo, 2 x Intel E5 2620 v3 CPU’s, 2 x 1200W Platinum +80 PSU’s, 4 x 250GB Samsung Evo SSD’s set up in raid 5, Lacie 2Big 12TB External HD (this can be daisy chained up to 6 times using thunderbolt 2 cables), external Blu Ray DVD Writer/player, 4 x Intel Phi 31S1P co-processors. Due to the Phi co-processors being passive cooled (that is what the “P” stands for in the model) i.e. no inbuilt fans on them, I am using as many external components as possible to reduce air flow restrictions and heat build up.

    • Thank you for your input! The type of analyses I’m doing don’t need the double point precision or multiple CPUs, but certainly the way I built my computer is just one of many possibilities! I’m glad you found something that works for you.

      • sarah, I like your approach. I’m going to assemble a similar pc used for scientific calcs like used for some moderate CFD problems and equation solutions. I’ll be looking for something similar to what you spec’d.
        Would you post an approximate price for your whole bundle of pieces?

        Tom Kosvic

      • The machine I built cost about $2000, but obviously depending on which components you choose the price will change. I also don’t know how prices may have changed in the past year.

  2. Graphics cards nowadays play big role in computation and I must respectfully point out that you’re either completely unaware of it, or you’re downplaying the importance without saying that you’re doing so. If one is going to put a ‘how-to guide’ on the Internet where large sums of money are involved, one should have more experience and knowledge–otherwise you’re potentially doing a great disservice to other people.

    Depending on what you want do, you might want to have more than one graphics card. This will affect what choice you make in selecting the motherboard.

    https://www.quora.com/Is-GPU-computing-suitable-for-big-data-analytics

    “A graphics card is necessary to run the computer’s display.”
    Only true if motherboard doesn’t have a basic one integrated in it.

    More humility, please.

    • I apologize if I have misled anyone. These blog posts are merely reporting the journey that I’ve taken to analyze large genomics datasets, and all of the programs I’m aware of to do current genomics analysis do not take advantage of GPU computing. So perhaps I downplayed the importance of GPUs because the importance wasn’t central to my decision making.

      I hope that everyone who chooses to invest large amounts of money in building their own computer does a more research beyond reading my posts!

  3. Thanks for this post! I’ve never built my own computer before and most advice online seems to concern gaming rigs, so this was very helpful.

  4. sarah, I like your approach to component selection. I am building a similar machine that will be used for some moderate CFD work and equation solving.
    Would you post an approximate total price for your component assemblage?
    Thanks,

    Tom Kosvic

  5. Choosing a CPU will be the first step for most people. In this respect, I completely agree with Paul Goode’s suggestion of using Intel Xeon processors in a Desktop setup. These processors don’t include integrated graphics (which is an advantage), because as Student has pointed out, any serious graphics work will usually require dedicated GPU’s. Be sure to select a CPU that has hyper-threading support to allow for highly threaded applications.

    I would also recommend using Error-correcting code (EEC) memory to handle soft/hard errors. This is particularly important when data integrity is critical.

  6. Nice article, i think you should have gone with ecc memory as normal ram can have errors every once in a while (which is fine for the everyday user), and this would potentially lead to some false results if this thing is crunching away at a lot of numbers.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s