Sunday, December 12, 2010

Excel Hacks: Calculating GC%

As a quick hack, you can calculate the GC% of a DNA sequence in Excel without breaking out the macros. Here’s how you do it assuming your sequence is in Excel cell A1.

= (1 - LEN(SUBSTITUTE(SUBSTITUTE(A1,"G",""),"C",""))/LEN(A1)) * 100

Although the formula is a bit daunting, the principle is straightforward. First, use the SUBSTITUTE command to get a version of your sequence with all G’s and C’s removed. Determine the length of this new sequence using the LEN command and compare this to the length of the original sequence. This gives you the fraction of sequence that is NOT a G or a C. To get the GC%, subtract this fraction from 1 and multiply by 100% to go from a fraction to a percent.

This will probably satisfy most folks, but it’s not perfect. For an accurate calculation, we need to take into account ambiguous IUPAC base codes. For example, a "N" should be treated as 0.5 of a G-C base. A "D" should count as 0.33. Although it’s possible to extend the hack above to handle these cases, it might be a good time to break out Perl or Python.

Sunday, December 5, 2010

Excel Hacks: Plate Well to Number Conversions

Here’s a simple but useful transformation for going from a well “address” on a plate (e.g. “B5”) to a numbered location. For a 96 well plate, the numbered location will be a number from 1 to 96. Typically, this numbering goes first from left to right across the 12 columns and then top to bottom down the 8 rows. So, A1 would map to 1, B1 would map to well 13 and H12 would map to 96.

Let's assume the plate address you want to convert is in the Excel cell A1. The conversion is done in Excel as shown below:

= (CODE(LEFT(A1,1))-65)*12 + RIGHT(A1,LEN(A1)-1)

The formula assumes the well format is one character denoting the row followed by one or more integers which denote the column. The left half of the formula before the '+' converts the character into an integer using the CODE function. Since "A" has a code of 65, we subtract this off before multiplying by the row length. The portion of the formula after the '+' simply extracts the integer portion of the well address and adds it to get the well number. The formula works with padded (A01)and unpadded (A1) representations.

To go back from a numbered address to the letter-number format, use the following:

= CHAR(INT((A1-1)/12)+65) & (MOD(A1-1, 12)+1)

The portion of the formula left of the '&' handles the row calculation. The CHAR function performs the opposite duty of CODE, taking a number and converting it back to a character. The portion to the right of the '&' calculates the integer (column) portion of the well address.

It's quite easy to do 384-well or other formats. Just replace the red 12 with the correct number of columns. In the case of a 384-well format, you'd replace it with 24.

It is less common to see plate wells converted into numbers from top to bottom and then left to right, although it does happen. In this case, B1 maps to well 2 and H2 to 16 for a 96 well plate.

The conversion from plate well to number in this case is:

= (CODE(LEFT(A1,1))-65) + 1 + (RIGHT(A1,LEN(A1)-1)-1)*8

And to go back:

= CHAR(MOD((A1-1),8)+65) & (INT((A1-1)/8)+1)

Note that instead of using the number of columns (12), we're now using the number of rows (8). It's left as an exercise to the student to tweak the formula for 384-well formatted plates ;)