Saturday, January 16, 2010

What Darwin Never Knew: DNA

I just finished watching one of the best PBS NOVA episodes - "What Darwin Never Knew". It is exceptionally good. I was finally able to understand a little about DNA - the building blocks of all living creatures. In this episode they explain how evolution is happening and why there are so many species and why even animals from the same species look so different. How complex DNA is and what a gene is. It still did not answer all my questions but got pretty close. Here is one part I found on youtube, but please watch all parts. It is very good.

So now I know that DNA is a very large sequence and it contains genetic instructions to synthesize various parts of a living creature. In simple layman terms there are three kinds of genes identified so far
  1. Genes that actually are used to create stuff. These are the sequences used to synthesize proteins and other things - the building blocks
  2. Genes that simply act as switches which can be on or off. When on the "stuff" creating genes are used to create the stuff, when the switches are off the "stuff" is not created. The show provides an example of fruit flies. Why some of them have black spots on the wings while others don't. But have the same genes that makes the spots, but in some the switch is turned off
  3. Genes that boss the switch genes. These are the ones that turn the switch genes on and off at various times. An example of this is the same bird species with same genes that creates the beak with the same switch genes, except the boss genes are different. So the switch genes are turned on at different times and the beaks of the birds are different (short and thick beak vs long and thin).
There is much more interesting stuff in there. However after seeing the show I still have a few questions. So the messenger RNA are created based on whether the switch is on or off. But how does that happen? Will the enzyme responsible for creating mRNA look at the switch? How does it "look"? Do the enzymes have sites that only lock on to the switch genes that are on? And then something happens from there on to make the enzyme use the genes to synthesize the mRNA or whatever? This means that the location of the switch genes should be close to the stuff making genes that they are controlling right?

Another question is how do the boss genes turn on and off the switches? What is the difference between an "on" gene and "off" gene? Probably the sequence. Then does it mean that the boss genes some how change the switch genes? I though a gene is an immutable entity. How can it change? Also how does the boss genes instruct the switch genes to change? Does it attract an enzyme of something to change the switch genes?

All these questions make me think I should probably study genetics or bio-engineering or something.

Friday, January 15, 2010

Fun with grep and sed

At work we have several java files that have javadocs with links that are not hyperlinked with . So I wanted to covert the links to hyperlinks. We wanted to convert only links that start with "Automates ", followed by one or more links that ends with a number. Example "Automates http://something/12345 and http://something/67890 but not http://something/54321". I wanted to do the conversion with one line of a bash command (trying to avoid writing the bash script). While tackling the problem I learnt a few things that I want to share and record here for myself to look back again in future when I forget.

To start with I needed to find all the files containing "Automates http://". I just wanted the filenames containing that string. And so comes grep to the rescue. With -l switch to list just the filenames instead of all the lines that match.

grep -R -l "Automates http://" *

Then it is time to replace the links with <a href=link>link</a> only for those lines containing "Automated http://". For this I want the line number of every line of every file that contained it. Getting the file number and line number is easy with grep. To get the filename use -H switch and to get the line number use -n switch. Here is an example

grep -R -H -n "Automates http://" *

The output of the above command looks something like this

/home/chandanp/temp/  /** Automates http://something/353571 */

To replace the link with hyperlink we can used sed. All we need is the filename, the line number and the string to replace. And use sed like so

sed -i '936s|\(http.*[0-9]\)|<a href="\1">\1</a>|g'

Where 936 is the line number I want to change and is the filename I want to edit. The -i option edits the file in place. The more complicated part is the regex matching. Basically anything that matches the regex inside a \( and \) will be stored in a buffer. The buffer number is the number of the matching \(\). So in the example above, the first buffer is the string that matches \(http.*[0-9]\). Which is basically any link that ends with a number. To recall the buffer we use \1. Which means: use the value that matches the first parenthesis pair. So in the sed the replaced string will be <a href="link">link</a>, where link is the string that matches \(http.*[0-9]\). Here is an example of the change

/** Automates <a href="http://something/353571">http://something/353571</a> */

Notice another thing with the way I used sed's replace command above. I used s|match|replace instead of the usual s/match/replace. What many people don't know is that once can use any character after s instead of the usual /. So you could even do s#match#replace too if you want. I used the pipe symbol.

Now that we can replace each individual line of each file we somehow have combine the previous grep output with this sed command. That was tricky. First we need to break up the output of the grep command to individual filename and line numbers and then give that to sed. Well xargs, cut and sed to the rescue. We use the fact that the filename and line number are delimited by : and play some tricks

grep -R -l "Automates http://" * | xargs -I{} grep -H -n "Automates http://" {} | cut -f-2 -d: | sed "s/\(.*\):\(.*\)/filename is \1 and line number is \2/"

Basically all it says is that take the output from the first grep which prints out the filename containing "Automates http://" and pipe it to xargs which takes the filename and gives it to another grep that prints filename:line_number:matched_sting and pipe that information to cut which prints the first 2 tokens that are delimited by :. We need to do the cut because the matched string also has : which means we don't want sed to use that part of information in the matching. Then we pipe the information from the cut to another sed to print the filename and line number. Here is the output after various pipes

$ grep -R -l "Automates http://" *

$ grep -R -l "Automates http://" * | xargs -I{} grep -H -n "Automates http://" {}  /** Automates http://something/353571 */  /** Automates http://something/336439 and http://something/336438 */

$ grep -R -l "Automates http://" * | xargs -I{} grep -H -n "Automates http://" {} | cut -f-2 -d:

$ grep -R -l "Automates http://" * | xargs -I{} grep -H -n "Automates http://" {} | cut -f-2 -d: | sed "s/\(.*\):\(.*\)/filename is \1 and line number is \2/"
filename is and line number is 73
filename is and line number is 936

The final piece of puzzle is to make output from the last sed into a command and then run it. So instead an output like filename is and line number is 73, we just need sed -i '73s|\(http.*[0-9]\)|<a href="\1">\1</a>|g' So here is the command to do just that (very complicated with lots of backslashes and quotes but I did not know any better :).

$ grep -R -l "Automates http://" * | xargs -I{} grep -H -n "Automates http://" {} | cut -f-2 -d: | sed "s/\(.*\):\(.*\)/sed -i \\\'\2s|\"\\\(\"http.*[0-9]\"\\\)\"|\<a href=\"\\\1\"\>\"\\\1\"\<\/a\>|\\\' \1/"
sed -i \'73s|"\("http.*[0-9]"\)"|<a href="\1">"\1"</a>|\'
sed -i \'936s|"\("http.*[0-9]"\)"|<a href="\1">"\1"</a>|\'

Then we need to execute that command using bash. Like so

grep -R -l "Automates http://" * | xargs -I{} grep -H -n "Automates http://" {} | cut -f-2 -d: | sed "s/\(.*\):\(.*\)/sed -i \\\'\2s|\"\\\(\"http.*[0-9]\"\\\)\"|\<a href=\"\\\1\"\>\"\\\1\"\<\/a\>|\\\' \1/" | xargs -I{} bash -v -c "{}"

Ah finally. But there is one problem however. When there are multiple links in the same line, sed matches all of the links and creates a weird output like this:

Automates <a href=http://something/336439 and http://something/336438>http://something/336439 and http://something/336438</a>

I still don't have good solution for that. Since I have just a few of these lines I fixed them quickly using tkdiff. But anyone know how to solve it?

Tuesday, January 12, 2010

Attesting General Power of Attorney in SF

Recently I had to go through the motions of getting a General Power of Attorney (GPA) document attested in San Francisco. I am an Indian by birth. My parents were trying to buy a house back in India for me. Since I did not want to travel to India they needed a GPA so that they can act on my behalf to sign all the documents required to buy the house. The problem however is that they needed it urgently because the seller lives in UK and wants to get all the things done quickly so he can go back.

My parents send me a GPA document that they obtained from a lawyer. This is a document that will give the power to my parents to buy the said property in the document on my behalf. The lawyer said that I will have to get the document attested at an Indian Consulate in USA. The closest one for me is in SF and I can drive there in about an hour from where I live. So I though it will be like a day's work to get all the things done.

I looked up at their website for the procedure to attest GPA. I found the page I was looking for and the process seems simple enough. These are the things I need:

  • Complete Miscellaneous Services Application
  • Attach all previous passports in original
  • Photo copies of first 5 and last 2 pages of current passport
  • Current passport
  • Photo copy of a valid visa
  • Proof of status: original green card/visa/EAD. For me it is a valid visa
  • Proof of residence: Drivers License/Electricity/Water/Telephone bills/lease agreement. For me it is driver's license.
  • Photo copies of proof of status and proof of residence above
  • Original apostilled Power of Attorney to be attested
  • Photo copy of apostilled Power of Attorney document
  • Paste a photo in Miscellaneous Services Application
  • Paste a photo on Power of Attorney document
  • Sign Miscellaneous document
  • Pay $20 for fees using money order / cachier's check or debit card ($3 extra). Debit card for me

I got everything right except for apostilling the document. I got myself photographed. Pasted one in the GPA and another on the application and drove for more than an hour in traffic to reach the SF indian consulate only to find that I need to get the GPA apostilled. I did not know what it was and how it was supposed to be done. So I just skipped the step and went to the consulate. After standing in line for an hour the lady at the counter was kind enough to walk me through the whole process. She told me that first I should get the GPA notarized by a notary public after signing the document in front of the notary. Then I should send the GPA along with self addressed envelope to Secretary of State, California to get the document apostilled. Once they send the document back to me I will have to bring the documents to Indian consulate for attestation.

If I send the documents via mail to Secretary of State in Sacremento, it will be too late for me to send them to India. So I called up Secretary of State, CA to ask how long it will take if I walked in. By the way you can find all the details at Contact phone for notary public is here. The lady on the phone said that by mail the turn around time is 1 day exclusive of the mail service time. And if I walked in it will be over in about 15 minutes. They just require the notarized document.

Great. That is easy. But there is just one thing. I wanted to finish all this in one day. So I had some planning to do. First it takes 2 hours to drive to Sacremento from where I live. And another 2 hours from Sacremento to SF where I will get the final attestation. The consulate closes at 12 pm. While the Secretary of State opens at 8 am.

The next day I started at 6am and drove to Sacremento. It was exactly 8am by the time I was in the Secretary of State. Finding the notary public office was easy and I got the document apostilled in about 15 minutes. Then I started off to SF for another 2 hour drive. I got the apostilled document photo copied at an office max in SF. It was 10:30am before I stepped into the consulate. At this point it was just a matter of time. I was in the line for an hour and then everything flew by quickly. She took the documents and the payment and asked me to return at 4pm to pick up the documents. I drove to work at 12pm and returned back to SF at 4pm. Everything was ready for me. Took the attested documents, drove back to work and then home late in the night.

Two days and 450 miles later I have all things done! That one day was tiring with 350 miles logged. End of the day the whole process is not that complicated. I was lucky to be close to Sacremento and SF. But for others who are not so lucky it will take more time. So please make sure you have enough time.