Understanding Computer Basics, Programming Languages, and Data Structures

perl scripting n.w
1 / 16
Embed
Share

Explore the fundamentals of computer basics, different programming languages, types of data structures, and various basic types in programming. Delve into pointers and complex types like sets, arrays, and hashes.

  • Computer Basics
  • Programming Languages
  • Data Structures
  • Basic Types
  • Pointers

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. PERL SCRIPTING

  2. COMPUTER BASICS CPU CPU, RAM, Hard drive CPU can only use data in the register directly RAM HARD DRIVE

  3. COMPUTER LANGUAGES Machine languages: binary code directly taken by the CPU. Usually CPU model specific. Fast. Assembly language: mapping binary code to three-letter instructions; Platform-dependent. Fast High-level language: human-like syntax, often non-CPU dependent. Compiled into machine code before use. Fast. E.g. C, C++, Fotran, Pascal, Basic. Scripting language: usually not compiled into binary code. Interpreted and executed on request. Slow. E.g. Perl, Php, Python Javascript, Bash script,Ruby Byte-code language: source code converted to platform independent, intermediate code for rapid compilation. Java, Microsoft .NET. Speed intermediate.

  4. TWO ELEMENTS OF A PROGRAM Data structure & Algorithm Different data structures may have corresponding, well optimized algorithms for information processing and extraction. (computer science) For example: Inserting (algorithm) a node (data structure) in a linked list (data structure).

  5. BASIC TYPES Bit: 1 bit has 2 states, 1 or 0 1 Byte = 8 bits, i.e. max(1 Byte) = (binary)11111111 = 255 Characters in the ASCII encoding can be encoded by 1 byte. In C, data type byte is in fact written as char Byte is the smallest unit of storage. Boolean (true/false) theoretically takes only 1 bit, but in reality it takes 1 Byte. How many Boolean states can you store using 1 byte?

  6. BASIC TYPES Integer: 32 bit, signed -216 + 1 ~ +216 - 1; unsigned +232 -1 Long integer: 64 bit. Float: 32 bit. 24bit for significand, the rest for the exponent. Float point numbers could lose precision, try this in perl: print 0.6/0.2-3; Correct way: sub round { my($n) = @_; return int($n + $n/abs($n*2)); } print round(0.6/0.2)-3;

  7. POINTERS / REFERENCE Pointers (or reference in other languages) are essentially an integer. This integer stores a memory address. This memory address refers to another variable. http://perldoc.perl.org/perlref.html

  8. COMPLEX TYPES Set: unordered values. Array (vector): a set of ordered values of the same basic type. Index starting from 0 in most langs, last index = length -1 Hash: key => value pairs. Key must be unique. Array can be thought of as a special Hash where key values are ordered, consecutive integers. String * : in C, a string is simply an array of characters. In many other languages, strings are treated as a basic type . Most algorithms for arrays also works for strings.

  9. COMPLEX TYPES Classes: objected-oriented programming A class packages related data of different datatypes, as well as algorithms associated with them into a nice blackbox for you to use. Objected-oriented programming.

  10. PERL PERL lumps all basic types as Scalar , $ PERL interpreter decides on what it looks like Convenient, but sometimes problematic, especially when you parse in a user-provided data file. Arrays, definition: @, reference $. Hash, definition: %, reference $ RegExp Handlers. use strict; PERL has an ugly grammar. PERL has many short-cuts, such as $_ DO NOT USE THEM!

  11. FLOW CONTROL for, foreach, while, unless, until, if elsif else http://perldoc.perl.org/perlsyn.html#Compound- Statements

  12. FUNCTIONS (SUBROUTINES) Traditionally, subroutines do not accept parameters Function is a better term, but b/c perl is ugly so it continues to use sub. sub functionname { my($param1, $param2) = @_; #get the parameters return xxxx. } Call: functionname($param1, $param2); I prefix all private functions with fn . But you don t need to do that. However, capitalize first letter of each word! Use Verb + Noun phrases as function names fnGetFileName(), fnDownloadPicture.

  13. HOW TO NAME VARIABLES Variable names should reflex their basic types. Descriptive names should be given, with each word capitalized I use the c-style prefix on them Type bool integer float string File Handler array hash constant prefix b n n/f s h arr arr ALLCAPS Exp. $bGenomeLoaded $nLen $fAlleleFreq $sInFile $hInFile $arrLoci $arrGeneID MAX_LINE

  14. 1. Start with the DNA sequence: ATGGAAATGGAGAGGCCTCTGCAAATGATGCCGGATTGTTTCAGACATATAGAAATGTCT, report its length and check if its length can be divided by 3, also check if it's a valid DNA sequence. If check fails, do not continue. 2. Translate it into Peptide sequences using universal codon table. 3. Display it on screen in the following format where DNA is on first line, translated amino acids aligns with the middle letter at each codon at the second line: 4. This DNA sequence goes through generation after generation of replication. 5. At each replication, it has a user-specified probability (0-1) of single-nucleotide mutation. This mutational probability is specified through the command line.

  15. 6. If mutation happens, 1 random letter in the DNA will be changed to A,T,C or G with equal probability. It's okay if the letter "changes" to the same letter. 7. Display at each generation the DNA and protein sequence as described in step 3, also display the generation. 8. Check if a stop codon has occured at each generation. If so the protein has lost its function, stop the evolution and output the generation at which the stop codon occurs. 9. This program should be able to deal with DNA sequence with upper or lowercase letters.

  16. Create a shell script called getdistr.sh 1. Run the simulation mutation.pl for 1000 times with mutational probabilities of 0.01, 0.1 and 0.5 respectively 2. Collect all DNA and protein sequence outputs to dist_$mutationprob.log 3. Collect the stopping generation at which stop codon first occurs in dist_$mutationprob.txt 4. Use R to plot dist_0.01.txt, dist_0.1.txt and dist_0.5.txt on a histogram (each parameter with different colors). X axis should be log10(Generation).

More Related Content