
Stata Data Analysis Techniques and Commands
Learn how to use Stata for data analysis with commands like log files, do files, if/in statements, color-coded variables, Tobit models, and recoding. Explore handy tips and tricks for efficient data manipulation and interpretation.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Quick review Th m chi nt i .pwd D:\Stata12_WinX86_x64\Stata12_WinX86_x64 Thay ith m c l m vi c ngd n c d u c ch, th m . cd "E:\TAI LIEU\Stata" E:\TAI LIEU\Stata ngd n kh ng c d u c ch, kh ng c n th m . cd E:\ E:\
log file L ut tc k tqu tr n m n h nh log using filename [, append replace [text|smcl] Ch l u c c c u l nh tr n m n h nh cmdlog using filename [, append replace] . cd "E:\TAI LIEU\Stata\Buoi 3" E:\TAI LIEU\Stata\Buoi 3 . log using test.doc, text ng (m ) log (cmdlog) file t mth i: log (cmdlog) on (off) ng log (cmdlog) file: log (cmdlog) close
do file File d li ug c: zipped l iho c ch Read- only Dofile n n t ch ri ng ph n ph n t ch v ph n l m s chs li u. V d : ADB-cleandata.do; ADB- analysis.do
do file M i dofile c 1 log file ri ng, n n t n gi ng nhau Ho cm t log file chung . clear . log using ADB.doc, text . do ADB-clean . do ADB-merge . do ADB-regressions . log close T o dofile Copy c c l nh v o dofile editor D ng cmdlog
If/in L nh if ng cu i c u l nh, tr cd u , L nh in ng cu i c u l nh, tr cd u , in 4/5: quan s t t 4 n 5 in f/4: 4 quan s t u ti n in -4/l: 4 quan s t cu i c ng
STATA color-coded var1: bi n numeric c d n nh n (m u xanh) var2: bi n string (m u ) var3: bi n numeric (m u en) var4: bi n string (m u )
Tobinmi . gen [newvar]= [expression]
Tobinmi . gen newvar = (var1==1 & var2==1) . gen newvar = (var1==1 & var2<26)
Tobinmi . tab var1, gen (var2)
recode . recode rep77 rep78 (1 2 = 1 "Below average") /// (3 = 2 Average) /// (4 5 = 3 "Above average") /// (nonmissing= 9 No ) /// (missing=99 Missing ), /// pre(new) label(newrep) (else=9 No ) Nh n gi tr m i T obi nm i: newrep77, newrep78. T ng ngv i gen (newrep77 newrep78)
replace . replace oldvar =exp [if] [in]
Nhn data, ch thch label data Text notes: text notes tenbien: text
i tn bin, nhn bin i t n bi n rename old new, [options] rename (old1 old2 ...) (new1 new2 ...), [options] rename old1 old2 ..., {UPPER|lower|Proper}[options] options: renumber, renumber (#) addnumber, addnumber (#)
i tn bin jana1 jana2 jana3 rename jan* *1 jana11 jana21 jana31 jana1 jana2 jana3 rename jan* * a1 a2 a3 jana1 jana2 jana3 rename * *jan jana1jan jana2jan jana3jan
Nhn bin label variable [varname] Text
Nhn gi trcabin B c 1: T o nh n cho c c gi tr label define label1 1 Dong y /// 2 Khong dong y /// 3 Khong biet B c 2: D n nh n gi tr cho c c bi n label values var1 label1 label values var2 label1 ho c label values var1 var2 label1 B c 3: Thay i nh n gi tr label define label1 4 Tu choi tra loi , add label define label1 4 Khong tra loi , modify
Binhthng _N: t ngs quan s t _n: s th t c a quan s t T obi nh th ng theo nh m by major: gen idmajor=_n T obi n lagged, forward gen lag1_year=year[_n-1] gen for1_year=year[_n+1] gen lag2_year=year[_n-2] gen for2_year=year[_n+2]
Gi tr missing C 27 gi tr missing c th c trong Stata, m c nh gi tr missing l d uch m(.) C n l i l 26 gi tr missing tr ng v i 26 k t trong b ngch c i nh ng c d uch m ngtr c(.a, .b, .c, ..). C c gi tr missing trong Stata c coi nh c c s v c ng l n.
Gi tr missing V d : ch ng ta mu n t nh t nh summarize (m t th ng k ) v id li u auto, ta t nh mean c abi n price, theo rep78. sum price if rep78>3 K tqu b ng 1 sum price if rep78>3 & rep78 <. K tqu b ng 2 Variable price Obs 34 Mean 6073 Std. Dev. 2315.435 Min 3748 Max 12990 Variable price Obs 29 Mean 6011.38 2055.312 3748 Std. Dev. Min Max 11995 sum price if rep78>3 & !missing(rep78) sum price if rep78>3 & missing(rep78)==0
Gi tr missing Chuy n missing sang d ngs v ng cl i . mvencode varlist [if] [in], mv(#|mvc=# [\ mvc=#...] [\ else=#]) . mvdecode varlist [if] [in], mv(numlist | numlist=mvc [\ numlist=mvc...]) valid_numbers< . < .a < ... < .z.
Mts p kiudliu H m int(), float(), string(). ch ng ta mu n p t ki ud li u n y sang ki ud li u kh c V y n kh c g v i c c h m chuy nki unh destring() v tostring()? V d : display int(3.45) k tqu s l 3 Kho ngbi n: var1-var5 (var1 var2 var3 var4 var5). B nc ng c th d ng c c k t ?, * thay th cho c c k t ; v d : var* - s t m t tc c c bi nb t u b ng var.
Hm nhdng - format C ph p: format varlist %fmt V i %fmt: %w.df: w l chi u d i c as , d l s ch s sau ph nth p ph n v d : 1.5235 n u nhd ng %8.2f 1.52 %w.0g: w chi u d i c as int %8.0g byte %8.0g long %12.0g float %9.0g double %10.0g str# %#s
M tthng k C c l nh: summarize, tabulate, tabstat,tab1, tab2. M t th ng k v i c c bi n li n t c. B ngt nsu t v b ngt ng quan quan 2 chi u(cross-tabulation). Export d li u
Frequencies tabulate varname [if] [in] [,options] options: missing: t nh missing nh c c gi tr kh c nofreq: kh ng hi nth t ns nolabel: kh ng hi nth nh n gi tr sort: s px pb ng theo t ns gi md n
Frequencies and descriptive statistics table rowvar [colvar [supercolvar]] [if] [in] , options] options: contents (freq mean sd min max median) format (help table)
Frequencies and crosstabulations tab var1 var2, sum (var3)
Crosstabs tabulate varname1 varname2 [if] [in] [, options] tab2 varlist [if] [in] [weight] [, options] Options: col row cell nofreq missing nolabel
Descriptive statistics tabstat varlist [if] [in] [, options] Options: by (varname) stat (mean min max median sd) col (var): bi n c c c t (default) col (stat): th ng k c c c t nototal missing
Descriptive statistics tabstat varlist [if] [in] [, options]
Three way crosstabs bysort var3: tab var1 var2, col row
Three way crosstabs bysort var3: tab var1 var2, sum (var4)
Collapse collapse (stat1) var1 (stat2) var2 (stat3) newvar1=var1 newvar2=var2, by(varlist)
Quick review Command summarize y1 y2 y3 summarize y1 y2 y3, detail summarize y1if x1> 3 & !missing(x2) tabstat y1, stats(mean sd n) tabstat y1, stats(min p50 max) by(x1) tabulate x1 tabulate x1, sort miss tab1 x1 x2 x3 x4 tabulate x1 x2
Quick review Command tabulate x1 x2, column tabulate x1 x2, missing row all tab2 x1 x2 x3 x4 tabulate x1, sum(y) tabulate x1 x2, sum(y) means by x3, sort: tabulate x1 x2 table y x2 x3, by(x4 x5) contents(freq) table x1 x2, contents(mean y1 median y2)