About Stata
The Stata user interface
How to communicate with Stata
Working process for data analyses

Get an overview of the whole dataset
Explore a categorical variable
Explore two categorical variables simultaneously ("cross tabulating two variables")
Explore a continuous variable
Explore a continuous variable under a subgroup of a categorical variable
About do file

Data manegement

Rename a variable name
Operators in Stata
Missing values in Stata
Genearate a new variable
Categorize a continous variable as a categorical varialble
Recode a categorical variable
Drop variables
Drop observations
Variable label
Value label for a categorical variable

Get help from Stata
Tips and tricks
A list of Stata command

Preface

This document is dedicated to users who has little experience on Stata. However, I assumed that you have already downloaded and installed Stata into your computer. A general recommendation for using this document is that one should read and try examples sequentially from the beginning of the document.

There are certainly mistakes left in this document. Be careful and if you fnd a mistake, drop me a message feedBack@medical-statistics.dk.

Before you start, please make sure you have a big cup of coffee and perhaps some good music. If so, we are ready to start our journey.

Introduction

About Stata

What is Stata?

``Stata is a complete, integrated software package that provides all of your data science needs—data manipulation, visualization, statistics, and reproducible reporting.''

Why should we learn Stata?

Why should we learn Stata on the top of many things that we have already learned including for example Excel?

Based on my own experiences, there is a big diference between Stata and Excel. It is perfectly ok to use Excel to, for example, make tables and some nice figures. Again, based on my personal experiences, Stata can do many cool and complicated things that Excel either may not able to do or may be not as easy as to do comparing to Stata, especially in terms of doing statistical analyses. You will probably experience the same along the way.

The Stata user interface

Figure 1: Stata user interface for Windows (The following page can be downloaded)

Figure 2: Stata user interface for Mac (The following page can be downloaded)

Stata interface starts with

Menu
Toolbar

Stata consists of 5 windows:

Results window
Command window
History window
Variables window
Propoties window

do the follwong three things

Find where the five windows are
Move your cursor and click on each button of the menu, and a drop-down menu will appear. Skim through the drop-down menu. Pay particular attention to two buttons: File, Help
Move and hold your cursor from left to right over each button of the toolbar for a moment, and a description of that button will appear. Read the descriptions.
In the toolbar, find where Do-file Editor is?

How to communicate with Stata?

Clicking on menu
Typing commands in Command window
Typing commands in do-file editor

You may type one command in Command window and execute the command, type another command in Command window and execute the command, and continue until to the end. This is how it works when we type commands in Command window.
You may leave a note to Stata. The note can be very short or can be very long but the main point is that the secretary do things according to exactly what you have written. The note contains a list of commands and let Stata execute all of the commands that you have written in the note. This note in Stata is called do-file containing a collection of commands that you ask Stata to do things for you.

Furthermore, commands in do-file can be saved and can be re-used later. The do-file also can serve as a document recording everything you have done. However, commands excuted via Command window will be disappeared as soon as you shut down Stata.

Working process for data analyses

Figure 3: The data working process

The data working process can be simplified into four steps:

Open the dataset
Investigate the dataset
Data management or data manipulation
Date analyses

Based on the above working process, this document is organized as the following topics:

Opening a dataset
Investigating the contents of the dataset
Data management
Graphics
Data analysis

However, we will serve graphics and data analysis for another part to avoid put too much burden on you.

Before you continue, please make sure you fill up your coffee. If so, we are ready to move forward.

Open dataset

Stata classifies any dataset into two categories

Internal dataset: a dataset is created/generated by Stata and the extension is .dta
External dataset: a dataset is not created/generated by Stata, for example Excel Spreadsheet and the extension could be .xls, .xlsx

Download the dataset for practice: it is very important to know where the dataset is located.

Open an internal dataset

Start your Stata
In Stata, click on the fold symbol in the most left of the toolbar
Find the dataset (fake_birthcohort.dta)
Click on Open

Import an external dataset

Start your Stata
In Stata, click on the File in the most left of the menu
Find and click on Import
Click on the Excel spreadsheet and Dialog box will pop up
Insdide the Dialog box, click on Browse and find the dataset (fake_bcohortExcel.xls)
Select "Import first row as variable names"
Click on Ok

Explore the dataset

Whenever open a dataset, it is very important to explore the data thoroughly before doing any analysis.

Open the internal dataset

At this moment, I assume that you should be able to open the internal dataset (fake_birthcohort.dta). Please open the data now (hint: see the above "open an internal dataset").

Get an overview of the whole dataset

Describe the whole dataset

Stata command: describe
Example

In the Command Window type: describe
Press Enter
Figure 4: Stata output

Interpretation

describe produces a summary of the dataset, including

Number of observations: obs=29
Number of variables: vars=7
Variable names, such as id, smoking, etc.
Variable labels, which provide further information on the variables. For example, smoking means "maternal smoking during pregnancy"

Command syntax: describe [varlist]

Browse the whole dataset

Stata command: browse
Example

Move and hold your cursor from left to right over each button of the toolbar for a moment, until you find the button Data Editor (Browse).
Click on Data Editor (Browse)
Figure 5: Stata output

Interpretation

browse produces a Excel-spreadsheet-style data.

The main area in the left shows the whole dataset
The first horizontal line shows all variables. In the top right, you may select/filter the variables by click on the small box before the names of the variabls
Each vertical line is the values for a single variable.
Each horizontal line is the values for variable-values associated with a single individual observation.

Explore a categorical variable

Stata command: tab1
Example:
1. in the Command Window type: tab1 smoking
2. Press Enter
3. Figure 6: Stata output
4. Interpretation
Command syntax: tab1 [varlist]

Explore two categorical variables simultaneously ("cross tabulating two variables")

Stata command: tab2
Example:
1. in the Command Window type: tab2 smoking coffee
2. Press Enter
3. Figure 7: Stata output
4. Interpretation
Command syntax: tab2 varname1 varname2

Explore a continuous variable

Stata command: summarize
Example:
1. in the Command Window type: summarize weight height
2. Press Enter
3. Figure 8: Stata output
4. Interpretation
Command syntax: summarize [varlist]

Explore a continuous variable under a subgroup of a categorical variable

Stata command: summarize, if
Example:
1. in the Command Window type: summarize birthweight if smoking==0
2. Make sure that you have typed a double == instead of a single =
3. Press Enter
4. in the Command Window type: summarize birthweight if smoking==1
5. Press Enter
6. Figure 9: Stata output
7. Interpretation

About do file

Do-file Editor

Open a new Do-file Editor: In the toolbar, click on Do-file Editor
Explore the Do-file Editor: click on menus and go through toolbar
Find two buttons: save and Excute (do) (hint: see the following figures)

Figure 10: Do-file Editor for PC

Figure 11: Do-file Editor for Mac

Explore the dataset via Do-file

Download the Stata Do-file:
In the Do-file Editor, click on the first button to Open the Do-file (stataIntro.do)
The texts in green inside the Do.file are the notes/explanation texts.
The texts in blue inside the Do.file are the stata commands
Highlight the first Stata command in blue and click on the Excute (do), and observe what appear in the Results Window
Repeat the above step

Data manegement

Import the external data

At this moment, I assume that you are able to import an external dataset. Furthermore, I assume that you are able to explore the dataset.

Please import the data: fake_bcohortExcel.xls (hint: see the above "import an external dataset").
Explore the dataset
The dataset logbook

Current names	Desired names	Value labels	Variable labels
var1	id		ID number for each child in the cohort
var2	smoking	0: no, 1. yes	Maternal smoking status during pregnancy
var3	coffee	0: no, 1. yes	Maternal coffee drinking during prengnancy
var4	weight		Maternal weight (km) at the beginning of the pregnancy
var5	height		Maternal height (cm) at the begging of the pregnancy
var6	gender	0: boy, 1. girl	Gender for the child
var7	birthweight		Birthweight for the child (gramme)

Data management via Do-file

Download the Do-file:
Open a new Do-file
Open the downloaded Do-file:

Rename a variable name

Background

It is not uncommon to rename variables to make the variables being more readable and being more understandable. For example rename var7 as birthweight, one can immediately understand when the variable means.

Stata command: rename
Example: try out the example in the downloaded Do-file or copy, paste, and run the following command lines into a new Do-file Editor

describe
rename var1 id
rename var2 smoking
rename var3 coffee
rename var4 weight
rename var5 height
rename var6 gender
rename var7 birthweight
describe

Execute the command lines: use the computer cursor to highlight the above command lines and click on the Do button (Do button or Execute(do) in Windows see the above Figure 10 and in Mac see the above figure 11)
Compare the output from the command describe in the beginning and at the end

Operators in Stata

Arithmetic	Logical	Relational
+ addition	& and	> greater than
- subtraction	\| or	< less than
* multiplication	! not	>= > or equal
/ division	~ not	<= < or equal
^ power		== equal
- negation		!= not equal
= equal		~= not equal

Missing values in Stata

Background:
It is not uncommon, that there are missing values for a given dataset. Stata handles missing values in several ways. For now, we focus on one of the most common one. For a numerical variable, period "." means the value is missing. It is necessary and important to handle missing values whenever doing data cleaning and data analysis.
Example: try out the example in the downloaded Do-file or copy, paste, and run the following command lines into a new Do-file Editor

summarize

Execute the command lines: use the computer cursor to highlight the above command lines and click on the Do button
Go through the frequency number under the "Obs" and think about why 27 for coffee and height while 29 for any other variables?

browse

Execute the command lines: use the computer cursor to highlight the above command lines and click on the Do button
Go through the two variables (coffee and height) and identify where the missing values occur?

tab coffee
count if coffee > 0

Execute the command lines: use the computer cursor to highlight the above command lines and click on the Do button
Why is the frequency number 12 after the "tab coffee" while is the frequency number 14 after the "count if coffee>0" (*hint: in Stata, numeric missing value "." is the largest positive value and of course the any missing value is bigeger than any numertical number)

Genearate a new variable

Background:
It is so often that we generate a new variable based on the existing variables, for example, we generate a variable BMI (body mass index) based on height and weight.
Stata command: generate
Example: try out the example in the downloaded Do-file or copy, paste, and run the following command lines into a new Do-file Editor

generate bmi=weight/(height/100)^2
summarize bmi

Execute the command lines: use the computer cursor to highlight the above command lines and click on the Do button

Categorize a continous variable as a categorical varialble

Background:
It is so ofen that we categorize a continous variable as a categorical variable, for example, we categorize a variable BMI as a categorical variable based on the WHO standard categorization.
Stata command: generate, replace, if
Example: try out the example in the downloaded Do-file or copy, paste, and run the following command lines into a new Do-file Editor

generate bmi_g3=.
replace bmi_g3=1 if bmi<18.5
replace bmi_g3=2 if bmi>=18.5 & bmi<25
replace bmi_g3=3 if bmi>=25.0 & bmi!=.
tab1 bmi_g3

Execute the command lines: use the computer cursor to highlight the above command lines and click on the Do button

Recode a categorical variable

Background:
It is sometimes that we would like to recategorize a categorical variable using different categories.
Stata command: generate, replace, if
Example: try out the example in the downloaded Do-file or copy, paste, and run the following command lines into a new Do-file Editor

generate bmi_g2=.
replace bmi_g2=0 if bmi_g3==1 | bmi_g3==2
replace bmi_g2=1 if bmi_g3==3
tab2 bmi_g2 bmi_g3

Execute the command lines: use the computer cursor to highlight the above command lines and click on the Do button

Drop variables

Background:
It is sometimes that a dataset have too many variables that are not necessary. In this case, it is recommended to drop the variables, otherwise it could be very disturbing.
Stata command: drop
Example: try out the example in the downloaded Do-file or copy, paste, and run the following command lines into a new Do-file Editor

drop weight height

Execute the command lines: use the computer cursor to highlight the above command lines and click on the Do button

Drop observations

Background:
It is sometimes to drop/remove observations due to errors, outliers, missing values, etc.
Stata command: drop and if
Example: try out the example in the downloaded Do-file or copy, paste, and run the following command lines into a new Do-file Editor

drop if weight==.
drop if coffee==.

Execute the command lines: use the computer cursor to highlight the above command lines and click on the Do button

Variable label

Background:
Stata command: label variable
Example: try out the example in the downloaded Do-file or copy, paste, and run the following command lines into a new Do-file Editor

describe
label variable id "ID number for each child in the cohort"
label variable smoking "Maternal smoking status during pregnancy"
label variable coffee "Maternal coffee drinking during prengnancy"
label variable weight "Maternal weight (kilogram) at the beginning of the pregnancy"
label variable height "Maternal height (centimeter) at the begging of the pregnancy"
label variable gender "Gender for the child"
label variable birthweight "Birthweight for the child"
describe

Execute the command lines: use the computer cursor to highlight the above command lines and click on the Do button
Compare the output from the command describe in the beginning and at the end

Value label for a categorical variable

Background:
When we have categorical variables, it is difficult to remember what the values mean? for example, given a categorical variable gender coded as 0 and 1, it is difficult to judge whether 0 is boy/male or 1 is boy/male. Therefore, it is crutial to lable the values of categorical variables.
Stata command: label define and label value
Example: try out the example in the downloaded Do-file or copy, paste, and run the following command lines into a new Do-file Editor

tab1 smoking coffee gender

Step 1: define "label-name together with text explaining the values's meaning" for variables

lab define labForSmoking 1 "Yes" 0 "No"
lab define labForCoffee 1 "Yes" 0 "No"
lab define labForGender 0 "boy" 1 "girl"

Step 2: make connections between the exsiting variable-name and the label-name

lab value smoking labForSmoking
lab value coffee labForCoffee
lab value gender labForGender

tab1 smoking coffee gender

Execute the command lines: use the computer cursor to highlight the above command lines and click on the Do button
Compare the output from the command tab1 in the beginning and at the end

Get help from Stata

Review Window (renamed as "History" in Stata 16)

google

At the end of the Stata menu, click on the help. Skim through the drop-down menu
Geting help for a particular Stata command, in the Command Window, type: help commandname
Click on Stata Youtube Channel

Tips and tricks

Review Window (renamed as "History" in Stata 16)

Copy a past command from the Review-Window (History) to Command-Window: single-click on the past command in the Review-Window
Re-run a past command in the Review-Window: double-click on the past commad in in the Review-Window
Copy past commands to Do-file editor: Select the commands in the Review-Window and right-click, and further click on "Send selected to Do-File Editor"

Variable Window

Copy a single variable from the Variable-Window to Command-Window: double-click on the variable in the Variable-Window
Copy several variables from the Variable-Window to Command-Window: select the variables in the Variable-Window and right-click on the variables, and further click on "Send varlist to Command Window"

A list of Stata command

The description of the command	Name of command	Example
	describe
	browse
	tab1
	tab2
	summarize
	summarize, if
	rename
	generate
	replace, if
	label variable
	lab define, lab value
	drop
	drop, if

Medical statistics and Data Science: Statistics

How to get started with Stata

News:

Contents

Preface

Introduction

About Stata

What is Stata?

Why should we learn Stata?

The Stata user interface

How to communicate with Stata?

Working process for data analyses

Open dataset

Explore the dataset

Open the internal dataset

Get an overview of the whole dataset

Explore a categorical variable

Explore two categorical variables simultaneously ("cross tabulating two variables")

Explore a continuous variable

Explore a continuous variable under a subgroup of a categorical variable

About do file

Data manegement

Import the external data

Data management via Do-file

Rename a variable name

Operators in Stata

Missing values in Stata

Genearate a new variable

Categorize a continous variable as a categorical varialble

Recode a categorical variable

Drop variables

Drop observations

Variable label

Value label for a categorical variable

Get help from Stata

Tips and tricks

A list of Stata command