* Authors note: The data is created from a substantive number of different registers with different kinds of demographics * information on the Swedish population. It is available in a secure server at Statistics Sweden without access to the * internet, accessed through a remote access environment. This data is stored on an SQL server from which * we retrieve information, often from multiple registers, and save in intermediary Stata files used to * build the single data-set used for our analyses. Some variables names are in Swedish as this is the * source language of our data. All our analyses are done in Stata 16.0. "lopnr" is the random * number series, originally derived from the Swedish unique personal identification numbers, used to * uniquely identify individuals in our sample across different registers. * The present file prepares the data for survival analysis in Stata (using the st command family) * Specifically it defines the start and end time for each individual in the data and whether * each observations ends with a death or right censoring capture log close _all clear * This files does the stsetting of the data * Define some variables local today=subinstr("`c(current_date)'"," ","",.) local time=subinstr("`c(current_time)'",":","",.) * stata version compatibility version 16.0 * log file number local vnum="1_07" * Syntax name local syntax "Stset1" * Project name local project "C19a" * Open log file log using "$logfilepath/StataLog_AW_`project'_`syntax'_`vnum'_`today'_`time'", replace text name(`syntax') local logname=r(name) timer clear 1 * Change Log ******************************** * 1.04: Changed entry age to 20 (first adult age at death) * 1.05: Added descriptives * 1:06: Changed entry date to March 12 to not have biased estimates for all other causes of death * 1.07: Changed entry age to the first age of death in data use "$datafilepath/AW_`project'_Merge02" * use "$datafilepath/AW_`project'_Merge02_Sample" local lfy=2020 local lly=2020 local lar=20 * Summarize variables in the data sum lopnr kon birthcountry tpop_oldnew lan2019 civil2019 dispink2018 sun2018 ddate ddatecod bdate covid * Codebook relevant variables codebook lopnr kon birthcountry tpop_oldnew lan2019 civil2019 dispink2018 sun2018 ddate ddatecod bdate covid * Clean data to reduce size * Drop those who have missing info on birth date drop if bdate==. * Drop those older than 105 years at the beginning of study drop if bdate<=mdy(1,1,1915) * Drop those below age lar at May 7, 2020 drop if bdate>=mdy(5,7,2020-`lar') * Drop those who are not in the population at the end of 2019 (RTB) local r=`lfy'-1 drop if civil`r'=="" & lan`r'==. * Prepare for stsetting * Generate event indicator * Death data comes from different sources (COVID vs non-COVID) gen dead=. replace dead=0 if ddatecod==. & ddate==. replace dead=1 if ddate!=. & ddate<=mdy(5,7,`lly') replace dead=1 if ddatecod!=. & ddatecod<=mdy(5,7,`lly') * Generate helping variables * Date at entry age (lar) gen datelar=mdy(month(bdate), day(bdate), year(bdate) + `lar') * Date of entry date (first year lfy) gen datelfy=mdy(3,12,`lfy') * Generate end date for each observation gen enddto=. replace enddto=mdy(5,7,`lly') if dead==0 // Last observation date replace enddto=ddate if dead==1 & ddate!=. // Death date replace enddto=ddatecod if dead==1 & ddatecod!=. // Death date covid replace enddto=. if enddtodatelfy // Entry date is 21st birthday * Variable formatting format entrydate %td format enddto %td format datelar %td format datelfy %td * Replace both entry and exit with missing if one of them is replace entrydate=. if enddto==. replace enddto=. if entrydate==. * Then those who are missing can be remove from the data drop if enddto==. * All people where the observation ends before the entrydate can be dropped from data drop if enddto