Simulate genetic data, including genotypes, phenotype status and liabilities, for individuals.
sim_no_family(n, m, q, hsq, k, path)
n | number of genotypes (individuals). |
---|---|
m | number of SNPS per genotype. |
q | number of causal SNPs, i.e. SNPs that effect chances of having the phenotype. |
hsq | squared heritability parameter. |
k | prevalence of phenotype. |
path | directory where the files will be stored. If nothing is
specified, |
Does not return any value, but prints the following five files to
the path
parameter specified in the function call:
Three text files:
beta.txt - a file of m
rows with one column. The i'th row is
the true effect of the i'th SNP.
MAFs.txt - a file of m
rows with one column. The i'th row is
the true Minor Allelle Frequency of the i'th SNP.
phenotypes.txt - a file of n
rows. The file contains the
phenotype status and liability of each individual.
genotypes.map - a file created such that PLINK will work with the genotype data.
genotypes.ped - the simulated genotypes in a PLINK-readable format.
As this function does not include family history, its resulting data cannot
be used by assign_ltfh_phenotype()
or
assign_GWAX_phenotype()
.
For the methodology behind the simulation, see
vignette("liability-distribution")
.
sim_no_family()
makes use of parallel computation in order to
decrease the running time. As at least one CPU core is left unused, the user
should be able to do other work while the simulation is running.
Simulating large datasets takes time and generates large files. For details
on time complexity and required disk space, see
vignette("sim-benchmarks")
.
The largest file generated is genotypes.ped
. See convert_geno_file()
to convert it
to another file format, thereby reducing its size significantly.