Have questions? Visit https://www.reddit.com/r/SNPedia


From SNPedia

Hi SNPedia team, are there standards or editing policies on reporting risks associated with SNPs? It appears alot of journal articles are using odds ratio (OR). However I noticed alot of SNPedia entries are using "increased risk" which implies relative risk (RR). Is RR the preferred value to use?

Thanks! Darknatsu (talk)

Practically speaking, we wind up using odds ratios because that's what is most commonly published, and, the OR and RR are about the same for the vast majority of conditions in SNPedia (since most are rare). However, because OR's overestimate risk for common diseases we tend to use terms like "increased risk" or even "slightly increased risk" in the absence of having an available RR to use/cite, and to avoid unnecessary worry (or confusion). Feel free to also check out the way we've defined OR and RR in our Glossary. Greg (talk) 05:49, 25 May 2017 (UTC)

While all of SNPedia welcomes your contributions, you should feel extra welcome to leave comments, ideas, and questions here.

mtDNA haplogroup and Y-chromosome haplogroup[edit]


After I ran my 23andme data using Promethease, I did not get any confirmation of the haplogroups. Did I miss anything?



SNPedia does not have any way of automatically classifying every haplogroup. It does however have some coverage of Haplogroup_I_(Y-DNA), Haplogroup_R_(Y-DNA) and Haplogroup_H_(mtDNA) since those are areas that early users have had interest in.
The 23andMe Haplogroup Tree Mutation Mapper gives most of the information needed in order to build a genoset for each haplogroup (Orientation information is still needed though), however, covering the entire tree and dealing with periodic changes in the tree is so much work that so far no one has done it comprehensively here in SNPedia ... anyone want to volunteer?
You should also be sure that you've looked through your Promethease report's 'Topics->Haplogroups' section.

online SNP communities?[edit]

hi there i am a uk -based female social scientist really interested in the online interaction around open source genomics, and whether new cybergenomics- related communities are being catalysed. i'd be really interested to know whether people (for example, the 55 people who have made their genomes public here) are using this site or similar sites to discuss the implications of their genomes, particular SNPs, or discussing the results of others. i can see that in various places theres chat as people reflect individually on their SNP results, discuss different results, using different bits of software; that people discuss what they think this 'means' for them in terms of their health- and at this point often enrol other sorts of talk eg what they feel- their views and values- , using analogies, making jokes, talking about the importance of open source as a "public good", etc. theres a bit of talk on this site and on others- wondering if facebook pages relating to specific SNPs are being set up, for example? and whow and why people combine bio, technical, and more social information to construct knowledge claims. this is really fascinating! apols if a bit waffly. wondering how this will all develop as a social cyber/bio hybrid phenomena. alex

See the response at Talk:Genomes 17:51, 8 December 2010 (UTC)

Promethease for exome data?[edit]

Hi everyone

I am new to this, but researching the possibilities following from the use of exome/genome sequencing at hospitals and in research, does anyone know whether open source software exists, or is under development (or is likely to be developed), that could be used to analyse such data?

Kind regards

Morten Andreasen Denmark

Promethease is already used to process full genomes, such as Complete Genomics and the Illumina Personal Genotyping Service. File formats which do not contain rs#s are not supported. Promethease does not work with variations which do not have an rs#, which is still true for many of the exome variations.

ICD-10 References[edit]

It would be great if the ICD-10 reference for a condition could be used instead of free-form. Links could also be made to http://apps.who.int/classifications/icd10/browse/2010/en for details.

Unfortunately, this would require interpretation, since very very few publications reporting associations between SNPs and a given medical condition also state specifically which condition (in either ICD-9 or ICD-10 form) is being studied. Furthermore, while ICD-10 is the standard for billing purposes, to have up to 68,000 codes referring to conditions would likely be way too complicated for the average SNPedia user, even if the authors of each publication managed to say which subset of the 68,000 codes was associated with certain SNPs.
We are not against the idea in principle, but in practice, we don't see (yet) how to sensibly use the ICD system given it's lack of use in the scientific literature, let alone by the lay public.

I did not realize when I wrote the post below about the TREML2 SNP (rs3747742) reducing Alzheimer's disease risk that this SNP was one of the suggestive SNPs found in the International Genomics of Alzheimer's Project (IGAP). The IGAP reported 11 new loci and identified 13 suggestive loci.

Below are two articles that review the genetics of Alzheimer's disease risk. The first table lists in part a, the 10 SNPs established to influence AD risk (previous to IGAP) and in part b, the 11 SNPs established by the IGAP. The second table also considers the results from the IGAP but includes 9 suggestive SNPs that were not included in the previous table. rs3747742 from the TREML2 gene is included among these suggestive SNPs.

There are some omissions and additions in these tables. In the first table in part a {previously established AD SNPs}, CD33 (SNP rs3865444) was omitted because it did not attain statistical significance in the IGAP, though it is generally thought to be an AD SNP. The recently discovered SNP (rs75932628) from gene TREM2 was added to part a of the table.

The only SNPs from table 1 in part b {SNPs established in IGAP} not included in table 2 are rs9271192 from the gene HLA-DRB5/HLA-DRB1 due to technical issues, and rs10838725 from the gene CELF1 due to quality control. The second table lists 9 of the 13 suggestive SNPs from the IGAP. Presumably, the 4 SNPs (rs72807343 (SQSTM1); rs2337406 (IGH@); and chr17:61,538,148 (ACE)); and rs10751667 (AP2A2) rejected in the second study are these suggestive SNPs.

For some unclear reason, rs8093731 from the gene DSG2 is claimed to have reached statistical significance in stage 1 of the IGAP in the second article referenced, though it is not noted in part b of table 1 above or on alzforum's discussion of the 11 new genes found in IGAP.

2 genes that were suggestive SNPs from the IGAP noted in the second table have been been validated. The SNP from the TREML2 gene (rs3747742) from the original post and in the second article from this post TRIP4 (rs74615166)

This is good, current overview of the genetics of AD.

Perhaps the existing Alzheimer page on SNPedia can remain, though a new one reflecting the current understanding of the genetics of AD as noted in this post could be added. It could be another "skin".

Below table is from: Medway, C. and Morgan, K. (2014), Review: The genetics of Alzheimer's disease; putting flesh on the bones. Neuropathology and Applied Neurobiology, 40: 97–105. doi: 10.1111/nan.12101

Table 1. Population attributable fraction (PAF) calculations for alleles associated with Alzheimer's diseaseSNP Gene MAF Location Major/minor alleles OR PAF

   The top table (a) documents established alleles whereas the bottom table (b) is for the newly identified genes. For each gene the documented SNP is the one achieving the greatest association in the IGAP publication [1]. The exception is TREM2 which is the first rare variant to be identified from next-generation sequencing efforts (SNP details taken from Guerreiro et al. [2]), and APOE where the odds ratio (OR) is calculated from in-house data sets (C. Medway and K. Morgan, unpublished). Combined PAF calculated according to equation reported by Naj et al. [3].

(a) rs4147929 ABCA7 0.16 Intronic G/A 1.15 (1.11–1.19) 2.8 rs6733839 BIN1 0.37 Intergenic C/T 1.22 (1.18–1.25) 8.1 rs10948363 CD2AP 0.26 Intronic A/G 1.10 (1.07–1.13) 2.3 rs9331896 CLU 0.40 Intronic T/C 0.86 (0.84–0.89) 5.3 rs6656401 CR1 0.19 Intronic G/A 1.18 (1.14–1.22) 3.7 rs11771145 EPHA1 0.35 Intergenic G/A 0.90 (0.88–0.93) 3.1 rs983392 MS4A6A 0.41 Intergenic A/G 0.90 (0.87–0.92) 4.2 rs10792832 PICALM 0.37 Intergenic G/A 0.87 (0.85–0.89) 5.3 rs429358 APOE4 0.12 Nonsynonymous T/C 4.89 (4.45–5.39) 27.3 rs75932628 TREM2 0.002 Nonsynonymous C/T 4.5 (1.7–11.9) 0.8

(rs3865444 CD33 0.24 G/T 0.89 (0.86,0.92) source alzgene.org)

(b) rs7274581 CASS4 0.08 Intronic T/C 0.88 (0.84–0.92) 1.1 rs10838725 CELF1 0.31 Intronic T/C 1.08 (1.05–1.11) 2.4 rs17125944 FERMT2 0.08 Intronic T/C 1.14 (1.09–1.19) 1.5 rs9271192 HLA 0.28 Intergenic A/C 1.11 (1.08–1.15) 3.2 rs35349669 INPP5D 0.46 Intronic C/T 1.08 (1.05–1.11) 4.6 rs190982 MEF2C 0.39 Intergenic A/G 0.93 (0.90–0.95) 2.7 rs2718058 NME8 0.37 Intergenic A/G 0.93 (0.90–0.95) 2.9 rs28834970 PTK2B 0.36 Intronic T/C 1.10 (1.08–1.13) 3.1 rs10498633 SLC24A4 0.21 Intronic G/T 0.91 (0.88–0.94) 1.5 rs11218343 SORL1 0.04 Intronic T/C 0.77 (0.72–0.82) 1.1 rs1476679 ZCWPW1 0.29 Intronic T/C 0.91 (0.89–0.94) 3.2

	 	 	 	 	                           Total PAF	61.8

Below table is from: Follow-up of loci from the International Genomics of Alzheimer's Disease Project identifies TRIP4 as a novel susceptibility gene Transl Psychiatry. Feb 2014; 4(2): e358. Published online Feb 4, 2014. doi: 10.1038/tp.2014.2

Table 1 Results for the 19 IGAP SNPs in the Fundació ACE data set SNP Chr. Base pair Maj/min allele Locus IGAP status OR (IGAP) MAF (IGAP) OR (F.ACE) MAF (F.ACE) P-value (F.ACE) OR (com) P-value (com) Het*

rs8093731 18 29088958 C/T DSG2 NL-ST 1 0.73 (0.62–0.86) 0.017 0.728 (0.486–1.090) 0.011 0.1217 0.7292 3.02 × 10−5 0.0006 rs28834970 8 27195121 T/C PTK2B NL-ST 1 1.10 (1.08–1.13) 0.366 0.975 (0.893–1.065) 0.372 0.571 1.0936 2.39 × 10−12 0.026 rs11218343 11 121435587 T/C SORL1 NL-ST 1 0.77 (0.72–0.82) 0.039 0.864 (0.678–1.099) 0.035 0.233 0.7757 6.91 × 10−15 0.6301 rs10498633 14 92926952 G/T SLC24A4 NL-ST 1 0.91 (0.88–0.94) 0.217 0.922 (0.827–1.028) 0.191 0.1418 0.9107 1.99 × 10−9 0.6781 rs7274581 20 55018260 T/C CASS4 NL-ST 2 0.88 (0.84–0.92) 0.083 1.017 (0.882–1.173) 0.098 0.8153 0.8888 1.75 × 10−7 0.1372 rs35349669 2 234068476 C/T INPP5D NL-ST 2 1.08 (1.05–1.11) 0.488 1.104 (1.014–1.203) 0.439 0.02314 1.0807 2.59 × 10−9 0.5807 rs2718058 7 37841534 A/G NME8 NL-ST 2 0.93 (0.90–0.95) 0.373 1.081 (0.992–1.178) 0.418 0.1201 0.9368 2.41 × 10−7 0.0044 rs190982 5 88223420 A/G MEF2C NL-ST 2 0.93 (0.90–0.95) 0.408 0.885 (0.811–0.966) 0.388 0.006285 0.9232 1.18 × 10−9 0.5718 rs17125944 14 53400629 T/C FERMT2 NL-ST 2 1.14 (1.09–1.19) 0.092 1.238 (1.036–1.478) 0.060 0.01851 1.1470 6.71 × 10−10 0.5585 rs1476679 7 100004446 T/C ZCWPW1 NL-ST 2 0.91 (0.89–0.94) 0.287 0.846 (0.769–0.932) 0.271 0.000655 0.9147 5.04 × 10−12 0.1727 rs9381040 6 41154650 C/T TREML2 SUG 0.93 (0.91–0.96) 0.297 0.991 (0.901–1.089) 0.277 0.8446 0.9365 1.30 × 10−6 0.2321 rs8035452 15 51040798 T/C SPPL2A SUG 0.93 (0.91–0.96) 0.339 1.102 (1.009–1.204) 0.362 0.03098 0.9455 1.99 × 10−5 0.001 rs7920721 10 11720308 A/G ECHDC3 SUG 1.07 (1.04–1.10) 0.387 1.049 (0.962–1.145) 0.395 0.2778 1.0696 1.68 × 10−7 0.8768 rs7818382 8 96054000 C/T NDUFAF6 SUG 1.07 (1.04–1.10) 0.469 1.003 (0.921–1.093) 0.455 0.9405 1.0657 2.48 × 10–7 0.3428 rs74615166 15 64725490 T/C TRIP4 SUG 1.29 (1.17–1.42) 0.02 1.519 (1.148–2.012) 0.023 0.003265 1.3102 9.74 × 10−9 0.1357 rs7295246 12 43967677 T/G ADAMST20 SUG 1.07 (1.04–1.10) 0.406 1.044 (0.958–1.139) 0.399 0.3253 1.0693 2.23 × 10−7 0.764 rs7225151 17 5137047 G/A SCIMP SUG 1.10 (1.06–1.15) 0.121 0.952 (0.839–1.081) 0.129 0.4475 1.0898 3.06 × 10−6 0.0751 rs6678275 1 193625233 G/C None SUG 1.09 (1.05–1.13) 0.169 0.948 (0.849–1.059) 0.180 0.3419 1.0775 4.21 × 10−6 0.0444 rs6448799 4 11630049 C/T HS3ST1 SUG 1.08 (1.05–1.11) 0.300 0.994 (0.905–1.091) 0.293 0.9006 1.0729 2.70 × 10−7 0.247

Four SNPs (rs72807343 (SQSTM1); rs9271192 (HLA-DRB5/HLA-DRB1); rs2337406 (IGH@); and chr17:61,538,148 (ACE)) were rejected during this phase due to technical problems.

SNPs rs10751667 (AP2A2) and rs10838725 (CELF1) failed quality control

Abbreviations: Chr, chromosome; F.ACE, Fundació ACE data set; IGAP, International Genomics of Alzheimer's Disease Project; MAF, minor allele frequency; Maj/Min allele: major and minor allele; NL-ST 1, new locus in stage 1 of IGAP study; NL-ST 2, new locus in stage 2 of IGAP study; OR, odds ratio; SNP, single nucleotide polymorphism; SUG, suggestive locus. Het*: P-value Brelow-day test.

TREML2 rs3747742 minor allele(G) reduces Alzheimer disease risk[edit]

Recently TREM2 variants (especially R47H) were found to substantially increase Alzheimer disease risk. A genetic variant (minor allele (G) of rs3747742) in TREML2 has now been shown to decrease the risk of Alzheimer's disease.

B.A. Benitez et al. / Neurobiology of Aging 35 (2014) 1510.e19 e1510.e26 Missense variant in TREML2 protects against Alzheimer’s disease

"support the role of the TREML2 coding missense variant p.S144G (rs3747742) as a potential driver of the meta-analysis AD-associated genome-wide association studies signal. Additionally, we demonstrate that the protective role of TREML2 in AD is independent of the role of TREM2 gene as a risk factor for AD"

"...at least 2 genes in this gene cluster influence risk for AD: TREM2-p.R47H is associated with increased risk for AD(OR=1.91, CI=1.85-1.97) and TREML2-p.S144G is associated with reduced risk for AD (OR=0.91; CI=0.86-0.97)."

rs3747742 (p=8.66 x 10-5; OR=0.93, CI=0.89-0.96) reduce risk for AD Minor allele (G) reduces risk AG + GG versus AA OR=0.93 p=8.66 x 10 e-5 {It is not completely clear whether or not the comparison is between AG + GG or AG versus AA.}

"rs9381040 (p= 4.11 x 10-4, beta=0.02) and rs3747742 (p=1.4 x 10-4, beta=0.02) both exhibit a strong association with CSF ptau levels."

Seems like a fine addition. Would you like to add the pages/genotypes/info/etc, or are you suggesting we do it? (which we can; once added we'll eventually delete this part of the Talk section)

Well, I am having some trouble figuring out how to wiki. I suppose this would be a good time to read the manual. I have no idea how my comments landed up here and not on the main talk page.

As I noted in my reply post to my initial post, I would like to create a new snpedia page that gives a current view on Alzheimer genetics. I am worried, though, that such an effort might simply be erased through editing. I think it would make sense to have multiple Alzheimer pages that have different perspectives. The perspective I want to add is the current (as of April 2014) understanding of Alzheimer genetics that has been validated by the International Genomics of Alzheimer's Project (IGAP) and other research.

The current Alzheimer page seems too speculative. Some people might like a speculative take on AD genetics, yet those who wanted a more scientifically conservative view could go to the new snpedia page I want to create. Should I just go ahead?

https://en.wikipedia.org/wiki/Wikipedia:Be_bold --- cariaso 02:52, 2 April 2014 (UTC)

p.s. It would be very sensible if Pubmed automated SNP annotation. Great efforts are made in scientific publishing to standardize the organization of journal articles so that readers could readily extract relevant information. WHY hasn't a similar effort been made in formatting GWAS information, so that annotation (as done by SNPedia) would be automated? Perhaps it will take a petition, a march, or a Crowdfunding campaign for sanity to prevail.

Only government could spend billions of dollars on genetic research without providing their customers (i.e. taxpayers) with the results in a readily usable form. This same line of argument was successfully used to open access to government funded scientific journal articles. It would be interesting to see how long it took to achieve compliance, if funding were dependent upon such compliance.

The model to follow would be that of sequence submission to the INSDC. The key is that journals would enforce submission (i.e. annotation) requirements prior to allowing publication, most likely connected to a legal framework mandating researchers to deposit data generated by publicly funded research into public repositories able to generate unique and sufficiently stable accession identifiers.

p.p.s. It would be great if Promethease included an inputation service. (It is not easy for novice users to work through Java command line functions!)

Genotype imputation is definitely better since ~2013 thanks to the 1000 Genomes Project data, but even with accurate ethnic typing (which all on it's own could be problematic) plus limiting it to a small number of qualified yet commonly used microarray platforms, it couldn't routinely achieve better than 85 to 95% accuracy - and sometimes much less. To put it another way, the imputed genotype would be wrong 5 to 15% of the time at best. While good enough for GWAS studies, this is quite problematic for individuals and their health care providers as they study their personal Promethease reports. Frankly, we suspect that exome and whole genome sequencing will be cheap enough to be put into widespread use before too long.
Nonetheless, in the meantime, we are prepared to create genosets for the most important SNPs requiring imputation for a common, high capacity microarray platforms. Any SNPedian wishing to suggest such a SNP is encouraged to email their suggestioned to us at info@snpedia.com; if we can impute it at >98% better accuracy we will.

   Would it possible to impute all possible SNPs and then select those SNPs that achieved some threshold (e.g. 95 or 98%)? Often when investigating genetic studies, one finds that the reported SNP is not available on one's gene chip. There are a large number of SNPs that could be so imputed with current gene chips. Such an imputation provided by a Promethease report would be enormously helpful.