View complete transcript
Adaptive Long-Read Sequencing Reveals GGC Repeat Expansion in ZFHX3 Associated with Spinocerebellar Ataxia Type 4. Dr. Chen is a neurologist and research fellow on the UCL and Professor Houlden is a neurology professor in the Department of Neuromuscular Disease at University College of London.
Thank you both for your [00:01:00] willingness to participate and welcome to the MDS Podcast.
[00:01:03] Dr. Zhongbo Chen: Thank you for inviting us.
[00:01:04] Prof. Henry Houlden: Thank you.
[00:01:05] Dr. Sarah Camargos: Thank you for your time. I would like to go back to the description of SCA4 in 1994 and Utah family. How did you come across this family and why do you think it took so long to describe the gene?
[00:01:20] Prof. Henry Houlden: So thank you. That's a very important opening question, Sarah. So this was all before Zhongbo and I had even qualified from medical school, but Louis J. Ptácek, who's an eminent neurologist in Utah, was clinically practicing during neurogenetics, and he saw this huge Utah ataxia family. And he followed them for many years and he's a very good neurologist, very smart.
He took their blood, DNA. In those days, he did linkage analysis, something that you won't remember, Sarah, because it's before your days. But [00:02:00] older folk like me know about linkage, and he did linkage, and he linked the gene, but they couldn't find the gene. So linked it to a relatively small region was a sort of a couple of megabases, but couldn't localize it and they sequenced all of the genes in that region, but due to the technology at that time, they didn't pick up the repeat expansion.
So everything was sequenced and they thought, well, what do we do here? And then what happened is Lewis moved to San Francisco and his research then looked more at channels, headache, sleep disorders. And so this family was not lost, just forgotten. And until technology really caught up and that's where we came in and that's where Louis has been a friend for a number of years and we chatted and with Zhongbo's help as the engine behind the study we started to work on the family.
[00:02:58] Dr. Sarah Camargos: Wow.
[00:02:59] Prof. Henry Houlden: And that [00:03:00] was about a couple of years ago.
[00:03:01] Dr. Sarah Camargos: And the Iowa family.
[00:03:04] Prof. Henry Houlden: Yeah. So this is from a nice lady called Kathy Matthews, who's a very good neurologist there and this family had also got Swedish origin. And so we postulated and we haven't proven it, but we think basically they're part of the same family. We think is if you see a Swedish ataxia patient anywhere in the world, you should think of SCA4.
I think they're all related maybe to a Swedish sailor somewhere back in time.
[00:03:33] Dr. Sarah Camargos: Really nice. And could you tell us about SCA4 phenotype? Is it homogenous? Is SCA4 distinguishable from the other SCAs?
[00:03:46] Dr. Zhongbo Chen: Thank you. That's really a good question So in the two families that we described there's obviously a huge pedigree, the phenotype is homogeneous within that particular pedigree So there's an invariable presence of large fiber [00:04:00] neuropathy at the initial presentation and at least the ankle jerks, if not more of the reflexes are absent, with some individuals being completely araphlexic at presentation.
Some asymptomatically and some with symptoms of peripheral neuropathy. And the ataxia is both of a gait ataxia as well as a limb ataxia. And in those individuals in the 13 individuals who've had neurophysiology, there was absent sural nerve action potential, and especially in the sensory nerve action potentials.
So I think in our two families that we describe it is a very homogeneous presentation with a large fiber neuropathy and ataxia.
[00:04:37] Dr. Sarah Camargos: And there are some descriptions about involvement of autonomic dysfunction. Is it a consequence of peripheral neuropathy? Did your patients have their autonomic dysfunction?
[00:04:49] Dr. Zhongbo Chen: So within our two pedigrees, we actually went back to look at the clinical notes, especially of the individuals whom we tested using long read sequencing. And there was a [00:05:00] clear documentation of lack of autonomic neuropathy and autonomic symptoms in our family, but I understand there are other Swedish families published in the American Journal of Human Genetics paper which do have autonomic symptoms, however there are different pedigrees. So it'd be interesting to fully review that and I guess to look at that in relation to the actual repeat length of the expansion in those individuals.
[00:05:22] Prof. Henry Houlden: It slightly reminded me just to chip in of MSA, you've got ataxia, you've got autonomic dysfunction and MSA doesn't usually run in families. So that was, I mean, I do an MSA clinic, I don't in families, but I thought if I saw one of those patients in isolation, MSA would be in my differential.
And it's interesting how we think they're all from the same Swedish founder, but there is a little bit of variability depending on the repeat length.
[00:05:51] Dr. Sarah Camargos: Okay.
[00:05:52] Prof. Henry Houlden: Which is interesting.
[00:05:54] Dr. Sarah Camargos: So there is definitely anticipation as well. In these families.
[00:05:59] Dr. Zhongbo Chen: Well, [00:06:00] within our pedigrees, we did look at someone with a very early age of onset in their 20s and someone with an older age of onset. And we, unfortunately, we didn't see any significant difference in the repeat length. But, however, in the original pedigree description from Flanagan there was question of anticipation because some of the patients who'd actually come to clinic were deemed asymptomatic but actually they were found to have neuropathy and some ataxia symptoms at presentation that they hadn't reported.
So there might be a presentation bias. However we didn't find out of those individuals who we tested a clear increase in the repeat length between generations. But also coming back to your initial question about whether the phenotype differs from other SCAs. There is a really nice paper which we cite, and it's quite old, but which looked very closely at the eye movements and other clinical features of SCA4 patients compared to the other spinocerebellar ataxia syndromes.
[00:06:55] Dr. Sarah Camargos: Yes, it's really nice. And if I come across [00:07:00] a Swedish patient with neuropathy and ataxia, I will definitely think of SCA4. How was your approach to find the gene?
[00:07:12] Dr. Zhongbo Chen: So I think what this approach really shows is just basically it typifies how we do miss the diagnosis and some of these Neurogenetic disorders and it shows advances in technology and the challenges that we face in diagnosing neurogenetic disorders I think just to say that obviously this has been a diagnostic conundrum for about 25 years And that might be because firstly this repeat is quite rare and secondly, it may be because this repeat is very GC rich.
So we've tried to attempt to do PCR it sort of retrospectively after we found the repeat expansion and despite multiple attempts with different methods, we were not able to replicate this on PCR. And I guess lastly, it just shows I guess an exciting era of repeat expansion ataxias.
And through this, [00:08:00] because our approach was basically, we thought because the course of evaded diagnosis for such a number of years that the repeat must be a repeat expansion disorder within this region. Furthermore, within the 16Q22 region itself there are two other repeat expansion ataxias, one being SCA31 and the other being the new THAP11 associated ataxia described in the two Chinese families.
So this inspired us more to really look for a repeat expansion within this region. And I think previous studies within the German family by Helen Breuch et al had actually looked for CAG and non CAG repeats within the region, including a CAG repeat in ZFHX3, again did not look at this GC rich repeat.
So we came across an approach called adaptive sampling using long read sequencing. We thought this would work nicely with this pedigree in terms of trying to diagnose the patients, given the linkage was so strong within this region. And just to go through this for those not familiar with long read sequencing.
So in conventional next generation [00:09:00] sequencing, DNA is fragmented into quite short fragments. And so we miss structural variation, which are defined as variants more than 50 base pairs in length. Whereas long read sequencing, we're able to sequence 10 kilo bases to 100 kilo bases and able to detect such variants with more accuracy and We're able to detect some repeat expansions as well.
And so we used the Oxford nanopore technologies, and this is where tiny nanopores are fixed on a membrane. The double stranded DNA is unzipped passes through a tiny nanopore. And because an electric current is passed across a membrane in which the nanopores are embedded, every time a nucleotide be it CTAG passes through the nanopore, it disrupts a signal in electric flow in a different way.
And in this way, we're able to record the DNA sequence live and in real time with some accuracy. And because long read sequencing is quite expensive, usually for whole genome sequencing, one flow cell will only cater for one individual [00:10:00] sample. We wanted to make this as efficient as possible, and we used an approach called adaptive sampling.
And adaptive sampling is a computationally driven way in which we can isolate the region of interest. And so what we do is we feed a BED file into the sequencing and that BED file of interest here is a 16q22 region and we sequence upstream and downstream of this with a 20 megabase region of interest.
And when that region of interest is detected the pore allows the DNA to flow through and carry on the sequencing specifically of that target region to enrich for that target region. However, when that particular region of interest is not detected, the electric current reverses across the membrane.
It ejects the DNA, and the sequencing starts again. And this carries on iteratively to increase the coverage over that particular region of interest, in this case a 16q22 region. And so when we got this region, we applied a structural variant analysis pipeline using Sniffles2. [00:11:00] And this gave us structural variants which were different from reference genome.
And because we had these samples which were individuals affected by disease and those unrelated spells not affected by disease. We simply applied one algorithm which looked at all the repeat expansions or structural variants that were similar between those segregating individuals. And so in this way, it's a variant agnostic approach to detecting structural variation, detecting repeat expansions.
And through this, we were able to quite quickly, very fortunately, identify the GC GGC repeat expansion within ZFHX3 within that region.
[00:11:34] Dr. Sarah Camargos: Wow, this is amazing. If I understood correctly your approach was based on the region that was sequenced and they couldn't find anything different in this region. So you thought it was something structural and then you did this adaptive long-read sequencing
You [00:12:00] sequence only this part of interest of the locus.
You didn't spend so much money and you did a very good approach. Wow. And this is new, right? Your approach is pretty new, I think.
[00:12:15] Dr. Zhongbo Chen: I think previously there have been studies where people have used adaptive sampling in known repeat expansions to save target enrichment and probe pull down of the repeat when you're doing sequencing. So for example, they'd sequence a Huntington's disease repeat using adaptive sampling.
However, I think, to my knowledge, this is quite a novel approach in identifying repeat expansions that are novel. And we hope it can be applied to other families with linkage in which there is possible repeat expansion. So it's quite a powerful tool and something that's cost effective as well given that we're, we're not doing whole genome sequencing, then running Expansion Hunter, and we can quite accurately identify the length of repeats and number of interruptions and the actual repeat sequence itself using this approach.
[00:12:58] Prof. Henry Houlden: And also, I [00:13:00] mean, the problem is, as well, if you want to screen a family for SCA4, we're going to have to use long reads, and many people won't be able to do that.
[00:13:09] Dr. Sarah Camargos: Yeah.
[00:13:10] Prof. Henry Houlden: So you won't be able to go to your local lab and say, can you do simple PCR or exome or even short read genome sequencing? There's not going to be many people that can do that for you.
Of course, we would love to offer this and help people out where we can.
[00:13:29] Dr. Sarah Camargos: Wow. It's really, really interesting. And this methodology, so it's expensive, but less expensive when you compare to whole genomic sequencing. Right? But you have to have a very good quality of DNA, right? Tell us how was your challenges in these families?
[00:13:53] Prof. Henry Houlden: And also we'll tell you about the repeat shipping and the repeat culturing. I mean it was a real [00:14:00] challenge. And in the end, Louis just sent us all of his DNA samples in a big package. And we grew up the lymphoblasts and eventually we got high quality DNA. And it's, what you need is you need a long stranded DNA.
So non column based, high quality DNA, either a new blood sample or a tissue culture from a fibroblast or a lymphoblast line. So unfortunately, several of the samples failed and when the DNA is very poor, it tends to block up the sequencing pores. And so the girls were saying, oh, we're having to come in and unblock these pores because your DNA is poor.
[00:14:42] Dr. Sarah Camargos: Okay. It's fragmented. Right?
[00:14:45] Dr. Zhongbo Chen: Yeah. You have to have something quite consistent fragment lengths, but I think it shouldn't be a problem for freshly extracted DNA or from blood that we have now. I think it's because these samples are pretty old and they've been extracted using quite old [00:15:00] conventional methods, including chloroform and other methods.
So there may have been some contamination as well, which could have blocked the pores.
[00:15:06] Prof. Henry Houlden: So we love fresh blood, fresh or frozen blood.
[00:15:09] Dr. Sarah Camargos: And cultures.
[00:15:10] Prof. Henry Houlden: It's perfect. It's just having a nice system to extract it.
[00:15:14] Dr. Sarah Camargos: Right.
[00:15:14] Dr. Zhongbo Chen: Yeah. I guess it also shows how precious samples are in studying rare disease genetics and how we need to value them and probably using an approach like this. You just get the hit rather than go straight for the targeted region rather than sequencing the whole genome and potentially wasting DNA. That's quite precious.
[00:15:33] Dr. Sarah Camargos: Definitely. And do you both believe that this resultant polyglycine protein is a gain of function protein?
[00:15:42] Dr. Zhongbo Chen: So, it would be nice to think so, and ride on the wave of polyglycine disorders that we're seeing now. So, again, this is the first GGC repeat that's been characterized within an exon, whereas we're even seeing, for example, for the NOTCH2NLC GGC repeat [00:16:00] expansion, which is in a 5 prime UTR associated with neuronal intranuclear inclusion disease in East Asian populations.
Even that GG3P can be associated with an upper open reading frame and cause translation of a polyglycine protein. So it will be great if this actually is one of the first polyglycine disorders, and it's the mechanism through toxic gain of function. I think the other mechanism we need to think about because it is a GC rich region, whether there is increased methylation and what this will actually lead to actually a loss of function.
Unfortunately, because of the challenges that we face with the DNA and the sample availability, lack of other tissues we weren't able to explore this further in our study, but it'd be a fascinating. subject. Including maybe skin biopsies to see whether there is inclusions which are characteristic of possible polyglycine deposition.
[00:16:49] Dr. Sarah Camargos: Of course. And how do you interpret the interruptions in the tandom repeat of the non expanded allele?
[00:16:58] Dr. Zhongbo Chen: I think this is really [00:17:00] fascinating because when we look at the tens of thousands of individuals who we used in our samples who've had whole genome sequencing, an expansion hunter, which is a bioinformatic tool to estimate the repeat sizes from short read whole genome sequencing data.
The majority of patients have a repeat size of 21 on the non expanded allele, and what this shows is that possibly the repeat itself is very stable in the non expanded allele that everyone, most of the population has a repeat size of 21, whatever the ethnicity. And we looked at the different genotypically predicted ethnicity sizes within our cohort.
And so I think the presence of interruptions may have a role in stabilizing the repeats to prevent either meiotic instability or somatic instability. And again, further studies in this would be really, really fascinating.
[00:17:48] Dr. Sarah Camargos: And all the interruptions are consistant with serine. All of them.
[00:17:54] Dr. Zhongbo Chen: Yeah, we, did not look at this. So because we had the unaffected spouse, the long resequencing data [00:18:00] showed the presence of a interruptions rather than something that was actually conformational.
[00:18:06] Dr. Sarah Camargos: Okay. It's really nice. So somehow if you lose the interruption, you are prone to expansion, maybe.
[00:18:15] Dr. Zhongbo Chen: Maybe, and I think that would be something that we should study further. For example, we know that in SCA2 CAA interruptions in the CAG repeat causes a different phenotype. So it would be interesting to characterize this further. But I think it's difficult in this case, given the rarity of the clinical syndrome.
[00:18:33] Prof. Henry Houlden: It's an interesting mechanism. I mean, it makes me think of SCA 8 also, which is another one where you get interruptions and non interruptions. And that's a way of determining whether the repeat is pathogenic or not, which again plays towards needing long-read sequencing to actually sequence this tract.
[00:18:53] Dr. Zhongbo Chen: Absolutely. Yeah.
[00:18:54] Dr. Sarah Camargos: It's fascinating. Tell us about your insights in the single tandom [00:19:00] repeats along all the chromosomes and along 16q.
[00:19:04] Dr. Zhongbo Chen: Yeah. So I guess this is a question that we're really interested in. As to why so many ataxias are associated with repeat expansion disorders and why so many of the repeat expansion disorders are associated with neurological disease. And I think we don't know yet. However in a study we published last year in Brain, we used something called Functional Genomic Annotation, which is a simple way of annotating and providing biological meaning to the genome.
And we found that genes in ataxia were associated with a high short tandem repeat density along its length. And using this method, we were able to predict the FGF14 repeat expansion because FGF14 is one of those genes that causes SCA27 through point mutations, but a repeat expansion also causes a SCA27B.
So we know that FGF14, for example, has a really high density of short tandem repeats. So using a similar [00:20:00] approach, we looked at also the 16q22 region as the refunctional genomic annotation and taking a map of all short tandem repeats of which are at 1. 7 million as naturally occurring short tandem repeats within the genome, and they're highly polymorphic.
We see that 16q22 region has actually the highest density of short tandem repeats, which are naturally occurring compared to all other chromosomal regions when taking into account the length of that region. So that in itself is really interesting because it just leads us to have the hypothesis that regions which have a high proportion, high density in naturally occurring STRs have more propensity for the repeat expansion.
So it also leads us to, for example, in undiagnosed cohorts, look towards those regions with high short tandem repeat density as a way of prioritizing potential pathogenic repeats as well in variants of unknown significance.
[00:20:53] Dr. Sarah Camargos: Wow. It's fascinating. I'm impressed. And I think everybody's impressed too. [00:21:00] Maybe we're going to talk together again for the next papers you're going to publish at MDG, right?
[00:21:07] Prof. Henry Houlden: We'll be very pleased to, and we're happy to work with anybody anywhere in the world to try and solve some of these interesting families and use Zhongbo's talent to sequence them.
[00:21:19] Dr. Sarah Camargos: Oh yeah, of course.
Thank you both for the time you have with us and congratulations for all your work on the ataxias field.
[00:21:29] Dr. Zhongbo Chen: Thank you very much.
[00:21:29] Prof. Henry Houlden: Thank you, Sarah.
[00:21:30] Dr. Zhongbo Chen: Thank you for the opportunity.
[00:22:00]