Insights from Genome Informatics 2018

Two members of the Fios Team attended the Genome Informatics 2018 conference in Cambridge on 17th-20th September. It was a packed four days with speakers covering topics from data curation to transcriptomics.

Data Curation, Integration, and Visualization

Sarah Teichmann kicked off the sessions and gave a great overview about ongoing research in her group, starting off with a discussion about work to define the spatial arrangement of cell types in human tissues: a human cell atlas or “google maps” for the human body based on scRNAseq. This is based on both the exponential growth in single-cell data and various methods available to obtain spatial profile data in tissue samples. Sarah also discussed ongoing work into the maternal-fetal interface and “CellPhoneDB”, a method to investigate communication (ligand and receptor) networks from scRNAseq data.

Giorgio Gonella followed, discussing the GFA format for storing sequence graphs for genome assemblies and GfaVis to visualise them. An approach to visualise clusters in scRNAseq data by generating trees from clusters with different k was discussed by Luke Zappia during his talk, while Laura Huerta, of EMBL-EBI, presented the “expression atlas” – curated single-cell expression data currently covering 43 single experiments from 53,000 cells in multiple species.

Casey Greene continued the afternoon by discussing work for methods to integrate multiple separate expression data sets using PLIER (Pathway Level Information ExtratoR) and muliPLIER. Sergei Yakneen then reviewed “Butler”, a computational framework for orchestrating large scale genomic workflows in the cloud abstracted across multiple cloud platforms using on SaltStack, Terraform, Airflow and a variety of other tools.

Wellcome Genome Campus Conference Centre

Personal and Medical Genomics

Katie Pollard presented the second keynote lecture. Her lab uses comparative genomics and biophysics methods to understand the influences of chromatin structure at cellular and organism level. Among other results, she showed that topological domain (TAD) boundaries (which define ends of areas of the genome capable of interacting) are strongly selected against in primates and healthy people, but found to be more abundant in patients with autism, developmental delays, and cancer, suggesting important roles in human disease. She also discussed how sequence motifs fail to explain key aspects of protein-DNA binding. Instead of relying on standard motifs, they looked for patterns in the physical shape of DNA to define structural features.

Katie finished her talk with a discussion of the pitfalls of machine learning when analysing genomic data and emphasised the importance of correct balancing of training/test sets and performing all aspects of model fitting within cross-validation. She also emphasised that most machine learning algorithms leverage the assumption of independent and identically distributed (IID) variables, which genomics data rarely conform to.

Following on, Sri Kosuri discussed the impact of rare genetic variation on pre-mRNA splicing. Sri’s lab have developed an assay ‘Multiplex Functional Assay of Splicing using So-seq (MFASS)’ and analysed variants in the Exome Aggregation Consortium (ExAC). Dennis Wang, a lecturer at Sheffield University, talked about an approach to identify non-driver somatic alterations that could be used as useful prognostic markers for non-small cell lung cancer (NSCLC) patients. In the discovery stage, they used NSCLC xenograft models along with penalised regression models to identify genes for which a high burden of somatic mutations are associated with longer disease-free survival. In the validation stage, the selected features were tested for their prognostic value in independent patient datasets.

Patrick Brennan’s talk gave a comprehensive overview of the work done by the Institute for Genomic Medicine & Nationwide Children’s Hospital. They analysed patient data with the view to identifying drugable targets using comprehensive genomic profiling of tumour samples with WES, WGS and RNAseq. So far, 60 patients have had their genomes profiled and, in many cases, they have been able to find a diagnostic or prognostic variant. Their work is a great example of interaction between hospital and genomic research group.

Transcriptomics and Epigenetics

Rafael Irizarry talked about understanding variability in high throughput data technology data. He emphasised the importance of exploratory data analysis (and not just running data through a standard workflow). He discussed several examples, in increasing complexity, of how data has produced misleading results that would be captured by more careful exploratory data analysis.

Rafael also discussed the problem of controlling for the proportion of zero counts in scRNA and finished with an analysis of whether DNA methylation changes have a causal effect on expression differences, using data from a study where they experimentally manipulated methylation levels. Part of this analysis involved defining and testing differentially methylated regions (DMRs), testing significance by permuting samples.

Barbara Engelhardt’s talk covered a generative model that she has developed for single-cell RNAseq, a competitor for the other scRNA clustering methods including PCA (principal component analysis), t-SNE (t-distributed stochastic neighbour embedding) and ZIFA (zero-inflated factor analysis). The method is a “student’s t Gaussian process latent variable model” (tGPLVM). Interestingly, Barbara noted that t-SNE effectively falls apart when considering more than ~3 dimensions (in terms of being able to accurately cluster cells), whereas tGPLVM performs better.

Overall, it was an excellent conference with a strong emphasis on single-cell omics. Single cells may well become one of the dominant technologies used to generate omics data in the future.

Posters from Genome Informatics 2018 can be found here.

Wooden screw outside the Wellcome Genome Campus Conference Centre

Services

Explore our data analysis capabilities.

Blog

Read recent blogs.

Resources

Access our recent publications & posters.



Leave a Reply

Book a free call with our team