Skip to main content

How Big Data Are Unlocking the Mysteries of Autism

Better genetic insights can help support people across the spectrum

An artist's rendition of genomic data looks like a rectangle made up of colorful stripes with colored lines spreading off to both sides.

Artist's visualization of genomic data.

When I started my pediatric genetic practice over 20 years ago, I was frustrated by constantly having to tell families and patients that I couldn’t answer many of their questions about autism and what the future held for them. What were the causes of their child’s particular behavioral and medical challenges? Would their child talk? Have seizures? What I did know was that research was the key to unlocking the mysteries of a remarkably heterogeneous disorder that affects more than five million Americans and has no FDA-approved treatments. Now, thanks in large part to the impact of genetic research, those answers are starting to come into focus.

Five years ago we launched SPARK ( Simons Foundation Powering Autism Research for Knowledge) to harness the power of big data by engaging hundreds of thousands of individuals with autism and their family members to participate in research. The more people who participate, the deeper and richer these data sets become, catalyzing research that is expanding our knowledge of both biology and behavior to develop more precise approaches to medical and behavioral issues.

SPARK is the world’s largest autism research study to date with over 250,000 participants, more than 100,000 of whom have provided DNA samples through the simple act of spitting in a tube. We have generated genomic data that have been de-identified and made available to qualified researchers. SPARK has itself been able to analyze 19,000 genes to find possible connections to autism; worked with 31 of the nation’s leading medical schools and autism research centers; and helped thousands of participating families enroll in nearly 100 additional autism research studies.


On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


Genetic research has taught us that what we commonly call autism is actually a spectrum of hundreds of conditions that vary widely among adults and children.  Across this spectrum, individuals share core symptoms and challenges with social interaction, restricted interests and/or repetitive behaviors.

We now know that genes play a central role in the causes of these “autisms,” which are the result of genetic changes in combination with other causes including prenatal factors. To date, research employing data science and machine learning has identified approximately 150 genes related to autism, but suggests there may be as many as 500 or more. Finding additional genes and commonalities among individuals who share similar genetic differences is crucial to advancing autism research and developing improved supports and treatments. Essentially, we will take a page from the playbook that oncologists use to treat certain types of cancer based upon their genetic signatures and apply targeted therapeutic strategies to help people with autism.

But in order to get answers faster and be certain of these results, SPARK and our research partners need a huge sample size: “bigger data.”  To ensure an accurate inventory of all the major genetic contributors, and learn if and how different genetic variants contribute to autistic behaviors, we need not only the largest but also the most diverse group of participants.

The genetic, medical and behavioral data SPARK collects from people with autism and their families is rich in detail and can be leveraged by many different investigators. Access to rich data sets draws talented scientists to the field of autism science to develop new methods of finding patterns in the data, better predicting associated behavioral and medical issues, and, perhaps, identifying more effective supports and treatments.

Genetic research is already providing answers and insights about prognosis. For example, one SPARK family’s genetic result is strongly associated with a lack of spoken language but an ability to understand language. Armed with this information, the medical team provided the child with an assistive communication device that decreased tantrums that arose from the child’s frustration at being unable to express himself.  An adult who was diagnosed at age 11 with a form of autism that used to be known as Asperger’s syndrome recently learned that the cause of her autism is KMT2C-related syndrome, a rare genetic disorder caused by changes in the gene KMT2C.

Some genetic syndromes associated with autism also confer cancer risks, so receiving these results is particularly important. We have returned genetic results to families with mutations in PTEN, which is associated with a higher risk of breast, thyroid, kidney and uterine cancer. A genetic diagnosis means that they can now be screened earlier and more frequently for specific cancers.

In other cases, SPARK has identified genetic causes of autism that can be treated. Through whole exome sequencing, SPARK identified a case of phenylketonuria (PKU) that was missed during newborn screening.  This inherited disorder causes a buildup of amino acid in the blood, which can cause behavior and movement problems, seizures and developmental disabilities. With this knowledge, the family started their child on treatment with a specialized diet including low levels of phenylalanine.  

Today, thanks to a growing community of families affected by autism who, literally, give a part of themselves to help understand the vast complexities of autism, I can tell about 10 percent of parents what genetic change caused their child’s autism.

We know that big data, with each person representing their unique profile of someone impacted by autism, will lead to many of the answers we seek. Better genetic insights, gleaned through complex analysis of rich data, will help provide the means to support individuals—children and adults across the spectrum—through early intervention, assistive communication, tailored education and, someday, genetically-based treatments. We strive to enable every person with autism to be the best possible version of themselves.

This is an opinion and analysis article.

Wendy Chung, M.D., Ph.D., is principal investigator for SPARK (Simons Foundation Powering Autism Research for Knowledge); the Kennedy Family Professor of Pediatrics and Medicine at Columbia University Vagelos College of Physicians and Surgeons; and a clinical and molecular geneticist and physician at New York-Presbyterian/Columbia University Irving Medical Center. In 2020, she was elected to membership in the National Academy of Medicine.

More by Wendy Chung
SA Mind Vol 32 Issue 4This article was originally published with the title “How Big Data Are Unlocking the Mysteries of Autism” in SA Mind Vol. 32 No. 4 (), p. 28
doi:10.1038/scientificamericanmind0821-28