“Some men see things as they are and say, ‘Why?’ I dream of things that never were and say, ‘Why not?” George Bernard Shaw via Robert Kennedy.
Earlier this month I wrote about the difficulty of thinking “exponentially” as we try to understand the potential impact of technology on human learning (you can read this article in the Academic Technology Newsletter, and if you missed it you may want to sign up for the Newsletter here). One of the points I made in my newsletter article was that we will see dramatic increases in the power of computers and networks once we once reach the “knee” in the technology growth chart pictured on the left. At that point, our capability to accomplish our goals with the technology is limited more by our imagination and ability to conceive of what might be done rather than by the technology itself. In this piece, I try to dream a little about what one part of the university might be like if these advances really do come to pass: the doctoral dissertation in the social sciences (specifically education).
While most of the media attention has focused on Massive Open Online Courses (MOOCs) and their impact on teaching, it seems to me that our current approach to research is likely to change even more dramatically — particularly the doctoral dissertation in education. The dissertation in the social sciences is a relatively predictable document that has been developed according to the expectations over the last 50 years or so. The chapters cover the same topics and follow the same logic, even though we no longer hand-calculate our statistical tests or tediously calculate the amount of space needed to accommodate our footnotes at the bottom of a typewriter page. But the dissertation still takes a year to produce, and the first couple of drafts are painful for the writer and for the advisor.
The Computer-Enhanced Dissertation: Using Watson and Quill
How would things be different if new artificial intelligence tools were to allow us to actually write the dissertation rather than just format it? Two existing technologies show us what this might be like: the hypothesis-generating capability of IBM’s Watson and the ability of Narrative Science’s Quill to generate textual reports that are largely indistinguishable from those written by humans.
Watson is the IBM supercomputer who proved that computers were capable of beating the very best human players in Jeopardy. That was in 2011 and since then the technologies that powered Watson has improved by a factor of three and the machine itself now fits in a suitcase rather than taking up the space of a master bedroom. Watson carries out many of the same tasks that a doctoral candidate does: ingesting huge amounts of data from various scholarly sources, pattern recognition, and hypothesis generation.
Quill, on the other hand, is an artificial intelligence platform that produces writing. Narrative Science developed it to analyzes data and produces business reports, articles, summaries, graphics, and other products that can be more easily read and interpreted by human beings. The software is designed to help “bridge the gap between numbers and knowledge,” incorporating a natural language engine that adapts text, graphics, structures, and layouts while adapting to the style and expectations of the writer and potential audiences. Again, these are the same tasks we expect from a doc student.
Let’s more closely at how these two clusters of technology might change the nature of research if they are widely available and 30 to 50 times more powerful than they are today.
Lessons from the Health Care Industry
Beating Ken Jennings in Jeopardy was a publicity stunt, but IBM has been working since then to get useful work out of their investment. Perhaps the most telling example these tools might play out in educational research is in a pilot organized by IBM and the Sloan Kettering Cancer Center to adapt a specialized version of Watson to help oncologists do better predictions of thoracic cancers. The following demo shows how the interaction between the computer and the doctors might work once the pilot is complete.
The demo shows how an oncologist might access patient records and then use Watson to provide some suggestions (hypotheses) for what the particular kind of cancer might be. In it Watson searches 450,000 journal articles, 4500 textbooks, 65,000 clinical trials, then produces a range of hypotheses about what kinds of cancers the patient’s current medical history suggests. Doctors needing additional information can just click on the button on the iPad app and all of the research that supports the particular diagnosis will be displayed for them. In the limited trials today, Watson-enhanced diagnosis is proving to be about 90% accurate, compared to the 50% accuracy rate achieved by oncologists on their own.
Lessons from Computer-Based Journalism
Narrative Science grew from a Northwestern University computer science project that generated stories about Little League baseball games for local newspapers. Increased processing power and advances in artificial intelligence have allowed the company develop new ways to automate increasingly complex report-writing processes.
Narrative Science quickly moved beyond Little League to other sports reporting, including customized reports for fantasy football fanatics. It expanded into the financial market, generating company reports for Forbes magazine before developing software capable of generating millions of individual stories from petabytes of data for ancestry.com. Most recently, the CIA has entered the market by contracting with Narrative Science to write analyst reports.
All of these narrative forms are relatively predictable, and readers are looking for a specific combination of context and an numerical data. Can you imagine a more predictable format than the traditional dissertation in education? The five-chapter structure includes chapters for literature review and methodology, along with a strong bibliography and a few appendices. It would seem only a matter of time before agencies like the Department of Education will be exploiting these types of capabilities.
So how would those technologies transfer into educational research?
The One-Minute Dissertation
The one-minute dissertation would largely be a first draft in which an education-specific version of Watson would help researchers refine and answer specific research questions. Like the Sloan Kettering Cancer diagnosis machine, this Watson version would have to be trained. (The Sloan Kettering and Wellpoint computers have been trained with about 15,000 of hours input by nurses, doctors, and specialists in medical information. Given the small army of graduate students and faculty studying educational problems, even a small amount of funding from the likes of the Bill & Melinda Gates Foundation, some corporate gifts from IBM or Pearson, and/or an National Science Foundation grant or two, could easily fund the same level of training for a project like this one.)
So, let’s imagine that both IBM Watson and Narrative Science are still around and the technology has continued to advance. Let’s also imagine that for the last four years a consortium led by Stanford and Penn State has been feeding every piece of educational research available into a specially designed version of Watson. But this is a version of Watson which is now 30 times more powerful than it was in 2011 when it defeated Ken Jennings on Jeopardy and it is available to any researcher at any university for a nominal charge.
During that time, Narrative Science has also ingested every research report, book, and dissertation in education over the last 50 years and has analyzed all that text with tools developed by artificial intelligence researchers, digital humanists, and journalists of every persuasion. The power of their analytics have identified the characteristics of successful writing and have developed algorithms to be able to reproduce those most successful pieces of writing quickly and easily. Their platform is also available to researchers at any university for a nominal fee and is about 30 times more powerful than the current system that is writing intelligence reports for the CIA.
In that new world, one of our graduate students has finished all the required coursework for her doctorate and is ready to start on her dissertation. She opens up an app on her tablet and sees the question, “What would you like to research today?”
She types in the first draft of her research question: “What is the impact of parental involvement on student achievement?”
The computer thinks about it and accesses the following resources: 450,000 research studies, 4500 textbooks, and data sets incorporating over 200 TB of student achievement data and then returns the following:
- Student achievement is increased about 3% when parents are highly involved with their students work.
- Parent involvement increases student achievements in grade 1-8 and has no impact beyond grade 8.
- Parent involvement is positively related to family reported income across all demographic groups.
Much like the oncologist in the demo video, our potential researcher can touch buttons displayed on the iPad to dig deeper into the research and to determine what studies Watson accessed to generate the hypotheses. Another button poses additional questions that could assist the program in refining the hypotheses. As the researcher interacted with the program, answering questions and focusing on additional research studies, the hypothesis would be more clearly defined. When all the fine-tuning was done, the machine would ask, “Are you ready to begin writing your proposal?”
At that point, Watson would turn the research question over to Quill. The AI platform would analyze the current working hypothesis, comparing it to the entire corpus of previous work in the area. Then the program would propose several levels of study that would build on existing research and might be a significant contribution to the field. Like the hypothesis-generating program, the researcher would continue to refine the kinds of output that would be required for this particular dissertation. When the researcher had completed her interaction with the program, the final button would appear: “Ready to write your proposal?”
A minute later the first draft proposal would be done. The literature review would be composed based on thousands of sources, and a methodology would be recommended from the best practices defined by hundreds of professors and scholars all over the world. The paper will have a clear and coherent structure based on the best exemplars of this type of research. The spelling, grammar, and mechanics will be impeccable and the APA will be perfect. An additional menu will allow the researcher to make contact with other students and faculty doing research in a similar area. Another button will connect with our database of grants that might be available to support this kind of research.
Is a One-Minute Dissertation Realistic?
Does all that sound like fantasy? Certainly, but so did most of the features of online card catalog when I was working on my dissertation in the Syracuse University Library 25 years ago. Back then the futurists in the Library School talked about the days when we’d be able to access the collections of major libraries all the world from our offices at home. They claimed we’d be able to read full-text reports of most any work, and with a click of our mouse we could see all of the citations of articles that cited that work. They predicted that I’d have an electronic version of my bibliography cards that I could share with colleagues all over the world and that would allow me to format a reformat a bibliographic materials in a matter of a few seconds.
Did I believe them? Not really. I was still carrying a sock full of coins to feed the photocopier (my full text reader) and piles of 3 by 5 cards to keep track of my bibliography entries. My “word processor” was on the Cornell mainframe and I was the first generation of doc students not to have to take my punch cards to Machinery Hall to run my stats. The simple research tools I take for granted these days seemed pretty far out then. Maybe this scenario is closer than we think.