News Raj Reddy’s Vision of Digitization of all Human Knowledge Attains Milestone  
Email this page
Print this page

30 November 2007 -- Back in 1959 as a graduate student at Stanford University in California, Raj Reddy specialized in the burgeoning field of computer science. “Even then I knew that the computer would have a profound impact on society,” he says, “but I never thought computers would be as cheap and widespread as they are.”
Reddy, who is now a professor of computer science and an expert in robotics at Carnegie Mellon University, is working to exploit the ubiquity of computers and technology to improve the lives of ordinary people. “When I came back to being a professor [after serving as dean], I could have done research and published papers for the sake of publishing papers,” he says. “Instead, I wanted to do something that would have an impact on people.”

So he started thinking big, really big—specifically, a million books big.

With the arrival of the internet, Reddy saw the trend that any information that is not on-line and accessible to search engines may become unavailable or unusable. Moreover, in a thousand years, only a few of the paper documents available today will survive the ravages of deterioration, loss, and outright destruction. Hence, Reddy believed there is an urgent need to preserve all knowledge and heritage in digital form.

So his idea was to store in digital form all the knowledge ever produced by the human race and making this content available free of charge to be browsed and searched by anyone, anywhere and at anytime.

To begin with, the mission was to digitize one million books – the Million Book Project (MBP). Today, thanks to the initiative taken by Professor Reddy more than 1.5 million books are now available online.

Much of the work has been carried out by workers at scanning centers in India and China, helped by $3.5 million in seed funding from the National Science Foundation and in-kind contributions from computer hardware and software makers. These funds were primarily used to purchase scanning equipment and for developing the scanning, digitization and cataloguing methods necessary for creating a large digital library.

The United States, China, and India each have contributed $10 million in cash and contributions to the project, undertaken with partners at China's Zhejiang University, India's Indian Institute of Science, and Egypt's Library at Alexandria.

The majority of the works scanned so far -- 970,000 -- are in Chinese. English is a distant second at 360,000 works. Other languages in the project include Sanskrit, Arabic, and several Indian languages.

For the first time since the project was initiated in 2002, all of the books, which range from rare Sanskrit treatises to Mark Twain’s “A Connecticut Yankee in King Arthur’s Court” are available through a single Web portal of the Universal Library (, said Gloriana St. Clair, Carnegie Mellon’s dean of libraries.

“Anyone who can get on the Internet now has access to a collection of books the size of a large university library,” said Reddy, professor of computer science and robotics at Carnegie Mellon. “This project brings us closer to the ideal of the Universal Library: making all published works available to anyone, anytime, in any language. The economic barriers to the distribution of knowledge are falling,” said Reddy, who has spearheaded the MBP since its inception.

Though Google, Microsoft and the Internet Archive all have launched major book digitization projects, the MBP represents the world’s largest, university-based digital library of freely accessible books. At least half of its books are out of copyright, or were digitized with the permission of the copyright holders, so the complete texts are or eventually will be available free.

The collection includes a large number of rare and orphan books. More than 20 languages are represented among the 1.5 million books, a little more than 1 percent of all of the world’s books.

Many of the books, particularly those in Chinese and English, have been digitized — their text converted by optical character recognition methods into computer readable text. That allows these books to be searched and, eventually, reformatted for access by PDAs and other devices.

The vast majority of the scanning, digitization and cataloguing has been performed at centers in China and India, where more than 1.1 million and 360,000 books have been scanned, respectively. Now, about 7,000 books are scanned daily by more than 1,000 workers worldwide.

“We greatly value the participation of Bibliotheca Alexandrina,” said Michael Shamos, a Carnegie Mellon computer science professor and copyright lawyer. “Scholars everywhere regret the destruction of the Alexandria Library at various points in history, and we’re willing to go to great lengths to see that no such destruction is ever possible in the future. Once books are on the Internet, they become immortal.”

Protecting and preserving texts is a major goal, said Pan Yunhe, the leader of the MBP in China. “Paper gets old and brittle, so books soon become so delicate that no one can read them without damaging them,” said Yunhe, the former president of Zhejiang University who is now vice president of the Chinese Academy of Engineering. “Artwork fades. But once we have digitized texts and illustrations, we can keep them in circulation indefinitely. And by storing them at multiple sites, we can minimize the risk that they be destroyed, as occurred in Alexandria.”

“This collection of books in multiple languages opens up unparalleled opportunities to bring Indian cultural material to everyone, and offers a huge range of possibilities in natural language research,” said N. Balakrishnan, associate director of the Indian Institute of Science in Bangalore, one of the partners in the project.

“Digital libraries constitute an essential part of the future of the developing world,” said Ismail Serageldin, director of Bibliotheca Alexandrina. ”This requires that we approach conditions governing copyright, digital archiving and scientific databases with a view to creating two-tier systems of access to information that would allow access to such data from developing countries for a nominal fee or for free.”

Though the long-term goal of the Universal Digital Library is to make books, artwork and other published works available online for free, about half of the current collection remains under copyright. Until the permission of the copyright holders can be documented, or copyright laws are amended, only 10 percent or less of those books can be accessed at no cost.

The project has surpassed one million books, but the participants are looking to expand to all countries and eventually every language. At the Third Annual International Conference on Universal Digital Library, held at Carnegie Mellon Nov. 2-4, 2007, the partners in the MBP agreed to continue scanning, to enlist more centers for the scanning of rare and unique materials, and to work on governmental solutions to the problem of books which are out of print but still in copyright.

Raj Reddy’s MBP is up and running, bringing the project one step closer to its goal of a universal library of online books where the world’s greatest literary gems and artwork are preserved digitally for posterity. Besides the goal of a Universal Digital Library Project (UDL), Reddy envisioned a future where almost all information will be readable by humans as well as machines.

As Prof. Reddy explains: “For the first time in history, technology seems to favor the possibility of digital preservation of all the significant literary, artistic, and scientific works of mankind, as well as the potential of free access to them from every corner of the world. A Universal Digital Library (UDL) has the potential of improving the global society in ways beyond measurement. The Internet can house a Universal Library that is freely accessible to everyone. This would revolutionize education for all our future generations. There were about 10 million unique book and document editions before the year 1900, and about 100 million since the beginning of recorded history. An average-sized book is around 250 pages and would require about 50 MB of disk storage if the book were stored as compressed images. Thus, all the books and documents ever produced by the human race would require 5 peta byte of storage. Even if we multiply this by a factor of 200 for all other forms of knowledge, such as music, images, audio and video, the total of that information could be stored in a zeta byte server. With the storage capacity of digital disks increasing by a factor of 1,000 in ten years, it looks technically feasible and financially affordable to articulate the vision to store on the computer all forms of knowledge ever produced by the human race. With new digital technology, though, this task is within the reach of a single concerted effort for the public good, and this effort can be distributed to libraries, museums, and other groups in every country. This formed the motivation for the grand vision of the Universal Digital Library Project. Mission: A million books on the web by 2008.”

An interesting offshoot of Reddy’s efforts to make knowledge available free of charge to everyone is the opening up of research opportunities in language technologies – particularly for Indian languages – in order to make sure the language in which that knowledge exists does not become a barrier to information access. Language technologies research in Indian languages has so far been impaired by the lack of resources pertaining to Indian languages in the form of text or speech. Compared to what is available in English and other European languages, or in Chinese, the resources available for Indian languages have been very limited. This situation is being changed by the Digital Library of India (DLI) initiative, which is the Indian part of the UDL and MBP.

DLI today has already scanned over 360,000 books composed of approximately 150 million pages in the Indian and English languages. Digital representation and storage mechanisms have been developed for Indian languages, and a large number of applications are being built to store, process, retrieve and present the Indian language content. The Digital Library of India fosters a large number of research activities pertaining to language technologies for Indian languages, and acts as a testbed for developments made in areas such as text summarization, information retrieval, machine translation and transliteration, optical character recognition, handwriting recognition, and natural language parsing and morphological analyses.

Dr. Raj Reddy is the Mozah Bint Nasser University Professor of Computer Science and Robotics in the School of Computer Science at Carnegie Mellon University. He began his academic career as an Assistant Professor at Stanford in 1966. He has been a member of the Carnegie Mellon faculty since 1969. He served as the founding Director of the Robotics Institute from 1979 to 1991 and the Dean of School of Computer Science from 1991 to 1999.

Dr. Reddy's research interests include the study of human-computer interaction and artificial intelligence. His current research interests include Million Book Digital Library Project; a Multifunction Information Appliance that can be used by the uneducated; Fiber To The Village Project; Mobile Autonomous Robots; and Learning by Doing.

He is a member of the National Academy of Engineering and the American Academy of Arts and Sciences. He was president of the American Association for Artificial Intelligence from 1987 to 89. Dr. Reddy was awarded the Legion of Honor by President Mitterand of France in 1984. He was awarded the ACM Turing Award in 1994, the Okawa Prize in 2004, the Honda Prize in 2005, and the Vannevar Bush Award in 2006. He served as co-chair of the President's Information Technology Advisory Committee (PITAC) from 1999 to 2001 under Presidents Clinton and Bush.

Home About Us Jobs Comments Contact Us Advertise Terms of Service Privacy Policy
Copyright © 1995-2016, Inc. All Rights Reserved.
INDOlink, Planet Bollywood, "Best of Both Worlds", "Linking Indians Worldwide" are trademarks of, Inc.