Realizing the Promise of Big Data

Big data research is transforming how companies and organizations operate. It not only measures what is happening, but can also help us make predictions-- ranging from what customers will buy to how citizens will vote and  who will develop heart disease.


As the need for new datasets and the skills to analyze them grow, the Center for Data Science and Analytics at NYU Shanghai aims to both be one of China’s leading data science education and research facilities, and open up Chinese datasets to the world.


“The Center supports faculty research and a new Data Science major for undergraduate students. It also recently launched a new database accessible to researchers on a global scale,” said Keith Ross, NYU Shanghai Dean of Engineering and Computer Science and co-director of the Center.


A New Major


“With a growing demand for new knowledge, there is a huge need for data scientists. As part of this innovative university, the Data Science major takes the initiative to further promote interdisciplinary research and teaching,” said Yuxin Chen, NYU Shanghai’s Dean of Business and the Center’s other co-director.


The new Data Science major and minor draw on existing courses in computer science, mathematics, statistics and economics and will give students a solid foundation in computer programming, statistics and data mining.


“It differs from studying pure numbers, typical of statistics in computer science. We handle large amounts of unstructured data including images, video and text, and use algorithms developed by computer scientists and AI, separate from traditional statistics methods,” said Chen.


Taking what they’ve learned in courses like Introduction to Data Structures, Multivariable Calculus, Information Visualization and Databases, students are able to combine tools and solve contemporary problems in any discipline, including social science, physical science and engineering.


“One practical application for data science is its wide usage in developing marketing strategy for businesses. Big companies like Tencent, Alibaba and Baidu have been investing heavily in this field. You can think about how data science is in play as you scroll through different types of ads on social media platforms,” said Chen.


From Business Analytics to Computer Science, the study of data science has already picked up considerable momentum at NYU Shanghai. Students who choose a Data Science major will have flexibility to choose an area of concentration, like Finance. Students have already been using data science in their published research to investigate topics like online privacy and anonymity of app users.

Chinese Datasets Archive


This February, the Center launched an online data portal that makes Chinese datasets from its own collections, as well as a catalog of external data resources, available to the public. The free and open platform was the result of a six-month collaboration shared by Administrator for the Center of Data Science and Analytics Lin Hong, Library’s Technology Assessment Fellow Yun Dai, and Manager of Digital Communications, Jun Zheng.


“A lot of people want to get their hands on data, whether to further investigative research or to try out different algorithms for coming up with new insights for data,” said Ross. “There are a number of existing sites like this in the US, but here we’re focusing purely on Chinese data and directing users to the richest resources.”


The China Datasets Archive offers a variety of searchable categories: Biosciences, Business and Finance, Education, Geosciences, History, Linguistics, Political Sciences, Public Health and Psychology, Social Media and Socio-economic Development. Users can find everything from the date of ten million Twitter blogs spanning 2006 -2009 or the China Health and Nutrition Survey, with brief descriptions of each dataset and its source. It then indicates if the resource is free to use as is or whether access is granted through registration or application.


“Faculty members from every discipline here--including Economics, Global China Studies and Politics are interested in doing research about China, and we hope our site can help serve their research needs,” said Ross.


Also in the works is a database of software tools to analyze Chinese textual data. Contemporary software tools mostly accommodate Western languages, and are capable of everything from automatically pulling out all names in a text to analyzing the sentiments of a film review.


“With the right tools, someone can analyze a question like ‘Does The New York Times write more about women or men?’ In our case, we might have all articles that appear in a Chinese newspaper as a dataset,” explained Ross.



Click here to explore the China Datasets Archive 1.0

Discover recent publications for the Center for Data Science and Analytics here.


To find out more about Artificial Intelligence join us for  Predictive Learning and the Future of AI--a talk on March 24 by Yann LeCun, Director of AI Research at Facebook, and Silver Professor of Data Science, Computer Science, Neural Science, and Electrical Engineering at New York University.