Matt Sheehan
Debates over Chinese and American artificial intelligence (AI) capabilities have been long on bombast and short on data.
That’s why at MacroPolo we have created an original dataset based on published papers at what many experts deem the top annual AI conference, NeurIPS 2018, bringing more data to bear on assessing the quantity and quality of AI research talent in China and the United States.
Research talent is often overlooked but is in fact a core building block of any AI ecosystem (see our ChinAI project). Given that leading AI research is relatively open source, talent is one of the most directly quantifiable of those building blocks. Insights gleaned from the data on published research can better inform a well-grounded and data-driven public debate around the state and flow of global AI talent.
The charts below are a first look at the raw data, followed by key takeaways from the data.
The above charts break down NeurIPS 2018 papers to yield insight along several dimensions: quality of research (based on elite or upper-tier of research); where top talent comes from (based on the authors’ country of origin); where the talent received training (based on the country where those authors attended graduate school); and where they study or work today (based on the authors’ current affiliation). [A separate section on methodology and qualifiers is included at the bottom.]
Takeaways:
1. Chinese-born researchers conduct a relatively small portion of the most elite AI research (~9%), but a substantial portion (~25%) of upper-tier AI research.
Based on my colleague Joy Dantong Ma’s recent data analysis of authors of elite papers selected for oral presentations at NeurIPS 2018, 10 of those 113 authors were Chinese-born.
Notably, Ma found that all ten Chinese-born authors of elite papers are currently affiliated with US institutions (universities or corporations) or are about to join them. These findings echo Jeffrey Ding’s earlier analysis of 2017 NeurIPS oral presentations, which found 14% of those authors were of Chinese origin, but just 1% currently work at Chinese institutions.
Conducting the same country-of-origin analysis for upper-tier (but not elite) publications in 2018, we found that of the 3,824 authors, approximately one-quarter (955) were Chinese-born. This finding suggests that while Chinese-born researchers have not quite scaled the peak of the AI research pyramid, they make up a sizeable chunk of upper-tier AI research.
2. A majority of Chinese-born researchers conducting upper-tier AI research do so atUS institutions.
Among the upper-tier Chinese researchers, most of them (59%) are currently affiliated with US institutions, 33% with Chinese institutions, and around 9% with other countries, such as Canada, Singapore, and Japan.
This implies that while US institutions are still the favored destination for most upper-tier AI researchers, Chinese institutions are home to a much larger proportion of these researchers than they are for elite researchers.
3. The majority of Chinese-born researchers conducting upper-tier research attendedgraduate school in the United States, and the majority of them work in the United States after graduation (see endnote 5).
Nearly 60% of Chinese upper-tier researchers attended graduate school in the United States, with 35% attending graduate school in China and 7% in other countries (Australia and UK).
Of those Chinese authors who completed graduate studies in the United States, a large majority (78%) are currently at US institutions, with just 21% at Chinese institutions.
Conclusion
These trends—particularly for where Chinese-born researchers go to school and work—are also heavily influenced by policy changes and the overall climate between the Chinese and American technology ecosystems.
The decade-long rise of China’s technology sector has already substantially shifted the calculus for many Chinese-born technologists in Silicon Valley, pulling many of them to return to found startups or work at China’s tech giants. Recent US restrictions on graduate student visas, sometimes mistaken prosecutions of Chinese-born scientists in the United States, and political rhetoric claiming that all Chinese students are spies, are just now beginning to affect the flow and retention of Chinese AI research talent. In that light, data on where Chinese-born and US-educated researchers go on to work may represent a lagging indicator, one that could shift substantially in the coming years.
Whether those impacts are positive (protecting America’s relative advantage in elite research) or negative (diminishing America’s unique ability to attract and retain talent), remains an open question. It is one that we will explore through new datasets in the future of this ongoing series.
Notes and Methodology
NeurIPS is one of the most important AI conferences—particularly for the currently hottest subfield of deep learning—but it remains just one conference, and is necessarily incomplete in gauging AI talent in the respective countries. Additional measures of AI talent—based on citation counts, other conferences, machine learning contests, among others—are needed to give a more comprehensive picture of AI talent. We intend to add alternative measures in future analysis.
The data for the top 1% talent are based on examining the entire population of 113 authors of the 2018 NeurIPS oral presentations. The top 20% of talent is based on estimates extrapolated from a random sampling of 69 out of the 1,087 authors with Chinese surnames (confidence interval of +/- 7.8% at a confidence level of 0.95). We then conducted research on each author in this sample to find their country of origin, location of graduate school, and current work affiliation.
To assign a country of origin to each author, we used the location of their undergraduate institution as a first proxy. For authors whose high school education was available, we used the location of their high school to assign a country of origin. These proxies are imperfect: Chinese-born researchers who did their undergraduate studies in the United States are counted as US-born if no information about their high school location is available. This may slightly skew the percentages towards a lower proportion of Chinese authors. That skew may be partially offset by the fact that certain authors with Chinese surnames, and who work for Chinese institutions, were excluded due to the lack of information on their undergraduate education.
In assigning national affiliations for transnational institutions, we used the location of the headquarters of the company or university. For example, a Chinese-born researcher working for Microsoft Research Asia in Beijing would be counted as affiliated with an American institution, because Microsoft’s headquarters is in the United States. Institutions headquartered in Hong Kong were counted as Chinese.
The estimate of Chinese researchers who went to graduate school in the United States and remained with US institutions is based on a substantially smaller sample of authors. They matched the following characteristics: Chinese-born, attended graduate school in the United States, and currently employed somewhere different from their graduate school. Of the 14 authors in the sample who fit these characteristics, 11 currently work for US institutions, while three now work for Chinese institutions. Inferences based on these samples are thus made with less confidence (confidence interval of +/- 22% at a confidence level of 0.95).
No comments:
Post a Comment