Skip to main content


Office Location

440 Huntington Avenue
208 West Village H
Boston, MA 02115


Xiaofeng Yang is a PhD student in the Database Systems and Data Mining programs at Northeastern University’s College of Computer and Information Science, advised by Professor Mirek Riedewald. She is currently working on graph sampling on social networks and its application on distributed database systems. In 2013, Xiaofeng received her bachelor’s degree in mathematics and engineering from Tsinghua University in China.


  • BEng, Tsinghua University – China
  • BS in Mathematics, Tsinghua University – China

About Me

  • Hometown: Jilin, China
  • Field of Study: Database Systems and Data Mining
  • PhD Advisor: Mirek Riedewald

What are the specifics of your graduate education (thus far)?

I am planning to develop and apply algorithms to improve performance of graph databases for large scale social networks. I want to integrate information from the network, such as graph structure (how users are connected), what preferences (attributes, tags) do users specify, to provide faster and more precise answer to data mining questions.

What are your research interests?

I came into the PhD program with an interest in some specific data mining problems, such as how users behave on social networks, and did a project looking for organized propaganda over the Twitter-like platform Sina Weibo. During this project, I came across issues such as “if the data is stored in a different way, can the pipeline be significantly more efficient?”, and “which computation can be parallelized?”, which shifted my interests into the design of database system.

What’s one problem you’d like to solve with your research/work?

I am interested in answering questions regarding the improvement of the performance of the data processing system, also in proving underlying bounds on how well they can eventually become.

What aspect of what you do is most interesting?

I find it fascinating that I can actually prove that based on simple assumptions, the social network services that are functioning nowadays are far from optimal, in terms of processing speed, given a reasonable tolerance of error.

What are your research or career goals, going forward?

I want to further look into practical ways of improving these performances in the future. For this specific problem on approximated graph queries, it can be a faster distributed graph database that can be actually implemented and applied to real social network services. This may contain a better graph sparseness algorithm, and a more valid metric in trading off performance with error.