George Mason Software Engineering Group

Seminar Schedule

The Fall 2023 Software Engineering seminar will take place every Thursday at 1PM US eastern time in ENGR 4201.

Past Seminars

Upcoming Seminars:
Nov. 2: Post PhD: Industry vs. Academia (Michele Tufano and Kevin Moran)

Biography: Michele Tufano is a Senior Research Scientist in Visual Studio at Microsoft. He obtained his PhD in the Department of Computer Science from The College of William and Mary in 2019. Michele’s research interests include Deep Learning applied to Software Engineering tasks, Software Evolution and Maintenance, Mining Software Repositories, and Software Testing. His current work at Microsoft is focused on training Transformer models to automate different developers’ tasks. Michele has published in top-tier conferences such as ASE, ESEC/FSE, and ICSE. Michele has been the recipient of several awards, including an ACM SIGSOFT Distinguished Paper Award and a Best ERA Paper Award. More information is available on his web page: https://tufanomichele.com

Biography: Kevin Moran is an Assistant Professor of Computer Science and a member of the Cybersecurity & Privacy (CyberSP) Cluster at UCF. He directs the SAGE research group. He graduated with a B.A. in Physics from the College of the Holy Cross in 2013, an M.S. and Ph.D. from William & Mary in 2015 and 2018 respectively. His main research interest involves facilitating the processes of software engineering, security, and maintenance by building developer tools enhanced by machine learning. He has published over 30 papers at various software engineering and security conferences, and his research has been recognized with ACM SIGSOFT distinguished paper awards at ESEC/FSE 2019 and ICSE 2020, and a Best Paper Award at CODASPY’19. He was also recently recognized with the 2023 MOBILESoft Rising Star Award. More information is available at http://www.kpmoran.com


Past Seminars
Oct. 26: Subjectivity in Unsupervised Machine Learning Model Selection (Wanyi Chen)

Abstract:

Model selection is a necessary step in unsupervised machine learning. Despite numerous criteria and metrics, model selection remains subjective. The impact of modelers’ preferences on model selection outcomes is largely unexplored. Through an experiment with 33 people in an unsupervised model selection task, this talk will highlight just how subjective model selection remains, with varying opinions on the importance of different criteria and metrics, differing views on how parsimonious a model should be, and how the size of a data set should influence model selection. The results underscore the importance of developing a more standardized way to document subjective choices made in model selection processes.

Bio:

Wanyi Chen is a PhD student in computer science at Duke University. Prior to joining Duke, she earned her bachelor’s degree at University of North Carolina – Chapel Hill and was a software engineer at Audible.

Oct. 19: OOPSLA’24 Writer’s Workshop

For our SWE seminar this week at 1:00 PM on October 19 (in ENGR 4201), we will have a writer’s workshop for OOPSLA 2024 submissions.

Please send us a link to your draft by Thursday, October 19 at 11:00 AM ET if you plan to submit to OOPSLA 2024 and would like feedback on your submission. If you plan to submit and the final draft is not ready, you are welcome to submit only the title and abstract for some feedback on those two items.

Please note that the deadline for OOPSLA submission is October 20 AOE.

What is a writer’s workshop: In case you have never attended a “writer’s workshop” before, the authors of submissions prepare a draft of their submission and all attendees read some sections of the paper (e.g., abstract and the introduction) so that we can give comments to help the authors improve their paper. The authors may not speak while their paper is being discussed and can only take notes when their paper is being discussed. These workshops are beneficial to the submission authors and to attendees not submitting. E.g., attendees can learn what other people in the department are working on, how reviewers review papers, what authors should or should not do in their writing, etc.

Oct. 12: My bad: automatically fixing last-mile errors in low-code languages

Abstract:

Low-code languages allow users with limited programming experience to carry out computations in popular environments like spreadsheets (e.g. Microsoft Excel) and visual app development platforms (e.g. Microsoft PowerApps). Unfortunately, such users often make small mistakes in their programs such as simple syntax and type errors. Limited rectification assistance in these environments, paired with inexperienced authors, can lead to substantial user struggles. Our group has developed multiple systems to automatically fix such mistakes. In this talk, I will present three such systems, ranging from purely neural to systems that blend neural and symbolic reasoning, and discuss their tradeoffs.  First, we’ll discuss how we can approach this problem in a similar fashion to parser error recovery. Next, I will show how modern LLMs and in-context learning can deliver on this task across multiple languages. Finally, I will present work that focuses on building an Excel-specific formula language model that can be used for repair.

Bio:

José Cambronero is a senior researcher in the blended engineering/research PROSE team at Microsoft, where he focuses on developing program synthesis systems that help developers and end-users.  Recently, most of his work has been focused on exploring the applications of program repair in low-code languages like the Excel formula language. He obtained his PhD from MIT in 2021, working with Martin Rinard. In a prior life, José worked on mortgage and housing research at a large US bank. He is originally from Costa Rica, has bounced around the east coast, and has recently moved to the DC area (so looking forward to meeting others in the PL/SE+ML community in the area!)

Oct. 5: Senior SWE PhD Students Panel

In this week’s SWE seminar, we are offering something different. We will host a panel of senior SWE PhD students. During this panel discussion, senior PhD students will discuss a range of topics related to the typical milestones that new students often encounter during their doctoral program. These topics will include managing coursework and research, the publication process in Software Engineering, preparing for milestone exams (comprehensive exams), finding collaborators, and more. Following the panel discussion, we will have Q&A session to address any concerns or questions newer students may have during the PhD program.

Sept. 28: REVIS: An Error Visualization Tool for Rust (Ruochen Wang)

Abstract:

Rust is a programming language that uses a concept of ownership to guarantee memory safety without the use of a garbage collector. However, some error messages related to ownership can be difficult to understand and fix, particularly those that depend on value lifetimes. To help developers fix lifetime-related errors, we developed REVIS, a VSCode extension that visualizes lifetime-related Rust compiler errors. We describe the design and implementation of the VSCode extension, along with a preliminary evaluation of its efficacy for student learners of Rust. Although the number of participants was too low to enable evaluation of the efficacy of REVIS, we gathered data regarding the prevalence and time to fix the compiler errors that the participants encountered. 

Bio:

Ruochen Wang is a first-year PhD student in the Department of Computer Science at George Mason University. He is part of the Developer Experience Design Lab and is currently supervised by Prof. Thomas LaToza. He received the MS degree in Computer Science from the University of California San Diego where he worked with Prof. Michael Coblenz on the REVIS project.

Sept. 21: Writer’s Workshop for FSE 2024

Please send us a link to your draft by Thursday, September 21 at 10AM ET if you plan to submit to FSE 2024 and would like feedback on your submission. If you plan to submit and the final draft is not read, we encourage you to submit only the title or/and the draft abstract to give at least some feedback.

Please note that the deadline for FSE abstract submission is September 21. 

What is a writer’s workshop: In case you have never attended a “writer’s workshop” before, the authors of submissions prepare a draft of their submission and all attendees read some sections of the paper (e.g., abstract and the introduction) so that we can give comments to help the authors improve their paper. The authors may not speak while their paper is being discussed and can only take notes when their paper is being discussed. These workshops are beneficial to the submission authors and to attendees not submitting. E.g., attendees can learn what other people in the department are working on, how reviewers review papers, what authors should or should not do in their writing, etc.

Sept. 14: 25 million dollar mistakes in research: A review of research critiques (MIT ChatGPT, Harvard Psychology, & BugSwarm) (Suzzana Rafi, Bala Naren Chanumolu)

Abstract

In this talk, we will discuss three incidents related to academic research where authors had to either clarify the importance and contributions of their work or, in some cases, retract their published work. Specifically, in this talk, we will discuss a critical review of BugSwarm, a Continuous Integration (CI) harvesting toolkit designed to automatically generate a large-scale dataset, as well as a response article to the critical review by some of the authors of the BugSwarm paper. We will also discuss the MIT-ChatGPT incident, where one research group claimed that ChatGPT could achieve a perfect solve rate in MIT courses. Yet, some students found fundamental mistakes in the research methodology of this work. Lastly, we will discuss an incident where a Harvard Business School professor was placed on administrative leave after allegations of data manipulation in several of her co-authored papers surfaced. Understanding these incidents well can help us realize what not to do in research, learn from general mistakes seen in the presented work, and learn how to review other’s work critically. And learn what can be done to avoid these kinds of scenarios.

Bio:

Suzzana Rafi is a 3rd year PhD student at the Department of CS in GMU. She has been working with Dr. Wing Lam for 1.5 years. Her work of interest includes Software Testing, specifically Flaky tests, and using and understanding LLMs in the context of Software Engineering.

Bala Naren Chanumolu is a 2nd-year master’s student at the Department of CS in GMU. He has been working with Dr. Wing for the past nine months. He likes to play FPS games a lot, mostly Valorant. His current area of research lies in Software Testing, specifically in improving Continuous Integration systems. Previously, he worked for Temenos, Inc. as a software development engineer for two and a half years.

Sept. 7: Optimizing Continuous Development By Detecting and Preventing Unnecessary Content Generation (Talank Baral)

Abstract:

Continuous development (CD) helps developers quickly release and update their software. To enact CD, developers customize their CD builds to perform several tasks, including compiling, testing, static analysis checks, etc. However, as developers add more tasks to their builds, the builds take longer to run, therefore slowing down the entire CD process. Furthermore, developers may unknowingly include tasks into their builds whose results are not used (e.g., generating coverage files that are never read or uploaded anywhere), therefore wasting build runtime doing unnecessary tasks. We propose OptCD, a technique to dynamically detect unnecessary work within CD builds. Our intuition is that unnecessary work can be identified by the generation of files that are not used by any other task within the build. OptCD runs alongside a CD build, tracking the generated files during the build and which files are read/written. Files that are written to but are never read from are unnecessary content from a build. Based on the names of the unnecessary files, OptCD then maps the files to the specific build tasks responsible for generating or writing to those files. Finally, OptCD leverages ChatGPT to suggest changing the build configuration to disable generating these unnecessary files. Our evaluation of OptCD on 22 open-source projects finds that 95.6% of projects generate at least one unused directory, a directory whose contents are all unnecessarily generated. OptCD identifies the correct task that generates 92.0% of the unused directories. Further, OptCD can produce a patch for the CD configuration file to prevent generating 72.0% of the unused directories. Using the patches, we reduce the runtime by 7.0% on average for the projects we studied. We submitted 26 pull requests for the unused directories that we could disable. Developers have accepted 12 of them, with five rejected, and nine still pending.

Bio:

Talank Baral is a first-year Ph.D. student in the Department of Computer Science at George Mason University. He is part of the SEAR Lab (https://twitter.com/sear_lab) and is currently supervised by Dr. Wing Lam. His current area of research lies in Software Testing, specifically in improving Continuous Integration systems. Previously, he worked for JankariTech as a software developer for two and a half years.

Aug. 31: Coordinator assignments and Lightning talks

For next week’s Software Engineering seminar (August 31st), we will assign SE social activity coordinators and SE seminar coordinators. We plan to have roughly one student from each group be a part of the social activity coordinators. Note that if you do not show up next week, you may not get to be a coordinator or other attendees may nominate you into becoming a coordinator.

We also ask ALL students to give a 2-min lightning talk + 3-min for questions on one topic that they (1) are working on or (2) have recently worked on.

See the following for what you are expected to present. (Please duplicate the slide and fill in a slide BEFORE the seminar next week.)

https://docs.google.com/presentation/d/1ihhG_QWCJguUoHUSEnImzYCNOJ7WpkWeP2eSHto9u2A

These lightning talks are meant to be short, lightweight, and provide an opportunity for SE faculty and students to learn about you and what you are working on. These talks should also help you practice how to quickly and effectively share your work with others.

Note the following.

  • Please do NOT share these slides with anyone as some attendees may present work that is in progress
  • Be succinct and discuss only what matters in your presentation. E.g., you need not go into too much detail on your approach but you should provide an idea for what makes it novel and how it is different than existing work
  • If you are working on a problem with another SE seminar attendee, try to pick different problems to present so that everyone gets to present a unique problem. If it is not possible to do so, you may present the same problem together in one slide
Aug. 24: Toward Efficient, Robust, and Interpretable Machine Learning Systems (Simin Chen, University of Texas at Dallas)

Abstract: Machine learning systems, encompassing domains like autonomous driving, smart cities, and finance, are playing an increasingly pivotal role in the contemporary landscape. With the widespread adoption of ML systems, how to ensure the reliability of these systems brings a series of technical challenges across diverse dimensions (e.g., efficient inference in mobile systems, cybersecurity and privacy, and human-centered computing), calling for cross-disciplinary solutions.

In this talk, I will introduce two research works aimed at building more interpretable, efficient, and robust machine learning systems. Firstly, I will direct our focus toward the system development phase, where I will introduce a model explanation framework with the key advantages of interpreting the decision-making of ML model into human-understandable rules. The interpreted rules facilitate the analysis and debugging of the ML models. Moving forward, I will transition to the system deployment phase, where I will introduce a general framework. This framework enables a seamless co-design of machine learning systems by jointly optimizing model-level and runtime-level components. Finally, I will conclude this talk by discussing my forthcoming research endeavors, aimed at advancing interpretability, efficiency, and robustness through the co-design of ML model, infrastructure, and runtime components.

Bio: Simin Chen is a fifth-year Ph.D. candidate at the University of Texas at Dallas. His research interest lies in the intersection of Security, Software Engineering, and Machine Learning, with a dedicated emphasis on enhancing the full-stack machine learning system through the coordination of application (ESEC/FSE 2020, ASE 2022, CVPR2022, CVPR 2023, ACL 2023), infrastructure (ISSTA 2023, IJCAI 2022), and runtime (ESEC/FSE 2022).

More information can be found on his webpage: https://chensimin.site/