Bruno Barreto Portfolio

Accident Severity Predictor

The goal of this project was to build a model that can predict the severity of a flight accident from a formal report detailing the events that led up to it. The model would then be used to determine the risk factors that lead to lethal accidents, so that future regulations and aviation guidelines could be directed towrds addressing them

To accomplish this, a dataset of 30,000 U.S. flight accident reports and associated metadata was assembled from the National Transportation Safety Board. This data was cleaned of missing or incomplete entries and underwent natural language processing to extract key feature information and remove self-referrential terms that would otherwise distort the model's results.

The highest performing predictive model had its feature weights analysed to determine the terms that best predicted accident severity. From this, we were able to determine that the improper installation and maintenance of key components such as the airframe or engine transmission were disproportionally important causes of high-severity flight accidents.

BTS Aviation Delay Model

Github Repo

The goal of this project was to work with the Bureau of Transportation Statistics to design a model of commercial flight delays in the U.S. that could be used to identify the most important data points for delay prediction and direct the Bureau's future data collection efforts towards the highest-priority information.

Data on 6,000,000 commercial flights collected by the BTS across 2022 was obtained and used to train multiple predictive classifer models. All data was exhausively cleaned of missing or irrelevant information. Final model performance highlighted the uncomfortable reality that the BTS' flight delay dataset has no meaningful connection with flight delays and connot be used to predict them. A cursory review of the scientific liturature was coupled with model results to direct the BTS on what data should be added to improve the predictive power of its information.

Subreddit Community NLP Classifier

Github Repo

The goal for this project was to design a model using natural language processing techniques that could determine whether a given post originated from the r/playstation or r/xbox subreddits and identify the differences in discussion topics that make these two communities distinct from one another.

Utilizing the Pushshift API, 10,000 subreddit posts from both r/playstation and r/xbox were collected. These posts were then cleaned of missing or self-referrential data that could allow the model to circumvent its task and subsequently vectorized for model training.

A logistic regression model with 85% accuracy was chosen as the highest performing model and analyzed to determine the topics within each community. As the lower model accuracy shows, the two communities are very similar overall and primarily differ in the platform-specific features prioritize discussing. The r/playstation subreddit focuses on playstation-exclusive title and network features while r/xbox discusses xbox-exclusive titles and GamePass offers.

Subreddit Community NLP Classifier

Github Repo

The goal for this project was to design a model using natural language processing techniques that could determine whether a given post originated from the r/playstation or r/xbox subreddits and identify the differences in discussion topics that make these two communities distinct from one another.

Utilizing the Pushshift API, 10,000 subreddit posts from both r/playstation and r/xbox were collected. These posts were then cleaned of missing or self-referrential data that could allow the model to circumvent its task and subsequently vectorized for model training.

A logistic regression model with 85% accuracy was chosen as the highest performing model and analyzed to determine the topics within each community. As the lower model accuracy shows, the two communities are very similar overall and primarily differ in the platform-specific features prioritize discussing. The r/playstation subreddit focuses on playstation-exclusive title and network features while r/xbox discusses xbox-exclusive titles and GamePass offers.

Attention-Based Movie Review Classifier

Github Repo

The goal of this project was to design a concise attention-based natural language model that could predict the rating given by movie reviews taken from IMDb using only the text of the movie review.

Eclipse Frequency Analysis

Github Repo

The goal of this project was to determine if there was any relationship between the location of a country on Earth and the frequency of eclipses that occur there. In addition, this project also aimed to determine if any patterns existed in the incidence of different types of eclipses within a given region.

Wikipedia User Analysis by Platform

Github Repo

The goal of this project was to determine if there are any quantifiable differences in pageview trends for wikipedia users depending on which platform (desktop or mobile) they view the use. In particular, this project aimed to determine if the peak or average pageview counts significantly differed between platforms.

Bruno Barreto

Portfolio

Accident Severity Predictor

BTS Aviation Delay Model

Subreddit Community NLP Classifier

Subreddit Community NLP Classifier

Attention-Based Movie Review Classifier

Eclipse Frequency Analysis

Wikipedia User Analysis by Platform

Skills

Education

University of Wshington

General Assembly

University of Washington

Interests