Wednesday, January 27, 2021

How I Used Data Science to Buy a New Car

A Real World Test of My Used Car Price Predictor

Hello everyone, long time no see! A few months ago, you may have come across my blog where I discussed using Linear Regression to build a model to predict the price of a used car.

Having no prior experience with car purchasing or car ownership, I could only test my model against available web listings, and occasionally help my brother price out his 2015 Honda Fit (accurate to $200). I had run a few light tests on the model to determine its accuracy, and left happy that I had created a useful tool for anyone doing some light shopping. Since then, hundreds of people have been using it and informed me of some of its shortfalls. For example, SUVs and Pickup Trucks don’t work so well with the model. Similarly, high end luxury cars like Porsches and Maseratis don’t work so well either, mostly due to small sample size, or because Maseratis suck and their resale value goes into the toilet after the first 6 months (there are a number of reasons for this, it’s not entirely the car’s fault, mostly the owner).

Unfortunately, in July, while I was driving my brother’s car home from a Korean Fried Chicken run, I was involved in an accident. You can view the video here. I’ll spare you the long story, a lot of back and forth with insurance, but the damage was sufficient for Geico to determine the car was a total loss. Not because the car was not drive-able, but because the cost to repair the damage would exceed 75% of the car’s worth. Here was the first real world test for my model. Geico had valued my brother’s car at $11,352.00, and my model had predicted a price of $11,878.00! Barring the actual loss of the car, I was thrilled that my model provided some usefulness to my family!

Looking For A Replacement

Now, as a long time resident of NYC, I can easily imagine what many of you are thinking. You had a car, simultaneously the most liberatingly awesome, and the most crazy annoying thing to do in New York City. Yes, you can go on REAL grocery runs, do a road trip out to Jersey or the Hamptons whenever you want. On the other hand, you’re constantly dodging pot holes, constantly dodging parking tickets, constantly dodging INSANE NYC drivers. You can’t actually do anything IN the city because you can never find a parking spot, and garage rates are so damn high. Plus, we have the SUBWAY!

All those are givens, but in the 5 years my brother owned that car, I really came to enjoy having it. In my mind, it was worth the frustrations, the changing the tires once a year because you can’t dodge EVERY pot hole, and of course the one you hit is the one that kills the tire. Ever since I moved to Long Island City in 2014, I had largely spent most of my time in the outer boroughs. Plus my in-laws lived in Flushing, so having a car meant more flexibility for visiting them and helping out with random chores. Prior to the car, visiting them meant 45 minutes on the 7 train, then another 35 minutes or so on the bus to get to their neck of the woods. Driving, it was a straight 20 minute shot on the LIE. Without the car, visits were carefully scheduled, and it had to be for a special family event only. It just wasn’t worth it otherwise. WITH the car, we could stop by at any moment, pick up kimbap, drop off groceries, and help grandma find the perfect yarn for that scarf she was knitting us for Christmas. So when it came time to deal with the demise of my brother’s car, we naturally fell into that psychological trap, where we didn’t want our life to change.

After much back and forth and debate, my brother and I decided to do an upgrade rather than moving straight into the next Honda Fit. There were a number of reasons for this, but safety was kind of in the top 10. I got through the accident just fine, but we felt the Honda Fit, while perfectly suited for NYC driving (damn thing could park anywhere and slide past double- and triple-parked cars), we couldn’t help but feel a little bit fragile with our tiny car compared to some of the SUVs and larger trucks on the road. We have been nearly run off the road numerous times in the past because larger cars simply didn’t notice us. Based that and other requirements coupled with a few test drives, we ended up settling on a Subaru Outback.

Using the Model to Shop

Once we selected the car we wanted, the next step in the process would be familiar to anyone looking to purchase a big ticket item: The Research.

Perusing the web, we narrowed down our search for our specific requirements. Had to have some sort of blind spot detection system, which required getting a car with Subaru’s Eyesight package. Couldn’t be more than 40k miles, and of course, based on our budget, 2017 and older was likely our target age.

Now, a couple things you have to know about my model. The model was built based off of a wide range of used car listings, some hosted by authorized dealerships, others by plain used car lots. My partner and I noticed there wasn’t a lot of correlations or significant coefficients when it came to trims and partnerships in our model runs. In addition, the gamut of trim ranges ran so far, that it would be hard to properly categorize them for our model. For example, a car may feature a heated seat upgrade, but more luxurious trims of the same car may have heated seats as standard. Some car brands just do heated seats as standard on all their cars. As a result, we left off trims for our model, and adopted a “used car lot” pricing model, where only the basics mattered.

This resulted in some difficulty later when I was using the model to price out our Outbacks. See, unlike used car lot, dealerships know the exact price point of their trims and upgrades, and build that into their pricing accordingly, even for used cars.

For example, take a look at the below listings. One is from an authorized Subaru Dealer, on the left, and the other is a Used Auto “Mall”:

Left: Dealership, Right: Used Car Dealer
Left: Dealership, Right: Used Car Dealer

These aren’t exactly apple to apples, as the dealership is listing a Premium with an upgrade, while the used car lot is listing a Limited which has those upgrades as part of its basic package. What’s important to note though, is that the Subaru dealership actually makes note of the price for the upgrade, which they will use against you when negotiating pricing. The used dealer just simply lists out the car features. I saw this difference pretty consistently in our research.

Now, for pricing out ordinary base trim Outbacks, my model did a pretty good job:

The problem is when we started moving up in trims. Obviously, if I’m building a model to ignore trims, and put them up against a car dealership that cares very much about trims and upgrades, I’m going to start running into problems. The solution? Start collecting data and track for changes!!

Crossed out entries are for teaser cars or dealerships we deemed untrustworthy
Crossed out entries are for teaser cars or dealerships we deemed untrustworthy

As we can see, Subaru dealers really place a premium on their trims. For the standard 2.5 in Norwalk on Line 5, they’re actually priced pretty accurately based on my model. But when we move further into the middle of the pack Limited trims tend to orbit $3.8k extra.

The next step for us was to do the dance, and find the right combination of mileage, extra, distance from NYC, and finally cash on hand to select the car we were going to be least ripped off on. My brother and I gravitated towards the Limited trims because those came with Subaru’s Eyesight package standard, whereas with the Premium, we had to do further investigation to determine if it was available. In two cars, the Koeppel and the Vestal Premiums, Eyesight was available, but the Koeppel refused to do a Certified Pre-Owned check on it, and Vestal was a 4 hour drive, too far for us to justify a trip for a test drive.

Leaving off the Premiums left us with World Subaru and the Norwalk Outback as the cheaper ones on the list. The World Subaru one had 2 accidents and 2 previous owners on its Carfax, despite only have 13k miles on it. Its low price and low mileage made it SEEM like a good deal, but obviously that history played a huge factor in its pricing, something I may consider including in future builds of the model.

That left us with the Norwalk Outback. At 40k miles, it was really a bit outside our acceptable range for mileage, but conversations with other Subaru owners we knew convinced us that 40k was an acceptable mileage. On top of that, after a lot of yelling and walking out the door and coming back in, they brought the price down another $500, making this one the clear value buy.


Buying car, used or new is an exhaustive process. In all, it took us about 3 weeks to track down, research, test drive, and negotiate all these cars. Throughout the entire saga, we constantly worried we were getting ripped off, that there were hidden issues we didn’t know about in the car (we still worry about that), and questioned if we were making the right decision.

Having this model helped relieve some of that pressure. Knowing that even if we were going to get ripped off, everyone else was going to rip us off the same way helped somewhat. More importantly, having that baseline to do a proper comparison meant we could more easily eliminate potential lemons.

All in all, while I absolutely abhorred the entire rigamarole involved in buying this new car, I’m happy with my model. It helped us narrow down our options so that we could focus our research on cars that actually mattered to us, and I invite you to try it out yourself in buying your next used car!

Related Articles

Stay Connected


Latest Articles