Analysis of the College Scorecard Data

In September of 2015, President Obama announced the release of data on all universities in the United States. The data includes "how much each school’s graduates earn, how much debt they graduate with, and what percentage of a school’s students can pay back their loans – which will help all of us see which schools do the best job of preparing America for success."

So let's jump into the data and see what it tells us about how much each school’s graduates earn, how much debt they graduate with, and what percentage of a school’s students can pay back their loans. But first, to get some perspective on the issue of college debt, I want to address something we've all heard a lot about: the rising cost of college tuition. There are many articles about this subject in publications such as the New York Times (ref 1, 2), The Washington Post (ref), Time Magazine (ref), and others.

In figure 1 I plot the average faculty salary vs the in-state tuition and fees at nearly every university in the United States. I've color-coded the universities by sector in order to make better sense of the data. An interactive version of this plot can be found on the app page here.

Fig. 1 Dynamic plot of faculty salary vs in-state tuition and fees from 2001 to 2014. The dot colors are blue: public 2 year; green: public 4 year; orange: non-profit 4 year; red: non-profit 2 year; purple: for-profit 4 year; and brown: for-profit 2 year colleges. The tuition and fees at the most expensive schools in 2001 were about $28k/year in 2001, compared to around $50k in 2014. This is an increase in sticker price of roughly 80%, compared to inflation which caused a net increase of 39% from 2001 to 2014. An interactive version of the plot is on the app page here.

The increase in tuition and faculty salaries is striking. Take for example Columbia University in New York. In 2001, tuition and fees were $27k/year. In 2014 that number grew to $51k/year, an 89% increase. In the same period, inflation rose the value of the dollar by just 39%.

Something else that catches the eye in figure 1 is the linearity of the non-profit 4 year college (orange) data. If we fit a line to the data, shown in figure 2, we find a slope of 1.62 -- this means that if tuition is raised by one dollar, on average the faculty salaries are raised by $1.62. This raises the question of how many students there are there per faculty member on average... when there's a tuition increase, what fraction of the additional revenue goes to faculty salaries? Although the CollegeScorecard data does not include the number of faculty per university (surprisingly it isn't one of the 1745 columns!), this information can be found in the Delta Cost Project database. The data reveals that for non-profit 4 year colleges there are on average 16.22 students per full time faculty member in 2012. Thus, out of the revenue generated by a tuition increased of one dollar, it appears that roughly 1/10 of that revenue goes towards paying the faculty more. This is a bit misleading because many of the top schools have lower student-faculty ratios, for example Harvard is 7:1, but I'll save a more in-depth analysis for another post.

Fig. 2 Faculty salary vs. tuition and fees for 4-year non-profit universities. A line line of best-fit yields the equation: (faculty salary) = 1.62*(tuition and fees)+$21,723.

Now that we have some perspective on tuition, let's return to how much each school’s graduates earn, and how much debt they graduate with. The CollegeScorecard data contains a column for "median earnings of students working and not enrolled 10 years after entry," and "the median original amount of the loan principal upon entering repayment." For brevity I'll refer to these as "earnings" and "debt", respectively. Note that here are many variations of these columns in the dataset if you're interested in exploring further.

In figure 3 I plot earnings vs debt for all universities in the United States. This plot appears much more scattered, with less clumping by sector compared to the faculty salary vs tuition plot above. To interact with the data, please visit the interactive page here, and select the appropriate columns.

So which universities yield the highest-earning students? I assumed that the Ivy-League schools would pepper the top of this plot, but I was incorrect. In fact, the schools that produce the top-earners are medically-associated colleges. At the top of the list is the Louisiana State University Health Sciences Center-Shreveport, although this is slightly misleading considering they have a very small number of undergraduates (50 in 2012, currently 35 ref). A list of the top-earning universities is given in table 1 (click the button below to show).

The next question may be what does the opposite table look like? I.e., which universities are at the bottom of the list in terms of student earnings? Inspecting the bottom earning schools, I found that a number of schools in the bottom 30 listed "PrivacySuppressed", which means that the sample size was very small; so small, that a person might be able to identify the people whose salary data was used. I have excluded these universities from the list in table 2.

Fig. 3 Earnings vs. debt at all U.S. universities for which there is data in the year 2012.


Top 30 universities by student earnings
Table 1: Top 30 universities by student earnings
Institution name Median earnings ($) Total number of undergrads at institution Sample size for earnings
1 Louisiana State University Health Sciences Center 186500.0 50.0 70
2 SUNY Downstate Medical Center 128000.0 335.0 211
3 Albany College of Pharmacy and Health Sciences 118800.0 1069.0 618
4 MCPHS University 113400.0 3587.0 1768
5 Samuel Merritt University 108000.0 520.0 681
6 University of Medicine and Dentistry of New Jersey 107100.0 974.0 1346
7 University of Texas Southwestern Medical Center 106900.0 33.0 36
8 University of the Sciences 95800.0 1782.0 968
9 Harvard University 95500.0 7207.0 873
10 Montefiore School of Nursing 89500.0 129.0 97
11 Massachusetts Institute of Technology 89200.0 4477.0 770
12 Los Angeles County College of Nursing and Allied Health 87200.0 208.0 117
13 Babson College 86700.0 2015.0 480
14 Thomas Jefferson University 86300.0 744.0 909
15 Cochran School of Nursing 86000.0 93.0 139
16 Stanford University 86000.0 6999.0 823
17 Upstate Medical University 85900.0 295.0 210
18 Helene Fuld College of Nursing 84200.0 354.0 442
19 Georgetown University 84000.0 7200.0 1870
20 Stevens Institute of Technology 83700.0 2542.0 991
21 United States Merchant Marine Academy 82000.0 987.0 116
22 University of Maryland Baltimore 80700.0 722.0 617
23 Worcester Polytechnic Institute 80300.0 3841.0 1343
24 University of Pennsylvania 79700.0 10679.0 2570
25 Rensselaer Polytechnic Institute 79600.0 5300.0 2058
26 The California Maritime Academy 79400.0 971.0 373
27 DigiPen Institute of Technology 79400.0 963.0 430
28 Medical University of South Carolina 79400.0 204.0 747
29 Rose-Hulman Institute of Technology 79200.0 2097.0 690
30 Maine Maritime Academy 78800.0 968.0 430

Bottom 30 universities by student earnings
Table 2: Bottom 30 universities by student earnings
Institution name Median earnings ($) Total number of undergrads at institution Sample size for earnings
1 Clinton College 12100.0 139.0 144.0
2 Gallipolis Career College 14500.0 136.0 156.0
3 United Tribes Technical College 14800.0 505.0 227.0
4 Mountain State College 14900.0 153.0 216.0
5 Lincoln College of Technology-Franklin LCT 15200.0 76.0 3083.0
6 Lincoln College of Technology-Vine Street 15200.0 84.0 3083.0
7 Huntington Junior College 16300.0 782.0 1031.0
8 Long Island Business Institute 16500.0 491.0 236.0
9 West Virginia Business College-Wheeling 16700.0 93.0 137.0
10 National University College-Ponce 16800.0 985.0 3749.0
11 National University College-Arecibo 16800.0 1656.0 3749.0
12 National University College-Rio Grande 16800.0 1613.0 3749.0
13 National University College-Caguas 16800.0 492.0 3749.0
14 National University College-Bayamon 16800.0 3637.0 3749.0
15 Caribbean University-Ponce 17000.0 1222.0 1232.0
16 Caribbean University-Carolina 17000.0 779.0 1232.0
17 Caribbean University-Vega Baja 17000.0 530.0 1232.0
18 Caribbean University-Bayamon 17000.0 1597.0 1232.0
19 Michigan Jewish Institute 17800.0 1381.0 16.0
20 Delta School of Business and Technology 17800.0 235.0 358.0
21 Pontifical Catholic University of Puerto Rico-... 17900.0 605.0 3610.0
22 Pontifical Catholic University of Puerto Rico-... 17900.0 6073.0 3610.0
23 West Virginia Junior College-Bridgeport 17900.0 229.0 749.0
24 EDP University of Puerto Rico Inc-San Sebastian 17900.0 1143.0 589.0
25 Pontifical Catholic University of Puerto Rico-... 17900.0 1410.0 3610.0
26 EDP Univeristy of Puerto Rico Inc-San Juan 17900.0 1135.0 589.0
27 West Virginia Junior College-Charleston 17900.0 206.0 749.0
28 Huertas College 17900.0 1487.0 77.0
29 Centro de Estudios Multidisciplinarios-Humacao 18000.0 785.0 249.0
30 Centro de Estudios Multidisciplinarios-San Juan 18000.0 1243.0 249.0


The data show that Clinton College (South Carolina, historically black college) is at the very bottom of the list in terms of student earnings. The second is Gallipolis Career College, and I've included an image here.

There are also a handful of schools in Puerto Rico, tribal colleges, and a few "business" schools filling the list. But something else caught my eye, from exploring the interactive version of the graph. In figure 4 I have filtered for university names that contain art, design, music, or conservatory in the name of the university. It's readily seen that most of these universities fall below $50k in earnings, and in fact the average value is $36,203.

Fig. 4 Earnings vs. debt for universities that contain art, design, music, or conservatory in the name of the university.

I next used scikit-learn to predict the next colleges that are most likely to fail. My approach is to use the data about universities that have closed in order to predict which other universities which are most like them, and thus likely to fail. I trained a Random Forest model which I trained using a grid search, and which yields about 96% overall accuracy.

Universities most likely to fail
Table 1: Top 30 universities by student earnings
Institution name Heightened cash monitoring Probability closed
1 Southern California University SOMA 1.0 0.806655
2 Institute of Clinical Acupuncture & Oriental Med 0.0 0.514355
3 American National University-Lexington 1.0 0.429721
4 Ultrasound Medical Institute 1.0 0.426612
5 Monroe College 0.0 0.421557
6 Globe University–Wausau 0.0 0.322627
7 Instituto Tecnologico de Puerto Rico-Recinto d... 0.0 0.322036
8 Ecclesia College 1.0 0.303765
9 Globe University-Madison East 0.0 0.301356
10 Globe University-Sioux Falls 0.0 0.296969
11 Brown Mackie College-Miami 0.0 0.289619
12 Stevens-Henager College 0.0 0.287017
13 Oxford Graduate School 0.0 0.270720
14 Fortis College-Centerville 0.0 0.269408
15 IGlobal University 0.0 0.265926
16 Instituto Tecnologico de Puerto Rico-Recinto d... 0.0 0.256295
17 Herzing University-Atlanta 0.0 0.253669
18 Technical Career Institutes 0.0 0.250084
19 American InterContinental University-Atlanta 0.0 0.244845
20 Los Angeles Film School 0.0 0.241422
21 DeVry University-Oklahoma 0.0 0.239483
22 University of Phoenix-Connecticut 0.0 0.235022
23 DeVry University-Pennsylvania 0.0 0.232032
24 Expression College for Digital Arts 0.0 0.228275
25 The Art Institute of New York City 0.0 0.222006
26 Sanford-Brown College-Chicago 0.0 0.219967
27 Institute of Production and Recording 0.0 0.218016
28 The Art Institute of Atlanta 0.0 0.212843
29 Rabbinical College of Ohr Shimon Yisroel 0.0 0.212571
30 Strayer University-Mississippi 0.0 0.211138

To look at the importance of the different variables that are most important, I retrained the model with just one variable at a time. For example, I retrained the model using just tuition, and looked at the average score, then with faculty salary, etc. I find that the most important variables are as follows:

  1. Heightened cash monitoring
  2. Tuition
  3. Total cost
  4. Three-year student default rate
  5. Percent Pell grant recipients
  6. Institution educational expenditures per student

The last piece that I would like to investigate is what fraction of students can pay back their loans at each university, but I'll have to save it for another post. This can give us an indication of which universities are taking advantage of students who can't afford the loans that they take out.

In conclusion, these data show that rising tuition costs are far out-pacing inflation. Furthermore, they expose that the median salaries for students who attend undergraduate medical/nursing schools are amongst the highest; some even higher than Ivy-League graduates. And lastly, they show that in fact going to art or music school probably isn't the best decision if you're looking to maximize your future earning potential. The choice is yours!