Understanding Laptop Prices

Executive Summary

Using the 2008 laptop sales in the City of London, the goal is to describe some of the characteristics of laptop prices. For example, how the customer postal code and amount of RAM lead to increased or decreased laptop prices.

The Data

The data is comprised of 16 columns and approximately 297,000 rows of laptop sales in London, England for all of 2008. Below is the first three rows of the dataset, broken into three tables so all 16 columns can fit on the page.

The first five columns include the date and time of the laptop purchase, laptop configuration, which is a code that corresponds to a combination of screen size, battery life, RAM and some of the other varaibles. The customer post code, store post code and retail price are also included.

Table 1: First 5 Columns
Date Configuration Customer Postcode Store Postcode Retail Price
1/1/2008 0:01 163 EC4V 5BH SE1 2BN 455
1/1/2008 0:02 320 SW4 0JL SW12 9HD 545
1/1/2008 0:04 23 EC3V 1LR E2 0RY 515

The next 5 columsn include some of the physical characteristics of laptops, including screen size, battery life, RAM and processor speed as well as if they have integrated wireless capabilities.

Table 2: Next 5 Columns
Screen Size (Inches) Battery Life (Hours) RAM (GB) Processor Speeds (GHz) Integrated Wireless?
15 5 1 2 Yes
15 6 1 2 No
15 4 1 2 Yes

The final 6 columns include the hard drive size, if the laptop comes with any applications pre-installed and the geo codes for both the customer and store.

Table 3: Last 6 Columns
HD Size (GB) Bundled Applications? customer X customer Y store X store Y
80 Yes 532041 180995 534057 179682
300 No 529240 175537 528739 173080
300 Yes 533095 181047 535652 182961

Data Exploration

Next up, I will explore the data in more detail. From the tables above, there are a few issues that immediately jump out.

One is that the date column is a character column, not an actual date/time column. We will want to convert that into a date/time for further analysis.

Two, there appear to be some redundant columns. For example, Configuration is just a numeric key of the over variables. This column can be removed safely.

The post code columns for both store and customer also have geo codes associated with them. I’m not interested exactly where people live or where stores reside so I’m going to remove those columns. Also, in regards to the post codes, I only want the general small section that people and stores reside in. Each small section of London is allocated a 1-3 letter prefix that corresponds to its compass location and then a following number and 2 letters to distinguish it from adjoining streets within that area. So, I will need to strip out extra information in the post codes.

A final thought about the data, I may also want to see if the dates or times of day of sales are relevant. To do this, I will break the features of the date into more columns, including month, day of the week, hours, minutes, etc.

Numeric Variables

Below are some summary statistics for the numerical variables.

Table 4: Summary Statistics for Numeric Variables
variable count_of_rows mean median variance std min max
am.pm 297572 1.50 2 0.25 0.50 1.0 2.0
battery_life_hours 297572 4.97 5 0.66 0.81 4.0 6.0
day 297572 15.88 16 79.43 8.91 1.0 31.0
hd_size_gb 297572 132.02 80 9558.21 97.77 40.0 300.0
hour 297572 11.50 12 48.00 6.93 0.0 23.0
minute 297572 29.53 30 300.18 17.33 0.0 59.0
month 297572 8.34 9 8.12 2.85 1.0 12.0
processor_speeds_g_hz 297572 1.93 2 0.13 0.36 1.5 2.4
quarter 297572 3.06 3 0.86 0.93 1.0 4.0
ram_gb 297572 2.10 2 1.31 1.15 1.0 4.0
retail_price 297572 508.13 500 10943.63 104.61 168.0 890.0
screen_size_inches 297572 15.81 15 0.97 0.98 15.0 17.0
wday 297572 3.99 4 4.04 2.01 1.0 7.0
week 297572 34.63 36 156.83 12.52 1.0 53.0
year 297572 2008.00 2008 0.00 0.00 2008.0 2008.0

Some interesting observations from the above statistics:

  1. There are no missing values in the numeric variables (seen by having 297,572 rows which is the same as the entire dataset)

  2. Mean of the am.pm is exactly between 1 (am purchase) and 2 (pm purchase) saying that an equal number of laptop purchases take place between mornings and evenings. It would also be interesting to see the distribution of sales by hour.

  3. Battery life has a mean of ~5 hours, with a max of only 6 hours.

  4. There isn’t much difference in the processor speeds of the laptops

  5. Pretty wide difference in retail price: £168 - £890

Visualizing the data makes it easier to see patterns. Below, all of the numeric variables in the dataset are plotted against the laptop prices. Note, due to data size, only 10% of the points are plotted.

Some interesting patterns:

  1. Seems to be only 3 different battery life: 4,5,and 6 hours

  2. Most hard drive sizes are below 150 gb, with some in the 300gb range

  3. Laptop sales are lower earlier and the year and pick up around week 20 (month 6: June)

  4. Only 2 laptop screen sizes: 15 and 17 inches.

  5. Day of the week for sales does not appear to matter.

Categorical Variables

The categorial variables also need to be explored. Below is a couple of basic metrics for the categorial variables.

Table 5: Summary Statistics for Numeric Variables
variable count_of_rows num_distinct
bundled_applications 297572 2
customer_postcode 297572 114
integrated_wireless 297572 2
store_postcode 297572 15

For both bundled_applications and integrated_wireless there are only two variables while the cusdtomer_postcode has many postcodes and stores only have 15. So, I we know there are 15 different stores in the dataset. There are also no missing values in these variables.

Laptop sales are almost split exactly down the middle between having bundled applications and not.

As with the bundled applications, integrated wireless is only in about 50% of the laptop sales (remember, this is 2008 data!).

Correlation

Finally, it is time to see which variables are highly correlated with each other.

Linear Regression

Finally, it is time to use linear regression to describe the effect of the variables on laptop price.

You can find the entire linear model in the appendix.

All of the data was used for the model since the goal is not to predict future sales, but describe past laptop prices and account for as much of the variance in laptop prices as possible.

If you look at the appendix, a decent amount of variables are not significant (have no stars on the far right column), meaning we could use a simpler model to explain the variance in laptop prices. At a 5% significance level, it means that you have a 5% risk of concluding the variable (i.e. hard drive size) has no relationship with laptop price.

After accounting for a signficance level of 5%, 55 variables are removed from the model.

Reviewing the first few rows of the model tells an interesting story. For a customer in postcode CR7, all other things being equal, will pay about £2.39 more for a laptop than other post codes while E1 will pay £10 less! Armed with this information, you could target market postcodes with different types of deals on laptops. For example, in E1, you could target lower end laptops.

I can also figure out which borough E1 belongs to and plot it (object of another post). Using data from www.milesfaster.co.uk, I scraped the post codes and borough information. E1 belongs Whitechapel, Stepney, Mile End.

Combining the borough with demographic data from the London Datastore can provide even more powerful insights. For example, in 2008, E1 had an average weekly salary for full-time workers of £450.2 which is lower than the average of £581.5 for the entire City of London.

Meanwhile, a 1 gb increase in RAM size will lead to a £57 price increase for the laptop.

Appendix

Entire Linear Model

## 
## Call:
## lm(formula = retail_price ~ ., data = laptop_sales_tidy)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -236.066  -21.017    6.614   30.399  113.742 
## 
## Coefficients: (1 not defined because of singularities)
##                                  Estimate        Std. Error t value
## (Intercept)             -114307.384285684    3452.939629856 -33.104
## purchase_date                 0.000094714       0.000002886  32.820
## customer_postcodeCR0          0.653161381       1.595411426   0.409
## customer_postcodeCR7          2.393742037       1.826828077   1.310
## customer_postcodeE1         -10.236456764       1.565947688  -6.537
## customer_postcodeE10         -9.342318139       2.942421962  -3.175
## customer_postcodeE11        -10.009032446       2.113307172  -4.736
## customer_postcodeE12         -8.306950645       2.078809823  -3.996
## customer_postcodeE13         -6.216695230       2.063372197  -3.013
## customer_postcodeE14         -7.162327310       1.635515219  -4.379
## customer_postcodeE15         -5.575233421       1.967345236  -2.834
## customer_postcodeE16        -10.984911944       2.810537778  -3.908
## customer_postcodeE17        -11.108470331       2.327832027  -4.772
## customer_postcodeE2          -8.905799003       1.668186388  -5.339
## customer_postcodeE3          -7.629412953       1.855538853  -4.112
## customer_postcodeE4          -3.935224826       2.898146801  -1.358
## customer_postcodeE6          -7.957763166       1.821955326  -4.368
## customer_postcodeE7          -8.657173890       2.085043422  -4.152
## customer_postcodeE8          -8.026919384       1.856707018  -4.323
## customer_postcodeE9         -10.734213077       2.889323338  -3.715
## customer_postcodeEC1        -12.759316773       1.561471826  -8.171
## customer_postcodeEC2        -14.580322379       1.622167841  -8.988
## customer_postcodeEC3        -14.593079251       1.623570755  -8.988
## customer_postcodeEC4         -5.772365878       1.583604058  -3.645
## customer_postcodeHA5         -1.917868371       2.911825673  -0.659
## customer_postcodeIG11       -10.240250313       2.919385238  -3.508
## customer_postcodeIG2        -10.717300702       2.965762529  -3.614
## customer_postcodeKT1          1.692682392       1.851849810   0.914
## customer_postcodeKT2          2.628802124       1.843412305   1.426
## customer_postcodeKT4          3.564075325       2.883196541   1.236
## customer_postcodeKT6          2.953342350       2.358901761   1.252
## customer_postcodeKT7          2.824709941       2.355479082   1.199
## customer_postcodeN1         -10.799233299       1.590078019  -6.792
## customer_postcodeN10         -7.937386267       2.861136915  -2.774
## customer_postcodeN13         -8.678203992       2.838048022  -3.058
## customer_postcodeN16         -7.586061940       2.261425013  -3.355
## customer_postcodeN17         -7.213667743       2.724167456  -2.648
## customer_postcodeN19         -7.999436891       2.367108532  -3.379
## customer_postcodeN2          -4.182278837       2.147197113  -1.948
## customer_postcodeN20         -5.877790200       2.930628673  -2.006
## customer_postcodeN22         -5.574250392       2.360505574  -2.361
## customer_postcodeN3          -2.826141904       2.962168383  -0.954
## customer_postcodeN4          -7.058295210       2.262611666  -3.120
## customer_postcodeN5          -8.434074641       1.946853095  -4.332
## customer_postcodeN6          -2.587187839       3.004993399  -0.861
## customer_postcodeN7         -11.848030581       1.716824461  -6.901
## customer_postcodeNW1         -4.760866046       1.614299946  -2.949
## customer_postcodeNW10        -0.022332322       2.979392941  -0.007
## customer_postcodeNW3         -3.996232400       1.828848606  -2.185
## customer_postcodeNW5         -8.641286191       1.963524842  -4.401
## customer_postcodeNW6         -2.666023525       1.971175625  -1.353
## customer_postcodeNW8        -11.070708540       2.976118336  -3.720
## customer_postcodeRM10       -10.220352949       2.949620295  -3.465
## customer_postcodeSE1         -5.449135177       1.567810291  -3.476
## customer_postcodeSE10        -8.913675308       1.845572970  -4.830
## customer_postcodeSE11        -2.546496471       1.792032980  -1.421
## customer_postcodeSE13        -8.332219258       2.286014206  -3.645
## customer_postcodeSE14        -8.070409884       1.834583701  -4.399
## customer_postcodeSE15       -12.921189825       2.090394355  -6.181
## customer_postcodeSE16        -8.377238110       1.780350740  -4.705
## customer_postcodeSE17        -2.673058013       1.960150667  -1.364
## customer_postcodeSE19        -1.616042457       1.766629301  -0.915
## customer_postcodeSE20        -1.362254221       2.045686120  -0.666
## customer_postcodeSE21        -0.820241242       2.913917522  -0.281
## customer_postcodeSE22        -3.695561194       1.905831705  -1.939
## customer_postcodeSE23        -1.249303325       1.810258313  -0.690
## customer_postcodeSE24         3.237916815       1.920911756   1.686
## customer_postcodeSE25        -1.844354409       1.608538729  -1.147
## customer_postcodeSE26        -1.355862297       1.722539161  -0.787
## customer_postcodeSE27         4.238718648       2.242240092   1.890
## customer_postcodeSE3         -7.113646755       2.040815348  -3.486
## customer_postcodeSE4         -8.480663737       2.825133263  -3.002
## customer_postcodeSE5         -7.651826738       2.079773531  -3.679
## customer_postcodeSE6         -6.221497551       2.883114470  -2.158
## customer_postcodeSE7         -7.312373708       2.919105571  -2.505
## customer_postcodeSE8         -9.059321438       1.923199595  -4.711
## customer_postcodeSW1          0.161368996       1.580066353   0.102
## customer_postcodeSW10         0.925863904       2.852216146   0.325
## customer_postcodeSW11        10.512017952       1.818845823   5.779
## customer_postcodeSW12         9.575288281       2.054026941   4.662
## customer_postcodeSW13         0.139282407       2.313421880   0.060
## customer_postcodeSW14         1.626309228       2.941857093   0.553
## customer_postcodeSW15        -1.541046376       2.912142918  -0.529
## customer_postcodeSW16         1.002812131       1.910280962   0.525
## customer_postcodeSW17         2.791591699       2.798198713   0.998
## customer_postcodeSW18         9.044360455       2.238238747   4.041
## customer_postcodeSW19         2.700080798       1.797574720   1.502
## customer_postcodeSW2          8.162087697       2.270292151   3.595
## customer_postcodeSW3          0.499108488       1.950111233   0.256
## customer_postcodeSW4          7.530082095       1.683348884   4.473
## customer_postcodeSW5         10.674145501       2.274609796   4.693
## customer_postcodeSW6          7.236641985       2.281281034   3.172
## customer_postcodeSW7          2.650395664       2.085381685   1.271
## customer_postcodeSW8          0.980382198       1.735114880   0.565
## customer_postcodeSW9          2.332303615       2.304509717   1.012
## customer_postcodeTW1         -0.608833897       2.379020651  -0.256
## customer_postcodeTW10         2.986071659       1.935751220   1.543
## customer_postcodeTW7          0.710874887       2.828429815   0.251
## customer_postcodeTW8          0.191297066       3.051335857   0.063
## customer_postcodeTW9          2.925401419       1.982374243   1.476
## customer_postcodeW1          -0.776207057       1.546811083  -0.502
## customer_postcodeW10         -0.601944526       2.929523429  -0.205
## customer_postcodeW11          2.523763820       1.945509054   1.297
## customer_postcodeW12          0.413531045       2.098230022   0.197
## customer_postcodeW14          8.528889653       2.842768104   3.000
## customer_postcodeW2           1.336848619       1.721732928   0.776
## customer_postcodeW3           2.060777224       1.959604902   1.052
## customer_postcodeW4           2.322126547       1.980921353   1.172
## customer_postcodeW6           1.440440287       1.899840729   0.758
## customer_postcodeW7          -0.093812700       2.390115110  -0.039
## customer_postcodeW8          -0.946428475       2.319313552  -0.408
## customer_postcodeW9         -12.552672926       2.933941420  -4.278
## customer_postcodeWC1         -4.200132349       1.586900162  -2.647
## customer_postcodeWC2         -1.237348131       1.565318983  -0.790
## customer_postcodeWD7         -6.053515505       2.929804276  -2.066
## store_postcodeE2             62.688511590       0.823049015  76.166
## store_postcodeE7              8.884542333       1.226220363   7.245
## store_postcodeKT2            50.789480442       1.154384666  43.997
## store_postcodeN17            61.360994597       1.123124897  54.634
## store_postcodeN3              5.390095914       1.338978964   4.026
## store_postcodeNW5            57.904851803       0.837713208  69.123
## store_postcodeS1             42.367083416       3.848933764  11.007
## store_postcodeSE1            60.159395983       0.804434204  74.785
## store_postcodeSE8            59.323917920       0.780396468  76.018
## store_postcodeSW1            27.216986358       0.792463877  34.345
## store_postcodeSW12           50.984289921       0.717706624  71.038
## store_postcodeSW18           49.733077113       0.804627940  61.809
## store_postcodeW10            53.415825929       0.898440261  59.454
## store_postcodeW4             -2.587948895       1.072437416  -2.413
## screen_size_inches           57.265004769       0.090051725 635.912
## battery_life_hours           47.710596109       0.105830830 450.819
## ram_gb                       57.216536916       0.075775178 755.083
## processor_speeds_g_hz        68.109656821       0.233949882 291.129
## integrated_wirelessYes       18.921113434       0.169188356 111.835
## hd_size_gb                    0.406273406       0.000875954 463.807
## bundled_applicationsYes      48.010264485       0.170108242 282.234
## year                                   NA                NA      NA
## quarter                      54.746166232       0.328711146 166.548
## month                      -267.660899519       7.523931839 -35.575
## day                          -7.993812607       0.245199469 -32.601
## hour                         -0.352550258       0.026628049 -13.240
## minute                       -0.008193244       0.004881277  -1.679
## am.pm                         0.018031195       0.339249327   0.053
## wday                          0.112227161       0.043392830   2.586
## week                         -1.192638205       0.305121510  -3.909
##                                     Pr(>|t|)    
## (Intercept)             < 0.0000000000000002 ***
## purchase_date           < 0.0000000000000002 ***
## customer_postcodeCR0                0.682246    
## customer_postcodeCR7                0.190086    
## customer_postcodeE1     0.000000000062909225 ***
## customer_postcodeE10                0.001498 ** 
## customer_postcodeE11    0.000002178743087891 ***
## customer_postcodeE12    0.000064434500925781 ***
## customer_postcodeE13                0.002588 ** 
## customer_postcodeE14    0.000011913189965406 ***
## customer_postcodeE15                0.004599 ** 
## customer_postcodeE16    0.000092902742180550 ***
## customer_postcodeE17    0.000001824742901921 ***
## customer_postcodeE2     0.000000093732793992 ***
## customer_postcodeE3     0.000039287244463509 ***
## customer_postcodeE4                 0.174515    
## customer_postcodeE6     0.000012560339857785 ***
## customer_postcodeE7     0.000032962751444813 ***
## customer_postcodeE8     0.000015383315184929 ***
## customer_postcodeE9                 0.000203 ***
## customer_postcodeEC1    0.000000000000000306 ***
## customer_postcodeEC2    < 0.0000000000000002 ***
## customer_postcodeEC3    < 0.0000000000000002 ***
## customer_postcodeEC4                0.000267 ***
## customer_postcodeHA5                0.510122    
## customer_postcodeIG11               0.000452 ***
## customer_postcodeIG2                0.000302 ***
## customer_postcodeKT1                0.360692    
## customer_postcodeKT2                0.153854    
## customer_postcodeKT4                0.216402    
## customer_postcodeKT6                0.210571    
## customer_postcodeKT7                0.230448    
## customer_postcodeN1     0.000000000011108486 ***
## customer_postcodeN10                0.005534 ** 
## customer_postcodeN13                0.002230 ** 
## customer_postcodeN16                0.000795 ***
## customer_postcodeN17                0.008097 ** 
## customer_postcodeN19                0.000727 ***
## customer_postcodeN2                 0.051442 .  
## customer_postcodeN20                0.044895 *  
## customer_postcodeN22                0.018204 *  
## customer_postcodeN3                 0.340045    
## customer_postcodeN4                 0.001812 ** 
## customer_postcodeN5     0.000014770572576426 ***
## customer_postcodeN6                 0.389259    
## customer_postcodeN7     0.000000000005169759 ***
## customer_postcodeNW1                0.003186 ** 
## customer_postcodeNW10               0.994019    
## customer_postcodeNW3                0.028882 *  
## customer_postcodeNW5    0.000010783958462815 ***
## customer_postcodeNW6                0.176215    
## customer_postcodeNW8                0.000199 ***
## customer_postcodeRM10               0.000530 ***
## customer_postcodeSE1                0.000510 ***
## customer_postcodeSE10   0.000001367684852479 ***
## customer_postcodeSE11               0.155315    
## customer_postcodeSE13               0.000268 ***
## customer_postcodeSE14   0.000010876953703863 ***
## customer_postcodeSE15   0.000000000636937176 ***
## customer_postcodeSE16   0.000002535047732810 ***
## customer_postcodeSE17               0.172663    
## customer_postcodeSE19               0.360318    
## customer_postcodeSE20               0.505466    
## customer_postcodeSE21               0.778334    
## customer_postcodeSE22               0.052493 .  
## customer_postcodeSE23               0.490117    
## customer_postcodeSE24               0.091871 .  
## customer_postcodeSE25               0.251547    
## customer_postcodeSE26               0.431206    
## customer_postcodeSE27               0.058706 .  
## customer_postcodeSE3                0.000491 ***
## customer_postcodeSE4                0.002684 ** 
## customer_postcodeSE5                0.000234 ***
## customer_postcodeSE6                0.030936 *  
## customer_postcodeSE7                0.012246 *  
## customer_postcodeSE8    0.000002471699023579 ***
## customer_postcodeSW1                0.918655    
## customer_postcodeSW10               0.745475    
## customer_postcodeSW11   0.000000007500135039 ***
## customer_postcodeSW12   0.000003137274777136 ***
## customer_postcodeSW13               0.951991    
## customer_postcodeSW14               0.580389    
## customer_postcodeSW15               0.596681    
## customer_postcodeSW16               0.599615    
## customer_postcodeSW17               0.318455    
## customer_postcodeSW18   0.000053274525998420 ***
## customer_postcodeSW19               0.133080    
## customer_postcodeSW2                0.000324 ***
## customer_postcodeSW3                0.797999    
## customer_postcodeSW4    0.000007706053251386 ***
## customer_postcodeSW5    0.000002696994605618 ***
## customer_postcodeSW6                0.001513 ** 
## customer_postcodeSW7                0.203751    
## customer_postcodeSW8                0.572058    
## customer_postcodeSW9                0.311510    
## customer_postcodeTW1                0.798014    
## customer_postcodeTW10               0.122931    
## customer_postcodeTW7                0.801558    
## customer_postcodeTW8                0.950011    
## customer_postcodeTW9                0.140024    
## customer_postcodeW1                 0.615801    
## customer_postcodeW10                0.837201    
## customer_postcodeW11                0.194555    
## customer_postcodeW12                0.843761    
## customer_postcodeW14                0.002698 ** 
## customer_postcodeW2                 0.437481    
## customer_postcodeW3                 0.292971    
## customer_postcodeW4                 0.241099    
## customer_postcodeW6                 0.448338    
## customer_postcodeW7                 0.968691    
## customer_postcodeW8                 0.683227    
## customer_postcodeW9     0.000018827481752372 ***
## customer_postcodeWC1                0.008127 ** 
## customer_postcodeWC2                0.429250    
## customer_postcodeWD7                0.038812 *  
## store_postcodeE2        < 0.0000000000000002 ***
## store_postcodeE7        0.000000000000432031 ***
## store_postcodeKT2       < 0.0000000000000002 ***
## store_postcodeN17       < 0.0000000000000002 ***
## store_postcodeN3        0.000056862609626565 ***
## store_postcodeNW5       < 0.0000000000000002 ***
## store_postcodeS1        < 0.0000000000000002 ***
## store_postcodeSE1       < 0.0000000000000002 ***
## store_postcodeSE8       < 0.0000000000000002 ***
## store_postcodeSW1       < 0.0000000000000002 ***
## store_postcodeSW12      < 0.0000000000000002 ***
## store_postcodeSW18      < 0.0000000000000002 ***
## store_postcodeW10       < 0.0000000000000002 ***
## store_postcodeW4                    0.015816 *  
## screen_size_inches      < 0.0000000000000002 ***
## battery_life_hours      < 0.0000000000000002 ***
## ram_gb                  < 0.0000000000000002 ***
## processor_speeds_g_hz   < 0.0000000000000002 ***
## integrated_wirelessYes  < 0.0000000000000002 ***
## hd_size_gb              < 0.0000000000000002 ***
## bundled_applicationsYes < 0.0000000000000002 ***
## year                                      NA    
## quarter                 < 0.0000000000000002 ***
## month                   < 0.0000000000000002 ***
## day                     < 0.0000000000000002 ***
## hour                    < 0.0000000000000002 ***
## minute                              0.093250 .  
## am.pm                               0.957612    
## wday                                0.009702 ** 
## week                    0.000092803350079581 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 45.03 on 283785 degrees of freedom
##   (13643 observations deleted due to missingness)
## Multiple R-squared:  0.8148, Adjusted R-squared:  0.8147 
## F-statistic:  8732 on 143 and 283785 DF,  p-value: < 0.00000000000000022