Chapter 4 Missing values

4.1 missing value by column:

##        Easy Apply       Competitors           Revenue 
##             14746             11177              4965 
##           Founded          Industry            Sector 
##              4004              2174              2171 
##            Rating              Size Type of ownership 
##              1515              1204              1031 
##      Headquarters     Low_salary(k)    High_salary(k) 
##               931                25                25 
##     Avg_salary(k)   Name of Company         Job Title 
##                25                15                13 
##   Job Description          Location         file_Type 
##                13                13                13 
##          job_type             State 
##                13                 0

For each columns, their number of missing values are show below. As we can see from the form, most of the values under “easy apply” and “competitor” labels are missing. That means we should avoid to investigate the relation of salary and these two features. Among all features, “Sectors”, “Industry”, and “number of employees” are least missing values. Thus, we want to investigate more about relations of these features and salary level.

4.2 display first ten company

The following missing pattern graph shows the pattern of missing values. Blue represents data that are missing, while gray represents data that are not missing. We can find out from the map that “revenue”, “competitors”, and “year founded” have most missing value in the dataset. Besides this, we also find out that there is no correlation between features that are missing, that is, no feature leads to other features’ data missing. Therefore, we can investigate each feature without being intervened by other feature’s missing data.

4.3 use mi library draw heatmap and check missing value

This heat map gives the missing date of each features. As we can see from the map, “competitors”, “revenues” and “founded years” are missing in a massive amount. So we avoid to do studies on these data.

## NOTE: The following pairs of variables appear to have the same missingness pattern.
##  Please verify whether they are in fact logically distinct variables.
##       [,1]              [,2]             
##  [1,] "Job Title"       "Job Description"
##  [2,] "Job Title"       "Location"       
##  [3,] "Job Title"       "file_Type"      
##  [4,] "Job Title"       "job_type"       
##  [5,] "Job Description" "Location"       
##  [6,] "Job Description" "file_Type"      
##  [7,] "Job Description" "job_type"       
##  [8,] "Location"        "file_Type"      
##  [9,] "Location"        "job_type"       
## [10,] "file_Type"       "job_type"       
## [11,] "Low_salary(k)"   "High_salary(k)" 
## [12,] "Low_salary(k)"   "Avg_salary(k)"  
## [13,] "High_salary(k)"  "Avg_salary(k)"

This table displays 63 missing patterns.

##  [1] "nothing"                                                                                                                                                                                                                 
##  [2] "Competitors"                                                                                                                                                                                                             
##  [3] "Revenue"                                                                                                                                                                                                                 
##  [4] "Founded"                                                                                                                                                                                                                 
##  [5] "Revenue, Competitors"                                                                                                                                                                                                    
##  [6] "Founded, Competitors"                                                                                                                                                                                                    
##  [7] "Size, Competitors"                                                                                                                                                                                                       
##  [8] "Industry, Sector"                                                                                                                                                                                                        
##  [9] "Founded, Revenue"                                                                                                                                                                                                        
## [10] "Type.of.ownership, Competitors"                                                                                                                                                                                          
## [11] "Rating, Competitors"                                                                                                                                                                                                     
## [12] "Size, Revenue"                                                                                                                                                                                                           
## [13] "Founded, Revenue, Competitors"                                                                                                                                                                                           
## [14] "Industry, Sector, Competitors"                                                                                                                                                                                           
## [15] "Size, Founded, Competitors"                                                                                                                                                                                              
## [16] "Rating, Founded, Competitors"                                                                                                                                                                                            
## [17] "Rating, Revenue, Competitors"                                                                                                                                                                                            
## [18] "Size, Revenue, Competitors"                                                                                                                                                                                              
## [19] "Type.of.ownership, Revenue, Competitors"                                                                                                                                                                                 
## [20] "Founded, Type.of.ownership, Competitors"                                                                                                                                                                                 
## [21] "Founded, Industry, Sector, Competitors"                                                                                                                                                                                  
## [22] "Size, Founded, Revenue, Competitors"                                                                                                                                                                                     
## [23] "Rating, Industry, Sector, Competitors"                                                                                                                                                                                   
## [24] "Rating, Founded, Revenue, Competitors"                                                                                                                                                                                   
## [25] "Type.of.ownership, Industry, Sector, Competitors"                                                                                                                                                                        
## [26] "Industry, Sector, Revenue, Competitors"                                                                                                                                                                                  
## [27] "Headquarters, Founded, Revenue, Competitors"                                                                                                                                                                             
## [28] "Founded, Type.of.ownership, Revenue, Competitors"                                                                                                                                                                        
## [29] "Competitors, Low_salary.k., High_salary.k., Avg_salary.k."                                                                                                                                                               
## [30] "Rating, Headquarters, Revenue, Competitors"                                                                                                                                                                              
## [31] "Rating, Type.of.ownership, Revenue, Competitors"                                                                                                                                                                         
## [32] "Type.of.ownership, Industry, Revenue, Competitors"                                                                                                                                                                       
## [33] "Founded, Industry, Sector, Revenue, Competitors"                                                                                                                                                                         
## [34] "Rating, Founded, Industry, Sector, Competitors"                                                                                                                                                                          
## [35] "Rating, Founded, Type.of.ownership, Revenue, Competitors"                                                                                                                                                                
## [36] "Rating, Size, Founded, Revenue, Competitors"                                                                                                                                                                             
## [37] "Rating, Industry, Sector, Revenue, Competitors"                                                                                                                                                                          
## [38] "Founded, Type.of.ownership, Industry, Sector, Competitors"                                                                                                                                                               
## [39] "Size, Founded, Type.of.ownership, Revenue, Competitors"                                                                                                                                                                  
## [40] "Type.of.ownership, Industry, Sector, Revenue, Competitors"                                                                                                                                                               
## [41] "Founded, Competitors, Low_salary.k., High_salary.k., Avg_salary.k."                                                                                                                                                      
## [42] "Revenue, Competitors, Low_salary.k., High_salary.k., Avg_salary.k."                                                                                                                                                      
## [43] "Size, Competitors, Low_salary.k., High_salary.k., Avg_salary.k."                                                                                                                                                         
## [44] "Headquarters, Size, Founded, Revenue, Competitors"                                                                                                                                                                       
## [45] "Rating, Size, Type.of.ownership, Revenue, Competitors"                                                                                                                                                                   
## [46] "Rating, Founded, Industry, Sector, Revenue, Competitors"                                                                                                                                                                 
## [47] "Size, Founded, Industry, Sector, Revenue, Competitors"                                                                                                                                                                   
## [48] "Rating, Founded, Type.of.ownership, Industry, Sector, Competitors"                                                                                                                                                       
## [49] "Founded, Type.of.ownership, Industry, Sector, Revenue, Competitors"                                                                                                                                                      
## [50] "Headquarters, Founded, Industry, Sector, Revenue, Competitors"                                                                                                                                                           
## [51] "Rating, Type.of.ownership, Industry, Sector, Revenue, Competitors"                                                                                                                                                       
## [52] "Rating, Size, Founded, Industry, Sector, Revenue, Competitors"                                                                                                                                                           
## [53] "Rating, Founded, Type.of.ownership, Industry, Sector, Revenue, Competitors"                                                                                                                                              
## [54] "Headquarters, Size, Founded, Industry, Sector, Revenue, Competitors"                                                                                                                                                     
## [55] "Size, Founded, Type.of.ownership, Industry, Sector, Revenue, Competitors"                                                                                                                                                
## [56] "Rating, Headquarters, Founded, Industry, Sector, Revenue, Competitors"                                                                                                                                                   
## [57] "Rating, Size, Founded, Type.of.ownership, Industry, Sector, Revenue, Competitors"                                                                                                                                        
## [58] "Rating, Headquarters, Founded, Type.of.ownership, Industry, Sector, Revenue, Competitors"                                                                                                                                
## [59] "Rating, Headquarters, Size, Founded, Industry, Sector, Revenue, Competitors"                                                                                                                                             
## [60] "Rating, Headquarters, Size, Founded, Type.of.ownership, Industry, Sector, Revenue, Competitors"                                                                                                                          
## [61] "Rating, Name.of.Company, Headquarters, Size, Founded, Type.of.ownership, Industry, Sector, Revenue, Competitors"                                                                                                         
## [62] "Rating, Headquarters, Size, Founded, Type.of.ownership, Industry, Sector, Revenue, Competitors, Low_salary.k., High_salary.k., Avg_salary.k."                                                                            
## [63] "Job.Title, Job.Description, Rating, Name.of.Company, Location, Headquarters, Size, Founded, Type.of.ownership, Industry, Sector, Revenue, Competitors, file_Type, job_type, Low_salary.k., High_salary.k., Avg_salary.k."

This table displays missing patterns and how many time it appears.

##                                                                                                                                                                                                                  nothing 
##                                                                                                                                                                                                                     3496 
##                                                                                                                                                                                                              Competitors 
##                                                                                                                                                                                                                     5160 
##                                                                                                                                                                                                                  Revenue 
##                                                                                                                                                                                                                      541 
##                                                                                                                                                                                                                  Founded 
##                                                                                                                                                                                                                       49 
##                                                                                                                                                                                                     Revenue, Competitors 
##                                                                                                                                                                                                                     1849 
##                                                                                                                                                                                                     Founded, Competitors 
##                                                                                                                                                                                                                     1194 
##                                                                                                                                                                                                        Size, Competitors 
##                                                                                                                                                                                                                        8 
##                                                                                                                                                                                                         Industry, Sector 
##                                                                                                                                                                                                                       68 
##                                                                                                                                                                                                         Founded, Revenue 
##                                                                                                                                                                                                                       10 
##                                                                                                                                                                                           Type.of.ownership, Competitors 
##                                                                                                                                                                                                                       22 
##                                                                                                                                                                                                      Rating, Competitors 
##                                                                                                                                                                                                                       28 
##                                                                                                                                                                                                            Size, Revenue 
##                                                                                                                                                                                                                        1 
##                                                                                                                                                                                            Founded, Revenue, Competitors 
##                                                                                                                                                                                                                      550 
##                                                                                                                                                                                            Industry, Sector, Competitors 
##                                                                                                                                                                                                                       31 
##                                                                                                                                                                                               Size, Founded, Competitors 
##                                                                                                                                                                                                                       22 
##                                                                                                                                                                                             Rating, Founded, Competitors 
##                                                                                                                                                                                                                       32 
##                                                                                                                                                                                             Rating, Revenue, Competitors 
##                                                                                                                                                                                                                       55 
##                                                                                                                                                                                               Size, Revenue, Competitors 
##                                                                                                                                                                                                                       13 
##                                                                                                                                                                                  Type.of.ownership, Revenue, Competitors 
##                                                                                                                                                                                                                        7 
##                                                                                                                                                                                  Founded, Type.of.ownership, Competitors 
##                                                                                                                                                                                                                        4 
##                                                                                                                                                                                   Founded, Industry, Sector, Competitors 
##                                                                                                                                                                                                                      200 
##                                                                                                                                                                                      Size, Founded, Revenue, Competitors 
##                                                                                                                                                                                                                       14 
##                                                                                                                                                                                    Rating, Industry, Sector, Competitors 
##                                                                                                                                                                                                                       10 
##                                                                                                                                                                                    Rating, Founded, Revenue, Competitors 
##                                                                                                                                                                                                                       68 
##                                                                                                                                                                         Type.of.ownership, Industry, Sector, Competitors 
##                                                                                                                                                                                                                        8 
##                                                                                                                                                                                   Industry, Sector, Revenue, Competitors 
##                                                                                                                                                                                                                       17 
##                                                                                                                                                                              Headquarters, Founded, Revenue, Competitors 
##                                                                                                                                                                                                                       12 
##                                                                                                                                                                         Founded, Type.of.ownership, Revenue, Competitors 
##                                                                                                                                                                                                                        5 
##                                                                                                                                                                Competitors, Low_salary.k., High_salary.k., Avg_salary.k. 
##                                                                                                                                                                                                                        5 
##                                                                                                                                                                               Rating, Headquarters, Revenue, Competitors 
##                                                                                                                                                                                                                        1 
##                                                                                                                                                                          Rating, Type.of.ownership, Revenue, Competitors 
##                                                                                                                                                                                                                        2 
##                                                                                                                                                                        Type.of.ownership, Industry, Revenue, Competitors 
##                                                                                                                                                                                                                        3 
##                                                                                                                                                                          Founded, Industry, Sector, Revenue, Competitors 
##                                                                                                                                                                                                                      384 
##                                                                                                                                                                           Rating, Founded, Industry, Sector, Competitors 
##                                                                                                                                                                                                                       23 
##                                                                                                                                                                 Rating, Founded, Type.of.ownership, Revenue, Competitors 
##                                                                                                                                                                                                                        3 
##                                                                                                                                                                              Rating, Size, Founded, Revenue, Competitors 
##                                                                                                                                                                                                                        6 
##                                                                                                                                                                           Rating, Industry, Sector, Revenue, Competitors 
##                                                                                                                                                                                                                        2 
##                                                                                                                                                                Founded, Type.of.ownership, Industry, Sector, Competitors 
##                                                                                                                                                                                                                       11 
##                                                                                                                                                                   Size, Founded, Type.of.ownership, Revenue, Competitors 
##                                                                                                                                                                                                                        2 
##                                                                                                                                                                Type.of.ownership, Industry, Sector, Revenue, Competitors 
##                                                                                                                                                                                                                        5 
##                                                                                                                                                       Founded, Competitors, Low_salary.k., High_salary.k., Avg_salary.k. 
##                                                                                                                                                                                                                        2 
##                                                                                                                                                       Revenue, Competitors, Low_salary.k., High_salary.k., Avg_salary.k. 
##                                                                                                                                                                                                                        2 
##                                                                                                                                                          Size, Competitors, Low_salary.k., High_salary.k., Avg_salary.k. 
##                                                                                                                                                                                                                        1 
##                                                                                                                                                                        Headquarters, Size, Founded, Revenue, Competitors 
##                                                                                                                                                                                                                        2 
##                                                                                                                                                                    Rating, Size, Type.of.ownership, Revenue, Competitors 
##                                                                                                                                                                                                                        2 
##                                                                                                                                                                  Rating, Founded, Industry, Sector, Revenue, Competitors 
##                                                                                                                                                                                                                      217 
##                                                                                                                                                                    Size, Founded, Industry, Sector, Revenue, Competitors 
##                                                                                                                                                                                                                       72 
##                                                                                                                                                        Rating, Founded, Type.of.ownership, Industry, Sector, Competitors 
##                                                                                                                                                                                                                        3 
##                                                                                                                                                       Founded, Type.of.ownership, Industry, Sector, Revenue, Competitors 
##                                                                                                                                                                                                                       23 
##                                                                                                                                                            Headquarters, Founded, Industry, Sector, Revenue, Competitors 
##                                                                                                                                                                                                                        5 
##                                                                                                                                                        Rating, Type.of.ownership, Industry, Sector, Revenue, Competitors 
##                                                                                                                                                                                                                        1 
##                                                                                                                                                            Rating, Size, Founded, Industry, Sector, Revenue, Competitors 
##                                                                                                                                                                                                                      138 
##                                                                                                                                               Rating, Founded, Type.of.ownership, Industry, Sector, Revenue, Competitors 
##                                                                                                                                                                                                                       25 
##                                                                                                                                                      Headquarters, Size, Founded, Industry, Sector, Revenue, Competitors 
##                                                                                                                                                                                                                       17 
##                                                                                                                                                 Size, Founded, Type.of.ownership, Industry, Sector, Revenue, Competitors 
##                                                                                                                                                                                                                       12 
##                                                                                                                                                    Rating, Headquarters, Founded, Industry, Sector, Revenue, Competitors 
##                                                                                                                                                                                                                        4 
##                                                                                                                                         Rating, Size, Founded, Type.of.ownership, Industry, Sector, Revenue, Competitors 
##                                                                                                                                                                                                                        5 
##                                                                                                                                 Rating, Headquarters, Founded, Type.of.ownership, Industry, Sector, Revenue, Competitors 
##                                                                                                                                                                                                                        1 
##                                                                                                                                              Rating, Headquarters, Size, Founded, Industry, Sector, Revenue, Competitors 
##                                                                                                                                                                                                                        2 
##                                                                                                                           Rating, Headquarters, Size, Founded, Type.of.ownership, Industry, Sector, Revenue, Competitors 
##                                                                                                                                                                                                                      870 
##                                                                                                          Rating, Name.of.Company, Headquarters, Size, Founded, Type.of.ownership, Industry, Sector, Revenue, Competitors 
##                                                                                                                                                                                                                        2 
##                                                                             Rating, Headquarters, Size, Founded, Type.of.ownership, Industry, Sector, Revenue, Competitors, Low_salary.k., High_salary.k., Avg_salary.k. 
##                                                                                                                                                                                                                        2 
## Job.Title, Job.Description, Rating, Name.of.Company, Location, Headquarters, Size, Founded, Type.of.ownership, Industry, Sector, Revenue, Competitors, file_Type, job_type, Low_salary.k., High_salary.k., Avg_salary.k. 
##                                                                                                                                                                                                                       13

4.4 Missing pattern plot

4.4.1 display with percentage

4.4.2 display with count number

The most frequent missing pattern is the pattern where ‘Easy.apply’ and ‘Competitors’ are both missing, while the other variables are not. Moreover, ‘Easy.apply’ and ‘Competitors’ are more likely to be missing in the dataset than the other variables. While other values are missing, the values of Company.name, index, Job.Description, Job.Title, Location, Salary.Estimate, and X are all presenting in the dataset.

The features Industry and Sector seem to be missing at the same time.

The feature Founded seems to be an independent feature, its representing doesn’t effect other feature’s status.

Complete cases only appears less than 100 rows while there are more than 2000 rows of data in the set.