How To Filter Multiple Names In R
Filtering Data with dplyr
Filtering data is i of the very basic performance when you work with data. You want to remove a part of the data that is invalid or simply you're not interested in. Or, you want to zero in on a detail function of the data yous desire to know more well-nigh. Of course, dplyr has 'filter()' function to do such filtering, merely there is even more. With dplyr you can practice the kind of filtering, which could be hard to perform or complicated to construct with tools like SQL and traditional BI tools, in such a simple and more intuitive way.
Let'due south brainstorm with some uncomplicated ones. Again, I'll use the same flight data I accept imported in the previous post.
Select columns
First, allow'south select columns that are interesting for now. If yous want to know more about 'how to select columns' please check this post I have written earlier.
library(dplyr) flight %>%
select(FL_DATE, CARRIER, ORIGIN, ORIGIN_CITY_NAME, ORIGIN_STATE_ABR, DEP_DELAY, DEP_TIME, ARR_DELAY, ARR_TIME)
Filter with a value
Let's say you want to see only the flights of United Airline (UA). You can run something like below.
flight %>%
select(FL_DATE, CARRIER, ORIGIN, ORIGIN_CITY_NAME, ORIGIN_STATE_ABR, DEP_DELAY, DEP_TIME, ARR_DELAY, ARR_TIME) %>%
filter(CARRIER == "UA")
If you desire to utilise 'equal' operator you need to have two '=' (equal sign) together similar above. If you run the above you'll see something like below.
And at present, let's observe the flights that are of United Airline (UA) and left San Francisco airport (SFO). You lot tin can apply '&' operator as AND and '|' operator as OR to connect multiple filter weather. This fourth dimension we'll apply '&'.
flight %>%
select(FL_DATE, CARRIER, ORIGIN, ORIGIN_CITY_NAME, ORIGIN_STATE_ABR, DEP_DELAY, DEP_TIME, ARR_DELAY, ARR_TIME) %>%
filter(CARRIER == "UA" & ORIGIN == "SFO")
Or, you might want to see simply the flights that left San Francisco airport (SFO) but are not of United Airline (UA). You can use '!=' operator as 'not equal'.
flight %>%
select(FL_DATE, CARRIER, ORIGIN, ORIGIN_CITY_NAME, ORIGIN_STATE_ABR, DEP_DELAY, DEP_TIME, ARR_DELAY, ARR_TIME) %>%
filter(CARRIER != "UA" & ORIGIN == "SFO")
Filtering with multiple values
What if you desire to run across only the data for the flights that are of either United Airline (UA) or American Airline (AA) ? Yous can apply '%in%' for this, just like the IN operator in SQL.
flight %>%
select(FL_DATE, CARRIER, ORIGIN, ORIGIN_CITY_NAME, ORIGIN_STATE_ABR, DEP_DELAY, DEP_TIME, ARR_DELAY, ARR_TIME) %>%
filter(CARRIER %in% c("UA", "AA"))
We can't really tell if information technology's working or not by looking at the first 10 rows. Allow's run count() function to summarize this quickly.
flying %>%
select(FL_DATE, CARRIER, ORIGIN, ORIGIN_CITY_NAME, ORIGIN_STATE_ABR, DEP_DELAY, DEP_TIME, ARR_DELAY, ARR_TIME) %>%
filter(CARRIER %in% c("UA", "AA")) %>%
count(CARRIER)
We tin see just AA and UA as we expected. And yes, I know, this 'count()' function is amazing. It literally does what you would intuitively imagine. It returns the number of the rows for each specified grouping, in this case that is CARRIER. We could have done this by using 'group_by()' and 'summarize()' functions, but for something similar this unproblematic 'count()' part lonely does the chore in such a quick way.
Reverse the condition logic
What if you want to see the flight that are not United Airline (UA) and American Airline (AA) this time ? Information technology'southward really very simple with R and dplyr. Here's a magic one letter you can use with any status to reverse the outcome. Information technology's '!' (exclamation marker). And, information technology goes like this.
flight %>%
select(FL_DATE, CARRIER, ORIGIN, ORIGIN_CITY_NAME, ORIGIN_STATE_ABR, DEP_DELAY, DEP_TIME, ARR_DELAY, ARR_TIME) %>%
filter(!CARRIER %in% c("UA", "AA")) %>%
count(CARRIER)
Notice that there is the assertion mark at the beginning of the condition inside the filter() office. This is a very handy 'function' that basically flips the issue of the status that is after the exclamation mark. This is why the event to a higher place doesn't include 'UA' nor 'AA'. It might look a scrap weird until y'all get used to it especially if you lot're coming from outside of R earth, merely you are going to see this a lot and will appreciate its power and convenience.
Filtering out NA values
Now, let's get dorsum to the original information again.
flying %>%
select(FL_DATE, CARRIER, ORIGIN, ORIGIN_CITY_NAME, ORIGIN_STATE_ABR, DEP_DELAY, DEP_TIME, ARR_DELAY, ARR_TIME)
When you look closer you'd notice that there are some NA values in ARR_DELAY cavalcade. You tin can become rid of them hands with 'is.na()' office, which would return TRUE if the value is NA and FALSE otherwise.
flight %>%
select(FL_DATE, CARRIER, ORIGIN, ORIGIN_CITY_NAME, ORIGIN_STATE_ABR, DEP_DELAY, DEP_TIME, ARR_DELAY, ARR_TIME) %>%
filter(is.na(ARR_DELAY))
Oops, it looks like all the values in ARR_DELAY are now NA, which is reverse of what I hoped. Well, as you saw already nosotros tin now try the '!' (exclamation marker) function again similar beneath.
flying %>%
select(FL_DATE, CARRIER, ORIGIN, ORIGIN_CITY_NAME, ORIGIN_STATE_ABR, DEP_DELAY, DEP_TIME, ARR_DELAY, ARR_TIME) %>%
filter(!is.na(ARR_DELAY))
This is how you tin can work with NA values in terms of filtering the data.
This is the bones of how 'filter' works with dplyr. Merely this is just the beginning. You can do a lot more by combining with amass, window, cord/text, and date functions, which I'm going to cover at the adjacent post. Stay tuned!
How To Filter Multiple Names In R,
Source: https://blog.exploratory.io/filter-data-with-dplyr-76cf5f1a258e
Posted by: stewartasher1959.blogspot.com
0 Response to "How To Filter Multiple Names In R"
Post a Comment