Some tools for computing stats from a GTFS feed, assuming the feed is valid.
All time estimates below were produced on a 2013 MacBook Pro with a 2.8 GHz Intel Core i7 processor and 16GB of RAM running OS 10.9.2.
Bases: builtins.object
A class to gather all the GTFS files for a feed and store them in memory as Pandas data frames. Make sure you have enough memory! The stop times object can be big.
Into the given directory, dump to separate CSV files the outputs of
where each time series is resampled to the given frequency. Also include a README.txt file that contains a few notes on units and include some useful charts.
If no dates are given, then use self.get_first_week()[:5].
Return a chronologically ordered list of dates (datetime.date objects) for which this feed is valid.
Return a list of dates (datetime.date objects) of the first Monday–Sunday week for which this feed is valid. In the unlikely event that this feed does not cover a full Monday–Sunday week, then return whatever initial segment of the week it does cover.
Return a dictionary with structure shape_id -> Shapely linestring of shape in UTM coordinates. If self.shapes is None, then return None.
Take trips_stats, which is the output of self.get_trips_stats(), and use it to calculate stats for all the routes active on at least one day of the given dates (list of datetime.date objects).
Return a Pandas data frame with the following columns
If split_directions == False, then remove the direction_id column and compute each route’s stats based using its trips running in both directions. Note that this will give bidirectional headway stats, which most folks don’t find useful.
NOTES:
Takes about 0.2 minutes on the Portland feed given the first five weekdays of the feed.
Given trips_stats, which is the output of self.get_trips_stats(), return a time series version of the following route stats for the given dates:
The time series is a Pandas data frame with a period index for a 24-hour period sampled at the given frequency. The maximum allowable frequency is 1 minute. If multiples dates are given, a generic placeholder date of 2001-01-01 is used as the date for the period index. Otherwise, the given date is used.
The columns of the data frame are hierarchical (multi-index) with
If split_directions == False, then don’t include the bottom level.
NOTES:
To remove the placeholder date (2001-1-1) and seconds from the time series f, do f.index = [t.time().strftime('%H:%M') for t in f.index.to_datetime()]
Takes about 0.6 minutes on the Portland feed given the first five weekdays of the feed.
If this feed has station data, that is, ‘location_type’ and ‘parent_station’ columns in self.stops, then compute the same stats that self.get_stops_stats() does, but for stations. Otherwise, return None.
NOTES:
Takes about 0.2 minutes on the Portland feed given the first five weekdays of the feed.
Return a Pandas data frame with the columns
stop has stop times on this date (1) or not (0) ... - dates[-1]: ditto
If dates is None, then return None.
If this feed has station data, that is, ‘location_type’ and ‘parent_station’ columns in self.stops, then return a Pandas data frame that has the same columns as self.stops but only includes stops with parent stations, that is, stops with location type 0 or blank and nonblank parent station. Otherwise, return None.
Return a Pandas data frame with the following columns:
If split_directions == False, then compute each stop’s stats using vehicles visiting it from both directions.
NOTES:
Takes about 0.73 minutes on the Portland feed given the first five weekdays of the feed.
Return a time series version of the following stops stats for the given dates:
The time series is a Pandas data frame with a period index for a 24-hour period sampled at the given frequency. The maximum allowable frequency is 1 minute. If multiples dates are given, a generic placeholder date of 2001-01-01 is used as the date for the period index. Otherwise, the given date is used.
The columns of the data frame are hierarchical (multi-index) with
If split_directions == False, then don’t include the bottom level.
NOTES:
Return a Pandas data frame with the columns
trip is active (1) on the given date or inactive (0) ... - dates[-1]: ditto
If dates is None, then return None.
Return a Pandas data frame with the following columns:
NOTES:
Takes about 1 minute on the Portland feed.
Return a dictionary with structure stop_id -> stop location as a UTM coordinate pair
If the given trip (trip ID) is active on the given date (date object), then return True. Otherwise, return False. To avoid error checking in the interest of speed, assume trip is a valid trip ID in the feed and date is a valid date object.