Skip to content

Commit

Permalink
partial updates to filter_csv
Browse files Browse the repository at this point in the history
  • Loading branch information
zstumgoren committed Mar 8, 2024
1 parent 4677a8c commit 2839436
Show file tree
Hide file tree
Showing 2 changed files with 94 additions and 196 deletions.
229 changes: 57 additions & 172 deletions completed/filter_csv_notebook_complete.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,25 +11,21 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We're going to use built-in Python modules - programs really - to download a csv file from the Internet and save it locally.\n",
"So we know how to [download a CSV from the Interent and read the data](read_csv_notebook_complete.ipynb) using the `csv` library.\n",
"\n",
"CSV stands for comma-separated values. It's a common file format a file format that resembles a spreadsheet or database table in a text file.\n",
"Now we're ready to do something a bit more useful using these libraries, combined with the basics we learned on Day 1.\n",
"\n",
"So first, let's import two built-in Python modules: urllib and csv. \n",
"\n",
"* ```urllib``` is a module that allows Python to make http requests to URLs on the web to fetch HTML. It contains a submodule called request. And inside there we want a specific method called urlretrieve\n",
"\n",
"* ```csv``` is a module that helps Python work with tabular data extracted from spreadsheets and databases"
"Once again, we'll start by importing the library code for downloading a file and processing a CSV:"
]
},
{
"cell_type": "code",
"execution_count": 87,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from urllib.request import urlretrieve\n",
"import csv"
"import csv\n",
"from urllib.request import urlretrieve"
]
},
{
Expand All @@ -41,7 +37,7 @@
},
{
"cell_type": "code",
"execution_count": 88,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -52,35 +48,17 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we need a URL to a CSV file out on the Internet.\n",
"\n",
"For this project we're going to download a CSV file that the [FDIC](https://www.fdic.gov/bank/individual/failed/banklist.html) compiles of all the banks that have failed since October 1, 2000.\n",
"\n",
"The file we want is at https://s3.amazonaws.com/datanicar/banklist.csv.\n",
"Now we can download the file using `urlretrieve`. Don't forget that it takes two arguments:\n",
"\n",
"If the internet is uncooperative, we can also use the local version of the file in the ```project1/data/``` directory, and structure out code a little differently.\n",
"\n",
"To do this, we use that program within the ```urllib``` module to download the file and save it to our project folder. It's called ```urlretrieve``` and for our purposes starting out think of it as a way to download a file from the Internet.\n",
"\n",
"`urlretrieve` takes two arguments to download a file. First specify our target URL, and then we give it a name for the file we want to create."
"- the URL\n",
"- the name we'll give the file when we save it locally"
]
},
{
"cell_type": "code",
"execution_count": 89,
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('banklist.csv', <http.client.HTTPMessage at 0x110c740f0>)"
]
},
"execution_count": 89,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"urlretrieve(\"https://s3.amazonaws.com/datanicar/banklist.csv\", downloaded_file)"
]
Expand All @@ -89,170 +67,57 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The output shows we successfully downloaded the file and saved it\n",
"The output shows we successfully downloaded the file and saved it.\n",
"\n",
"For this exercise, we'll:\n",
"\n",
"- read the data\n",
"- filter it for banks in California\n",
"- save the California banks to a new CSV file\n",
"\n",
"Let's open a new file so we can filter just the data we want. We add the newline parameter when we open the file to write so it doesn't add [additional, blank rows on Windows machines](https://stackoverflow.com/questions/3348460/csv-file-written-with-python-has-blank-lines-between-each-row/3348664)."
"> NOTE: Below, we use the newline argument when we open the file to write so it doesn't add [additional, blank rows on Windows machines](https://stackoverflow.com/questions/3348460/csv-file-written-with-python-has-blank-lines-between-each-row/3348664)."
]
},
{
"cell_type": "code",
"execution_count": 90,
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"filtered_file = open('california_banks.csv', 'w', newline='')"
"ca_banks = open('california_banks.csv', 'w', newline='')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will use the writer method to write data to a file by passing in the name of the new file as the first argument and delimiter as the the second.\n",
"We'll use the `csv` library's [writer](https://docs.python.org/3/library/csv.html#csv.writer) functionality to...well...write data to a file by passing in the name of the new file as the first argument and delimiter as the second.\n",
"\n",
"Then we will go ahead and use python's csv reader to open the file and see what is inside.\n",
"Then we'll use `reader` to open the file and see what is inside.\n",
"\n",
"We specify the name of the file we just created, and we add a setting so we can open and read almost any CSV file."
]
},
{
"cell_type": "code",
"execution_count": 91,
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['Bank Name', 'City', 'ST', 'CERT', 'Acquiring Institution', 'Closing Date', 'Updated Date']\n",
"['Frontier Bank, FSB D/B/A El Paseo Bank', 'Palm Desert', 'CA', '34738', 'Bank of Southern California, N.A.', '7-Nov-14', '10-Nov-16']\n",
"<class 'list'>\n",
"7\n",
"['Palm Desert National Bank', 'Palm Desert', 'CA', '23632', 'Pacific Premier Bank', '27-Apr-12', '7-Dec-15']\n",
"<class 'list'>\n",
"7\n",
"['Citizens Bank of Northern California', 'Nevada City', 'CA', '33983', 'Tri Counties Bank', '23-Sep-11', '7-Jan-18']\n",
"<class 'list'>\n",
"7\n",
"['San Luis Trust Bank, FSB', 'San Luis Obispo', 'CA', '34783', 'First California Bank', '18-Feb-11', '12-Sep-16']\n",
"<class 'list'>\n",
"7\n",
"['Charter Oak Bank', 'Napa', 'CA', '57855', 'Bank of Marin', '18-Feb-11', '29-Jan-19']\n",
"<class 'list'>\n",
"7\n",
"['Canyon National Bank', 'Palm Springs', 'CA', '34692', 'Pacific Premier Bank', '11-Feb-11', '19-Aug-14']\n",
"<class 'list'>\n",
"7\n",
"['First Vietnamese American Bank', 'Westminster', 'CA', '57885', 'Grandpoint Bank', '5-Nov-10', '29-Jan-19']\n",
"<class 'list'>\n",
"7\n",
"['Western Commercial Bank', 'Woodland Hills', 'CA', '58087', 'First California Bank', '5-Nov-10', '12-Sep-16']\n",
"<class 'list'>\n",
"7\n",
"['Sonoma Valley Bank', 'Sonoma', 'CA', '27259', 'Westamerica Bank', '20-Aug-10', '8-Aug-18']\n",
"<class 'list'>\n",
"7\n",
"['Los Padres Bank', 'Solvang', 'CA', '32165', 'Pacific Western Bank', '20-Aug-10', '29-Jan-19']\n",
"<class 'list'>\n",
"7\n",
"['Butte Community Bank', 'Chico', 'CA', '33219', 'Rabobank, N.A.', '20-Aug-10', '29-Jan-19']\n",
"<class 'list'>\n",
"7\n",
"['Pacific State Bank', 'Stockton', 'CA', '27090', 'Rabobank, N.A.', '20-Aug-10', '29-Jan-19']\n",
"<class 'list'>\n",
"7\n",
"['Granite Community Bank, NA', 'Granite Bay', 'CA', '57315', 'Tri Counties Bank', '28-May-10', '7-Sep-17']\n",
"<class 'list'>\n",
"7\n",
"['1st Pacific Bank of California', 'San Diego', 'CA', '35517', 'City National Bank', '7-May-10', '31-Jan-19']\n",
"<class 'list'>\n",
"7\n",
"['Tamalpais Bank', 'San Rafael', 'CA', '33493', 'Union Bank, N.A.', '16-Apr-10', '31-Jan-19']\n",
"<class 'list'>\n",
"7\n",
"['Innovative Bank', 'Oakland', 'CA', '23876', 'Center Bank', '16-Apr-10', '31-Jan-19']\n",
"<class 'list'>\n",
"7\n",
"['La Jolla Bank, FSB', 'La Jolla', 'CA', '32423', 'OneWest Bank, FSB', '19-Feb-10', '31-Jan-19']\n",
"<class 'list'>\n",
"7\n",
"['First Regional Bank', 'Los Angeles', 'CA', '23011', 'First-Citizens Bank & Trust Company', '29-Jan-10', '31-Jan-19']\n",
"<class 'list'>\n",
"7\n",
"['First Federal Bank of California, F.S.B.', 'Santa Monica', 'CA', '28536', 'OneWest Bank, FSB', '18-Dec-09', '1-Feb-19']\n",
"<class 'list'>\n",
"7\n",
"['Imperial Capital Bank', 'La Jolla', 'CA', '26348', 'City National Bank', '18-Dec-09', '1-Feb-19']\n",
"<class 'list'>\n",
"7\n",
"['Pacific Coast National Bank', 'San Clemente', 'CA', '57914', 'Sunwest Bank', '13-Nov-09', '10-Apr-17']\n",
"<class 'list'>\n",
"7\n",
"['United Commercial Bank', 'San Francisco', 'CA', '32469', 'East West Bank', '6-Nov-09', '1-Feb-19']\n",
"<class 'list'>\n",
"7\n",
"['Pacific National Bank', 'San Francisco', 'CA', '30006', 'U.S. Bank N.A.', '30-Oct-09', '1-Feb-19']\n",
"<class 'list'>\n",
"7\n",
"['California National Bank', 'Los Angeles', 'CA', '34659', 'U.S. Bank N.A.', '30-Oct-09', '1-Feb-19']\n",
"<class 'list'>\n",
"7\n",
"['San Diego National Bank', 'San Diego', 'CA', '23594', 'U.S. Bank N.A.', '30-Oct-09', '1-Feb-19']\n",
"<class 'list'>\n",
"7\n",
"['San Joaquin Bank', 'Bakersfield', 'CA', '23266', 'Citizens Business Bank', '16-Oct-09', '1-Feb-19']\n",
"<class 'list'>\n",
"7\n",
"['Affinity Bank', 'Ventura', 'CA', '27197', 'Pacific Western Bank', '28-Aug-09', '1-Feb-19']\n",
"<class 'list'>\n",
"7\n",
"['Temecula Valley Bank', 'Temecula', 'CA', '34341', 'First-Citizens Bank & Trust Company', '17-Jul-09', '20-Oct-16']\n",
"<class 'list'>\n",
"7\n",
"['Vineyard Bank', 'Rancho Cucamonga', 'CA', '23556', 'California Bank & Trust', '17-Jul-09', '14-Sep-18']\n",
"<class 'list'>\n",
"7\n",
"['Mirae Bank', 'Los Angeles', 'CA', '57332', 'Wilshire State Bank', '26-Jun-09', '21-Feb-18']\n",
"<class 'list'>\n",
"7\n",
"['MetroPacific Bank', 'Irvine', 'CA', '57893', 'Sunwest Bank', '26-Jun-09', '5-Feb-15']\n",
"<class 'list'>\n",
"7\n",
"['First Bank of Beverly Hills', 'Calabasas', 'CA', '32069', 'No Acquirer', '24-Apr-09', '31-Jan-19']\n",
"<class 'list'>\n",
"7\n",
"['County Bank', 'Merced', 'CA', '22574', 'Westamerica Bank', '6-Feb-09', '1-Feb-19']\n",
"<class 'list'>\n",
"7\n",
"['Alliance Bank', 'Culver City', 'CA', '23124', 'California Bank & Trust', '6-Feb-09', '8-Aug-18']\n",
"<class 'list'>\n",
"7\n",
"['1st Centennial Bank', 'Redlands', 'CA', '33025', 'First California Bank', '23-Jan-09', '1-Feb-19']\n",
"<class 'list'>\n",
"7\n",
"['PFF Bank & Trust', 'Pomona', 'CA', '28344', 'U.S. Bank, N.A.', '21-Nov-08', '31-Jan-19']\n",
"<class 'list'>\n",
"7\n",
"['Downey Savings & Loan', 'Newport Beach', 'CA', '30968', 'U.S. Bank, N.A.', '21-Nov-08', '1-Feb-19']\n",
"<class 'list'>\n",
"7\n",
"['Security Pacific Bank', 'Los Angeles', 'CA', '23595', 'Pacific Western Bank', '7-Nov-08', '1-Feb-19']\n",
"<class 'list'>\n",
"7\n",
"['First Heritage Bank, NA', 'Newport Beach', 'CA', '57961', 'Mutual of Omaha Bank', '25-Jul-08', '12-Sep-16']\n",
"<class 'list'>\n",
"7\n",
"['IndyMac Bank', 'Pasadena', 'CA', '29730', 'OneWest Bank, FSB', '11-Jul-08', '1-Feb-19']\n",
"<class 'list'>\n",
"7\n",
"['Southern Pacific Bank', 'Torrance', 'CA', '27094', 'Beal Bank', '7-Feb-03', '20-Oct-08']\n",
"<class 'list'>\n",
"7\n"
"ename": "NameError",
"evalue": "name 'filtered_file' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[2], line 2\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;66;03m# create our output\u001b[39;00m\n\u001b[0;32m----> 2\u001b[0m output \u001b[38;5;241m=\u001b[39m csv\u001b[38;5;241m.\u001b[39mwriter(\u001b[43mfiltered_file\u001b[49m, delimiter\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124m,\u001b[39m\u001b[38;5;124m'\u001b[39m)\n\u001b[1;32m 4\u001b[0m \u001b[38;5;66;03m# open our downloaded file\u001b[39;00m\n\u001b[1;32m 5\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m \u001b[38;5;28mopen\u001b[39m(downloaded_file, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mr\u001b[39m\u001b[38;5;124m'\u001b[39m) \u001b[38;5;28;01mas\u001b[39;00m file:\n\u001b[1;32m 6\u001b[0m \n\u001b[1;32m 7\u001b[0m \u001b[38;5;66;03m# use python's csv reader to access the contents\u001b[39;00m\n\u001b[1;32m 8\u001b[0m \u001b[38;5;66;03m# and create an object that represents the data\u001b[39;00m\n",
"\u001b[0;31mNameError\u001b[0m: name 'filtered_file' is not defined"
]
}
],
"source": [
"# create our output\n",
"output = csv.writer(filtered_file, delimiter=',')\n",
"output = csv.writer(ca_banks, delimiter=',')\n",
"\n",
"# open our downloaded file\n",
"with open(downloaded_file, 'r') as file:\n",
Expand Down Expand Up @@ -293,12 +158,32 @@
"# close the output file\n",
"filtered_file.close()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Bonus exericse\n",
"\n",
"Ok, try modifying the code above to count the rows. You can do this by using some of the Python tricks we've already covered:\n",
"\n",
"- Create a `ca_banks_count` variable *before* the loop and set it's value to `0`\n",
"- Use the `+=` to increment the count every time we encounter a California bank.\n",
"- Add a `print` statement at the end of the cell to print `ca_banks_count`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -312,9 +197,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.0"
"version": "3.11.8"
}
},
"nbformat": 4,
"nbformat_minor": 1
"nbformat_minor": 4
}
Loading

0 comments on commit 2839436

Please sign in to comment.