Collecting, Analyzing, and Visualizing Data with Python - Part I

The Art of Analyzing Big Data - The Data Scientist’s Toolbox - Lecture 2

By Dr. Michael Fire


1. Collecting Data from Websites

Let's write code that easily can get the titles from the Guido van Rossum blog

In [1]:
import requests
u = "http://neopythonic.blogspot.com/"
s = requests.get(u).content.decode('utf-8')
s
Out[1]:
'<!DOCTYPE html>\n<html dir=\'ltr\' xmlns=\'http://www.w3.org/1999/xhtml\' xmlns:b=\'http://www.google.com/2005/gml/b\' xmlns:data=\'http://www.google.com/2005/gml/data\' xmlns:expr=\'http://www.google.com/2005/gml/expr\'>\n<head>\n<link href=\'https://www.blogger.com/static/v1/widgets/2549344219-widget_css_bundle.css\' rel=\'stylesheet\' type=\'text/css\'/>\n<link href=\'http://google-code-prettify.googlecode.com/svn/trunk/src/prettify.css\' rel=\'stylesheet\' type=\'text/css\'/>\n<script src=\'http://google-code-prettify.googlecode.com/svn/trunk/src/prettify.js\' type=\'text/javascript\'></script>\n<meta content=\'text/html; charset=UTF-8\' http-equiv=\'Content-Type\'/>\n<meta content=\'blogger\' name=\'generator\'/>\n<link href=\'http://neopythonic.blogspot.com/favicon.ico\' rel=\'icon\' type=\'image/x-icon\'/>\n<link href=\'http://neopythonic.blogspot.com/\' rel=\'canonical\'/>\n<link rel="alternate" type="application/atom+xml" title="Neopythonic - Atom" href="http://neopythonic.blogspot.com/feeds/posts/default" />\n<link rel="alternate" type="application/rss+xml" title="Neopythonic - RSS" href="http://neopythonic.blogspot.com/feeds/posts/default?alt=rss" />\n<link rel="service.post" type="application/atom+xml" title="Neopythonic - Atom" href="https://www.blogger.com/feeds/4195135246107166251/posts/default" />\n<link rel="me" href="https://www.blogger.com/profile/12821714508588242516" />\n<!--Can\'t find substitution for tag [blog.ieCssRetrofitLinks]-->\n<meta content=\'http://neopythonic.blogspot.com/\' property=\'og:url\'/>\n<meta content=\'Neopythonic\' property=\'og:title\'/>\n<meta content=\'Ramblings through technology, politics, culture and philosophy by the creator of the Python programming language.\' property=\'og:description\'/>\n<!--[if IE]> <script> (function() { var html5 = ("abbr,article,aside,audio,canvas,datalist,details," + "figure,footer,header,hgroup,mark,menu,meter,nav,output," + "progress,section,time,video").split(\',\'); for (var i = 0; i < html5.length; i++) { document.createElement(html5[i]); } try { document.execCommand(\'BackgroundImageCache\', false, true); } catch(e) {} })(); </script> <![endif]-->\n<title>Neopythonic</title>\n<style id=\'page-skin-1\' type=\'text/css\'><!--\n/*\n-----------------------------------------------\nBlogger Template Style\nName:     Dots\nDate:     24 Feb 2004\nUpdated by: Blogger Team\n----------------------------------------------- */\n\nbody {\nmargin: 0px 0px 0px 0px;\nbackground:#fff url("https://resources.blogblog.com/blogblog/data/dots/bg_dots.gif");\nbackground-position: 50% 31px;\ntext-align:center;\nfont:x-small Verdana, Arial, Sans-serif;\ncolor:#333333;\nfont-size/* */:/**/small;\nfont-size: /**/small;\n}\n/* Page Structure\n----------------------------------------------- */\n#outer-wrapper {\nbackground:url("https://resources.blogblog.com/blogblog/data/dots/bg_3dots.gif") no-repeat 250px 50px;\nwidth:700px;\nmargin:0 auto;\ntext-align:left;\nfont:normal normal 100% Verdana,Arial,Sans-Serif;\n}\n#header-wrapper {\ndisplay: none;\n}\n#main-wrapper {\nwidth:450px;\nfloat:right;\npadding:100px 0 20px;\nfont-size:85%;\nword-wrap: break-word; /* fix for long text breaking sidebar float in IE */\noverflow: hidden;     /* fix for long non-text content breaking IE sidebar float */\n}\n#main {\nbackground:url("https://resources.blogblog.com/blogblog/data/dots/bg_dots2.gif") -100px -100px;\npadding:20px 10px 15px;\n}\n#sidebar-wrapper {\nwidth:200px;\nfloat:left;\nfont-size:85%;\npadding-bottom:20px;\nword-wrap: break-word; /* fix for long text breaking sidebar float in IE */\noverflow: hidden;     /* fix for long non-text content breaking IE sidebar float */\n}\n#sidebar {\nbackground:url("https://resources.blogblog.com/blogblog/data/dots/bg_dots2.gif") 150px -50px;\npadding:40px 10px 15px;\nwidth:200px;\nwidth/* */:/**/180px;\nwidth: /**/180px;\n}\n/* Title & Description\n----------------------------------------------- */\n.Header h1 {\nmargin:0 0 .5em;\nline-height: 1.4em;\nfont: normal normal 250% Georgia,Serif;\ncolor: #335533;\n}\n.Header h1 a {\ncolor:#335533;\ntext-decoration:none;\n}\n.Header .description {\nmargin:0 0 1.75em;\ncolor: #999966;\nfont: normal normal 100% Verdana, Arial, Sans-Serif;\n}\n/* Links\n----------------------------------------------- */\na:link {\ncolor:#448888;\n}\na:visited {\ncolor:#888855;\n}\na:hover {\ncolor:#888855;\n}\na img {\nborder-width:0;\n}\n/* Posts\n----------------------------------------------- */\nh2.date-header {\nmargin:0 0 .75em;\npadding-bottom:.35em;\nborder-bottom:1px dotted #99bb99;\ntext-transform:uppercase;\nletter-spacing:.3em;\ncolor: #666633;\nfont: normal normal 95% Georgia, Serif;\n}\n.post {\nmargin:0 0 2.5em;\n}\n.post h3 {\nmargin:.25em 0;\nline-height: 1.4em;\nfont: normal normal 100% Georgia,Serif;\nfont-size: 130%;\nfont-weight: bold;\ncolor:#999966;\nbackground:url("https://resources.blogblog.com/blogblog/data/dots/bg_post_title_left.gif") no-repeat left .25em;\npadding:0 0 1px 45px;\n}\n.post h3 a {\ntext-decoration:none;\ncolor: #999966;\n}\n.post h3 a:hover {\ncolor: #333333;\n}\n.post-body {\nmargin:0 0 .75em;\nline-height:1.6em;\n}\n.post-body blockquote {\nline-height:1.3em;\n}\n.post-footer {\nmargin:0;\n}\n.uncustomized-post-template .post-footer {\ntext-align: right;\n}\n.uncustomized-post-template .post-author,\n.uncustomized-post-template .post-timestamp {\ndisplay: block;\nfloat: left;\nmargin-right: 4px;\ntext-align: left;\n}\n.post-author, .post-timestamp {\ncolor:#999966;\n}\na.comment-link {\n/* IE5.0/Win doesn\'t apply padding to inline elements,\nso we hide these two declarations from it */\nbackground/* */:/**/url("https://resources.blogblog.com/blogblog/data/dots/icon_comment_left.gif") no-repeat left .25em;\npadding-left:15px;\n}\nhtml>body a.comment-link {\n/* Respecified, for IE5/Mac\'s benefit */\nbackground:url("https://resources.blogblog.com/blogblog/data/dots/icon_comment_left.gif") no-repeat left .25em;\npadding-left:15px;\n}\n.post img, table.tr-caption-container {\nmargin:0 0 5px 0;\npadding:4px;\nborder:1px solid #99bb99;\n}\n.tr-caption-container img {\nborder: none;\nmargin: 0;\npadding: 0;\n}\n.feed-links {\nclear: both;\nline-height: 2.5em;\n}\n#blog-pager-newer-link {\nfloat: left;\n}\n#blog-pager-older-link {\nfloat: right;\n}\n#blog-pager {\ntext-align: center;\n}\n/* Comments\n----------------------------------------------- */\n#comments {\nmargin:0;\n}\n#comments h4 {\nmargin:0 0 10px;\nborder-top:1px dotted #99bb99;\npadding-top:.5em;\nline-height: 1.4em;\nfont: bold 110% Georgia,Serif;\ncolor:#333;\n}\n#comments-block {\nline-height:1.6em;\n}\n.comment-author {\nbackground:url("https://resources.blogblog.com/blogblog/data/dots/icon_comment_left.gif") no-repeat 2px .35em;\nmargin:.5em 0 0;\npadding-top: 0;\npadding-bottom:0;\npadding-left:20px;\npadding-right:20px;\nfont-weight:bold;\n}\n.comment-body {\nmargin:0;\npadding-top: 0;\npadding-bottom:0;\npadding-left:20px;\npadding-right:20px;\n}\n.comment-body p {\nmargin:0 0 .5em;\n}\n.comment-footer {\nmargin:0 0 .5em;\npadding:0 0 .75em 20px;\npadding-top: 0;\npadding-bottom:.75em;\npadding-left:20px;\npadding-right:0;\ncolor:#996;\n}\n.comment-footer a:link {\ncolor:#996;\n}\n.deleted-comment {\nfont-style:italic;\ncolor:gray;\n}\n/* More Sidebar Content\n----------------------------------------------- */\n.sidebar h2 {\nmargin:2em 0 .75em;\npadding-bottom:.35em;\nborder-bottom:1px dotted #99bb99;\nline-height: 1.4em;\nfont-size: 95%;\nfont: normal normal 100% Georgia,Serif;\ntext-transform:uppercase;\nletter-spacing:.3em;\ncolor:#666633;\n}\n.sidebar p {\nmargin:0 0 .75em;\nline-height:1.6em;\n}\n.sidebar ul {\nlist-style:none;\nmargin:.5em 0;\npadding:0 0px;\n}\n.sidebar .widget {\nmargin: .5em 0 1em;\npadding: 0 0px;\nline-height: 1.5em;\n}\n.main .widget {\npadding-bottom: 1em;\n}\n.sidebar ul li {\nbackground:url("https://resources.blogblog.com/blogblog/data/dots/bullet.gif") no-repeat 3px .45em;\nmargin:0;\npadding-top: 0;\npadding-bottom:5px;\npadding-left:15px;\npadding-right:0;\n}\n.sidebar p {\nmargin:0 0 .6em;\n}\n/* Profile\n----------------------------------------------- */\n.profile-datablock {\nmargin: 0 0 1em;\n}\n.profile-img {\nfloat: left;\nmargin-top: 0;\nmargin-bottom:5px;\nmargin-left:0;\nmargin-right:8px;\nborder: 4px solid #cc9;\n}\n.profile-data {\nmargin: 0;\nline-height: 1.5em;\n}\n.profile-textblock {\nclear: left;\nmargin-left: 0;\n}\n/* Footer\n----------------------------------------------- */\n#footer {\nclear:both;\npadding:15px 0 0;\n}\n#footer p {\nmargin:0;\n}\n/* Page structure tweaks for layout editor wireframe */\nbody#layout #sidebar, body#layout #main,\nbody#layout #main-wrapper,\nbody#layout #outer-wrapper,\nbody#layout #sidebar-wrapper {\npadding: 0;\n}\nbody#layout #sidebar, body#layout #sidebar-wrapper {\npadding: 0;\nwidth: 240px;\n}\n\n--></style>\n<link href=\'https://www.blogger.com/dyn-css/authorization.css?targetBlogID=4195135246107166251&amp;zx=3404a461-9e33-4ad1-8b2b-080af63b1f5a\' media=\'none\' onload=\'if(media!=&#39;all&#39;)media=&#39;all&#39;\' rel=\'stylesheet\'/><noscript><link href=\'https://www.blogger.com/dyn-css/authorization.css?targetBlogID=4195135246107166251&amp;zx=3404a461-9e33-4ad1-8b2b-080af63b1f5a\' rel=\'stylesheet\'/></noscript>\n\n</head>\n<body onload=\'prettyPrint()\'>\n<div class=\'navbar section\' id=\'navbar\'><div class=\'widget Navbar\' data-version=\'1\' id=\'Navbar1\'><script type="text/javascript">\n    function setAttributeOnload(object, attribute, val) {\n      if(window.addEventListener) {\n        window.addEventListener(\'load\',\n          function(){ object[attribute] = val; }, false);\n      } else {\n        window.attachEvent(\'onload\', function(){ object[attribute] = val; });\n      }\n    }\n  </script>\n<div id="navbar-iframe-container"></div>\n<script type="text/javascript" src="https://apis.google.com/js/plusone.js"></script>\n<script type="text/javascript">\n      gapi.load("gapi.iframes:gapi.iframes.style.bubble", function() {\n        if (gapi.iframes && gapi.iframes.getContext) {\n          gapi.iframes.getContext().openChild({\n              url: \'https://www.blogger.com/navbar.g?targetBlogID\\x3d4195135246107166251\\x26blogName\\x3dNeopythonic\\x26publishMode\\x3dPUBLISH_MODE_BLOGSPOT\\x26navbarType\\x3dBLUE\\x26layoutType\\x3dLAYOUTS\\x26searchRoot\\x3dhttps://neopythonic.blogspot.com/search\\x26blogLocale\\x3den\\x26v\\x3d2\\x26homepageUrl\\x3dhttp://neopythonic.blogspot.com/\\x26vt\\x3d5369531368964104765\',\n              where: document.getElementById("navbar-iframe-container"),\n              id: "navbar-iframe"\n          });\n        }\n      });\n    </script><script type="text/javascript">\n(function() {\nvar script = document.createElement(\'script\');\nscript.type = \'text/javascript\';\nscript.src = \'//pagead2.googlesyndication.com/pagead/js/google_top_exp.js\';\nvar head = document.getElementsByTagName(\'head\')[0];\nif (head) {\nhead.appendChild(script);\n}})();\n</script>\n</div></div>\n<div id=\'outer-wrapper\'><div id=\'wrap2\'>\n<!-- skip links for text browsers -->\n<span id=\'skiplinks\' style=\'display:none;\'>\n<a href=\'#main\'>skip to main </a> |\n      <a href=\'#sidebar\'>skip to sidebar</a>\n</span>\n<div id=\'content-wrapper\'>\n<div id=\'crosscol-wrapper\' style=\'text-align:center\'>\n<div class=\'crosscol no-items section\' id=\'crosscol\'></div>\n</div>\n<div id=\'main-wrapper\'>\n<div class=\'main section\' id=\'main\'><div class=\'widget Blog\' data-version=\'1\' id=\'Blog1\'>\n<div class=\'blog-posts hfeed\'>\n\n          <div class="date-outer">\n        \n<h2 class=\'date-header\'><span>Friday, March 15, 2019</span></h2>\n\n          <div class="date-posts">\n        \n<div class=\'post-outer\'>\n<div class=\'post hentry uncustomized-post-template\' itemprop=\'blogPost\' itemscope=\'itemscope\' itemtype=\'http://schema.org/BlogPosting\'>\n<meta content=\'4195135246107166251\' itemprop=\'blogId\'/>\n<meta content=\'775339472173253922\' itemprop=\'postId\'/>\n<a name=\'775339472173253922\'></a>\n<h3 class=\'post-title entry-title\' itemprop=\'name\'>\n<a href=\'http://neopythonic.blogspot.com/2019/03/why-operators-are-useful.html\'>Why operators are useful</a>\n</h3>\n<div class=\'post-header\'>\n<div class=\'post-header-line-1\'></div>\n</div>\n<div class=\'post-body entry-content\' id=\'post-body-775339472173253922\' itemprop=\'description articleBody\'>\nThis is something I posted on python-ideas, but I think it\'s interesting to a wider audience.<br />\n<br />\nThere\'s been a lot of discussion recently about an operator to merge two dicts.<br />\n<br />\nIt prompted me to think about the reason (some) people like operators, and a discussion I had with my mentor Lambert Meertens over 30 years ago came to mind.<br />\n<br />\nFor mathematicians, operators are essential to how they think. Take a simple operation like adding two numbers, and try exploring some of its behavior.<br />\n<br />\n&nbsp;&nbsp;&nbsp; add(x, y) == add(y, x)&nbsp;&nbsp;&nbsp; (1)<br />\n<br />\nEquation (1) expresses the law that addition is commutative. It\'s usually written using an operator, which makes it more concise:<br />\n<br />\n&nbsp;&nbsp;&nbsp; x + y == y + x&nbsp;&nbsp;&nbsp; (1a)<br />\n<br />\nThat feels like a minor gain.<br />\n<br />\nNow consider the associative law:<br />\n<br />\n&nbsp;&nbsp;&nbsp; add(x, add(y, z)) == add(add(x, y), z)&nbsp;&nbsp;&nbsp; (2)<br />\n<br />\nEquation (2) can be rewritten using operators:<br />\n<br />\n&nbsp;&nbsp;&nbsp; x + (y + z) == (x + y) + z&nbsp;&nbsp;&nbsp; (2a)<br />\n<br />\nThis is much less confusing than (2), and leads to the observation that the parentheses are redundant, so now we can write<br />\n<br />\n&nbsp;&nbsp;&nbsp; x + y + z&nbsp;&nbsp;&nbsp; (3)<br />\n<br />\nwithout ambiguity (it doesn\'t matter whether the + operator binds tighter to the left or to the right).<br />\n<br />\nMany other laws are also written more easily using operators.&nbsp; Here\'s one more example, about the identity element of addition:<br />\n<br />\n&nbsp;&nbsp;&nbsp; add(x, 0) == add(0, x) == x&nbsp;&nbsp;&nbsp; (4)<br />\n<br />\ncompare to<br />\n<br />\n&nbsp;&nbsp;&nbsp; x + 0 == 0 + x == x&nbsp;&nbsp;&nbsp; (4a)<br />\n<br />\nThe general idea here is that once you\'ve learned this simple notation, equations written using them are easier to *manipulate* than equations written using functional notation -- it is as if our brains grasp the operators using different brain machinery, and this is more efficient.<br />\n<br />\nI think that the fact that formulas written using operators are more easily processed *visually* has something to do with it: they engage the brain\'s visual processing machinery, which operates largely subconsciously, and tells the conscious part what it sees (e.g. "chair" rather than "pieces of wood joined together"). The functional notation must take a different path through our brain, which is less subconscious (it\'s related to reading and understanding what you read, which is learned/trained at a much later age than visual processing).<br />\n<br />\nThe power of visual processing really becomes apparent when you combine multiple operators. For example, consider the distributive law:<br />\n<br />\n&nbsp;&nbsp;&nbsp; mul(n, add(x, y)) == add(mul(n, x), mul(n, y))&nbsp; (5)<br />\n<br />\nThat was painful to write, and I believe that at first you won\'t see the pattern (or at least you wouldn\'t have immediately seen it if I hadn\'t mentioned this was the distributive law).<br />\n<br />\nCompare to:<br />\n<br />\n&nbsp;&nbsp;&nbsp; n * (x + y) == n * x + n * y&nbsp;&nbsp;&nbsp; (5a)<br />\n<br />\nNotice how this also uses relative operator priorities. Often mathematicians write this even more compact:<br />\n<br />\n&nbsp;&nbsp;&nbsp; n(x+y) == nx + ny&nbsp;&nbsp;&nbsp; (5b)<br />\n<br />\nbut alas, that currently goes beyond the capacities of Python\'s parser.<br />\n<br />\nAnother very powerful aspect of operator notation is that it is convenient to apply them to objects of different types. For example, laws (1) through (5) also work when x, y and z are same-size vectors and n is a scalar (substituting a vector of zeros for the literal "0"), and also if they are matrices (again, n has to be a scalar).<br />\n<br />\nAnd you can do this with objects in many different domains. For example, the above laws (1) through (5) apply to functions too (n being a scalar again).<br />\n<br />\nBy choosing the operators wisely, mathematicians can employ their visual brain to help them do math better: they\'ll discover new interesting laws sooner because sometimes the symbols on the blackboard just jump at you and suggest a path to an elusive proof.<br />\n<br />\nNow, programming isn\'t exactly the same activity as math, but we all know that Readability Counts, and this is where operator overloading in Python comes in. Once you\'ve internalized the simple properties which operators tend to have, using + for string or list concatenation becomes more readable than a pure OO notation, and (2) and (3) above explain (in part) why that is.<br />\n<br />\nOf course, it\'s definitely possible to overdo this -- then you get Perl. But I think that the folks who point out "there is already a way to do this" are missing the point that it really is easier to grasp the meaning of this:<br />\n<br />\n&nbsp;&nbsp;&nbsp; d = d1 + d2<br />\n<br />\ncompared to this:<br />\n<br />\n&nbsp;&nbsp;&nbsp; d = d1.copy()<br />\n&nbsp;&nbsp;&nbsp; d.update(d2)&nbsp;&nbsp;&nbsp; # CORRECTED: This line was previously wrong<br />\n<br />\nand it is not just a matter of fewer lines of code: the first form allows us to use our visual processing to help us see the meaning quicker -- and without distracting other parts of our brain (which might already be occupied by keeping track of the meaning of d1 and d2, for example).<br />\n<br />\nOf course, everything comes at a price. You have to learn the operators, and you have to learn their properties when applied to different object types. (This is true in math too -- for numbers, x*y == y*x, but this property does not apply to functions or matrices; OTOH x+y == y+x applies to all, as does the associative law.)<br />\n<br />\n"But what about performance?" I hear you ask. Good question. IMO, readability comes first, performance second. And in the basic example (d = d1 + d2) there is no performance loss compared to the two-line version using update, and a clear win in readability. I can think of many situations where performance difference is irrelevant but readability is of utmost importance, and for me this is the default assumption (even at Dropbox -- our most performance critical code has already been rewritten in ugly Python or in Go). For the few cases where performance concerns are paramount, it\'s easy to transform the operator version to something else -- *once you\'ve confirmed it\'s needed* (probably by profiling).\n<div style=\'clear: both;\'></div>\n</div>\n<div class=\'post-footer\'>\n<div class=\'post-footer-line post-footer-line-1\'>\n<span class=\'post-author vcard\'>\nPosted by\n<span class=\'fn\' itemprop=\'author\' itemscope=\'itemscope\' itemtype=\'http://schema.org/Person\'>\n<meta content=\'https://www.blogger.com/profile/12821714508588242516\' itemprop=\'url\'/>\n<a class=\'g-profile\' href=\'https://www.blogger.com/profile/12821714508588242516\' rel=\'author\' title=\'author profile\'>\n<span itemprop=\'name\'>Guido van Rossum</span>\n</a>\n</span>\n</span>\n<span class=\'post-timestamp\'>\nat\n<meta content=\'http://neopythonic.blogspot.com/2019/03/why-operators-are-useful.html\' itemprop=\'url\'/>\n<a class=\'timestamp-link\' href=\'http://neopythonic.blogspot.com/2019/03/why-operators-are-useful.html\' rel=\'bookmark\' title=\'permanent link\'><abbr class=\'published\' itemprop=\'datePublished\' title=\'2019-03-15T10:58:00-07:00\'>10:58 AM</abbr></a>\n</span>\n<span class=\'reaction-buttons\'>\n</span>\n<span class=\'post-comment-link\'>\n<a class=\'comment-link\' href=\'https://www.blogger.com/comment.g?blogID=4195135246107166251&postID=775339472173253922\' onclick=\'\'>\nNo comments:\n    </a>\n</span>\n<span class=\'post-backlinks post-comment-link\'>\n</span>\n<span class=\'post-icons\'>\n<span class=\'item-control blog-admin pid-1774424698\'>\n<a href=\'https://www.blogger.com/post-edit.g?blogID=4195135246107166251&postID=775339472173253922&from=pencil\' title=\'Edit Post\'>\n<img alt=\'\' class=\'icon-action\' height=\'18\' src=\'https://resources.blogblog.com/img/icon18_edit_allbkg.gif\' width=\'18\'/>\n</a>\n</span>\n</span>\n<div class=\'post-share-buttons goog-inline-block\'>\n</div>\n</div>\n<div class=\'post-footer-line post-footer-line-2\'>\n<span class=\'post-labels\'>\n</span>\n</div>\n<div class=\'post-footer-line post-footer-line-3\'>\n<span class=\'post-location\'>\n</span>\n</div>\n</div>\n</div>\n</div>\n\n          </div></div>\n        \n\n          <div class="date-outer">\n        \n<h2 class=\'date-header\'><span>Monday, November 26, 2018</span></h2>\n\n          <div class="date-posts">\n        \n<div class=\'post-outer\'>\n<div class=\'post hentry uncustomized-post-template\' itemprop=\'blogPost\' itemscope=\'itemscope\' itemtype=\'http://schema.org/BlogPosting\'>\n<meta content=\'4195135246107166251\' itemprop=\'blogId\'/>\n<meta content=\'2471146972433715807\' itemprop=\'postId\'/>\n<a name=\'2471146972433715807\'></a>\n<h3 class=\'post-title entry-title\' itemprop=\'name\'>\n<a href=\'http://neopythonic.blogspot.com/2018/11/what-do-do-with-your-computer-science.html\'>What to do with your computer science career</a>\n</h3>\n<div class=\'post-header\'>\n<div class=\'post-header-line-1\'></div>\n</div>\n<div class=\'post-body entry-content\' id=\'post-body-2471146972433715807\' itemprop=\'description articleBody\'>\nI regularly receive questions from students in the field of computer science looking for career advice.<br />\n<br />\nHere\'s an answer I wrote to one of them. It\'s not comprehensive or anything, but I thought people might find it interesting.<br />\n<br />\n[A question about whether to choose a 9-5 job or be an entrepreneur]<br />\n<br />\nThe question about "9-5" vs. "entrepreneur" is a complex one -- not everybody can be a successful entrepreneur (who would do the work? :-) and not everybody has the temperament for it. For me personally it was never an option -- there are vast parts of management and entrepreneurship that I wouldn\'t enjoy doing, such as hiring (I hate interviewing and am bad at it) and firing (too emotionally draining -- even just giving negative feedback is hard for me). Pitching ideas to investors is another thing that I\'d rather do without.<br />\n<br />\nIf any of that resonates with you, you may be better off not opting for entrepreneurship -- the kind of 9-5 software development jobs I have had are actually (mostly) very rewarding: I get to write software that gets used by hundreds or thousands of other developers (or millions in the case of Python), and those other developers in turn use my software to produce product that get uses by hundreds of thousands or, indeed hundreds of millions of users. Not every 9-5 job is the same! For me personally, I don\'t like the product stuff (since usually that means it\'s products I have no interest in using myself), but "your mileage may vary" (as they say in the US). Just try to do better than an entry-level web development job;&nbsp; that particular field (editing HTML and CSS) is likely to be automated away, and would feel repetitive to me.<br />\n<br />\n[A question about whether AI would make human software developers redundant (not about what I think of the field of AI as a career choice)]<br />\n<br />\nRegarding AI, I\'m not worried at all. The field is focused on automating boring, repetitive tasks like driving a car or recognizing faces, which humans can learn to do easily but find boring if they have to do it all the time. The field of software engineering (which includes the field of AI) is never boring, since as soon as a task is repetitive, you automate it, and you start solving new problems.\n<div style=\'clear: both;\'></div>\n</div>\n<div class=\'post-footer\'>\n<div class=\'post-footer-line post-footer-line-1\'>\n<span class=\'post-author vcard\'>\nPosted by\n<span class=\'fn\' itemprop=\'author\' itemscope=\'itemscope\' itemtype=\'http://schema.org/Person\'>\n<meta content=\'https://www.blogger.com/profile/12821714508588242516\' itemprop=\'url\'/>\n<a class=\'g-profile\' href=\'https://www.blogger.com/profile/12821714508588242516\' rel=\'author\' title=\'author profile\'>\n<span itemprop=\'name\'>Guido van Rossum</span>\n</a>\n</span>\n</span>\n<span class=\'post-timestamp\'>\nat\n<meta content=\'http://neopythonic.blogspot.com/2018/11/what-do-do-with-your-computer-science.html\' itemprop=\'url\'/>\n<a class=\'timestamp-link\' href=\'http://neopythonic.blogspot.com/2018/11/what-do-do-with-your-computer-science.html\' rel=\'bookmark\' title=\'permanent link\'><abbr class=\'published\' itemprop=\'datePublished\' title=\'2018-11-26T09:13:00-08:00\'>9:13 AM</abbr></a>\n</span>\n<span class=\'reaction-buttons\'>\n</span>\n<span class=\'post-comment-link\'>\n<a class=\'comment-link\' href=\'https://www.blogger.com/comment.g?blogID=4195135246107166251&postID=2471146972433715807\' onclick=\'\'>\nNo comments:\n    </a>\n</span>\n<span class=\'post-backlinks post-comment-link\'>\n</span>\n<span class=\'post-icons\'>\n<span class=\'item-control blog-admin pid-1774424698\'>\n<a href=\'https://www.blogger.com/post-edit.g?blogID=4195135246107166251&postID=2471146972433715807&from=pencil\' title=\'Edit Post\'>\n<img alt=\'\' class=\'icon-action\' height=\'18\' src=\'https://resources.blogblog.com/img/icon18_edit_allbkg.gif\' width=\'18\'/>\n</a>\n</span>\n</span>\n<div class=\'post-share-buttons goog-inline-block\'>\n</div>\n</div>\n<div class=\'post-footer-line post-footer-line-2\'>\n<span class=\'post-labels\'>\n</span>\n</div>\n<div class=\'post-footer-line post-footer-line-3\'>\n<span class=\'post-location\'>\n</span>\n</div>\n</div>\n</div>\n</div>\n\n          </div></div>\n        \n\n          <div class="date-outer">\n        \n<h2 class=\'date-header\'><span>Saturday, July 23, 2016</span></h2>\n\n          <div class="date-posts">\n        \n<div class=\'post-outer\'>\n<div class=\'post hentry uncustomized-post-template\' itemprop=\'blogPost\' itemscope=\'itemscope\' itemtype=\'http://schema.org/BlogPosting\'>\n<meta content=\'4195135246107166251\' itemprop=\'blogId\'/>\n<meta content=\'2468107226962512288\' itemprop=\'postId\'/>\n<a name=\'2468107226962512288\'></a>\n<h3 class=\'post-title entry-title\' itemprop=\'name\'>\n<a href=\'http://neopythonic.blogspot.com/2016/07/about-spammers-and-comments.html\'>About spammers and comments</a>\n</h3>\n<div class=\'post-header\'>\n<div class=\'post-header-line-1\'></div>\n</div>\n<div class=\'post-body entry-content\' id=\'post-body-2468107226962512288\' itemprop=\'description articleBody\'>\nI\'m turning off commenting for my blogs. While I\'ve enjoyed some feedback, the time wasted to moderate spam posts just isn\'t worth it. Thank you, spammers! :-(\n<div style=\'clear: both;\'></div>\n</div>\n<div class=\'post-footer\'>\n<div class=\'post-footer-line post-footer-line-1\'>\n<span class=\'post-author vcard\'>\nPosted by\n<span class=\'fn\' itemprop=\'author\' itemscope=\'itemscope\' itemtype=\'http://schema.org/Person\'>\n<meta content=\'https://www.blogger.com/profile/12821714508588242516\' itemprop=\'url\'/>\n<a class=\'g-profile\' href=\'https://www.blogger.com/profile/12821714508588242516\' rel=\'author\' title=\'author profile\'>\n<span itemprop=\'name\'>Guido van Rossum</span>\n</a>\n</span>\n</span>\n<span class=\'post-timestamp\'>\nat\n<meta content=\'http://neopythonic.blogspot.com/2016/07/about-spammers-and-comments.html\' itemprop=\'url\'/>\n<a class=\'timestamp-link\' href=\'http://neopythonic.blogspot.com/2016/07/about-spammers-and-comments.html\' rel=\'bookmark\' title=\'permanent link\'><abbr class=\'published\' itemprop=\'datePublished\' title=\'2016-07-23T14:11:00-07:00\'>2:11 PM</abbr></a>\n</span>\n<span class=\'reaction-buttons\'>\n</span>\n<span class=\'post-comment-link\'>\n<a class=\'comment-link\' href=\'https://www.blogger.com/comment.g?blogID=4195135246107166251&postID=2468107226962512288\' onclick=\'\'>\nNo comments:\n    </a>\n</span>\n<span class=\'post-backlinks post-comment-link\'>\n</span>\n<span class=\'post-icons\'>\n<span class=\'item-control blog-admin pid-1774424698\'>\n<a href=\'https://www.blogger.com/post-edit.g?blogID=4195135246107166251&postID=2468107226962512288&from=pencil\' title=\'Edit Post\'>\n<img alt=\'\' class=\'icon-action\' height=\'18\' src=\'https://resources.blogblog.com/img/icon18_edit_allbkg.gif\' width=\'18\'/>\n</a>\n</span>\n</span>\n<div class=\'post-share-buttons goog-inline-block\'>\n</div>\n</div>\n<div class=\'post-footer-line post-footer-line-2\'>\n<span class=\'post-labels\'>\n</span>\n</div>\n<div class=\'post-footer-line post-footer-line-3\'>\n<span class=\'post-location\'>\n</span>\n</div>\n</div>\n</div>\n</div>\n\n          </div></div>\n        \n\n          <div class="date-outer">\n        \n<h2 class=\'date-header\'><span>Wednesday, May 18, 2016</span></h2>\n\n          <div class="date-posts">\n        \n<div class=\'post-outer\'>\n<div class=\'post hentry uncustomized-post-template\' itemprop=\'blogPost\' itemscope=\'itemscope\' itemtype=\'http://schema.org/BlogPosting\'>\n<meta content=\'4195135246107166251\' itemprop=\'blogId\'/>\n<meta content=\'4387175608679924841\' itemprop=\'postId\'/>\n<a name=\'4387175608679924841\'></a>\n<h3 class=\'post-title entry-title\' itemprop=\'name\'>\n<a href=\'http://neopythonic.blogspot.com/2016/05/union-syntax.html\'>Union syntax</a>\n</h3>\n<div class=\'post-header\'>\n<div class=\'post-header-line-1\'></div>\n</div>\n<div class=\'post-body entry-content\' id=\'post-body-4387175608679924841\' itemprop=\'description articleBody\'>\n<h2>\nUnion syntax</h2>\n<blockquote class="tr_bq">\n<i>(I\'m trying to do this as a quick post in response to some questions I received on this topic. I realize this will probably reopen the whole discussion about the best syntax for types, but sorry folks, PEP 484 was accepted nearly a year ago, after many months of discussions and hundreds of messages. It\'s unlikely that any idea you can think of here would be new. This post just explains the rationale of one particular decision and tries to put it in some context.)</i></blockquote>\nI\'ve heard some grumbling about the union syntax in <a href="https://www.python.org/dev/peps/pep-0484/">PEP 484</a>: Union[X, Y, Z] (where X, Y and Z are arbitrary type expressions). In the past people have suggested X|Y|Z for this, or (X, Y, Z) or {X, Y, Z}. Why did we go with the admittedly clunkier Union[X, Y, Z]?<br />\n<br />\nFirst of all, despite all the attention drawn to it, unions are actually a pretty minor feature, and you shouldn\'t be using them much. So you also shouldn\'t care that much.<br />\n<h3>\nWhy not X|Y|Z?</h3>\nThis won\'t fly because we want compatibility with versions of Python 3 that were already frozen (see below). We want to be able to express e.g. a union of int and str, which under this notation would be written as int|str. But for that to fly we\'d have to modify the builtin \'type\' class to implement __or__ -- and that wouldn\'t fly on already-frozen Python versions. Supporting X|Y only for types (like List) imported from the typing module and some other notation for builtin types would only sow confusion. So X|Y|Z is out.<br />\n<h3>\nWhy not {X, Y, Z}?</h3>\nThat\'s the set with elements X, Y and Z, using the builtin set notation. We can usefully consider types to be sets of values, and this makes a union a set of values too (that\'s why it\'s called union :-).<br />\n<br />\nHowever, {X, Y, Z} confuses the set of <i>types</i> with the set of <i>values</i>, which I consider a mortal sin. This would just cause endless confusion.<br />\n<br />\nThis notation would also confuse things when taking the union of several classes that overlap, e.g. if we have classes B and C, where C inherits from B, then the union of B and C is just B. But the builtin set doesn\'t see it that way. In contrast, the X|Y notation could actually solve this (since in principle we could overload __or__ to do whatever we want), and the Union[] operator ("functor"?) from PEP 484 indeed solves this -- in this example Union[B, C] returns the (non-union) type B, both in the type checker and at runtime.<br />\n<h3>\nWhy not (X, Y, Z)?</h3>\nThat\'s the tuple (X, Y, Z). It has the same disadvantages as {X, Y, Z}, but at least it has the advantage of being similar to how unions are expressed as arguments to isinstance(), for example isinstance(x, (int, str, list)) or isinstance(x, (Sequence, Mapping)). (Similarly the except clause: try: ... / except (KeyError, IndexError): ...)<br />\n<br />\nAnother problem with tuples is that the tuple syntax is already overloaded in so many ways that it would be confused with other uses even more easily. One particular confusion would be other generic types, for which we\'d still want to use square brackets. (You can\'t really beat Iterable[int] for clarity if you have an iterable of integers. :-) Suppose you have a sequence of values that could be integers or strings. In PEP 484 notation we write this as Sequence[Union[int, str]]. Using the tuple notation we\'d want to write this as Sequence[(int, str)]. But it turns out that the __getitem__ overload on the metaclass can\'t tell the difference between Sequence[(int, str)] and Sequence[int, str] -- and we would like to reject the latter as a mistake since Sequence[] is a generic class over a single parameter. (An example of a generic class over two parameters would be Mapping[K, V].) Disambiguating all this would place us on very thin ice indeed.<br />\n<br />\nThe nail in this idea\'s coffin is the competing idea of using (X, Y, Z) to indicate a tuple with three items, with respective types, X, Y and Z. At first sight this seems an even better use of the tuple syntax than unions would be, and tuples are way more common than unions. But it runs afoul of the same problems with Foo[(X, Y)] vs. Foo[X, Y]. (Also, there would be no easy way to describe what PEP 484 calls Tuple[X, ...], i.e. a variable-length tuple with uniform item type X.)<br />\n<h3>\nPS. Why support old Python 3 versions?</h3>\nThe reason for supporting older versions is adoption. Only a relatively small crowd of early adopters can upgrade to the latest Python version as soon as it\'s out; the rest of us are stuck on older versions (even Python 2.7!). <br />\n<br />\nSo for PEP 484 and the typing module, we wanted to support 3.2 and up -- we chose 3.2 because it\'s the newest Python 3 supported by some older but still popular Ubuntu and Debian distributions. (Also, 3.0 and 3.1 were too immature at their time of release to ever have a large following.)<br />\n<br />\nThere\'s a typing package that you can install easily using pip, and this defines all sorts of useful things for typing, from Any and Union to generic versions of List and Sequence. But such a package can\'t modify existing builtins like int or list.<br />\n<br />\n(Eventually we also added Python 2.7 support, using type comments for function signatures.)\n<div style=\'clear: both;\'></div>\n</div>\n<div class=\'post-footer\'>\n<div class=\'post-footer-line post-footer-line-1\'>\n<span class=\'post-author vcard\'>\nPosted by\n<span class=\'fn\' itemprop=\'author\' itemscope=\'itemscope\' itemtype=\'http://schema.org/Person\'>\n<meta content=\'https://www.blogger.com/profile/12821714508588242516\' itemprop=\'url\'/>\n<a class=\'g-profile\' href=\'https://www.blogger.com/profile/12821714508588242516\' rel=\'author\' title=\'author profile\'>\n<span itemprop=\'name\'>Guido van Rossum</span>\n</a>\n</span>\n</span>\n<span class=\'post-timestamp\'>\nat\n<meta content=\'http://neopythonic.blogspot.com/2016/05/union-syntax.html\' itemprop=\'url\'/>\n<a class=\'timestamp-link\' href=\'http://neopythonic.blogspot.com/2016/05/union-syntax.html\' rel=\'bookmark\' title=\'permanent link\'><abbr class=\'published\' itemprop=\'datePublished\' title=\'2016-05-18T11:55:00-07:00\'>11:55 AM</abbr></a>\n</span>\n<span class=\'reaction-buttons\'>\n</span>\n<span class=\'post-comment-link\'>\n<a class=\'comment-link\' href=\'https://www.blogger.com/comment.g?blogID=4195135246107166251&postID=4387175608679924841\' onclick=\'\'>\nNo comments:\n    </a>\n</span>\n<span class=\'post-backlinks post-comment-link\'>\n</span>\n<span class=\'post-icons\'>\n<span class=\'item-control blog-admin pid-1774424698\'>\n<a href=\'https://www.blogger.com/post-edit.g?blogID=4195135246107166251&postID=4387175608679924841&from=pencil\' title=\'Edit Post\'>\n<img alt=\'\' class=\'icon-action\' height=\'18\' src=\'https://resources.blogblog.com/img/icon18_edit_allbkg.gif\' width=\'18\'/>\n</a>\n</span>\n</span>\n<div class=\'post-share-buttons goog-inline-block\'>\n</div>\n</div>\n<div class=\'post-footer-line post-footer-line-2\'>\n<span class=\'post-labels\'>\n</span>\n</div>\n<div class=\'post-footer-line post-footer-line-3\'>\n<span class=\'post-location\'>\n</span>\n</div>\n</div>\n</div>\n</div>\n<div class=\'post-outer\'>\n<div class=\'post hentry uncustomized-post-template\' itemprop=\'blogPost\' itemscope=\'itemscope\' itemtype=\'http://schema.org/BlogPosting\'>\n<meta content=\'4195135246107166251\' itemprop=\'blogId\'/>\n<meta content=\'8854185106045973213\' itemprop=\'postId\'/>\n<a name=\'8854185106045973213\'></a>\n<h3 class=\'post-title entry-title\' itemprop=\'name\'>\n<a href=\'http://neopythonic.blogspot.com/2016/05/adding-type-annotations-for-fspath.html\'>Adding type annotations for fspath</a>\n</h3>\n<div class=\'post-header\'>\n<div class=\'post-header-line-1\'></div>\n</div>\n<div class=\'post-body entry-content\' id=\'post-body-8854185106045973213\' itemprop=\'description articleBody\'>\n<div>\n<h1 class="ace-copy-paste-skip-this-tag">\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Type annotations for fspath</span></h1>\n</div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Python 3.6 will have a new </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="http://www.pixelmonkey.org/2013/04/11/python-double-under-double-wonder" href="http://www.pixelmonkey.org/2013/04/11/python-double-under-double-wonder" rel="noreferrer nofollow" target="_blank">dunder protocol</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">, </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">__fspath__()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , which should be supported by classes that represent filesystem paths. Example of such classes are the </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">pathlib.Path</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> family and </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">os.DirEntry</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;(returned by </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">os.scandir()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> ).</span></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">You can read more about this protocol in the brand new </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://www.python.org/dev/peps/pep-0519/" href="https://www.python.org/dev/peps/pep-0519/" rel="noreferrer nofollow" target="_blank">PEP 519</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">. In this blog post I&#8217;m going to discuss how we would add type annotations for these additions to the standard library.</span></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">I&#8217;m making frequent use of </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">AnyStr</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , a quite magical type variable predefined in the typing module. If you&#8217;re not familiar with it, I recommend reading my </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="http://neopythonic.blogspot.com/2016/05/the-anystr-type-variable.html" href="http://neopythonic.blogspot.com/2016/05/the-anystr-type-variable.html" rel="noreferrer nofollow" target="_blank">blog post about </a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code attrlink url"><a class="attrlink" data-target-href="http://neopythonic.blogspot.com/2016/05/the-anystr-type-variable.html" href="http://neopythonic.blogspot.com/2016/05/the-anystr-type-variable.html" rel="noreferrer nofollow" target="_blank">AnyStr</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> . You may also want to read up on </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://www.python.org/dev/peps/pep-0484/#generics" href="https://www.python.org/dev/peps/pep-0484/#generics" rel="noreferrer nofollow" target="_blank">generics in PEP 484</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> (or read </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="http://mypy.readthedocs.io/en/latest/generics.html" href="http://mypy.readthedocs.io/en/latest/generics.html" rel="noreferrer nofollow" target="_blank">mypy&#8217;s docs on the subject</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">).</span></div>\n<div>\n<h2>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Adding os.scandir() to the stubs for os.py</span></h2>\n</div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">For practice, let&#8217;s see if we can add something to the </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://github.com/python/typeshed/blob/master/stdlib/3/os/__init__.pyi" href="https://github.com/python/typeshed/blob/master/stdlib/3/os/__init__.pyi" rel="noreferrer nofollow" target="_blank">stub file for os.py</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">. As of this writing there&#8217;s no </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://github.com/python/typeshed" href="https://github.com/python/typeshed" rel="noreferrer nofollow" target="_blank">typeshed</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> information for </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code attrlink url"><a class="attrlink" data-target-href="https://docs.python.org/3/library/os.html" href="https://docs.python.org/3/library/os.html" rel="noreferrer nofollow" target="_blank">os.scandir()</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , which I think is a shame. I think the following will do nicely. Note how we only define </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">DirEntry</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;and </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">scandir()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> for Python versions &gt;= 3.5. (Mypy </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://github.com/python/mypy/issues/698" href="https://github.com/python/mypy/issues/698" rel="noreferrer nofollow" target="_blank">doesn&#8217;t support this yet</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">, but it will soon, and the example here still works &#8212; it just doesn&#8217;t realize </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">scandir()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;is only available in Python 3.5.) This could be added to the end of </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">stdlib/3/os/__init__.pyi</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">from typing import Generic, AnyStr, overload, Iterator</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><br /></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">if sys.version_info </span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&gt;</span>= (3, 5):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><br /></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; class DirEntry(</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-42889384956">Generic</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">[AnyStr]):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; &nbsp; &nbsp; name = ... &nbsp;# type: AnyStr</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; &nbsp; &nbsp; path = ... &nbsp;# type: AnyStr</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; &nbsp; &nbsp; def inode(self) -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&gt;</span> int: ...</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; &nbsp; &nbsp; def is_dir(self, *, follow_symlinks: bool = ...) -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&gt;</span> bool: ...</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; &nbsp; &nbsp; def is_file(self, *, follow_symlinks: bool = ...) -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&gt;</span> bool: ...</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; &nbsp; &nbsp; def is_symlink(self) -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&gt;</span> bool: ...</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; &nbsp; &nbsp; def stat(self, *, follow_symlinks: bool = ...) -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&gt;</span> stat_result: ...</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><br /></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">&nbsp; &nbsp; @overload</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">&nbsp; &nbsp; def scandir() -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&gt;</span> Iterator[DirEntry[str]]: ...</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">&nbsp; &nbsp; @overload</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">&nbsp; &nbsp; def scandir(path: </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-21653656371 thread-53340393283">AnyStr</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">) -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&gt;</span> Iterator[DirEntry[AnyStr]]: ...</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Deconstructing this a bit, we see a </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://www.python.org/dev/peps/pep-0484/#generics" href="https://www.python.org/dev/peps/pep-0484/#generics" rel="noreferrer nofollow" target="_blank">generic class</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> (that&#8217;s what the </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">Generic[AnyStr]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;base class means) and an overloaded function. &nbsp;The </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">scandir()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> definition uses </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">@overload</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> because it can also be called without arguments. </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">We could also write it as follows; it&#8217;ll work either way:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">&nbsp; &nbsp; @overload</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">&nbsp; &nbsp; def scandir(</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">path: </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-72143937476">str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> = ...) -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&gt;</span> Iterator[DirEntry[str]]: ...</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; @overload</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; def scandir(path: </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-86865838424">bytes</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">) -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&gt;</span> Iterator[DirEntry[bytes</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">]]: ...</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Either way there really are three ways to call </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">scandir()</span><span class=""> </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">, all three returning an iterable of DirEntry objects:</span></div>\n<div>\n<br /></div>\n<ul class="listtype-bullet listindent1 list-bullet1">\n<li><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">scandir() -</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&gt;</span> Iterator[DirEntry[str]]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp;</span></li>\n<li><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">scandir(str) -</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&gt;</span> Iterator[DirEntry[str]]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp;</span></li>\n<li><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">scandir(bytes) -</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&gt;</span> Iterator[DirEntry[bytes]]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp;</span></li>\n</ul>\n<div>\n<h2>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Adding os.fspath()</span></h2>\n</div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Next I&#8217;ll show how to add </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">os.fspath()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> and how to add support for the </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">__fspath__()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;protocol to </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">DirEntry</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> .</span></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://www.python.org/dev/peps/pep-0519/" href="https://www.python.org/dev/peps/pep-0519/" rel="noreferrer nofollow" target="_blank">PEP 519</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> defines a simple ABC (</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://docs.python.org/3/library/abc.html" href="https://docs.python.org/3/library/abc.html" rel="noreferrer nofollow" target="_blank">abstract base class</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">), </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">PathLike</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , with one method, </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">__fspath__()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> . We need to add this to the </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://github.com/python/typeshed/blob/master/stdlib/3/os/__init__.pyi" href="https://github.com/python/typeshed/blob/master/stdlib/3/os/__init__.pyi" rel="noreferrer nofollow" target="_blank">stub for </a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code attrlink url"><a class="attrlink" data-target-href="https://github.com/python/typeshed/blob/master/stdlib/3/os/__init__.pyi" href="https://github.com/python/typeshed/blob/master/stdlib/3/os/__init__.pyi" rel="noreferrer nofollow" target="_blank">os.py</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , as follows:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">class PathLike(</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-84113787329">Generic[AnyStr]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; @abstractmethod</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; def __fspath__(self) -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&gt;</span> AnyStr: ...</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">That&#8217;s really all there is to it (except for the </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">sys.version_info</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;check, which I&#8217;ll leave out here since it doesn&#8217;t really work yet). Next we define </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">os.fspath()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , which wraps this protocol. It&#8217;s slightly more complicated than just calling its argument&#8217;s </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">__fspath__()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;method, because it also handles strings and bytes. So here it is:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">@overload</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def fspath(path: PathLike[AnyStr]) -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&gt;</span> AnyStr: ...</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">@overload</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def fspath(path: AnyStr) -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&gt;</span> AnyStr: ...</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Easy enough! Next is update the definition of </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">DirEntry</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> . That&#8217;s easy too &#8212; in fact we only need to make it inherit from </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">PathLike[AnyStr]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , the rest is the same as the definition I gave above:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">class DirEntry(PathLike[AnyStr], Generic[AnyStr]):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; # Everything else unchanged!</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">The only slightly complicated bit here is the extra base class </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">Generic[AnyStr]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> . This seems redundant, and in fact PEP 484 says we can leave it off, but mypy doesn&#8217;t support that yet, and it&#8217;s quite harmless &#8212; this just rubs into mypy&#8217;s face that this is a generic class of one type variable (the by-now famous </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">AnyStr</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> ).</span></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Finally we need to make a similar change to the </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://github.com/python/typeshed/blob/master/stdlib/3.4/pathlib.pyi" href="https://github.com/python/typeshed/blob/master/stdlib/3.4/pathlib.pyi" rel="noreferrer nofollow" target="_blank">stub for </a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code attrlink url"><a class="attrlink" data-target-href="https://github.com/python/typeshed/blob/master/stdlib/3.4/pathlib.pyi" href="https://github.com/python/typeshed/blob/master/stdlib/3.4/pathlib.pyi" rel="noreferrer nofollow" target="_blank">pathlib.py</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> . Again, all we need to do is to make </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">PurePath</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;inherit from </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">PathLike[str]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , like so:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">from os import PathLike</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><br /></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">class PurePath(PathLike[str]):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp;</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-86168840759"> # Everything else unchanged!</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">However, here we don&#8217;t add </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">Generic</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , because this is not a generic class! It inherits from </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">PathLike[str]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , which is quite un-generic, since it&#8217;s </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">PathLike</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z i"><i>specialized</i></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> for just </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> .</span></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Note that we don&#8217;t actually have to define the </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">__fspath__()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;method in these stubs &#8212; we&#8217;re not supposed to call them directly, and stubs don&#8217;t provide implementations, only interfaces.</span></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Putting it all together, we see that it&#8217;s quite elegant:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-bash" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">for a in os.scandir(\'.\'):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-bash" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; b = os.fspath(a)</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-bash" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; # Here, </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-459852399">the</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> typechecker will know that the type of b is str!</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">The derivation that </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">b</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> has type </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;is not too complicated: first, </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">os.scandir(\'.\')</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;has a </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;argument, so it returns an iterator of </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">DirEntry</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;objects parameterized with </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , which we write as </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">DirEntry[str]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> . Passing this </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">DirEntry[str]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;to </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">os.fspath()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;then takes the first of that function&#8217;s two overloads (the one with </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">PathLike[AnyStr]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> ), since it doesn&#8217;t match the second one ( </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">DirEntry</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;doesn&#8217;t inherit from </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">AnyStr</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , because it&#8217;s neither a </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;nor </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">bytes</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> ). Further the </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">AnyStr</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> type variable in </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">PathLike[AnyStr]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> is solved to stand for just </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , because </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">DirEntry[str]</span><span class=""> </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp;inherits from </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">PathLike[str]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> . This is the specialized version of what the code says: </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">DirEntry[AnyStr]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;inherits from </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">PathLike[AnyStr]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> .</span></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Okay, so maybe that last paragraph was intermediate or advanced. And maybe it could be expanded. Maybe I&#8217;ll write another blog about how type inference works, but there&#8217;s a lot on that topic, and other authors have probably already written better introductory material about generics (in other languages, though).</span></div>\n<div>\n<h2>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Making things accept PathLike</span></h2>\n</div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">There&#8217;s a bit of cleanup work that I&#8217;ve left out. PEP 519 says that many stdlib functions that currently take strings for pathnames will be modified to also accept </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">PathLike</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> . For example, here&#8217;s how the signatures for </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">os.scandir()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;would change:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">@overload</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">def scandir() -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&gt;</span> Iterator[DirEntry[str]]: ...</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">@overload</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">def scandir(path: </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-21653656371">AnyStr</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">) -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&gt;</span> Iterator[DirEntry[AnyStr]]: ...</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">@overload</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def scandir(path: PathLike[AnyStr]) -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&gt;</span> Iterator[DirEntry[AnyStr]]: ...</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">The first two entries are unchanged; I&#8217;ve just added a third overload. (Note that the alternative way of defining </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">scandir()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> would require more changes &#8212; an indication that this way is more natural.)</span></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">I also tried doing this with a union:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">@overload</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def scandir() </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">-</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&gt;</span> Iterator[DirEntry[str]]: ...</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">@overload</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">def scandir(path: </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-21653656371">Union[AnyStr</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">, PathLike[AnyStr]]) -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&gt;</span> Iterator[DirEntry[AnyStr]]: ...</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">But I couldn&#8217;t get this to work, so the extra overload is probably the best we can do. Quite a few functions will require a similar treatment, sometimes introducing overloading where none exists today (but that shouldn&#8217;t hurt anything).</span></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">A note about </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">pathlib</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> : since it only deals with strings, its methods (the ones that PEP 519 says should be changed anyway) should use </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">PathLike[str]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;rather than </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">PathLike[AnyStr]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> .</span></div>\n<div>\n<h2>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Acknowledgments</span></h2>\n</div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">(Thanks for comments on the draft to Stephen Turnbull, Koos Zevenhoven, Eth</span><span class="author-d-4z65zz66zl57z75zyiz66zfr2fz87zwz89znuiz90zz78zoz72zz87zhgh7z71zz88zz77zfz66zquz87zq3xz82zcz82zq5caz88z9">a</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">n Furman, and Brett Cannon.)</span></div>\n<div style=\'clear: both;\'></div>\n</div>\n<div class=\'post-footer\'>\n<div class=\'post-footer-line post-footer-line-1\'>\n<span class=\'post-author vcard\'>\nPosted by\n<span class=\'fn\' itemprop=\'author\' itemscope=\'itemscope\' itemtype=\'http://schema.org/Person\'>\n<meta content=\'https://www.blogger.com/profile/12821714508588242516\' itemprop=\'url\'/>\n<a class=\'g-profile\' href=\'https://www.blogger.com/profile/12821714508588242516\' rel=\'author\' title=\'author profile\'>\n<span itemprop=\'name\'>Guido van Rossum</span>\n</a>\n</span>\n</span>\n<span class=\'post-timestamp\'>\nat\n<meta content=\'http://neopythonic.blogspot.com/2016/05/adding-type-annotations-for-fspath.html\' itemprop=\'url\'/>\n<a class=\'timestamp-link\' href=\'http://neopythonic.blogspot.com/2016/05/adding-type-annotations-for-fspath.html\' rel=\'bookmark\' title=\'permanent link\'><abbr class=\'published\' itemprop=\'datePublished\' title=\'2016-05-18T07:06:00-07:00\'>7:06 AM</abbr></a>\n</span>\n<span class=\'reaction-buttons\'>\n</span>\n<span class=\'post-comment-link\'>\n<a class=\'comment-link\' href=\'https://www.blogger.com/comment.g?blogID=4195135246107166251&postID=8854185106045973213\' onclick=\'\'>\n3 comments:\n    </a>\n</span>\n<span class=\'post-backlinks post-comment-link\'>\n</span>\n<span class=\'post-icons\'>\n<span class=\'item-control blog-admin pid-1774424698\'>\n<a href=\'https://www.blogger.com/post-edit.g?blogID=4195135246107166251&postID=8854185106045973213&from=pencil\' title=\'Edit Post\'>\n<img alt=\'\' class=\'icon-action\' height=\'18\' src=\'https://resources.blogblog.com/img/icon18_edit_allbkg.gif\' width=\'18\'/>\n</a>\n</span>\n</span>\n<div class=\'post-share-buttons goog-inline-block\'>\n</div>\n</div>\n<div class=\'post-footer-line post-footer-line-2\'>\n<span class=\'post-labels\'>\n</span>\n</div>\n<div class=\'post-footer-line post-footer-line-3\'>\n<span class=\'post-location\'>\n</span>\n</div>\n</div>\n</div>\n</div>\n\n          </div></div>\n        \n\n          <div class="date-outer">\n        \n<h2 class=\'date-header\'><span>Tuesday, May 17, 2016</span></h2>\n\n          <div class="date-posts">\n        \n<div class=\'post-outer\'>\n<div class=\'post hentry uncustomized-post-template\' itemprop=\'blogPost\' itemscope=\'itemscope\' itemtype=\'http://schema.org/BlogPosting\'>\n<meta content=\'4195135246107166251\' itemprop=\'blogId\'/>\n<meta content=\'1468618515324597653\' itemprop=\'postId\'/>\n<a name=\'1468618515324597653\'></a>\n<h3 class=\'post-title entry-title\' itemprop=\'name\'>\n<a href=\'http://neopythonic.blogspot.com/2016/05/the-anystr-type-variable.html\'>The AnyStr type variable</a>\n</h3>\n<div class=\'post-header\'>\n<div class=\'post-header-line-1\'></div>\n</div>\n<div class=\'post-body entry-content\' id=\'post-body-1468618515324597653\' itemprop=\'description articleBody\'>\n<div>\n<h1 class="ace-copy-paste-skip-this-tag">\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">The AnyStr type variable </span></h1>\n</div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">I was drafting a blog post on how to add type annotations for the new </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">__fspath__()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;protocol (</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://www.python.org/dev/peps/pep-0519/" href="https://www.python.org/dev/peps/pep-0519/" rel="noreferrer nofollow" target="_blank">PEP 519</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">) when I realized that I should write a separate post about </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">AnyStr</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> . So here it is.</span></div>\n<div>\n<h2>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">A simple function on strings</span></h2>\n</div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Let&#8217;s write a function that surrounds a string in parentheses. We&#8217;ll put it in a file named </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">demo.py</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> :</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def parenthesize(s):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; return \'(\' + s + \')\'</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">It works, too:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&gt;&gt;&gt; from demo import parenthesize</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&gt;&gt;&gt; print(parenthesize(\'hola\'))</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">(hola)</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Of course, if you pass it something that&#8217;s not a string it will fail:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&gt;&gt;&gt; parenthesize(42)</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Traceback (most recent call last):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; File "demo.py", line 1, in <module></module></span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; File "demo.py", line 2, in parenthesize</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">TypeError: Can\'t convert \'int\' object to str implicitly</span></code></div>\n<div>\n<h2>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Adding type annotations</span></h2>\n</div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Using </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://www.python.org/dev/peps/pep-0484/" href="https://www.python.org/dev/peps/pep-0484/" rel="noreferrer nofollow" target="_blank">PEP 484</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> type annotations we can clarify our little function&#8217;s signature:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def parenthesize(s: str) -&gt; str:</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; return \'(\' + s + \')\'</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Nothing to it, right? Even if you&#8217;ve never heard of PEP 484 before you can guess what this means. (Note that PEP 484 also says that the runtime behavior is unchanged. The calls I showed above will still have exactly the same effect, including the </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">TypeError</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> raised by </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">parenthesize(42)</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> .)</span></div>\n<div>\n<h2>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Polymorphic functions</span></h2>\n</div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Now suppose this is actually part of a networking app and we need to be able to parenthesize byte strings as well as text strings. Here&#8217;s how you&#8217;d implement that:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def parenthesize(s):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; if isinstance(s, str):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; &nbsp; &nbsp; return \'(\' + s + \')\'</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; elif isinstance(s, bytes):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; &nbsp; &nbsp; return b\'(\' + s + b\')\'</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; else:</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; &nbsp; &nbsp; raise TypeError(f"That\'s not a string, it\'s a {type(s)}") &nbsp;# See </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://www.python.org/dev/peps/pep-0498/" href="https://www.python.org/dev/peps/pep-0498/" rel="noreferrer nofollow" target="_blank">PEP 498</a></span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">With a fancy word we call that a polymorphic function. How do you write a signature for such a function? For the answer we have to dive a little deeper into PEP 484. It defines a nifty operator named </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">Union</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;that lets us state that a type can be either this or that (or something else). In our case, it&#8217;s either </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;or </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">bytes</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , so we can write it like this:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">from typing import Union</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><br /></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def parenthesize(s: Union[str, bytes]) -&gt; Union[str, bytes]:</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; if isinstance(s, str):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; # Etc.</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Now let&#8217;s write a little main program with a bug, to show off the type checker:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">from demo import parenthesize</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1" spellcheck="false"><br /></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">a = parenthesize(\'hello\')</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">b = parenthesize(b\'hola\')</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">c = a + b &nbsp;### bug here<-- bug="" span=""></--></span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">print(c)</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">When we try to run this, the two </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">parenthesize()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;calls work fine (yay polymorphism!) but we get a </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">TypeError</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> on the last line:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">$ python3 main.py&nbsp;</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Traceback (most recent call last):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; File "main.py", line 5, in <module></module></span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; c = a + b &nbsp;### bug here<-- bug="" span=""></--></span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">TypeError: Can\'t convert \'bytes\' object to str implicitly</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">The reason should be pretty obvious: in Python 3 you can&#8217;t mix bytes and str objects. And when we type-check this program using </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="http://mypy-lang.org/" href="http://mypy-lang.org/" rel="noreferrer nofollow" target="_blank">mypy</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> we indeed get a type error:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">$ mypy main.py&nbsp;</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">main.py:5: error: Unsupported operand types for + (likely involving Union)</span></code></div>\n<div>\n<h2>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Debugging the bug</span></h2>\n</div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">So let&#8217;s try a program without a bug:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">from demo import parenthesize</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1" spellcheck="false"><br /></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-bash" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">a = parenthesize(\'hello\')</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-bash" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">b = parenthesize(\'hola\')</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-bash" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">c = a + b &nbsp;### bug here<-- bug="" no="" span=""></--></span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-bash" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">print(c)</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Run it and it works great:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">$ python3 main.py</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">(hello)(hola)</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">So the type checker should be happy too, right?</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">$ mypy main.py</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">main.py:5: error: Unsupported operand types for + (likely involving Union)</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Whoops! The same error. What happened? Of course, I set you up, so I can explain something about type checking.</span></div>\n<div>\n<h2>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">The trouble with </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z s"><s>tribbles</s></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> unions</span></h2>\n</div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">The type checker takes the signature at face value, so that when checking the call, it infers the type </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">Union[str, bytes]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;for every call to </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">parenthesize()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , regardless of what the arguments are. This is because, for most functions of even modest complexity, a type checker doesn&#8217;t understand enough about what&#8217;s going on in the function body, so it just has to believe the types in the signature (even though in this particular case it would probably be easy enough to do better).</span></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">In our test program the types of </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">a</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;and </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">b</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;are both inferred to be exactly what </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">parenthesize()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;claims to return, i.e., both variables have the type </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">Union[str, bytes]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> . The type checker then analyzes the expression </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">a + b</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , and for this i</span><span class="author-d-z89zz72zz79zvhpz67zz83z9z66zz78zxz122z1xz74zu4z83z4myz73zkiz71zdz77zz71zz65zz79z4iz79ziosz75zz85zreqz69z">t</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> discovers a problem: if </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">a</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> is either str or bytes, and so is </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">b</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , then the </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">+</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;operator may be invoked on any of these combinations of types: </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str + str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str + bytes</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">bytes + str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , or </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">bytes + bytes</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> . But only the first and the last are valid! In Python 3, </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str + bytes</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;or </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">bytes + str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;are invalid operations.</span></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Aside: Even in Python 2, those two are suspect: since while </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">\'x\' + u\'y\'</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;indeed works (returning </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">u\'xy\'</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> ), other combinations will raise UnicodeDecodeError, e.g.:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&gt;&gt;&gt;\'Franç\' + u\'ois\'</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Traceback (most recent call last):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; File "<stdin>", line 1, in <module></module></stdin></span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">UnicodeDecodeError: \'ascii\' codec can\'t decode byte 0xc3 in position 4:</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">ordinal not in range(128)</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Anyway, the type checker doesn&#8217;t like this business, and it rejects operations on Unions where some combinations are invalid. What can we do instead?</span></div>\n<div>\n<h2>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Function overloading</span></h2>\n</div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">One option would be function overloading. PEP 484 defines a magical decorator, </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">@overload</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , which lets us get around this problem. We could write something like this:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">from typing import overload</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><br /></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">@overload</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def parenthesize(s: str) -&gt; str: ...</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">@overload</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def parenthesize(s: bytes) -&gt; bytes: ...</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">This tells the type checker that if the argument is a </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , the return value is also a </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , and similarly for </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">bytes</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> . Unfortunately </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">@overload</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;is only allowed in </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="http://mypy.readthedocs.io/en/latest/basics.html#library-stubs-and-the-typeshed-repo" href="http://mypy.readthedocs.io/en/latest/basics.html#library-stubs-and-the-typeshed-repo" rel="noreferrer nofollow" target="_blank">stub files</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">, which are a kind of interface definition files that show a type checker the signatures of a module&#8217;s contents without giving the implementation.</span></div>\n<div>\n<h2>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Type variables</span></h2>\n</div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Fortunately there&#8217;s an even better way, using type variables. This is how it goes:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">from typing import TypeVar</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><br /></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">S = TypeVar(\'S\')</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><br /></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def parenthesize(s: S) -&gt; S:</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; if isinstance(s, str):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; &nbsp; &nbsp; return \'(\' + s + \')\'</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; elif isinstance(s, bytes):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; &nbsp; &nbsp; return b\'(\' + s + b\')\'</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; else:</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; &nbsp; &nbsp; raise TypeError("That\'s not a string, dude! It\'s a %s" % type(s))</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Well&#8230; Almost. Our main.py program (unchanged from above) now gets a clean bill of health, but when we type-check this version we get errors on both </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">return</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;lines:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-bash" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">demo.py: note: In function "parenthesize":</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-bash" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">demo.py:7: error: Incompatible return value type: expected S`-1, got builtins.str</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-bash" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">demo.py:9: error: Incompatible return value type: expected S`-1, got builtins.bytes</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">This is a </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://github.com/python/mypy/issues/1539" href="https://github.com/python/mypy/issues/1539" rel="noreferrer nofollow" target="_blank">bit hard to fathom</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">, but the fix is what I was leading up to anyway, so I&#8217;ll reveal it now:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">from typing import TypeVar</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><br /></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">S = TypeVar(\'S\', str, bytes)</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><br /></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def parenthesize(s: S) -&gt; S:</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; if isinstance(s, str):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; &nbsp; &nbsp; return \'(\' + s + \')\'</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; elif isinstance(s, bytes):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; &nbsp; &nbsp; return b\'(\' + s + b\')\'</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; else:</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; &nbsp; &nbsp; raise TypeError("That\'s not a string, dude! It\'s a %s" % type(s))</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">The only changed line is this one:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-bash" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">S = TypeVar(\'S\', str, bytes)</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">This notation is called a </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="http://mypy.readthedocs.io/en/latest/generics.html#type-variables-with-value-restriction" href="http://mypy.readthedocs.io/en/latest/generics.html#type-variables-with-value-restriction" rel="noreferrer nofollow" target="_blank">type variable with value restriction</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> . Yes, it&#8217;s mouthful; we sometimes also call it a </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z i"><i>constrained type variable</i></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">. </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">S</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> is a type variable restricted to a set of types. It also has the advantage of telling the type checker that types other than </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;or </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">bytes</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;are not acceptable. Without that, a call like this would have been considered valid:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ini" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">x = parenthesize(42)</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">because the original type variable (without the restrictions) doesn\'t tell mypy that this is a bad idea.</span></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">In fact, this particular use case (a type variable constrained to str or bytes) is so commonly needed that it\'s predefined in the </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">typing</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> module, and all we have to do is import it:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">from typing import AnyStr</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1" spellcheck="false"><br /></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def parenthesize(s: AnyStr) -&gt; AnyStr:</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">&nbsp; &nbsp; # Etc. -- trust me, it works!</span></code></div>\n<div>\n<h2>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Real-world use of AnyStr</span></h2>\n</div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">In fact, this is how many polymorphic functions in the </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">os</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;and </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">os.path</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;modules are defined. For example, in the stub for </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">os.py</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;we find definitions like </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://github.com/python/typeshed/blob/master/stdlib/3/os/__init__.pyi#L236" href="https://github.com/python/typeshed/blob/master/stdlib/3/os/__init__.pyi#L236" rel="noreferrer nofollow" target="_blank">the following</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def link(src: AnyStr, link_name: AnyStr) -&gt; None: ...</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">and also </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://github.com/python/typeshed/blob/master/stdlib/3/os/path.pyi#L57" href="https://github.com/python/typeshed/blob/master/stdlib/3/os/path.pyi#L57" rel="noreferrer nofollow" target="_blank">this</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def split(path: AnyStr) -&gt; Tuple[AnyStr, AnyStr]: ...</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">These show us a bit more of the power of type variables: the signature for </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">link()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;indicates that either both arguments must be </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;or both must be </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">bytes</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> ; </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">split()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;demonstrates that the type variable may also occur in more complex constructs: splitting a </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> returns a tuple of two </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> objects, while splitting </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">bytes</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> returns a tuple of two </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">bytes</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> &nbsp;objects.</span></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">That&#8217;s all I wanted to share about </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">AnyStr</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> . Thanks for comments on the draft to Stephen Turnbull, Koos Zevenhoven, Ethan Furman, and Brett Cannon.</span></div>\n<div>\n<br /></div>\n<div style=\'clear: both;\'></div>\n</div>\n<div class=\'post-footer\'>\n<div class=\'post-footer-line post-footer-line-1\'>\n<span class=\'post-author vcard\'>\nPosted by\n<span class=\'fn\' itemprop=\'author\' itemscope=\'itemscope\' itemtype=\'http://schema.org/Person\'>\n<meta content=\'https://www.blogger.com/profile/12821714508588242516\' itemprop=\'url\'/>\n<a class=\'g-profile\' href=\'https://www.blogger.com/profile/12821714508588242516\' rel=\'author\' title=\'author profile\'>\n<span itemprop=\'name\'>Guido van Rossum</span>\n</a>\n</span>\n</span>\n<span class=\'post-timestamp\'>\nat\n<meta content=\'http://neopythonic.blogspot.com/2016/05/the-anystr-type-variable.html\' itemprop=\'url\'/>\n<a class=\'timestamp-link\' href=\'http://neopythonic.blogspot.com/2016/05/the-anystr-type-variable.html\' rel=\'bookmark\' title=\'permanent link\'><abbr class=\'published\' itemprop=\'datePublished\' title=\'2016-05-17T09:53:00-07:00\'>9:53 AM</abbr></a>\n</span>\n<span class=\'reaction-buttons\'>\n</span>\n<span class=\'post-comment-link\'>\n<a class=\'comment-link\' href=\'https://www.blogger.com/comment.g?blogID=4195135246107166251&postID=1468618515324597653\' onclick=\'\'>\n5 comments:\n    </a>\n</span>\n<span class=\'post-backlinks post-comment-link\'>\n</span>\n<span class=\'post-icons\'>\n<span class=\'item-control blog-admin pid-1774424698\'>\n<a href=\'https://www.blogger.com/post-edit.g?blogID=4195135246107166251&postID=1468618515324597653&from=pencil\' title=\'Edit Post\'>\n<img alt=\'\' class=\'icon-action\' height=\'18\' src=\'https://resources.blogblog.com/img/icon18_edit_allbkg.gif\' width=\'18\'/>\n</a>\n</span>\n</span>\n<div class=\'post-share-buttons goog-inline-block\'>\n</div>\n</div>\n<div class=\'post-footer-line post-footer-line-2\'>\n<span class=\'post-labels\'>\n</span>\n</div>\n<div class=\'post-footer-line post-footer-line-3\'>\n<span class=\'post-location\'>\n</span>\n</div>\n</div>\n</div>\n</div>\n\n        </div></div>\n      \n</div>\n<div class=\'blog-pager\' id=\'blog-pager\'>\n<span id=\'blog-pager-older-link\'>\n<a class=\'blog-pager-older-link\' href=\'http://neopythonic.blogspot.com/search?updated-max=2016-05-17T09:53:00-07:00&amp;max-results=7\' id=\'Blog1_blog-pager-older-link\' title=\'Older Posts\'>Older Posts</a>\n</span>\n<a class=\'home-link\' href=\'http://neopythonic.blogspot.com/\'>Home</a>\n</div>\n<div class=\'clear\'></div>\n<div class=\'blog-feeds\'>\n<div class=\'feed-links\'>\nSubscribe to:\n<a class=\'feed-link\' href=\'http://neopythonic.blogspot.com/feeds/posts/default\' target=\'_blank\' type=\'application/atom+xml\'>Posts (Atom)</a>\n</div>\n</div>\n</div></div>\n</div>\n<div id=\'sidebar-wrapper\'>\n<div class=\'sidebar section\' id=\'header\'><div class=\'widget Header\' data-version=\'1\' id=\'Header1\'>\n<div id=\'header-inner\'>\n<div class=\'titlewrapper\'>\n<h1 class=\'title\'>\nNeopythonic\n</h1>\n</div>\n<div class=\'descriptionwrapper\'>\n<p class=\'description\'><span>Ramblings through technology, politics, culture and philosophy by the creator of the Python programming language.</span></p>\n</div>\n</div>\n</div></div>\n<div class=\'sidebar section\' id=\'sidebar\'><div class=\'widget Followers\' data-version=\'1\' id=\'Followers1\'>\n<h2 class=\'title\'>Followers</h2>\n<div class=\'widget-content\'>\n<div id=\'Followers1-wrapper\'>\n<div style=\'margin-right:2px;\'>\n<div><script type="text/javascript" src="https://apis.google.com/js/plusone.js"></script>\n<div id="followers-iframe-container"></div>\n<script type="text/javascript">\n    window.followersIframe = null;\n    function followersIframeOpen(url) {\n      gapi.load("gapi.iframes", function() {\n        if (gapi.iframes && gapi.iframes.getContext) {\n          window.followersIframe = gapi.iframes.getContext().openChild({\n            url: url,\n            where: document.getElementById("followers-iframe-container"),\n            messageHandlersFilter: gapi.iframes.CROSS_ORIGIN_IFRAMES_FILTER,\n            messageHandlers: {\n              \'_ready\': function(obj) {\n                window.followersIframe.getIframeEl().height = obj.height;\n              },\n              \'reset\': function() {\n                window.followersIframe.close();\n                followersIframeOpen("https://www.blogger.com/followers.g?blogID\\x3d4195135246107166251\\x26colors\\x3dCgt0cmFuc3BhcmVudBILdHJhbnNwYXJlbnQaByMzMzMzMzMiByM0NDg4ODgqByNGRkZGRkYyByMwMDAwMDA6ByMzMzMzMzNCByM0NDg4ODhKByMwMDAwMDBSByM0NDg4ODhaC3RyYW5zcGFyZW50\\x26pageSize\\x3d21\\x26origin\\x3dhttp://neopythonic.blogspot.com/");\n              },\n              \'open\': function(url) {\n                window.followersIframe.close();\n                followersIframeOpen(url);\n              },\n              \'blogger-ping\': function() {\n              }\n            }\n          });\n        }\n      });\n    }\n    followersIframeOpen("https://www.blogger.com/followers.g?blogID\\x3d4195135246107166251\\x26colors\\x3dCgt0cmFuc3BhcmVudBILdHJhbnNwYXJlbnQaByMzMzMzMzMiByM0NDg4ODgqByNGRkZGRkYyByMwMDAwMDA6ByMzMzMzMzNCByM0NDg4ODhKByMwMDAwMDBSByM0NDg4ODhaC3RyYW5zcGFyZW50\\x26pageSize\\x3d21\\x26origin\\x3dhttp://neopythonic.blogspot.com/");\n  </script></div>\n</div>\n</div>\n<div class=\'clear\'></div>\n<span class=\'widget-item-control\'>\n<span class=\'item-control blog-admin\'>\n<a class=\'quickedit\' href=\'//www.blogger.com/rearrange?blogID=4195135246107166251&widgetType=Followers&widgetId=Followers1&action=editWidget&sectionId=sidebar\' onclick=\'return _WidgetManager._PopupConfig(document.getElementById("Followers1"));\' rel=\'nofollow\' target=\'configFollowers1\' title=\'Edit\'>\n<img alt=\'\' height=\'18\' src=\'https://resources.blogblog.com/img/icon18_wrench_allbkg.png\' width=\'18\'/>\n</a>\n</span>\n</span>\n<div class=\'clear\'></div>\n</div>\n</div><div class=\'widget BlogArchive\' data-version=\'1\' id=\'BlogArchive1\'>\n<h2>Blog Archive</h2>\n<div class=\'widget-content\'>\n<div id=\'ArchiveList\'>\n<div id=\'BlogArchive1_ArchiveList\'>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate expanded\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy toggle-open\'>\n\n        &#9660;&#160;\n      \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2019/\'>\n2019\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(1)</span>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate expanded\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy toggle-open\'>\n\n        &#9660;&#160;\n      \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2019/03/\'>\nMarch\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(1)</span>\n<ul class=\'posts\'>\n<li><a href=\'http://neopythonic.blogspot.com/2019/03/why-operators-are-useful.html\'>Why operators are useful</a></li>\n</ul>\n</li>\n</ul>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n        &#9658;&#160;\n      \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2018/\'>\n2018\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(1)</span>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n        &#9658;&#160;\n      \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2018/11/\'>\nNovember\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(1)</span>\n</li>\n</ul>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n        &#9658;&#160;\n      \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2016/\'>\n2016\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(5)</span>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n        &#9658;&#160;\n      \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2016/07/\'>\nJuly\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(1)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n        &#9658;&#160;\n      \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2016/05/\'>\nMay\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(3)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n        &#9658;&#160;\n      \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2016/04/\'>\nApril\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(1)</span>\n</li>\n</ul>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n        &#9658;&#160;\n      \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2013/\'>\n2013\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(2)</span>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n        &#9658;&#160;\n      \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2013/10/\'>\nOctober\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(2)</span>\n</li>\n</ul>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n        &#9658;&#160;\n      \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2011/\'>\n2011\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(5)</span>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n        &#9658;&#160;\n      \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2011/08/\'>\nAugust\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(1)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n        &#9658;&#160;\n      \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2011/07/\'>\nJuly\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(1)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n        &#9658;&#160;\n      \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2011/06/\'>\nJune\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(1)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n        &#9658;&#160;\n      \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2011/01/\'>\nJanuary\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(2)</span>\n</li>\n</ul>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n        &#9658;&#160;\n      \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2009/\'>\n2009\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(16)</span>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n        &#9658;&#160;\n      \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2009/12/\'>\nDecember\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(1)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n        &#9658;&#160;\n      \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2009/11/\'>\nNovember\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(1)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n        &#9658;&#160;\n      \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2009/09/\'>\nSeptember\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(1)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n        &#9658;&#160;\n      \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2009/07/\'>\nJuly\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(2)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n        &#9658;&#160;\n      \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2009/06/\'>\nJune\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(3)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n        &#9658;&#160;\n      \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2009/05/\'>\nMay\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(1)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n        &#9658;&#160;\n      \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2009/04/\'>\nApril\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(4)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n        &#9658;&#160;\n      \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2009/03/\'>\nMarch\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(1)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n        &#9658;&#160;\n      \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2009/01/\'>\nJanuary\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(2)</span>\n</li>\n</ul>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n        &#9658;&#160;\n      \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2008/\'>\n2008\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(14)</span>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n        &#9658;&#160;\n      \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2008/12/\'>\nDecember\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(2)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n        &#9658;&#160;\n      \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2008/11/\'>\nNovember\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(5)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n        &#9658;&#160;\n      \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2008/10/\'>\nOctober\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(7)</span>\n</li>\n</ul>\n</li>\n</ul>\n</div>\n</div>\n<div class=\'clear\'></div>\n<span class=\'widget-item-control\'>\n<span class=\'item-control blog-admin\'>\n<a class=\'quickedit\' href=\'//www.blogger.com/rearrange?blogID=4195135246107166251&widgetType=BlogArchive&widgetId=BlogArchive1&action=editWidget&sectionId=sidebar\' onclick=\'return _WidgetManager._PopupConfig(document.getElementById("BlogArchive1"));\' rel=\'nofollow\' target=\'configBlogArchive1\' title=\'Edit\'>\n<img alt=\'\' height=\'18\' src=\'https://resources.blogblog.com/img/icon18_wrench_allbkg.png\' width=\'18\'/>\n</a>\n</span>\n</span>\n<div class=\'clear\'></div>\n</div>\n</div><div class=\'widget Profile\' data-version=\'1\' id=\'Profile1\'>\n<h2>About Me</h2>\n<div class=\'widget-content\'>\n<a href=\'https://www.blogger.com/profile/12821714508588242516\'><img alt=\'My photo\' class=\'profile-img\' height=\'80\' src=\'//2.bp.blogspot.com/_FG9t5W1SJ14/SO0aRdEpTAI/AAAAAAAACvw/sQy2btDo2DI/S220-s80/IMG_2192.jpg\' width=\'53\'/></a>\n<dl class=\'profile-datablock\'>\n<dt class=\'profile-data\'>\n<a class=\'profile-name-link g-profile\' href=\'https://www.blogger.com/profile/12821714508588242516\' rel=\'author\' style=\'background-image: url(//www.blogger.com/img/logo-16.png);\'>\nGuido van Rossum\n</a>\n</dt>\n<dd class=\'profile-textblock\'>Python\'s BDFL</dd>\n</dl>\n<a class=\'profile-link\' href=\'https://www.blogger.com/profile/12821714508588242516\' rel=\'author\'>View my complete profile</a>\n<div class=\'clear\'></div>\n<span class=\'widget-item-control\'>\n<span class=\'item-control blog-admin\'>\n<a class=\'quickedit\' href=\'//www.blogger.com/rearrange?blogID=4195135246107166251&widgetType=Profile&widgetId=Profile1&action=editWidget&sectionId=sidebar\' onclick=\'return _WidgetManager._PopupConfig(document.getElementById("Profile1"));\' rel=\'nofollow\' target=\'configProfile1\' title=\'Edit\'>\n<img alt=\'\' height=\'18\' src=\'https://resources.blogblog.com/img/icon18_wrench_allbkg.png\' width=\'18\'/>\n</a>\n</span>\n</span>\n<div class=\'clear\'></div>\n</div>\n</div></div>\n</div>\n<!-- spacer for skins that want sidebar and main to be the same height-->\n<div class=\'clear\'>&#160;</div>\n</div>\n<!-- end content-wrapper -->\n<div id=\'footer-wrapper\'>\n<div class=\'footer no-items section\' id=\'footer\'></div>\n</div>\n</div></div>\n<!-- end outer-wrapper -->\n<script src=\'https://apis.google.com/js/plusone.js\' type=\'text/javascript\'></script>\n\n<script type="text/javascript" src="https://www.blogger.com/static/v1/widgets/3236635003-widgets.js"></script>\n<script type=\'text/javascript\'>\nwindow[\'__wavt\'] = \'AOuZoY5yNKA5W7NfDL5jC8sgYKdhWMg9Vw:1582878948016\';_WidgetManager._Init(\'//www.blogger.com/rearrange?blogID\\x3d4195135246107166251\',\'//neopythonic.blogspot.com/\',\'4195135246107166251\');\n_WidgetManager._SetDataContext([{\'name\': \'blog\', \'data\': {\'blogId\': \'4195135246107166251\', \'title\': \'Neopythonic\', \'url\': \'http://neopythonic.blogspot.com/\', \'canonicalUrl\': \'http://neopythonic.blogspot.com/\', \'homepageUrl\': \'http://neopythonic.blogspot.com/\', \'searchUrl\': \'http://neopythonic.blogspot.com/search\', \'canonicalHomepageUrl\': \'http://neopythonic.blogspot.com/\', \'blogspotFaviconUrl\': \'http://neopythonic.blogspot.com/favicon.ico\', \'bloggerUrl\': \'https://www.blogger.com\', \'hasCustomDomain\': false, \'httpsEnabled\': true, \'enabledCommentProfileImages\': true, \'gPlusViewType\': \'FILTERED_POSTMOD\', \'adultContent\': false, \'analyticsAccountNumber\': \'\', \'encoding\': \'UTF-8\', \'locale\': \'en\', \'localeUnderscoreDelimited\': \'en\', \'languageDirection\': \'ltr\', \'isPrivate\': false, \'isMobile\': false, \'isMobileRequest\': false, \'mobileClass\': \'\', \'isPrivateBlog\': false, \'feedLinks\': \'\\x3clink rel\\x3d\\x22alternate\\x22 type\\x3d\\x22application/atom+xml\\x22 title\\x3d\\x22Neopythonic - Atom\\x22 href\\x3d\\x22http://neopythonic.blogspot.com/feeds/posts/default\\x22 /\\x3e\\n\\x3clink rel\\x3d\\x22alternate\\x22 type\\x3d\\x22application/rss+xml\\x22 title\\x3d\\x22Neopythonic - RSS\\x22 href\\x3d\\x22http://neopythonic.blogspot.com/feeds/posts/default?alt\\x3drss\\x22 /\\x3e\\n\\x3clink rel\\x3d\\x22service.post\\x22 type\\x3d\\x22application/atom+xml\\x22 title\\x3d\\x22Neopythonic - Atom\\x22 href\\x3d\\x22https://www.blogger.com/feeds/4195135246107166251/posts/default\\x22 /\\x3e\\n\', \'meTag\': \'\\x3clink rel\\x3d\\x22me\\x22 href\\x3d\\x22https://www.blogger.com/profile/12821714508588242516\\x22 /\\x3e\\n\', \'adsenseHostId\': \'ca-host-pub-1556223355139109\', \'adsenseHasAds\': false, \'view\': \'\', \'dynamicViewsCommentsSrc\': \'//www.blogblog.com/dynamicviews/4224c15c4e7c9321/js/comments.js\', \'dynamicViewsScriptSrc\': \'//www.blogblog.com/dynamicviews/752be96649ffb269\', \'plusOneApiSrc\': \'https://apis.google.com/js/plusone.js\', \'disableGComments\': true, \'sharing\': {\'platforms\': [{\'name\': \'Get link\', \'key\': \'link\', \'shareMessage\': \'Get link\', \'target\': \'\'}, {\'name\': \'Facebook\', \'key\': \'facebook\', \'shareMessage\': \'Share to Facebook\', \'target\': \'facebook\'}, {\'name\': \'BlogThis!\', \'key\': \'blogThis\', \'shareMessage\': \'BlogThis!\', \'target\': \'blog\'}, {\'name\': \'Twitter\', \'key\': \'twitter\', \'shareMessage\': \'Share to Twitter\', \'target\': \'twitter\'}, {\'name\': \'Pinterest\', \'key\': \'pinterest\', \'shareMessage\': \'Share to Pinterest\', \'target\': \'pinterest\'}, {\'name\': \'Email\', \'key\': \'email\', \'shareMessage\': \'Email\', \'target\': \'email\'}], \'disableGooglePlus\': true, \'googlePlusShareButtonWidth\': 300, \'googlePlusBootstrap\': \'\\x3cscript type\\x3d\\x22text/javascript\\x22\\x3ewindow.___gcfg \\x3d {\\x27lang\\x27: \\x27en\\x27};\\x3c/script\\x3e\'}, \'hasCustomJumpLinkMessage\': false, \'jumpLinkMessage\': \'Read more\', \'pageType\': \'index\', \'pageName\': \'\', \'pageTitle\': \'Neopythonic\'}}, {\'name\': \'features\', \'data\': {\'sharing_get_link_dialog\': \'true\', \'sharing_native\': \'false\'}}, {\'name\': \'messages\', \'data\': {\'edit\': \'Edit\', \'linkCopiedToClipboard\': \'Link copied to clipboard!\', \'ok\': \'Ok\', \'postLink\': \'Post Link\'}}, {\'name\': \'template\', \'data\': {\'name\': \'custom\', \'localizedName\': \'Custom\', \'isResponsive\': false, \'isAlternateRendering\': false, \'isCustom\': true}}, {\'name\': \'view\', \'data\': {\'classic\': {\'name\': \'classic\', \'url\': \'?view\\x3dclassic\'}, \'flipcard\': {\'name\': \'flipcard\', \'url\': \'?view\\x3dflipcard\'}, \'magazine\': {\'name\': \'magazine\', \'url\': \'?view\\x3dmagazine\'}, \'mosaic\': {\'name\': \'mosaic\', \'url\': \'?view\\x3dmosaic\'}, \'sidebar\': {\'name\': \'sidebar\', \'url\': \'?view\\x3dsidebar\'}, \'snapshot\': {\'name\': \'snapshot\', \'url\': \'?view\\x3dsnapshot\'}, \'timeslide\': {\'name\': \'timeslide\', \'url\': \'?view\\x3dtimeslide\'}, \'isMobile\': false, \'title\': \'Neopythonic\', \'description\': \'Ramblings through technology, politics, culture and philosophy by the creator of the Python programming language.\', \'url\': \'http://neopythonic.blogspot.com/\', \'type\': \'feed\', \'isSingleItem\': false, \'isMultipleItems\': true, \'isError\': false, \'isPage\': false, \'isPost\': false, \'isHomepage\': true, \'isArchive\': false, \'isLabelSearch\': false}}]);\n_WidgetManager._RegisterWidget(\'_NavbarView\', new _WidgetInfo(\'Navbar1\', \'navbar\', document.getElementById(\'Navbar1\'), {}, \'displayModeFull\'));\n_WidgetManager._RegisterWidget(\'_BlogView\', new _WidgetInfo(\'Blog1\', \'main\', document.getElementById(\'Blog1\'), {\'cmtInteractionsEnabled\': false, \'lightboxEnabled\': true, \'lightboxModuleUrl\': \'https://www.blogger.com/static/v1/jsbin/577060686-lbx.js\', \'lightboxCssUrl\': \'https://www.blogger.com/static/v1/v-css/368954415-lightbox_bundle.css\'}, \'displayModeFull\'));\n_WidgetManager._RegisterWidget(\'_HeaderView\', new _WidgetInfo(\'Header1\', \'header\', document.getElementById(\'Header1\'), {}, \'displayModeFull\'));\n_WidgetManager._RegisterWidget(\'_FollowersView\', new _WidgetInfo(\'Followers1\', \'sidebar\', document.getElementById(\'Followers1\'), {}, \'displayModeFull\'));\n_WidgetManager._RegisterWidget(\'_BlogArchiveView\', new _WidgetInfo(\'BlogArchive1\', \'sidebar\', document.getElementById(\'BlogArchive1\'), {\'languageDirection\': \'ltr\', \'loadingMessage\': \'Loading\\x26hellip;\'}, \'displayModeFull\'));\n_WidgetManager._RegisterWidget(\'_ProfileView\', new _WidgetInfo(\'Profile1\', \'sidebar\', document.getElementById(\'Profile1\'), {}, \'displayModeFull\'));\n</script>\n</body>\n</html>'

We have several options to get the titles:

  • Using string splits
  • Using regular expressions
  • Using HTML parsing package

Let's try to find the titles using each method:

1.1 Parsing using String Splits

In [2]:
html = s
html_parts = html.split("h3 class='post-title entry-title")

def get_title(html):
    h = html.split("</h3>")[0]
    print("After first split:\n %s\n" % h)
    h = h.split("'>")[2]
    print("After second split:\n %s\n" % h)
    return h.replace("</a>", "").strip()


l = [get_title(i) for i in html_parts[1:]]
l
    
After first split:
 ' itemprop='name'>
<a href='http://neopythonic.blogspot.com/2019/03/why-operators-are-useful.html'>Why operators are useful</a>


After second split:
 Why operators are useful</a>


After first split:
 ' itemprop='name'>
<a href='http://neopythonic.blogspot.com/2018/11/what-do-do-with-your-computer-science.html'>What to do with your computer science career</a>


After second split:
 What to do with your computer science career</a>


After first split:
 ' itemprop='name'>
<a href='http://neopythonic.blogspot.com/2016/07/about-spammers-and-comments.html'>About spammers and comments</a>


After second split:
 About spammers and comments</a>


After first split:
 ' itemprop='name'>
<a href='http://neopythonic.blogspot.com/2016/05/union-syntax.html'>Union syntax</a>


After second split:
 Union syntax</a>


After first split:
 ' itemprop='name'>
<a href='http://neopythonic.blogspot.com/2016/05/adding-type-annotations-for-fspath.html'>Adding type annotations for fspath</a>


After second split:
 Adding type annotations for fspath</a>


After first split:
 ' itemprop='name'>
<a href='http://neopythonic.blogspot.com/2016/05/the-anystr-type-variable.html'>The AnyStr type variable</a>


After second split:
 The AnyStr type variable</a>


Out[2]:
['Why operators are useful',
 'What to do with your computer science career',
 'About spammers and comments',
 'Union syntax',
 'Adding type annotations for fspath',
 'The AnyStr type variable']

1.2 Parsing using Regular Expressions

In [3]:
import re
r = re.compile(r"<h3 class='post-title entry-title'.*?>.*?>(.*?)</a><\/h3>")
r.findall(html.replace("\r","").replace("\n",""))
Out[3]:
['Why operators are useful',
 'What to do with your computer science career',
 'About spammers and comments',
 'Union syntax',
 'Adding type annotations for fspath',
 'The AnyStr type variable']

1.3 Parsing using BeautifulSoup

In [4]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
l = soup.findAll('h3', attrs={'class': 'post-title entry-title'})
l
Out[4]:
[<h3 class="post-title entry-title" itemprop="name">
 <a href="http://neopythonic.blogspot.com/2019/03/why-operators-are-useful.html">Why operators are useful</a>
 </h3>, <h3 class="post-title entry-title" itemprop="name">
 <a href="http://neopythonic.blogspot.com/2018/11/what-do-do-with-your-computer-science.html">What to do with your computer science career</a>
 </h3>, <h3 class="post-title entry-title" itemprop="name">
 <a href="http://neopythonic.blogspot.com/2016/07/about-spammers-and-comments.html">About spammers and comments</a>
 </h3>, <h3 class="post-title entry-title" itemprop="name">
 <a href="http://neopythonic.blogspot.com/2016/05/union-syntax.html">Union syntax</a>
 </h3>, <h3 class="post-title entry-title" itemprop="name">
 <a href="http://neopythonic.blogspot.com/2016/05/adding-type-annotations-for-fspath.html">Adding type annotations for fspath</a>
 </h3>, <h3 class="post-title entry-title" itemprop="name">
 <a href="http://neopythonic.blogspot.com/2016/05/the-anystr-type-variable.html">The AnyStr type variable</a>
 </h3>]
In [5]:
#Getting the titles
[t.text.strip() for t in l]
Out[5]:
['Why operators are useful',
 'What to do with your computer science career',
 'About spammers and comments',
 'Union syntax',
 'Adding type annotations for fspath',
 'The AnyStr type variable']

2. Collecting Data using APIs

One straightforward way to collect data is using APIs. In the following example, we will use the Wikipedia Python package, which warps the MediaWiki API. First, we install the Wikipedia package. For visualization, we will also install the Networkx package

In [6]:
!pip install wikipedia
!pip install networkx
Collecting wikipedia
Requirement already satisfied: requests<3.0.0,>=2.0.0 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from wikipedia) (2.22.0)
Requirement already satisfied: beautifulsoup4 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from wikipedia) (4.8.0)
Requirement already satisfied: certifi>=2017.4.17 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from requests<3.0.0,>=2.0.0->wikipedia) (2019.9.11)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from requests<3.0.0,>=2.0.0->wikipedia) (1.24.2)
Requirement already satisfied: idna<2.9,>=2.5 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from requests<3.0.0,>=2.0.0->wikipedia) (2.8)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from requests<3.0.0,>=2.0.0->wikipedia) (3.0.4)
Requirement already satisfied: soupsieve>=1.2 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from beautifulsoup4->wikipedia) (1.9.3)
Installing collected packages: wikipedia
Successfully installed wikipedia-1.4.0
Requirement already satisfied: networkx in /anaconda3/envs/massivedata/lib/python3.6/site-packages (2.3)
Requirement already satisfied: decorator>=4.3.0 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from networkx) (4.4.0)
In [7]:
import wikipedia
w = wikipedia.page("Machine Learning")
w.summary
Out[7]:
'Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions, relying on patterns and inference instead. It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task. Machine learning algorithms are used in a wide variety of applications, such as email filtering and computer vision, where it is difficult or infeasible to develop a conventional algorithm for effectively performing the task.\nMachine learning is closely related to computational statistics, which focuses on making predictions using computers. The study of mathematical optimization delivers methods, theory and application domains to the field of machine learning. Data mining is a field of study within machine learning, and focuses on exploratory data analysis through unsupervised learning. In its application across business problems, machine learning is also referred to as predictive analytics.\n\n'
In [8]:
w.links[:20]
Out[8]:
['ACM Computing Classification System',
 'ACM Computing Surveys',
 'ADALINE',
 'AT&T Labs',
 'Action selection',
 'Active learning (machine learning)',
 'Adaptive website',
 'Affective computing',
 'Alan Turing',
 'Algorithm',
 'Algorithm design',
 'Algorithmic bias',
 'Algorithmic efficiency',
 'Amazon Machine Learning',
 'Analysis of algorithms',
 'Angoss',
 'Anomaly detection',
 'Apache Mahout',
 'Apache Spark',
 'Apache SystemML']

Let's build a graph in which each vertex is a category and each link is between categories that reference each other.

In [9]:
verticies = set(w.links[:50])
links = [("Machine Learning", l) for l in verticies]

for v in verticies:
    try:
        w = wikipedia.page(v)
        for v2 in w.links:
            if v2 in verticies:
                links.append((v,v2))    
    except Exception as e:
        print(e)
        
/anaconda3/envs/massivedata/lib/python3.6/site-packages/wikipedia/wikipedia.py:389: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 389 of the file /anaconda3/envs/massivedata/lib/python3.6/site-packages/wikipedia/wikipedia.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor.

  lis = BeautifulSoup(html).find_all('li')
"adeline" may refer to: 
Adeline (given name)
Yves-Marie Adeline
Adeline, Illinois
Adeline Records
Adeline Software International
Ballade pour Adeline
Portrait of Mary Adeline Williams
"Adeline" (song)
Adeleorina
Cyclone Adeline
Pépinières Arboretum Adeline
Adeline (rocket)
Sweet Adeline (disambiguation)

Let's draw the graph:

In [10]:
import networkx as nx
%matplotlib inline 
g = nx.DiGraph()
g.add_edges_from(set(links))
nx.info(g)
Out[10]:
'Name: \nType: DiGraph\nNumber of nodes: 51\nNumber of edges: 281\nAverage in degree:   5.5098\nAverage out degree:   5.5098'
In [11]:
import matplotlib.pyplot as plt
plt.figure(3,figsize=(14,14))
nx.draw_kamada_kawai(g, with_labels=True)
/anaconda3/envs/massivedata/lib/python3.6/site-packages/networkx/drawing/nx_pylab.py:579: MatplotlibDeprecationWarning: 
The iterable function was deprecated in Matplotlib 3.1 and will be removed in 3.3. Use np.iterable instead.
  if not cb.iterable(width):
/anaconda3/envs/massivedata/lib/python3.6/site-packages/networkx/drawing/nx_pylab.py:676: MatplotlibDeprecationWarning: 
The iterable function was deprecated in Matplotlib 3.1 and will be removed in 3.3. Use np.iterable instead.
  if cb.iterable(node_size):  # many node sizes

We are going to talk more about graphs in the next lectures.

3. NumPy, the Basics

In this section, we are going to give some examples of how to work with NumPy, and the advantages of NumPy's array (ndarray) over Python list object

In [42]:
import numpy as np
In [43]:
??np
In [44]:
dir(np)
Out[44]:
['ALLOW_THREADS',
 'AxisError',
 'BUFSIZE',
 'CLIP',
 'ComplexWarning',
 'DataSource',
 'ERR_CALL',
 'ERR_DEFAULT',
 'ERR_IGNORE',
 'ERR_LOG',
 'ERR_PRINT',
 'ERR_RAISE',
 'ERR_WARN',
 'FLOATING_POINT_SUPPORT',
 'FPE_DIVIDEBYZERO',
 'FPE_INVALID',
 'FPE_OVERFLOW',
 'FPE_UNDERFLOW',
 'False_',
 'Inf',
 'Infinity',
 'MAXDIMS',
 'MAY_SHARE_BOUNDS',
 'MAY_SHARE_EXACT',
 'MachAr',
 'ModuleDeprecationWarning',
 'NAN',
 'NINF',
 'NZERO',
 'NaN',
 'PINF',
 'PZERO',
 'RAISE',
 'RankWarning',
 'SHIFT_DIVIDEBYZERO',
 'SHIFT_INVALID',
 'SHIFT_OVERFLOW',
 'SHIFT_UNDERFLOW',
 'ScalarType',
 'Tester',
 'TooHardError',
 'True_',
 'UFUNC_BUFSIZE_DEFAULT',
 'UFUNC_PYVALS_NAME',
 'VisibleDeprecationWarning',
 'WRAP',
 '_NoValue',
 '_UFUNC_API',
 '__NUMPY_SETUP__',
 '__all__',
 '__builtins__',
 '__cached__',
 '__config__',
 '__doc__',
 '__file__',
 '__git_revision__',
 '__loader__',
 '__mkl_version__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 '_add_newdoc_ufunc',
 '_distributor_init',
 '_globals',
 '_mat',
 '_pytesttester',
 'abs',
 'absolute',
 'absolute_import',
 'add',
 'add_docstring',
 'add_newdoc',
 'add_newdoc_ufunc',
 'alen',
 'all',
 'allclose',
 'alltrue',
 'amax',
 'amin',
 'angle',
 'any',
 'append',
 'apply_along_axis',
 'apply_over_axes',
 'arange',
 'arccos',
 'arccosh',
 'arcsin',
 'arcsinh',
 'arctan',
 'arctan2',
 'arctanh',
 'argmax',
 'argmin',
 'argpartition',
 'argsort',
 'argwhere',
 'around',
 'array',
 'array2string',
 'array_equal',
 'array_equiv',
 'array_repr',
 'array_split',
 'array_str',
 'asanyarray',
 'asarray',
 'asarray_chkfinite',
 'ascontiguousarray',
 'asfarray',
 'asfortranarray',
 'asmatrix',
 'asscalar',
 'atleast_1d',
 'atleast_2d',
 'atleast_3d',
 'average',
 'bartlett',
 'base_repr',
 'binary_repr',
 'bincount',
 'bitwise_and',
 'bitwise_not',
 'bitwise_or',
 'bitwise_xor',
 'blackman',
 'block',
 'bmat',
 'bool',
 'bool8',
 'bool_',
 'broadcast',
 'broadcast_arrays',
 'broadcast_to',
 'busday_count',
 'busday_offset',
 'busdaycalendar',
 'byte',
 'byte_bounds',
 'bytes0',
 'bytes_',
 'c_',
 'can_cast',
 'cast',
 'cbrt',
 'cdouble',
 'ceil',
 'cfloat',
 'char',
 'character',
 'chararray',
 'choose',
 'clip',
 'clongdouble',
 'clongfloat',
 'column_stack',
 'common_type',
 'compare_chararrays',
 'compat',
 'complex',
 'complex128',
 'complex256',
 'complex64',
 'complex_',
 'complexfloating',
 'compress',
 'concatenate',
 'conj',
 'conjugate',
 'convolve',
 'copy',
 'copysign',
 'copyto',
 'core',
 'corrcoef',
 'correlate',
 'cos',
 'cosh',
 'count_nonzero',
 'cov',
 'cross',
 'csingle',
 'ctypeslib',
 'cumprod',
 'cumproduct',
 'cumsum',
 'datetime64',
 'datetime_as_string',
 'datetime_data',
 'deg2rad',
 'degrees',
 'delete',
 'deprecate',
 'deprecate_with_doc',
 'diag',
 'diag_indices',
 'diag_indices_from',
 'diagflat',
 'diagonal',
 'diff',
 'digitize',
 'disp',
 'divide',
 'division',
 'divmod',
 'dot',
 'double',
 'dsplit',
 'dstack',
 'dtype',
 'dual',
 'e',
 'ediff1d',
 'einsum',
 'einsum_path',
 'emath',
 'empty',
 'empty_like',
 'equal',
 'errstate',
 'euler_gamma',
 'exp',
 'exp2',
 'expand_dims',
 'expm1',
 'extract',
 'eye',
 'fabs',
 'fastCopyAndTranspose',
 'fft',
 'fill_diagonal',
 'find_common_type',
 'finfo',
 'fix',
 'flatiter',
 'flatnonzero',
 'flexible',
 'flip',
 'fliplr',
 'flipud',
 'float',
 'float128',
 'float16',
 'float32',
 'float64',
 'float_',
 'float_power',
 'floating',
 'floor',
 'floor_divide',
 'fmax',
 'fmin',
 'fmod',
 'format_float_positional',
 'format_float_scientific',
 'format_parser',
 'frexp',
 'frombuffer',
 'fromfile',
 'fromfunction',
 'fromiter',
 'frompyfunc',
 'fromregex',
 'fromstring',
 'full',
 'full_like',
 'fv',
 'gcd',
 'generic',
 'genfromtxt',
 'geomspace',
 'get_array_wrap',
 'get_include',
 'get_printoptions',
 'getbufsize',
 'geterr',
 'geterrcall',
 'geterrobj',
 'gradient',
 'greater',
 'greater_equal',
 'half',
 'hamming',
 'hanning',
 'heaviside',
 'histogram',
 'histogram2d',
 'histogram_bin_edges',
 'histogramdd',
 'hsplit',
 'hstack',
 'hypot',
 'i0',
 'identity',
 'iinfo',
 'imag',
 'in1d',
 'index_exp',
 'indices',
 'inexact',
 'inf',
 'info',
 'infty',
 'inner',
 'insert',
 'int',
 'int0',
 'int16',
 'int32',
 'int64',
 'int8',
 'int_',
 'int_asbuffer',
 'intc',
 'integer',
 'interp',
 'intersect1d',
 'intp',
 'invert',
 'ipmt',
 'irr',
 'is_busday',
 'isclose',
 'iscomplex',
 'iscomplexobj',
 'isfinite',
 'isfortran',
 'isin',
 'isinf',
 'isnan',
 'isnat',
 'isneginf',
 'isposinf',
 'isreal',
 'isrealobj',
 'isscalar',
 'issctype',
 'issubclass_',
 'issubdtype',
 'issubsctype',
 'iterable',
 'ix_',
 'kaiser',
 'kron',
 'lcm',
 'ldexp',
 'left_shift',
 'less',
 'less_equal',
 'lexsort',
 'lib',
 'linalg',
 'linspace',
 'little_endian',
 'load',
 'loads',
 'loadtxt',
 'log',
 'log10',
 'log1p',
 'log2',
 'logaddexp',
 'logaddexp2',
 'logical_and',
 'logical_not',
 'logical_or',
 'logical_xor',
 'logspace',
 'long',
 'longcomplex',
 'longdouble',
 'longfloat',
 'longlong',
 'lookfor',
 'ma',
 'mafromtxt',
 'mask_indices',
 'mat',
 'math',
 'matmul',
 'matrix',
 'matrixlib',
 'max',
 'maximum',
 'maximum_sctype',
 'may_share_memory',
 'mean',
 'median',
 'memmap',
 'meshgrid',
 'mgrid',
 'min',
 'min_scalar_type',
 'minimum',
 'mintypecode',
 'mirr',
 'mkl',
 'mod',
 'modf',
 'moveaxis',
 'msort',
 'multiply',
 'nan',
 'nan_to_num',
 'nanargmax',
 'nanargmin',
 'nancumprod',
 'nancumsum',
 'nanmax',
 'nanmean',
 'nanmedian',
 'nanmin',
 'nanpercentile',
 'nanprod',
 'nanquantile',
 'nanstd',
 'nansum',
 'nanvar',
 'nbytes',
 'ndarray',
 'ndenumerate',
 'ndfromtxt',
 'ndim',
 'ndindex',
 'nditer',
 'negative',
 'nested_iters',
 'newaxis',
 'nextafter',
 'nonzero',
 'not_equal',
 'nper',
 'npv',
 'numarray',
 'number',
 'obj2sctype',
 'object',
 'object0',
 'object_',
 'ogrid',
 'oldnumeric',
 'ones',
 'ones_like',
 'os',
 'outer',
 'packbits',
 'pad',
 'partition',
 'percentile',
 'pi',
 'piecewise',
 'place',
 'pmt',
 'poly',
 'poly1d',
 'polyadd',
 'polyder',
 'polydiv',
 'polyfit',
 'polyint',
 'polymul',
 'polynomial',
 'polysub',
 'polyval',
 'positive',
 'power',
 'ppmt',
 'print_function',
 'printoptions',
 'prod',
 'product',
 'promote_types',
 'ptp',
 'put',
 'put_along_axis',
 'putmask',
 'pv',
 'quantile',
 'r_',
 'rad2deg',
 'radians',
 'random',
 'rank',
 'rate',
 'ravel',
 'ravel_multi_index',
 'real',
 'real_if_close',
 'rec',
 'recarray',
 'recfromcsv',
 'recfromtxt',
 'reciprocal',
 'record',
 'remainder',
 'repeat',
 'require',
 'reshape',
 'resize',
 'result_type',
 'right_shift',
 'rint',
 'roll',
 'rollaxis',
 'roots',
 'rot90',
 'round',
 'round_',
 'row_stack',
 's_',
 'safe_eval',
 'save',
 'savetxt',
 'savez',
 'savez_compressed',
 'sctype2char',
 'sctypeDict',
 'sctypeNA',
 'sctypes',
 'searchsorted',
 'select',
 'set_numeric_ops',
 'set_printoptions',
 'set_string_function',
 'setbufsize',
 'setdiff1d',
 'seterr',
 'seterrcall',
 'seterrobj',
 'setxor1d',
 'shape',
 'shares_memory',
 'short',
 'show_config',
 'sign',
 'signbit',
 'signedinteger',
 'sin',
 'sinc',
 'single',
 'singlecomplex',
 'sinh',
 'size',
 'sometrue',
 'sort',
 'sort_complex',
 'source',
 'spacing',
 'split',
 'sqrt',
 'square',
 'squeeze',
 'stack',
 'std',
 'str',
 'str0',
 'str_',
 'string_',
 'subtract',
 'sum',
 'swapaxes',
 'sys',
 'take',
 'take_along_axis',
 'tan',
 'tanh',
 'tensordot',
 'test',
 'testing',
 'tile',
 'timedelta64',
 'trace',
 'tracemalloc_domain',
 'transpose',
 'trapz',
 'tri',
 'tril',
 'tril_indices',
 'tril_indices_from',
 'trim_zeros',
 'triu',
 'triu_indices',
 'triu_indices_from',
 'true_divide',
 'trunc',
 'typeDict',
 'typeNA',
 'typecodes',
 'typename',
 'ubyte',
 'ufunc',
 'uint',
 'uint0',
 'uint16',
 'uint32',
 'uint64',
 'uint8',
 'uintc',
 'uintp',
 'ulonglong',
 'unicode',
 'unicode_',
 'union1d',
 'unique',
 'unpackbits',
 'unravel_index',
 'unsignedinteger',
 'unwrap',
 'ushort',
 'vander',
 'var',
 'vdot',
 'vectorize',
 'version',
 'void',
 'void0',
 'vsplit',
 'vstack',
 'warnings',
 'where',
 'who',
 'zeros',
 'zeros_like']

As can be seen above NumPy comes with extensive functionality. One of the big advantages of NumPy is running fast on large numerical arrays. Let's see some examples:

In [45]:
import random
l = list(range(10000000))
random.shuffle(l)
In [46]:
a = np.array(l) # we can also create a Numpy array using np.arange function
                # we can also create a large-scale random array using the following np.random.randint(1,10000000, size=10000000)

print(type(a))
print(a)
<class 'numpy.ndarray'>
[9258905 4967159  351391 ... 4316321 7690808 2859538]
In [47]:
import sys
print(sys.getsizeof(l))
print(a.nbytes) 
90000112
80000000

We can see that the NumPy ndarray object has less memory consumption. Let's see other NumPy advantages:

In [48]:
%timeit sum(l) / float(len(l))
%timeit np.average(a)
353 ms ± 9.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
8.89 ms ± 64.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [49]:
%timeit max(l)
%timeit np.max(a)
1.16 s ± 14.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
9.5 ms ± 205 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

On average using NumPy's average and max functions is considerably faster than using their counterpart list functions. Additionally, NumPy lets us do numerical operations on a list in a very easy and straightforward way. Let's add and subtract two lists:

In [50]:
l1 = list(range(1000000))
random.shuffle(l1)
l2 = list(range(1000000))
random.shuffle(l2)
a1 = np.array(l1)
a2 = np.array(l2)
In [51]:
#adding two list (the regular '+' operator on list will append the two lists)
l = []
for i in range(1000000):
    l.append(l1[i] + l2[i])
l[:10]
Out[51]:
[1673707,
 810930,
 1568831,
 617571,
 845777,
 1407915,
 1140404,
 1556716,
 1072977,
 863667]
In [52]:
a = a1 + a2
a[:10]
Out[52]:
array([1673707,  810930, 1568831,  617571,  845777, 1407915, 1140404,
       1556716, 1072977,  863667])
In [53]:
%timeit l = [l1[i] + l2[i] for i in range(1000000)]
%timeit a = a1 + a2
340 ms ± 2.15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.24 ms ± 16.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

We can see that adding two arrays with NumPy is much simpler and faster. Using NumPy, we can also do another array arithmetic:

In [54]:
print(a1-a2)
print(a1*a2)
print(a1[:10]/a2[:10])
print(np.concatenate([a1,a2]))
[ 159285  300100  155847 ...   31803 -150837  364097]
[693980852656 141886863725 609235604788 ... 815314174954 735169043488
 287055317388]
[1.21035748 2.17495057 1.22059273 0.02574788 0.92570423 2.04359434
 0.66429856 0.84085795 7.09763405 0.13727336]
[916496 555515 862339 ... 887186 936149 383811]

Let's look at other useful NumPy ndarray object usage:

In [55]:
a1 = np.array([1,2,3,4,5], dtype = np.int32) # array of integer
print("Array type %s" % a1.dtype)
a2 = np.array([1,2,3,4,5], dtype = np.float64) # array of floats
print("Array type %s" % a2.dtype)
Array type int32
Array type float64
In [56]:
a3 = a2.astype('float128')
print("Array type %s" % a3.dtype)
Array type float128

We can also create a 2-dimension array (or n-dimensions array):

In [57]:
a4 = np.array([[1,2,3,4],[5,6,7,8]])
a4[1] # access the second row
Out[57]:
array([5, 6, 7, 8])
In [58]:
a4[1,1] # access a specific value
Out[58]:
6
In [59]:
a4[:1,2:3]
Out[59]:
array([[3]])
In [60]:
a4.reshape((1,8))
Out[60]:
array([[1, 2, 3, 4, 5, 6, 7, 8]])
In [61]:
a4.reshape((4,2))
Out[61]:
array([[1, 2],
       [3, 4],
       [5, 6],
       [7, 8]])
In [62]:
import numpy as np
a5 = np.array(['Jhon', "Marry", "William", "BoB"])
a5[a5 == "Marry"]
Out[62]:
array(['Marry'], dtype='<U7')

It is important to remember that unlike regular Python lists, slicing an array won't create copies.

In [63]:
l = list(range(10))
a = np.array(l)
sub_l = l[2:-1]
sub_a = a[2:-1]
In [64]:
sub_l
Out[64]:
[2, 3, 4, 5, 6, 7, 8]
In [65]:
sub_a
Out[65]:
array([2, 3, 4, 5, 6, 7, 8])
In [66]:
sub_l[0] = 100
print(sub_l)
print(l)
[100, 3, 4, 5, 6, 7, 8]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [67]:
sub_a[0] = 100
print(sub_a)
print(a)
[100   3   4   5   6   7   8]
[  0   1 100   3   4   5   6   7   8   9]

In the above example, we can see that changing the subarray's values will also change the original array values. NumPy also has a Not a Number value (denoted NaN).

In [68]:
a = np.array([1,np.nan,2,4])
In [69]:
np.sum(a)
Out[69]:
nan

To calculate various statistics that include NaN values, we can use the following functions:

In [70]:
print("Sum = %s" % np.nansum(a))
print("Max = %s" % np.nanmax(a))
print("STD = %s" % np.nanstd(a))
Sum = 7.0
Max = 4.0
STD = 1.247219128924647

Or, we can remove the NaN values from the array:

In [71]:
a = a[np.logical_not(np.isnan(a))] 
print("Sum = %s" % np.sum(a))
print("Max = %s" % np.max(a))
print("STD = %s" % np.std(a))
Sum = 7.0
Max = 4.0
STD = 1.247219128924647

4. Working with Pandas

Let's start with an example. Similar to our last lecture work with DBMS and SQL, we want to explore the Netflix Shows Dataset but this time using Pandas. We will start by downloading the dataset from Kaggle (see first lecture for details):

In [72]:
# Creating a dataset directory
!mkdir ./datasets/netflix

# download the dataset from Kaggle and unzip it 
!kaggle datasets download chasewillden/netflix-shows -p ./datasets/netflix/
!chdir ./datasets/netflix
!unzip ./datasets/netflix/*.zip  -d ./datasets/netflix
!ls ./datasets/netflix
Downloading netflix-shows.zip to ./datasets/netflix
  0%|                                               | 0.00/10.6k [00:00<?, ?B/s]
100%|██████████████████████████████████████| 10.6k/10.6k [00:00<00:00, 8.98MB/s]
/bin/sh: chdir: command not found
Archive:  ./datasets/netflix/netflix-shows.zip
  inflating: ./datasets/netflix/Netflix Shows.csv  
Netflix Shows.csv netflix-shows.zip

Now, let's load the shows data into a DataFrame Object:

In [75]:
import pandas as pd
import numpy as np

First, we load the Netflix Shows csv to a Pandas DataFrame object:

In [76]:
df = pd.read_csv('./datasets/netflix/Netflix Shows.csv', encoding='cp437' )
df
Out[76]:
title rating ratingLevel ratingDescription release year user rating score user rating size
0 White Chicks PG-13 crude and sexual humor, language and some drug... 80 2004 82.0 80
1 Lucky Number Slevin R strong violence, sexual content and adult lang... 100 2006 NaN 82
2 Grey's Anatomy TV-14 Parents strongly cautioned. May be unsuitable ... 90 2016 98.0 80
3 Prison Break TV-14 Parents strongly cautioned. May be unsuitable ... 90 2008 98.0 80
4 How I Met Your Mother TV-PG Parental guidance suggested. May not be suitab... 70 2014 94.0 80
... ... ... ... ... ... ... ...
995 The BFG PG for action/peril, some scary moments and brief... 60 2016 97.0 80
996 The Secret Life of Pets PG for action and some rude humor 60 2016 NaN 81
997 Precious Puppies TV-G Suitable for all ages. 35 2003 NaN 82
998 Beary Tales TV-G Suitable for all ages. 35 2013 NaN 82
999 Growing Up Wild G NaN 35 2016 80.0 80

1000 rows × 7 columns

Let's take a closer look on the user rating score column:

In [77]:
c = df['user rating score']
c
Out[77]:
0      82.0
1       NaN
2      98.0
3      98.0
4      94.0
       ... 
995    97.0
996     NaN
997     NaN
998     NaN
999    80.0
Name: user rating score, Length: 1000, dtype: float64
In [78]:
print("Min = %s" % np.nanmin(c))
print("Max = %s" % np.nanmax(c))
print("Median = %s" % np.nanmedian(c))
print("Average = %s" % np.nanmean(c))
Min = 55.0
Max = 99.0
Median = 88.0
Average = 84.09421487603306

We can create a nice visualization of the user rating score column using the Seaborn package:

In [79]:
!pip install seaborn
Requirement already satisfied: seaborn in /anaconda3/envs/massivedata/lib/python3.6/site-packages (0.9.0)
Requirement already satisfied: pandas>=0.15.2 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from seaborn) (0.25.1)
Requirement already satisfied: numpy>=1.9.3 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from seaborn) (1.17.2)
Requirement already satisfied: matplotlib>=1.4.3 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from seaborn) (3.1.1)
Requirement already satisfied: scipy>=0.14.0 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from seaborn) (1.3.1)
Requirement already satisfied: python-dateutil>=2.6.1 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from pandas>=0.15.2->seaborn) (2.8.0)
Requirement already satisfied: pytz>=2017.2 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from pandas>=0.15.2->seaborn) (2019.3)
Requirement already satisfied: cycler>=0.10 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from matplotlib>=1.4.3->seaborn) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from matplotlib>=1.4.3->seaborn) (1.1.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from matplotlib>=1.4.3->seaborn) (2.4.2)
Requirement already satisfied: six>=1.5 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from python-dateutil>=2.6.1->pandas>=0.15.2->seaborn) (1.12.0)
Requirement already satisfied: setuptools in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from kiwisolver>=1.0.1->matplotlib>=1.4.3->seaborn) (41.4.0)
In [80]:
import seaborn as sns 
c = c[np.logical_not(np.isnan(c))] # removes NaN
sns.set(style="darkgrid")
sns.distplot(c)
Out[80]:
<matplotlib.axes._subplots.AxesSubplot at 0x101af7b6a0>

We can also visualize other interesting statistics, such as the rating histogram for each decade in just few lines of code:

In [81]:
import matplotlib.pyplot as plt
df['decade'] = df['release year'] - df['release year'] % 10 # add a decade column
df = df[df['decade'] >= 1980] # select only rows with decade larger or equal 1980
g = sns.FacetGrid(df, row="rating", col="decade", margin_titles=True)
g.map(plt.hist, "user rating score", color="steelblue")
Out[81]:
<seaborn.axisgrid.FacetGrid at 0x101a8a6470>

4.1 Pandas Objects

Pandas have several objects to manipulate date. Let's take a closer look at them: We start with the Series object. The Series object is a generalized NumPy array, which has both data and index values. We can think of the Series object as a more efficient version of Python's dict object:

In [82]:
s = pd.Series([2,1,2,3,6,7])
s
Out[82]:
0    2
1    1
2    2
3    3
4    6
5    7
dtype: int64
In [83]:
print(s.values)
print(type(s.values))
[2 1 2 3 6 7]
<class 'numpy.ndarray'>
In [84]:
s.index
Out[84]:
RangeIndex(start=0, stop=6, step=1)

We can use strings to be indices:

In [85]:
s = pd.Series([2,1,2,3,6,7], index=['a','b','c','de','e','f'])
s
Out[85]:
a     2
b     1
c     2
de    3
e     6
f     7
dtype: int64
In [86]:
s['f']
Out[86]:
7
In [87]:
s = pd.Series({'A':1, 'B':2, 'C':4})
s
Out[87]:
A    1
B    2
C    4
dtype: int64

The second object we going to look at is the DataFrame object. We can think on the DataFrame object as a table with both columns and rows indices (similar to an Excel sheet :-).

In [88]:
df
Out[88]:
title rating ratingLevel ratingDescription release year user rating score user rating size decade
0 White Chicks PG-13 crude and sexual humor, language and some drug... 80 2004 82.0 80 2000
1 Lucky Number Slevin R strong violence, sexual content and adult lang... 100 2006 NaN 82 2000
2 Grey's Anatomy TV-14 Parents strongly cautioned. May be unsuitable ... 90 2016 98.0 80 2010
3 Prison Break TV-14 Parents strongly cautioned. May be unsuitable ... 90 2008 98.0 80 2000
4 How I Met Your Mother TV-PG Parental guidance suggested. May not be suitab... 70 2014 94.0 80 2010
... ... ... ... ... ... ... ... ...
995 The BFG PG for action/peril, some scary moments and brief... 60 2016 97.0 80 2010
996 The Secret Life of Pets PG for action and some rude humor 60 2016 NaN 81 2010
997 Precious Puppies TV-G Suitable for all ages. 35 2003 NaN 82 2000
998 Beary Tales TV-G Suitable for all ages. 35 2013 NaN 82 2010
999 Growing Up Wild G NaN 35 2016 80.0 80 2010

991 rows × 8 columns

In [89]:
print("DataFrame columns %s" % df.columns)
print("DataFrame Index %s" % df.index)
DataFrame columns Index(['title', 'rating', 'ratingLevel', 'ratingDescription', 'release year',
       'user rating score', 'user rating size', 'decade'],
      dtype='object')
DataFrame Index Int64Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,
            ...
            990, 991, 992, 993, 994, 995, 996, 997, 998, 999],
           dtype='int64', length=991)

We can change the index values to the following:

In [90]:
df.index = df["title"]
In [91]:
type(df["title"])
Out[91]:
pandas.core.series.Series
In [92]:
df['rating']
Out[92]:
title
White Chicks               PG-13
Lucky Number Slevin            R
Grey's Anatomy             TV-14
Prison Break               TV-14
How I Met Your Mother      TV-PG
                           ...  
The BFG                       PG
The Secret Life of Pets       PG
Precious Puppies            TV-G
Beary Tales                 TV-G
Growing Up Wild                G
Name: rating, Length: 991, dtype: object
In [93]:
df['rating']['Sherlock']
Out[93]:
Sherlock    TV-14
Sherlock    TV-14
Sherlock    TV-14
Name: rating, dtype: object

We can create new DataFrame objects by selecting columns and rows:

In [94]:
#selecting specific columns
df2 = df[['title','rating', 'release year']]
df2
Out[94]:
title rating release year
title
White Chicks White Chicks PG-13 2004
Lucky Number Slevin Lucky Number Slevin R 2006
Grey's Anatomy Grey's Anatomy TV-14 2016
Prison Break Prison Break TV-14 2008
How I Met Your Mother How I Met Your Mother TV-PG 2014
... ... ... ...
The BFG The BFG PG 2016
The Secret Life of Pets The Secret Life of Pets PG 2016
Precious Puppies Precious Puppies TV-G 2003
Beary Tales Beary Tales TV-G 2013
Growing Up Wild Growing Up Wild G 2016

991 rows × 3 columns

In [95]:
#selecting 10 top rows
df.iloc[:10]
Out[95]:
title rating ratingLevel ratingDescription release year user rating score user rating size decade
title
White Chicks White Chicks PG-13 crude and sexual humor, language and some drug... 80 2004 82.0 80 2000
Lucky Number Slevin Lucky Number Slevin R strong violence, sexual content and adult lang... 100 2006 NaN 82 2000
Grey's Anatomy Grey's Anatomy TV-14 Parents strongly cautioned. May be unsuitable ... 90 2016 98.0 80 2010
Prison Break Prison Break TV-14 Parents strongly cautioned. May be unsuitable ... 90 2008 98.0 80 2000
How I Met Your Mother How I Met Your Mother TV-PG Parental guidance suggested. May not be suitab... 70 2014 94.0 80 2010
Supernatural Supernatural TV-14 Parents strongly cautioned. May be unsuitable ... 90 2016 95.0 80 2010
Breaking Bad Breaking Bad TV-MA For mature audiences. May not be suitable for... 110 2013 97.0 80 2010
The Vampire Diaries The Vampire Diaries TV-14 Parents strongly cautioned. May be unsuitable ... 90 2017 91.0 80 2010
The Walking Dead The Walking Dead TV-MA For mature audiences. May not be suitable for... 110 2015 98.0 80 2010
Pretty Little Liars Pretty Little Liars TV-14 Parents strongly cautioned. May be unsuitable ... 90 2016 96.0 80 2010
In [96]:
#selecting any other rows 
df.iloc[::2]
Out[96]:
title rating ratingLevel ratingDescription release year user rating score user rating size decade
title
White Chicks White Chicks PG-13 crude and sexual humor, language and some drug... 80 2004 82.0 80 2000
Grey's Anatomy Grey's Anatomy TV-14 Parents strongly cautioned. May be unsuitable ... 90 2016 98.0 80 2010
How I Met Your Mother How I Met Your Mother TV-PG Parental guidance suggested. May not be suitab... 70 2014 94.0 80 2010
Breaking Bad Breaking Bad TV-MA For mature audiences. May not be suitable for... 110 2013 97.0 80 2010
The Walking Dead The Walking Dead TV-MA For mature audiences. May not be suitable for... 110 2015 98.0 80 2010
... ... ... ... ... ... ... ... ...
Dawn of the Croods Dawn of the Croods TV-Y7 Suitable for children ages 7 and older 41 2017 72.0 80 2010
Wiener Dog Internationals Wiener Dog Internationals G General Audiences. Suitable for all ages. 35 2015 NaN 82 2010
The BFG The BFG PG for action/peril, some scary moments and brief... 60 2016 97.0 80 2010
Precious Puppies Precious Puppies TV-G Suitable for all ages. 35 2003 NaN 82 2000
Growing Up Wild Growing Up Wild G NaN 35 2016 80.0 80 2010

496 rows × 8 columns

In [97]:
df2 = df[['title','rating', 'release year']]

There are duplicate rows let's remove them:

In [98]:
print("DataFrame's shape (%s,%s)" %df2.shape)
df2 = df2.drop_duplicates()
print("DataFrame's shape (%s,%s) (after remove)" %df2.shape)
DataFrame's shape (991,3)
DataFrame's shape (497,3) (after remove)

We can transpose between rows and columns:

In [99]:
df2.T
Out[99]:
title White Chicks Lucky Number Slevin Grey's Anatomy Prison Break How I Met Your Mother Supernatural Breaking Bad The Vampire Diaries The Walking Dead Pretty Little Liars ... Flicka 2 H2O: Just Add Water Dolphin Tale Step Dogs Mia and Me Russell Madness Wiener Dog Internationals Pup Star Precious Puppies Beary Tales
title White Chicks Lucky Number Slevin Grey's Anatomy Prison Break How I Met Your Mother Supernatural Breaking Bad The Vampire Diaries The Walking Dead Pretty Little Liars ... Flicka 2 H2O: Just Add Water Dolphin Tale Step Dogs Mia and Me Russell Madness Wiener Dog Internationals Pup Star Precious Puppies Beary Tales
rating PG-13 R TV-14 TV-14 TV-PG TV-14 TV-MA TV-14 TV-MA TV-14 ... PG TV-PG PG TV-PG TV-Y PG G G TV-G TV-G
release year 2004 2006 2016 2008 2014 2016 2013 2017 2015 2016 ... 2010 2009 2011 2013 2015 2015 2015 2016 2003 2013

3 rows × 497 columns

Let's select only movies that were released after 2016:

In [100]:
df2 = df[df['release year'] > 2016]
df2.shape  # we can get the number of columns and indexes
Out[100]:
(100, 8)

We can add a decade column:

In [101]:
df['decade'] = df['release year'].apply(lambda y: y - y%10)
df
/anaconda3/envs/massivedata/lib/python3.6/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
Out[101]:
title rating ratingLevel ratingDescription release year user rating score user rating size decade
title
White Chicks White Chicks PG-13 crude and sexual humor, language and some drug... 80 2004 82.0 80 2000
Lucky Number Slevin Lucky Number Slevin R strong violence, sexual content and adult lang... 100 2006 NaN 82 2000
Grey's Anatomy Grey's Anatomy TV-14 Parents strongly cautioned. May be unsuitable ... 90 2016 98.0 80 2010
Prison Break Prison Break TV-14 Parents strongly cautioned. May be unsuitable ... 90 2008 98.0 80 2000
How I Met Your Mother How I Met Your Mother TV-PG Parental guidance suggested. May not be suitab... 70 2014 94.0 80 2010
... ... ... ... ... ... ... ... ...
The BFG The BFG PG for action/peril, some scary moments and brief... 60 2016 97.0 80 2010
The Secret Life of Pets The Secret Life of Pets PG for action and some rude humor 60 2016 NaN 81 2010
Precious Puppies Precious Puppies TV-G Suitable for all ages. 35 2003 NaN 82 2000
Beary Tales Beary Tales TV-G Suitable for all ages. 35 2013 NaN 82 2010
Growing Up Wild Growing Up Wild G NaN 35 2016 80.0 80 2010

991 rows × 8 columns

4.2 Using Groupby

Similar to SQL, Pandas also has a groupby functionality. Namely, we can use groupby function to create a DataFrameGroupBy object:

In [102]:
df3 = df[['user rating score','decade', "release year"]]
df3 = df3.dropna()
g = df3.groupby("decade")
g
Out[102]:
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x101c8c6c18>

We can access each group directly using the get_group function:

In [103]:
g.get_group(2010)
Out[103]:
user rating score decade release year
title
Grey's Anatomy 98.0 2010 2016
How I Met Your Mother 94.0 2010 2014
Supernatural 95.0 2010 2016
Breaking Bad 97.0 2010 2013
The Vampire Diaries 91.0 2010 2017
... ... ... ...
Growing Up Wild 80.0 2010 2016
Paddington 70.0 2010 2014
Dawn of the Croods 72.0 2010 2017
The BFG 97.0 2010 2016
Growing Up Wild 80.0 2010 2016

423 rows × 3 columns

Or iterate over the groups:

In [104]:
d = {}
for y, year_df in g:
    d[y] = year_df
d[2010]
Out[104]:
user rating score decade release year
title
Grey's Anatomy 98.0 2010 2016
How I Met Your Mother 94.0 2010 2014
Supernatural 95.0 2010 2016
Breaking Bad 97.0 2010 2013
The Vampire Diaries 91.0 2010 2017
... ... ... ...
Growing Up Wild 80.0 2010 2016
Paddington 70.0 2010 2014
Dawn of the Croods 72.0 2010 2017
The BFG 97.0 2010 2016
Growing Up Wild 80.0 2010 2016

423 rows × 3 columns

In [105]:
g.max()
Out[105]:
user rating score release year
decade
1980 80.0 1989
1990 96.0 1999
2000 98.0 2009
2010 99.0 2017
In [106]:
g.describe()
Out[106]:
user rating score release year
count mean std min 25% 50% 75% max count mean std min 25% 50% 75% max
decade
1980 10.0 66.800000 6.088240 58.0 67.0 67.0 68.0 80.0 10.0 1985.300000 2.451757 1982.0 1983.0 1986.0 1986.75 1989.0
1990 73.0 79.808219 9.974492 64.0 71.0 81.0 86.0 96.0 73.0 1996.164384 2.608819 1990.0 1994.0 1997.0 1998.00 1999.0
2000 94.0 83.010638 13.907911 55.0 74.0 89.0 95.0 98.0 94.0 2004.978723 2.873605 2000.0 2003.0 2006.0 2007.00 2009.0
2010 423.0 85.638298 11.873709 55.0 79.0 91.0 95.5 99.0 423.0 2015.200946 1.667873 2010.0 2015.0 2016.0 2016.00 2017.0
In [107]:
g.plot( y='user rating score')
Out[107]:
decade
1980    AxesSubplot(0.125,0.125;0.775x0.755)
1990    AxesSubplot(0.125,0.125;0.775x0.755)
2000    AxesSubplot(0.125,0.125;0.775x0.755)
2010    AxesSubplot(0.125,0.125;0.775x0.755)
dtype: object
In [108]:
pd.__version__
Out[108]:
'0.25.1'

5. Real-World Examples

5.1 Emmy Awards Dataset

Let's downlpad at the Prime Time Emmy Awards Dataset to try to discover some interesting insights:

In [121]:
# Creating a dataset directory
!mkdir ./datasets/emmy

# download the dataset from Kaggle and unzip it
!kaggle datasets download pmagda/primetime-emmy-awards  -p ./datasets/emmy
!unzip ./datasets/emmy/*.zip  -d ./datasets/emmy
!ls ./datasets/emmy
Downloading primetime-emmy-awards.zip to ./datasets/emmy
100%|█████████████████████████████████████████| 235k/235k [00:00<00:00, 812kB/s]
100%|█████████████████████████████████████████| 235k/235k [00:00<00:00, 809kB/s]
Archive:  ./datasets/emmy/primetime-emmy-awards.zip
  inflating: ./datasets/emmy/emmy-awards-1949-2017.csv  
emmy-awards-1949-2017.csv primetime-emmy-awards.zip
In [122]:
import pandas as pd
%matplotlib inline 
df = pd.read_csv("./datasets/emmy/emmy-awards-1949-2017.csv")
df
Out[122]:
year category winner nominee detail
0 1949 Technical Award 1 Charles Mesak/Don Lee TV for Phasefader - In R... NaN
1 1949 Best Film Made For Television 0 Christopher Columbus CBS
2 1949 Best Film Made For Television 0 Hollywood Brevities CBS
3 1949 Best Film Made For Television 0 It Could Happen To You CBS
4 1949 Best Film Made For Television 0 Tell Tale Heart CBS
... ... ... ... ... ...
19234 2017 Outstanding Production Design For Variety, Non... 0 Bill Nye Saves The World - "Earth Is A Hot Mess" Netflix
19235 2017 Outstanding Production Design For Variety, Non... 0 Drunk History - "Hamilton" Comedy Central
19236 2017 Outstanding Production Design For Variety, Non... 0 Portlandia - "Fred's Cell Phone Company" IFC
19237 2017 Outstanding Production Design For Variety, Non... 1 Saturday Night Live - "Host: Alec Baldwin" NBC
19238 2017 Outstanding Production Design For Variety, Non... 0 The Voice - "Live Finale, Part 1" NBC

19239 rows × 5 columns

Let's see how many Emmy Award winner we have over the years:

In [123]:
g = df[['year', 'winner']].groupby('year')
g.get_group(1952)
type(g)
g
Out[123]:
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x101f805f60>
In [124]:
g.sum().plot(y="winner")
Out[124]:
<matplotlib.axes._subplots.AxesSubplot at 0x101fdce278>

As we can see the number of Emmy winners each year increased over the years. Let's calculate how many winners in each category we had over the years:

In [125]:
print("Number of categories %s" % df['category'].unique().shape[0])
Number of categories 1043

Wow, over the years there were over 1000 unique categories. Let's see what are the most common categories:

In [126]:
g = df[['category', 'winner']].groupby('category')
g.sum().sort_values(['winner'], ascending=0)
Out[126]:
winner
category
Outstanding Individual Achievement 75
Outstanding Individual Achievement In Animation 67
Outstanding Comedy Series 50
Outstanding Lead Actress In A Drama Series 42
Outstanding Lead Actor In A Comedy Series 42
... ...
Outstanding Single Performance By A Supporting Actor In Comedy Or Drama Special 0
Best Contribution To Daytime Programming 0
Outstanding Achievement In Graphic Design 0
Outstanding Single Performance By A Supporting Actress In Comedy Or Drama Series 0
Outstanding Makeup For Miniseries, Movie, Or A Special (Prosthetic) 0

1042 rows × 1 columns

We can see that for some categories, we have data only on the nominees, but not on the winners. Let's check who is the most awarded individual:

In [127]:
winners_df = df[df['winner'] != 0]
winners_df
Out[127]:
year category winner nominee detail
0 1949 Technical Award 1 Charles Mesak/Don Lee TV for Phasefader - In R... NaN
5 1949 Best Film Made For Television 1 The Necklace (Your Show Time Series) CBS
11 1949 Most Outstanding Television Personality 1 Shirley Dinsdale NaN
18 1949 Most Popular Television Program 1 Pantomime Quiz Time KTLA
22 1949 Special Award 1 Louis McManus For His Original Design of the Emmy NaN
... ... ... ... ... ...
19218 2017 Outstanding Innovation In Interactive Programming 1 PEARL NaN
19220 2017 Outstanding Music Supervision 1 Big Little Lies - "You Get What You Need" HBO
19228 2017 Outstanding Original Interactive Program 1 The People's House - Inside The White House Wi... Samsung / Oculus
19231 2017 Outstanding Picture Editing For Variety Progra... 1 Last Week Tonight With John Oliver - "F*ck 201... HBO
19237 2017 Outstanding Production Design For Variety, Non... 1 Saturday Night Live - "Host: Alec Baldwin" NBC

4291 rows × 5 columns

In [128]:
winners_df[['nominee', 'winner']].groupby('nominee').count().sort_values(['winner'], ascending=0)
Out[128]:
winner
nominee
Frasier 24
The Daily Show With Jon Stewart 19
ER 18
The Carol Burnett Show 17
STAR TREK: THE NEXT GENERATION 17
... ...
Invasion America 1
Invader Zim 1
Into The Storm 1
Intervention - "Robby" 1
Lucky Strike 1

2556 rows × 1 columns

So, we have some clues on what is in the data, and who are the all-time Emmy winners. However, we didn't clean the data before analyzing it. This can result in inaccurate results. For example, lets look at nominee names that include the word 'MTV', and we will get the following:

In [129]:
df[df['nominee'].apply(lambda n: "MTV" in str(n))] # Another option is to search for MTV in the 'detail' column
Out[129]:
year category winner nominee detail
5856 1985 Outstanding Art Direction For A Variety Or Mus... 0 1st Annual MTV Video Music Awards SYN
8213 1992 Outstanding Individual Achievement In Graphic ... 1 MTV LIQUID TV MTV
8426 1993 Outstanding Technical Direction/Camera/Video F... 0 MTV UNPLUGGED MTV
8451 1993 Outstanding Variety, Music Or Comedy Series 0 MTV UNPLUGGED MTV
8525 1993 Outstanding Individual Achievement In Graphic ... 0 MTV VIDEO MUSIC AWARDS MTV
8648 1993 Outstanding Individual Achievement In Sound Mi... 0 MTV UNPLUGGED MTV
8812 1994 Outstanding Variety, Music Or Comedy Series 0 MTV UNPLUGGED MTV
9155 1995 Outstanding Variety, Music Or Comedy Series 0 MTV UNPLUGGED MTV
9433 1996 Outstanding Choreography 0 1995 MTV Video Music Awards MTV
10786 1999 Outstanding Technical Direction/Camera/Video F... 0 1998 MTV Video Music Awards MTV
11072 2000 Outstanding Animated Program (For Programming ... 0 MTV Downtown MTV
14142 2008 Outstanding Art Direction For A Variety, Music... 0 2007 MTV Video Music Awards MTV
14597 2009 Outstanding Art Direction For Variety, Music O... 1 2008 MTV Video Music Awards MTV
15557 2011 Outstanding Art Direction For Variety, Music O... 1 2010 MTV Video Music Awards MTV

We can observe that some of the nominees, such as 2010 MTV Video Music Awards, include a specific year. Therefore, in the aggregation process we won't accurately count all MTV Video Music Awards nominees. This is a great example of why it is very important to get familiar with and clean the data before analyzing it.

Further Reading:

5.2 Lord of the Rings Dataset

In this example, we will use the Lord of the Rings Datasets, which include character data, as well as script data. First, let's download the dataset:

In [130]:
# Creating a dataset directory
!mkdir ./datasets/ltor

# download the dataset from Kaggle and unzip it
!kaggle datasets download paultimothymooney/lord-of-the-rings-data  -p ./datasets/ltor
!unzip ./datasets/ltor/*.zip  -d ./datasets/ltor
!ls ./datasets/ltor
Downloading lord-of-the-rings-data.zip to ./datasets/ltor
100%|███████████████████████████████████████| 88.0k/88.0k [00:00<00:00, 559kB/s]
100%|███████████████████████████████████████| 88.0k/88.0k [00:00<00:00, 556kB/s]
Archive:  ./datasets/ltor/lord-of-the-rings-data.zip
  inflating: ./datasets/ltor/lotr_characters.csv  
  inflating: ./datasets/ltor/lotr_scripts.csv  
lord-of-the-rings-data.zip lotr_scripts.csv
lotr_characters.csv
In [131]:
import re 
import numpy as np
import pandas as pd
import networkx as nx

df = pd.read_csv("./datasets/ltor/lotr_characters.csv")
df = df.fillna('')
df
Out[131]:
birth death gender hair height name race realm spouse
0 Female Adanel Men Belemir
1 TA 2978 February 26 ,3019 Male Dark (book) Light brown (movie) Boromir Men
2 March ,3019 Male Lagduf Orcs
3 TA 280 TA 515 Male Tarcil Men Arnor Unnamed wife
4 Male Fire-drake of Gondolin Dragon
... ... ... ... ... ... ... ... ... ...
906 Mid ,First Age FA 495 Female Aerin Men Brodda
907 Aerandir
908 YT during the ,Noontide of Valinor FA 455 Male Golden Aegnor Elves Loved ,Andreth but remained unmarried
909 TA 2917 TA 3010 Male Adrahil II Men Unnamed wife
910 Before ,TA 1944 Late ,Third Age Male Adrahil I Men

911 rows × 9 columns

Let's count the number of characters of each race and gender:

In [132]:
g = df.groupby(['race', 'gender'])
#g.size()# create a Series 
g.size().to_frame (name='counts').sort_values('counts', ascending=0)
Out[132]:
counts
race gender
Men Male 334
137
Hobbits Male 89
Elves Male 76
Men Female 51
... ... ...
Dragon Male 1
Black Uruk Male 1
Hobbits male 1
Eagle Male 1
Wolfhound Male 1

61 rows × 1 columns

Let's draw the spouse network. First, let's clean out the data and create a DataFrame object in which we have all the couples' data organized:

In [133]:
import re
def normailze_spouse(s):
    if s in {'','Unnamed wife', 'None' }:
        return None
    if "Loved" not in s:
        return s
    
    r = re.compile("Loved ,(\w+)")
    l = r.findall(s)
    if len(l) > 0:
        return l[0]
df['norm_spouse'] = df['spouse'].apply(lambda s: normailze_spouse(s))
    
links_df = df[['name', 'norm_spouse', 'race', 'gender']]
links_df = links_df.dropna()
links_df
Out[133]:
name norm_spouse race gender
0 Adanel Belemir Men Female
7 Angrod Eldalótë Elves Male
10 Ar-Pharazôn Tar-Míriel Men Male
12 Ar-Gimilzôr Inzilbêth Men Male
21 Tarannon Falastur Berúthiel Men Male
... ... ... ... ...
897 Ailinel Orchaldor Men Female
898 Almarian Tar-Meneldur Men Female
899 Algund None known Men Male
906 Aerin Brodda Men Female
908 Aegnor Andreth Elves Male

228 rows × 4 columns

In [134]:
links_df2 = pd.merge(links_df, links_df, left_on="name", right_on="norm_spouse")
links_df2
Out[134]:
name_x norm_spouse_x race_x gender_x name_y norm_spouse_y race_y gender_y
0 Adanel Belemir Men Female Belemir Adanel Men Male
1 Angrod Eldalótë Elves Male Eldalótë Angrod Elves Female
2 Ar-Pharazôn Tar-Míriel Men Male Tar-Míriel Ar-Pharazôn Men Female
3 Ar-Gimilzôr Inzilbêth Men Male Inzilbêth Ar-Gimilzôr Men Female
4 Tarannon Falastur Berúthiel Men Male Berúthiel Tarannon Falastur Men Female
... ... ... ... ... ... ... ... ...
122 Amroth Nimrodel Elves Male Nimrodel Amroth Elves Female
123 Amarië Finrod Elves Female Finrod Amarië Elves Male
124 Ailinel Orchaldor Men Female Orchaldor Ailinel Men Male
125 Almarian Tar-Meneldur Men Female Tar-Meneldur Almarian Men Male
126 Aegnor Andreth Elves Male Andreth Aegnor Men Female

127 rows × 8 columns

In [135]:
links_df2 = links_df2[["name_x","race_x", "gender_x", "name_y", "race_y", "gender_y"]]
links_df2
Out[135]:
name_x race_x gender_x name_y race_y gender_y
0 Adanel Men Female Belemir Men Male
1 Angrod Elves Male Eldalótë Elves Female
2 Ar-Pharazôn Men Male Tar-Míriel Men Female
3 Ar-Gimilzôr Men Male Inzilbêth Men Female
4 Tarannon Falastur Men Male Berúthiel Men Female
... ... ... ... ... ... ...
122 Amroth Elves Male Nimrodel Elves Female
123 Amarië Elves Female Finrod Elves Male
124 Ailinel Men Female Orchaldor Men Male
125 Almarian Men Female Tar-Meneldur Men Male
126 Aegnor Elves Male Andreth Men Female

127 rows × 6 columns

Next, let create a Networkx graph object

In [136]:
g = nx.Graph()

for i, r in links_df2.iterrows():
    g.add_node(r['name_x'],race=r["race_x"], gender=r["gender_x"])
    g.add_node(r['name_y'],race=r["race_y"], gender=r["gender_y"])
    edges_list = [(r['name_x'], r['name_y']) for i, r in links_df2.iterrows()]

g.add_edges_from(edges_list)
In [137]:
import matplotlib.pyplot as plt

node_colors = []
# add node color by gender
for node in g.nodes(data=True):

    if node[1]['gender'] == "Male":
        node_colors.append('blue')
    elif node[1]['gender'] == "Female":
        node_colors.append('red')
    else:
        node_colors.append('green')


plt.figure(3,figsize=(14,14))
nx.draw_spring(g, with_labels=True, node_color=node_colors)
/anaconda3/envs/massivedata/lib/python3.6/site-packages/networkx/drawing/nx_pylab.py:579: MatplotlibDeprecationWarning: 
The iterable function was deprecated in Matplotlib 3.1 and will be removed in 3.3. Use np.iterable instead.
  if not cb.iterable(width):

We can also draw the vertices to have different shapes according to each character race. We will show how todo this later in the course.

Further Reading

There are some great tutorials on interesting real world datasets. Here are some examples:

In [0]: