Let's write code that easily can get the titles from the Guido van Rossum blog
import requests
u = "http://neopythonic.blogspot.com/"
s = requests.get(u).content.decode('utf-8')
s
'<!DOCTYPE html>\n<html dir=\'ltr\' xmlns=\'http://www.w3.org/1999/xhtml\' xmlns:b=\'http://www.google.com/2005/gml/b\' xmlns:data=\'http://www.google.com/2005/gml/data\' xmlns:expr=\'http://www.google.com/2005/gml/expr\'>\n<head>\n<link href=\'https://www.blogger.com/static/v1/widgets/2549344219-widget_css_bundle.css\' rel=\'stylesheet\' type=\'text/css\'/>\n<link href=\'http://google-code-prettify.googlecode.com/svn/trunk/src/prettify.css\' rel=\'stylesheet\' type=\'text/css\'/>\n<script src=\'http://google-code-prettify.googlecode.com/svn/trunk/src/prettify.js\' type=\'text/javascript\'></script>\n<meta content=\'text/html; charset=UTF-8\' http-equiv=\'Content-Type\'/>\n<meta content=\'blogger\' name=\'generator\'/>\n<link href=\'http://neopythonic.blogspot.com/favicon.ico\' rel=\'icon\' type=\'image/x-icon\'/>\n<link href=\'http://neopythonic.blogspot.com/\' rel=\'canonical\'/>\n<link rel="alternate" type="application/atom+xml" title="Neopythonic - Atom" href="http://neopythonic.blogspot.com/feeds/posts/default" />\n<link rel="alternate" type="application/rss+xml" title="Neopythonic - RSS" href="http://neopythonic.blogspot.com/feeds/posts/default?alt=rss" />\n<link rel="service.post" type="application/atom+xml" title="Neopythonic - Atom" href="https://www.blogger.com/feeds/4195135246107166251/posts/default" />\n<link rel="me" href="https://www.blogger.com/profile/12821714508588242516" />\n<!--Can\'t find substitution for tag [blog.ieCssRetrofitLinks]-->\n<meta content=\'http://neopythonic.blogspot.com/\' property=\'og:url\'/>\n<meta content=\'Neopythonic\' property=\'og:title\'/>\n<meta content=\'Ramblings through technology, politics, culture and philosophy by the creator of the Python programming language.\' property=\'og:description\'/>\n<!--[if IE]> <script> (function() { var html5 = ("abbr,article,aside,audio,canvas,datalist,details," + "figure,footer,header,hgroup,mark,menu,meter,nav,output," + "progress,section,time,video").split(\',\'); for (var i = 0; i < html5.length; i++) { document.createElement(html5[i]); } try { document.execCommand(\'BackgroundImageCache\', false, true); } catch(e) {} })(); </script> <![endif]-->\n<title>Neopythonic</title>\n<style id=\'page-skin-1\' type=\'text/css\'><!--\n/*\n-----------------------------------------------\nBlogger Template Style\nName: Dots\nDate: 24 Feb 2004\nUpdated by: Blogger Team\n----------------------------------------------- */\n\nbody {\nmargin: 0px 0px 0px 0px;\nbackground:#fff url("https://resources.blogblog.com/blogblog/data/dots/bg_dots.gif");\nbackground-position: 50% 31px;\ntext-align:center;\nfont:x-small Verdana, Arial, Sans-serif;\ncolor:#333333;\nfont-size/* */:/**/small;\nfont-size: /**/small;\n}\n/* Page Structure\n----------------------------------------------- */\n#outer-wrapper {\nbackground:url("https://resources.blogblog.com/blogblog/data/dots/bg_3dots.gif") no-repeat 250px 50px;\nwidth:700px;\nmargin:0 auto;\ntext-align:left;\nfont:normal normal 100% Verdana,Arial,Sans-Serif;\n}\n#header-wrapper {\ndisplay: none;\n}\n#main-wrapper {\nwidth:450px;\nfloat:right;\npadding:100px 0 20px;\nfont-size:85%;\nword-wrap: break-word; /* fix for long text breaking sidebar float in IE */\noverflow: hidden; /* fix for long non-text content breaking IE sidebar float */\n}\n#main {\nbackground:url("https://resources.blogblog.com/blogblog/data/dots/bg_dots2.gif") -100px -100px;\npadding:20px 10px 15px;\n}\n#sidebar-wrapper {\nwidth:200px;\nfloat:left;\nfont-size:85%;\npadding-bottom:20px;\nword-wrap: break-word; /* fix for long text breaking sidebar float in IE */\noverflow: hidden; /* fix for long non-text content breaking IE sidebar float */\n}\n#sidebar {\nbackground:url("https://resources.blogblog.com/blogblog/data/dots/bg_dots2.gif") 150px -50px;\npadding:40px 10px 15px;\nwidth:200px;\nwidth/* */:/**/180px;\nwidth: /**/180px;\n}\n/* Title & Description\n----------------------------------------------- */\n.Header h1 {\nmargin:0 0 .5em;\nline-height: 1.4em;\nfont: normal normal 250% Georgia,Serif;\ncolor: #335533;\n}\n.Header h1 a {\ncolor:#335533;\ntext-decoration:none;\n}\n.Header .description {\nmargin:0 0 1.75em;\ncolor: #999966;\nfont: normal normal 100% Verdana, Arial, Sans-Serif;\n}\n/* Links\n----------------------------------------------- */\na:link {\ncolor:#448888;\n}\na:visited {\ncolor:#888855;\n}\na:hover {\ncolor:#888855;\n}\na img {\nborder-width:0;\n}\n/* Posts\n----------------------------------------------- */\nh2.date-header {\nmargin:0 0 .75em;\npadding-bottom:.35em;\nborder-bottom:1px dotted #99bb99;\ntext-transform:uppercase;\nletter-spacing:.3em;\ncolor: #666633;\nfont: normal normal 95% Georgia, Serif;\n}\n.post {\nmargin:0 0 2.5em;\n}\n.post h3 {\nmargin:.25em 0;\nline-height: 1.4em;\nfont: normal normal 100% Georgia,Serif;\nfont-size: 130%;\nfont-weight: bold;\ncolor:#999966;\nbackground:url("https://resources.blogblog.com/blogblog/data/dots/bg_post_title_left.gif") no-repeat left .25em;\npadding:0 0 1px 45px;\n}\n.post h3 a {\ntext-decoration:none;\ncolor: #999966;\n}\n.post h3 a:hover {\ncolor: #333333;\n}\n.post-body {\nmargin:0 0 .75em;\nline-height:1.6em;\n}\n.post-body blockquote {\nline-height:1.3em;\n}\n.post-footer {\nmargin:0;\n}\n.uncustomized-post-template .post-footer {\ntext-align: right;\n}\n.uncustomized-post-template .post-author,\n.uncustomized-post-template .post-timestamp {\ndisplay: block;\nfloat: left;\nmargin-right: 4px;\ntext-align: left;\n}\n.post-author, .post-timestamp {\ncolor:#999966;\n}\na.comment-link {\n/* IE5.0/Win doesn\'t apply padding to inline elements,\nso we hide these two declarations from it */\nbackground/* */:/**/url("https://resources.blogblog.com/blogblog/data/dots/icon_comment_left.gif") no-repeat left .25em;\npadding-left:15px;\n}\nhtml>body a.comment-link {\n/* Respecified, for IE5/Mac\'s benefit */\nbackground:url("https://resources.blogblog.com/blogblog/data/dots/icon_comment_left.gif") no-repeat left .25em;\npadding-left:15px;\n}\n.post img, table.tr-caption-container {\nmargin:0 0 5px 0;\npadding:4px;\nborder:1px solid #99bb99;\n}\n.tr-caption-container img {\nborder: none;\nmargin: 0;\npadding: 0;\n}\n.feed-links {\nclear: both;\nline-height: 2.5em;\n}\n#blog-pager-newer-link {\nfloat: left;\n}\n#blog-pager-older-link {\nfloat: right;\n}\n#blog-pager {\ntext-align: center;\n}\n/* Comments\n----------------------------------------------- */\n#comments {\nmargin:0;\n}\n#comments h4 {\nmargin:0 0 10px;\nborder-top:1px dotted #99bb99;\npadding-top:.5em;\nline-height: 1.4em;\nfont: bold 110% Georgia,Serif;\ncolor:#333;\n}\n#comments-block {\nline-height:1.6em;\n}\n.comment-author {\nbackground:url("https://resources.blogblog.com/blogblog/data/dots/icon_comment_left.gif") no-repeat 2px .35em;\nmargin:.5em 0 0;\npadding-top: 0;\npadding-bottom:0;\npadding-left:20px;\npadding-right:20px;\nfont-weight:bold;\n}\n.comment-body {\nmargin:0;\npadding-top: 0;\npadding-bottom:0;\npadding-left:20px;\npadding-right:20px;\n}\n.comment-body p {\nmargin:0 0 .5em;\n}\n.comment-footer {\nmargin:0 0 .5em;\npadding:0 0 .75em 20px;\npadding-top: 0;\npadding-bottom:.75em;\npadding-left:20px;\npadding-right:0;\ncolor:#996;\n}\n.comment-footer a:link {\ncolor:#996;\n}\n.deleted-comment {\nfont-style:italic;\ncolor:gray;\n}\n/* More Sidebar Content\n----------------------------------------------- */\n.sidebar h2 {\nmargin:2em 0 .75em;\npadding-bottom:.35em;\nborder-bottom:1px dotted #99bb99;\nline-height: 1.4em;\nfont-size: 95%;\nfont: normal normal 100% Georgia,Serif;\ntext-transform:uppercase;\nletter-spacing:.3em;\ncolor:#666633;\n}\n.sidebar p {\nmargin:0 0 .75em;\nline-height:1.6em;\n}\n.sidebar ul {\nlist-style:none;\nmargin:.5em 0;\npadding:0 0px;\n}\n.sidebar .widget {\nmargin: .5em 0 1em;\npadding: 0 0px;\nline-height: 1.5em;\n}\n.main .widget {\npadding-bottom: 1em;\n}\n.sidebar ul li {\nbackground:url("https://resources.blogblog.com/blogblog/data/dots/bullet.gif") no-repeat 3px .45em;\nmargin:0;\npadding-top: 0;\npadding-bottom:5px;\npadding-left:15px;\npadding-right:0;\n}\n.sidebar p {\nmargin:0 0 .6em;\n}\n/* Profile\n----------------------------------------------- */\n.profile-datablock {\nmargin: 0 0 1em;\n}\n.profile-img {\nfloat: left;\nmargin-top: 0;\nmargin-bottom:5px;\nmargin-left:0;\nmargin-right:8px;\nborder: 4px solid #cc9;\n}\n.profile-data {\nmargin: 0;\nline-height: 1.5em;\n}\n.profile-textblock {\nclear: left;\nmargin-left: 0;\n}\n/* Footer\n----------------------------------------------- */\n#footer {\nclear:both;\npadding:15px 0 0;\n}\n#footer p {\nmargin:0;\n}\n/* Page structure tweaks for layout editor wireframe */\nbody#layout #sidebar, body#layout #main,\nbody#layout #main-wrapper,\nbody#layout #outer-wrapper,\nbody#layout #sidebar-wrapper {\npadding: 0;\n}\nbody#layout #sidebar, body#layout #sidebar-wrapper {\npadding: 0;\nwidth: 240px;\n}\n\n--></style>\n<link href=\'https://www.blogger.com/dyn-css/authorization.css?targetBlogID=4195135246107166251&zx=3404a461-9e33-4ad1-8b2b-080af63b1f5a\' media=\'none\' onload=\'if(media!='all')media='all'\' rel=\'stylesheet\'/><noscript><link href=\'https://www.blogger.com/dyn-css/authorization.css?targetBlogID=4195135246107166251&zx=3404a461-9e33-4ad1-8b2b-080af63b1f5a\' rel=\'stylesheet\'/></noscript>\n\n</head>\n<body onload=\'prettyPrint()\'>\n<div class=\'navbar section\' id=\'navbar\'><div class=\'widget Navbar\' data-version=\'1\' id=\'Navbar1\'><script type="text/javascript">\n function setAttributeOnload(object, attribute, val) {\n if(window.addEventListener) {\n window.addEventListener(\'load\',\n function(){ object[attribute] = val; }, false);\n } else {\n window.attachEvent(\'onload\', function(){ object[attribute] = val; });\n }\n }\n </script>\n<div id="navbar-iframe-container"></div>\n<script type="text/javascript" src="https://apis.google.com/js/plusone.js"></script>\n<script type="text/javascript">\n gapi.load("gapi.iframes:gapi.iframes.style.bubble", function() {\n if (gapi.iframes && gapi.iframes.getContext) {\n gapi.iframes.getContext().openChild({\n url: \'https://www.blogger.com/navbar.g?targetBlogID\\x3d4195135246107166251\\x26blogName\\x3dNeopythonic\\x26publishMode\\x3dPUBLISH_MODE_BLOGSPOT\\x26navbarType\\x3dBLUE\\x26layoutType\\x3dLAYOUTS\\x26searchRoot\\x3dhttps://neopythonic.blogspot.com/search\\x26blogLocale\\x3den\\x26v\\x3d2\\x26homepageUrl\\x3dhttp://neopythonic.blogspot.com/\\x26vt\\x3d5369531368964104765\',\n where: document.getElementById("navbar-iframe-container"),\n id: "navbar-iframe"\n });\n }\n });\n </script><script type="text/javascript">\n(function() {\nvar script = document.createElement(\'script\');\nscript.type = \'text/javascript\';\nscript.src = \'//pagead2.googlesyndication.com/pagead/js/google_top_exp.js\';\nvar head = document.getElementsByTagName(\'head\')[0];\nif (head) {\nhead.appendChild(script);\n}})();\n</script>\n</div></div>\n<div id=\'outer-wrapper\'><div id=\'wrap2\'>\n<!-- skip links for text browsers -->\n<span id=\'skiplinks\' style=\'display:none;\'>\n<a href=\'#main\'>skip to main </a> |\n <a href=\'#sidebar\'>skip to sidebar</a>\n</span>\n<div id=\'content-wrapper\'>\n<div id=\'crosscol-wrapper\' style=\'text-align:center\'>\n<div class=\'crosscol no-items section\' id=\'crosscol\'></div>\n</div>\n<div id=\'main-wrapper\'>\n<div class=\'main section\' id=\'main\'><div class=\'widget Blog\' data-version=\'1\' id=\'Blog1\'>\n<div class=\'blog-posts hfeed\'>\n\n <div class="date-outer">\n \n<h2 class=\'date-header\'><span>Friday, March 15, 2019</span></h2>\n\n <div class="date-posts">\n \n<div class=\'post-outer\'>\n<div class=\'post hentry uncustomized-post-template\' itemprop=\'blogPost\' itemscope=\'itemscope\' itemtype=\'http://schema.org/BlogPosting\'>\n<meta content=\'4195135246107166251\' itemprop=\'blogId\'/>\n<meta content=\'775339472173253922\' itemprop=\'postId\'/>\n<a name=\'775339472173253922\'></a>\n<h3 class=\'post-title entry-title\' itemprop=\'name\'>\n<a href=\'http://neopythonic.blogspot.com/2019/03/why-operators-are-useful.html\'>Why operators are useful</a>\n</h3>\n<div class=\'post-header\'>\n<div class=\'post-header-line-1\'></div>\n</div>\n<div class=\'post-body entry-content\' id=\'post-body-775339472173253922\' itemprop=\'description articleBody\'>\nThis is something I posted on python-ideas, but I think it\'s interesting to a wider audience.<br />\n<br />\nThere\'s been a lot of discussion recently about an operator to merge two dicts.<br />\n<br />\nIt prompted me to think about the reason (some) people like operators, and a discussion I had with my mentor Lambert Meertens over 30 years ago came to mind.<br />\n<br />\nFor mathematicians, operators are essential to how they think. Take a simple operation like adding two numbers, and try exploring some of its behavior.<br />\n<br />\n add(x, y) == add(y, x) (1)<br />\n<br />\nEquation (1) expresses the law that addition is commutative. It\'s usually written using an operator, which makes it more concise:<br />\n<br />\n x + y == y + x (1a)<br />\n<br />\nThat feels like a minor gain.<br />\n<br />\nNow consider the associative law:<br />\n<br />\n add(x, add(y, z)) == add(add(x, y), z) (2)<br />\n<br />\nEquation (2) can be rewritten using operators:<br />\n<br />\n x + (y + z) == (x + y) + z (2a)<br />\n<br />\nThis is much less confusing than (2), and leads to the observation that the parentheses are redundant, so now we can write<br />\n<br />\n x + y + z (3)<br />\n<br />\nwithout ambiguity (it doesn\'t matter whether the + operator binds tighter to the left or to the right).<br />\n<br />\nMany other laws are also written more easily using operators. Here\'s one more example, about the identity element of addition:<br />\n<br />\n add(x, 0) == add(0, x) == x (4)<br />\n<br />\ncompare to<br />\n<br />\n x + 0 == 0 + x == x (4a)<br />\n<br />\nThe general idea here is that once you\'ve learned this simple notation, equations written using them are easier to *manipulate* than equations written using functional notation -- it is as if our brains grasp the operators using different brain machinery, and this is more efficient.<br />\n<br />\nI think that the fact that formulas written using operators are more easily processed *visually* has something to do with it: they engage the brain\'s visual processing machinery, which operates largely subconsciously, and tells the conscious part what it sees (e.g. "chair" rather than "pieces of wood joined together"). The functional notation must take a different path through our brain, which is less subconscious (it\'s related to reading and understanding what you read, which is learned/trained at a much later age than visual processing).<br />\n<br />\nThe power of visual processing really becomes apparent when you combine multiple operators. For example, consider the distributive law:<br />\n<br />\n mul(n, add(x, y)) == add(mul(n, x), mul(n, y)) (5)<br />\n<br />\nThat was painful to write, and I believe that at first you won\'t see the pattern (or at least you wouldn\'t have immediately seen it if I hadn\'t mentioned this was the distributive law).<br />\n<br />\nCompare to:<br />\n<br />\n n * (x + y) == n * x + n * y (5a)<br />\n<br />\nNotice how this also uses relative operator priorities. Often mathematicians write this even more compact:<br />\n<br />\n n(x+y) == nx + ny (5b)<br />\n<br />\nbut alas, that currently goes beyond the capacities of Python\'s parser.<br />\n<br />\nAnother very powerful aspect of operator notation is that it is convenient to apply them to objects of different types. For example, laws (1) through (5) also work when x, y and z are same-size vectors and n is a scalar (substituting a vector of zeros for the literal "0"), and also if they are matrices (again, n has to be a scalar).<br />\n<br />\nAnd you can do this with objects in many different domains. For example, the above laws (1) through (5) apply to functions too (n being a scalar again).<br />\n<br />\nBy choosing the operators wisely, mathematicians can employ their visual brain to help them do math better: they\'ll discover new interesting laws sooner because sometimes the symbols on the blackboard just jump at you and suggest a path to an elusive proof.<br />\n<br />\nNow, programming isn\'t exactly the same activity as math, but we all know that Readability Counts, and this is where operator overloading in Python comes in. Once you\'ve internalized the simple properties which operators tend to have, using + for string or list concatenation becomes more readable than a pure OO notation, and (2) and (3) above explain (in part) why that is.<br />\n<br />\nOf course, it\'s definitely possible to overdo this -- then you get Perl. But I think that the folks who point out "there is already a way to do this" are missing the point that it really is easier to grasp the meaning of this:<br />\n<br />\n d = d1 + d2<br />\n<br />\ncompared to this:<br />\n<br />\n d = d1.copy()<br />\n d.update(d2) # CORRECTED: This line was previously wrong<br />\n<br />\nand it is not just a matter of fewer lines of code: the first form allows us to use our visual processing to help us see the meaning quicker -- and without distracting other parts of our brain (which might already be occupied by keeping track of the meaning of d1 and d2, for example).<br />\n<br />\nOf course, everything comes at a price. You have to learn the operators, and you have to learn their properties when applied to different object types. (This is true in math too -- for numbers, x*y == y*x, but this property does not apply to functions or matrices; OTOH x+y == y+x applies to all, as does the associative law.)<br />\n<br />\n"But what about performance?" I hear you ask. Good question. IMO, readability comes first, performance second. And in the basic example (d = d1 + d2) there is no performance loss compared to the two-line version using update, and a clear win in readability. I can think of many situations where performance difference is irrelevant but readability is of utmost importance, and for me this is the default assumption (even at Dropbox -- our most performance critical code has already been rewritten in ugly Python or in Go). For the few cases where performance concerns are paramount, it\'s easy to transform the operator version to something else -- *once you\'ve confirmed it\'s needed* (probably by profiling).\n<div style=\'clear: both;\'></div>\n</div>\n<div class=\'post-footer\'>\n<div class=\'post-footer-line post-footer-line-1\'>\n<span class=\'post-author vcard\'>\nPosted by\n<span class=\'fn\' itemprop=\'author\' itemscope=\'itemscope\' itemtype=\'http://schema.org/Person\'>\n<meta content=\'https://www.blogger.com/profile/12821714508588242516\' itemprop=\'url\'/>\n<a class=\'g-profile\' href=\'https://www.blogger.com/profile/12821714508588242516\' rel=\'author\' title=\'author profile\'>\n<span itemprop=\'name\'>Guido van Rossum</span>\n</a>\n</span>\n</span>\n<span class=\'post-timestamp\'>\nat\n<meta content=\'http://neopythonic.blogspot.com/2019/03/why-operators-are-useful.html\' itemprop=\'url\'/>\n<a class=\'timestamp-link\' href=\'http://neopythonic.blogspot.com/2019/03/why-operators-are-useful.html\' rel=\'bookmark\' title=\'permanent link\'><abbr class=\'published\' itemprop=\'datePublished\' title=\'2019-03-15T10:58:00-07:00\'>10:58 AM</abbr></a>\n</span>\n<span class=\'reaction-buttons\'>\n</span>\n<span class=\'post-comment-link\'>\n<a class=\'comment-link\' href=\'https://www.blogger.com/comment.g?blogID=4195135246107166251&postID=775339472173253922\' onclick=\'\'>\nNo comments:\n </a>\n</span>\n<span class=\'post-backlinks post-comment-link\'>\n</span>\n<span class=\'post-icons\'>\n<span class=\'item-control blog-admin pid-1774424698\'>\n<a href=\'https://www.blogger.com/post-edit.g?blogID=4195135246107166251&postID=775339472173253922&from=pencil\' title=\'Edit Post\'>\n<img alt=\'\' class=\'icon-action\' height=\'18\' src=\'https://resources.blogblog.com/img/icon18_edit_allbkg.gif\' width=\'18\'/>\n</a>\n</span>\n</span>\n<div class=\'post-share-buttons goog-inline-block\'>\n</div>\n</div>\n<div class=\'post-footer-line post-footer-line-2\'>\n<span class=\'post-labels\'>\n</span>\n</div>\n<div class=\'post-footer-line post-footer-line-3\'>\n<span class=\'post-location\'>\n</span>\n</div>\n</div>\n</div>\n</div>\n\n </div></div>\n \n\n <div class="date-outer">\n \n<h2 class=\'date-header\'><span>Monday, November 26, 2018</span></h2>\n\n <div class="date-posts">\n \n<div class=\'post-outer\'>\n<div class=\'post hentry uncustomized-post-template\' itemprop=\'blogPost\' itemscope=\'itemscope\' itemtype=\'http://schema.org/BlogPosting\'>\n<meta content=\'4195135246107166251\' itemprop=\'blogId\'/>\n<meta content=\'2471146972433715807\' itemprop=\'postId\'/>\n<a name=\'2471146972433715807\'></a>\n<h3 class=\'post-title entry-title\' itemprop=\'name\'>\n<a href=\'http://neopythonic.blogspot.com/2018/11/what-do-do-with-your-computer-science.html\'>What to do with your computer science career</a>\n</h3>\n<div class=\'post-header\'>\n<div class=\'post-header-line-1\'></div>\n</div>\n<div class=\'post-body entry-content\' id=\'post-body-2471146972433715807\' itemprop=\'description articleBody\'>\nI regularly receive questions from students in the field of computer science looking for career advice.<br />\n<br />\nHere\'s an answer I wrote to one of them. It\'s not comprehensive or anything, but I thought people might find it interesting.<br />\n<br />\n[A question about whether to choose a 9-5 job or be an entrepreneur]<br />\n<br />\nThe question about "9-5" vs. "entrepreneur" is a complex one -- not everybody can be a successful entrepreneur (who would do the work? :-) and not everybody has the temperament for it. For me personally it was never an option -- there are vast parts of management and entrepreneurship that I wouldn\'t enjoy doing, such as hiring (I hate interviewing and am bad at it) and firing (too emotionally draining -- even just giving negative feedback is hard for me). Pitching ideas to investors is another thing that I\'d rather do without.<br />\n<br />\nIf any of that resonates with you, you may be better off not opting for entrepreneurship -- the kind of 9-5 software development jobs I have had are actually (mostly) very rewarding: I get to write software that gets used by hundreds or thousands of other developers (or millions in the case of Python), and those other developers in turn use my software to produce product that get uses by hundreds of thousands or, indeed hundreds of millions of users. Not every 9-5 job is the same! For me personally, I don\'t like the product stuff (since usually that means it\'s products I have no interest in using myself), but "your mileage may vary" (as they say in the US). Just try to do better than an entry-level web development job; that particular field (editing HTML and CSS) is likely to be automated away, and would feel repetitive to me.<br />\n<br />\n[A question about whether AI would make human software developers redundant (not about what I think of the field of AI as a career choice)]<br />\n<br />\nRegarding AI, I\'m not worried at all. The field is focused on automating boring, repetitive tasks like driving a car or recognizing faces, which humans can learn to do easily but find boring if they have to do it all the time. The field of software engineering (which includes the field of AI) is never boring, since as soon as a task is repetitive, you automate it, and you start solving new problems.\n<div style=\'clear: both;\'></div>\n</div>\n<div class=\'post-footer\'>\n<div class=\'post-footer-line post-footer-line-1\'>\n<span class=\'post-author vcard\'>\nPosted by\n<span class=\'fn\' itemprop=\'author\' itemscope=\'itemscope\' itemtype=\'http://schema.org/Person\'>\n<meta content=\'https://www.blogger.com/profile/12821714508588242516\' itemprop=\'url\'/>\n<a class=\'g-profile\' href=\'https://www.blogger.com/profile/12821714508588242516\' rel=\'author\' title=\'author profile\'>\n<span itemprop=\'name\'>Guido van Rossum</span>\n</a>\n</span>\n</span>\n<span class=\'post-timestamp\'>\nat\n<meta content=\'http://neopythonic.blogspot.com/2018/11/what-do-do-with-your-computer-science.html\' itemprop=\'url\'/>\n<a class=\'timestamp-link\' href=\'http://neopythonic.blogspot.com/2018/11/what-do-do-with-your-computer-science.html\' rel=\'bookmark\' title=\'permanent link\'><abbr class=\'published\' itemprop=\'datePublished\' title=\'2018-11-26T09:13:00-08:00\'>9:13 AM</abbr></a>\n</span>\n<span class=\'reaction-buttons\'>\n</span>\n<span class=\'post-comment-link\'>\n<a class=\'comment-link\' href=\'https://www.blogger.com/comment.g?blogID=4195135246107166251&postID=2471146972433715807\' onclick=\'\'>\nNo comments:\n </a>\n</span>\n<span class=\'post-backlinks post-comment-link\'>\n</span>\n<span class=\'post-icons\'>\n<span class=\'item-control blog-admin pid-1774424698\'>\n<a href=\'https://www.blogger.com/post-edit.g?blogID=4195135246107166251&postID=2471146972433715807&from=pencil\' title=\'Edit Post\'>\n<img alt=\'\' class=\'icon-action\' height=\'18\' src=\'https://resources.blogblog.com/img/icon18_edit_allbkg.gif\' width=\'18\'/>\n</a>\n</span>\n</span>\n<div class=\'post-share-buttons goog-inline-block\'>\n</div>\n</div>\n<div class=\'post-footer-line post-footer-line-2\'>\n<span class=\'post-labels\'>\n</span>\n</div>\n<div class=\'post-footer-line post-footer-line-3\'>\n<span class=\'post-location\'>\n</span>\n</div>\n</div>\n</div>\n</div>\n\n </div></div>\n \n\n <div class="date-outer">\n \n<h2 class=\'date-header\'><span>Saturday, July 23, 2016</span></h2>\n\n <div class="date-posts">\n \n<div class=\'post-outer\'>\n<div class=\'post hentry uncustomized-post-template\' itemprop=\'blogPost\' itemscope=\'itemscope\' itemtype=\'http://schema.org/BlogPosting\'>\n<meta content=\'4195135246107166251\' itemprop=\'blogId\'/>\n<meta content=\'2468107226962512288\' itemprop=\'postId\'/>\n<a name=\'2468107226962512288\'></a>\n<h3 class=\'post-title entry-title\' itemprop=\'name\'>\n<a href=\'http://neopythonic.blogspot.com/2016/07/about-spammers-and-comments.html\'>About spammers and comments</a>\n</h3>\n<div class=\'post-header\'>\n<div class=\'post-header-line-1\'></div>\n</div>\n<div class=\'post-body entry-content\' id=\'post-body-2468107226962512288\' itemprop=\'description articleBody\'>\nI\'m turning off commenting for my blogs. While I\'ve enjoyed some feedback, the time wasted to moderate spam posts just isn\'t worth it. Thank you, spammers! :-(\n<div style=\'clear: both;\'></div>\n</div>\n<div class=\'post-footer\'>\n<div class=\'post-footer-line post-footer-line-1\'>\n<span class=\'post-author vcard\'>\nPosted by\n<span class=\'fn\' itemprop=\'author\' itemscope=\'itemscope\' itemtype=\'http://schema.org/Person\'>\n<meta content=\'https://www.blogger.com/profile/12821714508588242516\' itemprop=\'url\'/>\n<a class=\'g-profile\' href=\'https://www.blogger.com/profile/12821714508588242516\' rel=\'author\' title=\'author profile\'>\n<span itemprop=\'name\'>Guido van Rossum</span>\n</a>\n</span>\n</span>\n<span class=\'post-timestamp\'>\nat\n<meta content=\'http://neopythonic.blogspot.com/2016/07/about-spammers-and-comments.html\' itemprop=\'url\'/>\n<a class=\'timestamp-link\' href=\'http://neopythonic.blogspot.com/2016/07/about-spammers-and-comments.html\' rel=\'bookmark\' title=\'permanent link\'><abbr class=\'published\' itemprop=\'datePublished\' title=\'2016-07-23T14:11:00-07:00\'>2:11 PM</abbr></a>\n</span>\n<span class=\'reaction-buttons\'>\n</span>\n<span class=\'post-comment-link\'>\n<a class=\'comment-link\' href=\'https://www.blogger.com/comment.g?blogID=4195135246107166251&postID=2468107226962512288\' onclick=\'\'>\nNo comments:\n </a>\n</span>\n<span class=\'post-backlinks post-comment-link\'>\n</span>\n<span class=\'post-icons\'>\n<span class=\'item-control blog-admin pid-1774424698\'>\n<a href=\'https://www.blogger.com/post-edit.g?blogID=4195135246107166251&postID=2468107226962512288&from=pencil\' title=\'Edit Post\'>\n<img alt=\'\' class=\'icon-action\' height=\'18\' src=\'https://resources.blogblog.com/img/icon18_edit_allbkg.gif\' width=\'18\'/>\n</a>\n</span>\n</span>\n<div class=\'post-share-buttons goog-inline-block\'>\n</div>\n</div>\n<div class=\'post-footer-line post-footer-line-2\'>\n<span class=\'post-labels\'>\n</span>\n</div>\n<div class=\'post-footer-line post-footer-line-3\'>\n<span class=\'post-location\'>\n</span>\n</div>\n</div>\n</div>\n</div>\n\n </div></div>\n \n\n <div class="date-outer">\n \n<h2 class=\'date-header\'><span>Wednesday, May 18, 2016</span></h2>\n\n <div class="date-posts">\n \n<div class=\'post-outer\'>\n<div class=\'post hentry uncustomized-post-template\' itemprop=\'blogPost\' itemscope=\'itemscope\' itemtype=\'http://schema.org/BlogPosting\'>\n<meta content=\'4195135246107166251\' itemprop=\'blogId\'/>\n<meta content=\'4387175608679924841\' itemprop=\'postId\'/>\n<a name=\'4387175608679924841\'></a>\n<h3 class=\'post-title entry-title\' itemprop=\'name\'>\n<a href=\'http://neopythonic.blogspot.com/2016/05/union-syntax.html\'>Union syntax</a>\n</h3>\n<div class=\'post-header\'>\n<div class=\'post-header-line-1\'></div>\n</div>\n<div class=\'post-body entry-content\' id=\'post-body-4387175608679924841\' itemprop=\'description articleBody\'>\n<h2>\nUnion syntax</h2>\n<blockquote class="tr_bq">\n<i>(I\'m trying to do this as a quick post in response to some questions I received on this topic. I realize this will probably reopen the whole discussion about the best syntax for types, but sorry folks, PEP 484 was accepted nearly a year ago, after many months of discussions and hundreds of messages. It\'s unlikely that any idea you can think of here would be new. This post just explains the rationale of one particular decision and tries to put it in some context.)</i></blockquote>\nI\'ve heard some grumbling about the union syntax in <a href="https://www.python.org/dev/peps/pep-0484/">PEP 484</a>: Union[X, Y, Z] (where X, Y and Z are arbitrary type expressions). In the past people have suggested X|Y|Z for this, or (X, Y, Z) or {X, Y, Z}. Why did we go with the admittedly clunkier Union[X, Y, Z]?<br />\n<br />\nFirst of all, despite all the attention drawn to it, unions are actually a pretty minor feature, and you shouldn\'t be using them much. So you also shouldn\'t care that much.<br />\n<h3>\nWhy not X|Y|Z?</h3>\nThis won\'t fly because we want compatibility with versions of Python 3 that were already frozen (see below). We want to be able to express e.g. a union of int and str, which under this notation would be written as int|str. But for that to fly we\'d have to modify the builtin \'type\' class to implement __or__ -- and that wouldn\'t fly on already-frozen Python versions. Supporting X|Y only for types (like List) imported from the typing module and some other notation for builtin types would only sow confusion. So X|Y|Z is out.<br />\n<h3>\nWhy not {X, Y, Z}?</h3>\nThat\'s the set with elements X, Y and Z, using the builtin set notation. We can usefully consider types to be sets of values, and this makes a union a set of values too (that\'s why it\'s called union :-).<br />\n<br />\nHowever, {X, Y, Z} confuses the set of <i>types</i> with the set of <i>values</i>, which I consider a mortal sin. This would just cause endless confusion.<br />\n<br />\nThis notation would also confuse things when taking the union of several classes that overlap, e.g. if we have classes B and C, where C inherits from B, then the union of B and C is just B. But the builtin set doesn\'t see it that way. In contrast, the X|Y notation could actually solve this (since in principle we could overload __or__ to do whatever we want), and the Union[] operator ("functor"?) from PEP 484 indeed solves this -- in this example Union[B, C] returns the (non-union) type B, both in the type checker and at runtime.<br />\n<h3>\nWhy not (X, Y, Z)?</h3>\nThat\'s the tuple (X, Y, Z). It has the same disadvantages as {X, Y, Z}, but at least it has the advantage of being similar to how unions are expressed as arguments to isinstance(), for example isinstance(x, (int, str, list)) or isinstance(x, (Sequence, Mapping)). (Similarly the except clause: try: ... / except (KeyError, IndexError): ...)<br />\n<br />\nAnother problem with tuples is that the tuple syntax is already overloaded in so many ways that it would be confused with other uses even more easily. One particular confusion would be other generic types, for which we\'d still want to use square brackets. (You can\'t really beat Iterable[int] for clarity if you have an iterable of integers. :-) Suppose you have a sequence of values that could be integers or strings. In PEP 484 notation we write this as Sequence[Union[int, str]]. Using the tuple notation we\'d want to write this as Sequence[(int, str)]. But it turns out that the __getitem__ overload on the metaclass can\'t tell the difference between Sequence[(int, str)] and Sequence[int, str] -- and we would like to reject the latter as a mistake since Sequence[] is a generic class over a single parameter. (An example of a generic class over two parameters would be Mapping[K, V].) Disambiguating all this would place us on very thin ice indeed.<br />\n<br />\nThe nail in this idea\'s coffin is the competing idea of using (X, Y, Z) to indicate a tuple with three items, with respective types, X, Y and Z. At first sight this seems an even better use of the tuple syntax than unions would be, and tuples are way more common than unions. But it runs afoul of the same problems with Foo[(X, Y)] vs. Foo[X, Y]. (Also, there would be no easy way to describe what PEP 484 calls Tuple[X, ...], i.e. a variable-length tuple with uniform item type X.)<br />\n<h3>\nPS. Why support old Python 3 versions?</h3>\nThe reason for supporting older versions is adoption. Only a relatively small crowd of early adopters can upgrade to the latest Python version as soon as it\'s out; the rest of us are stuck on older versions (even Python 2.7!). <br />\n<br />\nSo for PEP 484 and the typing module, we wanted to support 3.2 and up -- we chose 3.2 because it\'s the newest Python 3 supported by some older but still popular Ubuntu and Debian distributions. (Also, 3.0 and 3.1 were too immature at their time of release to ever have a large following.)<br />\n<br />\nThere\'s a typing package that you can install easily using pip, and this defines all sorts of useful things for typing, from Any and Union to generic versions of List and Sequence. But such a package can\'t modify existing builtins like int or list.<br />\n<br />\n(Eventually we also added Python 2.7 support, using type comments for function signatures.)\n<div style=\'clear: both;\'></div>\n</div>\n<div class=\'post-footer\'>\n<div class=\'post-footer-line post-footer-line-1\'>\n<span class=\'post-author vcard\'>\nPosted by\n<span class=\'fn\' itemprop=\'author\' itemscope=\'itemscope\' itemtype=\'http://schema.org/Person\'>\n<meta content=\'https://www.blogger.com/profile/12821714508588242516\' itemprop=\'url\'/>\n<a class=\'g-profile\' href=\'https://www.blogger.com/profile/12821714508588242516\' rel=\'author\' title=\'author profile\'>\n<span itemprop=\'name\'>Guido van Rossum</span>\n</a>\n</span>\n</span>\n<span class=\'post-timestamp\'>\nat\n<meta content=\'http://neopythonic.blogspot.com/2016/05/union-syntax.html\' itemprop=\'url\'/>\n<a class=\'timestamp-link\' href=\'http://neopythonic.blogspot.com/2016/05/union-syntax.html\' rel=\'bookmark\' title=\'permanent link\'><abbr class=\'published\' itemprop=\'datePublished\' title=\'2016-05-18T11:55:00-07:00\'>11:55 AM</abbr></a>\n</span>\n<span class=\'reaction-buttons\'>\n</span>\n<span class=\'post-comment-link\'>\n<a class=\'comment-link\' href=\'https://www.blogger.com/comment.g?blogID=4195135246107166251&postID=4387175608679924841\' onclick=\'\'>\nNo comments:\n </a>\n</span>\n<span class=\'post-backlinks post-comment-link\'>\n</span>\n<span class=\'post-icons\'>\n<span class=\'item-control blog-admin pid-1774424698\'>\n<a href=\'https://www.blogger.com/post-edit.g?blogID=4195135246107166251&postID=4387175608679924841&from=pencil\' title=\'Edit Post\'>\n<img alt=\'\' class=\'icon-action\' height=\'18\' src=\'https://resources.blogblog.com/img/icon18_edit_allbkg.gif\' width=\'18\'/>\n</a>\n</span>\n</span>\n<div class=\'post-share-buttons goog-inline-block\'>\n</div>\n</div>\n<div class=\'post-footer-line post-footer-line-2\'>\n<span class=\'post-labels\'>\n</span>\n</div>\n<div class=\'post-footer-line post-footer-line-3\'>\n<span class=\'post-location\'>\n</span>\n</div>\n</div>\n</div>\n</div>\n<div class=\'post-outer\'>\n<div class=\'post hentry uncustomized-post-template\' itemprop=\'blogPost\' itemscope=\'itemscope\' itemtype=\'http://schema.org/BlogPosting\'>\n<meta content=\'4195135246107166251\' itemprop=\'blogId\'/>\n<meta content=\'8854185106045973213\' itemprop=\'postId\'/>\n<a name=\'8854185106045973213\'></a>\n<h3 class=\'post-title entry-title\' itemprop=\'name\'>\n<a href=\'http://neopythonic.blogspot.com/2016/05/adding-type-annotations-for-fspath.html\'>Adding type annotations for fspath</a>\n</h3>\n<div class=\'post-header\'>\n<div class=\'post-header-line-1\'></div>\n</div>\n<div class=\'post-body entry-content\' id=\'post-body-8854185106045973213\' itemprop=\'description articleBody\'>\n<div>\n<h1 class="ace-copy-paste-skip-this-tag">\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Type annotations for fspath</span></h1>\n</div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Python 3.6 will have a new </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="http://www.pixelmonkey.org/2013/04/11/python-double-under-double-wonder" href="http://www.pixelmonkey.org/2013/04/11/python-double-under-double-wonder" rel="noreferrer nofollow" target="_blank">dunder protocol</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">, </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">__fspath__()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , which should be supported by classes that represent filesystem paths. Example of such classes are the </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">pathlib.Path</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> family and </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">os.DirEntry</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> (returned by </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">os.scandir()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> ).</span></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">You can read more about this protocol in the brand new </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://www.python.org/dev/peps/pep-0519/" href="https://www.python.org/dev/peps/pep-0519/" rel="noreferrer nofollow" target="_blank">PEP 519</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">. In this blog post I’m going to discuss how we would add type annotations for these additions to the standard library.</span></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">I’m making frequent use of </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">AnyStr</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , a quite magical type variable predefined in the typing module. If you’re not familiar with it, I recommend reading my </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="http://neopythonic.blogspot.com/2016/05/the-anystr-type-variable.html" href="http://neopythonic.blogspot.com/2016/05/the-anystr-type-variable.html" rel="noreferrer nofollow" target="_blank">blog post about </a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code attrlink url"><a class="attrlink" data-target-href="http://neopythonic.blogspot.com/2016/05/the-anystr-type-variable.html" href="http://neopythonic.blogspot.com/2016/05/the-anystr-type-variable.html" rel="noreferrer nofollow" target="_blank">AnyStr</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> . You may also want to read up on </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://www.python.org/dev/peps/pep-0484/#generics" href="https://www.python.org/dev/peps/pep-0484/#generics" rel="noreferrer nofollow" target="_blank">generics in PEP 484</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> (or read </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="http://mypy.readthedocs.io/en/latest/generics.html" href="http://mypy.readthedocs.io/en/latest/generics.html" rel="noreferrer nofollow" target="_blank">mypy’s docs on the subject</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">).</span></div>\n<div>\n<h2>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Adding os.scandir() to the stubs for os.py</span></h2>\n</div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">For practice, let’s see if we can add something to the </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://github.com/python/typeshed/blob/master/stdlib/3/os/__init__.pyi" href="https://github.com/python/typeshed/blob/master/stdlib/3/os/__init__.pyi" rel="noreferrer nofollow" target="_blank">stub file for os.py</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">. As of this writing there’s no </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://github.com/python/typeshed" href="https://github.com/python/typeshed" rel="noreferrer nofollow" target="_blank">typeshed</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> information for </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code attrlink url"><a class="attrlink" data-target-href="https://docs.python.org/3/library/os.html" href="https://docs.python.org/3/library/os.html" rel="noreferrer nofollow" target="_blank">os.scandir()</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , which I think is a shame. I think the following will do nicely. Note how we only define </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">DirEntry</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> and </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">scandir()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> for Python versions >= 3.5. (Mypy </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://github.com/python/mypy/issues/698" href="https://github.com/python/mypy/issues/698" rel="noreferrer nofollow" target="_blank">doesn’t support this yet</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">, but it will soon, and the example here still works — it just doesn’t realize </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">scandir()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> is only available in Python 3.5.) This could be added to the end of </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">stdlib/3/os/__init__.pyi</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">from typing import Generic, AnyStr, overload, Iterator</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><br /></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">if sys.version_info </span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">></span>= (3, 5):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><br /></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> class DirEntry(</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-42889384956">Generic</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">[AnyStr]):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> name = ... # type: AnyStr</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> path = ... # type: AnyStr</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> def inode(self) -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">></span> int: ...</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> def is_dir(self, *, follow_symlinks: bool = ...) -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">></span> bool: ...</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> def is_file(self, *, follow_symlinks: bool = ...) -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">></span> bool: ...</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> def is_symlink(self) -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">></span> bool: ...</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> def stat(self, *, follow_symlinks: bool = ...) -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">></span> stat_result: ...</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><br /></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283"> @overload</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283"> def scandir() -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">></span> Iterator[DirEntry[str]]: ...</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283"> @overload</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283"> def scandir(path: </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-21653656371 thread-53340393283">AnyStr</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">) -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">></span> Iterator[DirEntry[AnyStr]]: ...</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Deconstructing this a bit, we see a </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://www.python.org/dev/peps/pep-0484/#generics" href="https://www.python.org/dev/peps/pep-0484/#generics" rel="noreferrer nofollow" target="_blank">generic class</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> (that’s what the </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">Generic[AnyStr]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> base class means) and an overloaded function. The </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">scandir()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> definition uses </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">@overload</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> because it can also be called without arguments. </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">We could also write it as follows; it’ll work either way:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283"> @overload</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283"> def scandir(</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">path: </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-72143937476">str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> = ...) -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">></span> Iterator[DirEntry[str]]: ...</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> @overload</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> def scandir(path: </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-86865838424">bytes</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">) -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">></span> Iterator[DirEntry[bytes</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">]]: ...</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Either way there really are three ways to call </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">scandir()</span><span class=""> </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">, all three returning an iterable of DirEntry objects:</span></div>\n<div>\n<br /></div>\n<ul class="listtype-bullet listindent1 list-bullet1">\n<li><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">scandir() -</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">></span> Iterator[DirEntry[str]]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> </span></li>\n<li><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">scandir(str) -</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">></span> Iterator[DirEntry[str]]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> </span></li>\n<li><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">scandir(bytes) -</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">></span> Iterator[DirEntry[bytes]]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> </span></li>\n</ul>\n<div>\n<h2>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Adding os.fspath()</span></h2>\n</div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Next I’ll show how to add </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">os.fspath()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> and how to add support for the </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">__fspath__()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> protocol to </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">DirEntry</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> .</span></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://www.python.org/dev/peps/pep-0519/" href="https://www.python.org/dev/peps/pep-0519/" rel="noreferrer nofollow" target="_blank">PEP 519</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> defines a simple ABC (</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://docs.python.org/3/library/abc.html" href="https://docs.python.org/3/library/abc.html" rel="noreferrer nofollow" target="_blank">abstract base class</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">), </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">PathLike</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , with one method, </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">__fspath__()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> . We need to add this to the </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://github.com/python/typeshed/blob/master/stdlib/3/os/__init__.pyi" href="https://github.com/python/typeshed/blob/master/stdlib/3/os/__init__.pyi" rel="noreferrer nofollow" target="_blank">stub for </a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code attrlink url"><a class="attrlink" data-target-href="https://github.com/python/typeshed/blob/master/stdlib/3/os/__init__.pyi" href="https://github.com/python/typeshed/blob/master/stdlib/3/os/__init__.pyi" rel="noreferrer nofollow" target="_blank">os.py</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , as follows:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">class PathLike(</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-84113787329">Generic[AnyStr]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> @abstractmethod</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> def __fspath__(self) -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">></span> AnyStr: ...</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">That’s really all there is to it (except for the </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">sys.version_info</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> check, which I’ll leave out here since it doesn’t really work yet). Next we define </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">os.fspath()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , which wraps this protocol. It’s slightly more complicated than just calling its argument’s </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">__fspath__()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> method, because it also handles strings and bytes. So here it is:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">@overload</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def fspath(path: PathLike[AnyStr]) -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">></span> AnyStr: ...</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">@overload</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def fspath(path: AnyStr) -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">></span> AnyStr: ...</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Easy enough! Next is update the definition of </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">DirEntry</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> . That’s easy too — in fact we only need to make it inherit from </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">PathLike[AnyStr]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , the rest is the same as the definition I gave above:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">class DirEntry(PathLike[AnyStr], Generic[AnyStr]):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> # Everything else unchanged!</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">The only slightly complicated bit here is the extra base class </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">Generic[AnyStr]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> . This seems redundant, and in fact PEP 484 says we can leave it off, but mypy doesn’t support that yet, and it’s quite harmless — this just rubs into mypy’s face that this is a generic class of one type variable (the by-now famous </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">AnyStr</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> ).</span></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Finally we need to make a similar change to the </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://github.com/python/typeshed/blob/master/stdlib/3.4/pathlib.pyi" href="https://github.com/python/typeshed/blob/master/stdlib/3.4/pathlib.pyi" rel="noreferrer nofollow" target="_blank">stub for </a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code attrlink url"><a class="attrlink" data-target-href="https://github.com/python/typeshed/blob/master/stdlib/3.4/pathlib.pyi" href="https://github.com/python/typeshed/blob/master/stdlib/3.4/pathlib.pyi" rel="noreferrer nofollow" target="_blank">pathlib.py</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> . Again, all we need to do is to make </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">PurePath</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> inherit from </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">PathLike[str]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , like so:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">from os import PathLike</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><br /></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">class PurePath(PathLike[str]):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-86168840759"> # Everything else unchanged!</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">However, here we don’t add </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">Generic</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , because this is not a generic class! It inherits from </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">PathLike[str]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , which is quite un-generic, since it’s </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">PathLike</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z i"><i>specialized</i></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> for just </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> .</span></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Note that we don’t actually have to define the </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">__fspath__()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> method in these stubs — we’re not supposed to call them directly, and stubs don’t provide implementations, only interfaces.</span></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Putting it all together, we see that it’s quite elegant:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-bash" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">for a in os.scandir(\'.\'):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-bash" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> b = os.fspath(a)</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-bash" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> # Here, </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-459852399">the</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> typechecker will know that the type of b is str!</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">The derivation that </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">b</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> has type </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> is not too complicated: first, </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">os.scandir(\'.\')</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> has a </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> argument, so it returns an iterator of </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">DirEntry</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> objects parameterized with </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , which we write as </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">DirEntry[str]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> . Passing this </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">DirEntry[str]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> to </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">os.fspath()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> then takes the first of that function’s two overloads (the one with </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">PathLike[AnyStr]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> ), since it doesn’t match the second one ( </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">DirEntry</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> doesn’t inherit from </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">AnyStr</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , because it’s neither a </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> nor </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">bytes</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> ). Further the </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">AnyStr</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> type variable in </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">PathLike[AnyStr]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> is solved to stand for just </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , because </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">DirEntry[str]</span><span class=""> </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> inherits from </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">PathLike[str]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> . This is the specialized version of what the code says: </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">DirEntry[AnyStr]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> inherits from </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">PathLike[AnyStr]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> .</span></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Okay, so maybe that last paragraph was intermediate or advanced. And maybe it could be expanded. Maybe I’ll write another blog about how type inference works, but there’s a lot on that topic, and other authors have probably already written better introductory material about generics (in other languages, though).</span></div>\n<div>\n<h2>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Making things accept PathLike</span></h2>\n</div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">There’s a bit of cleanup work that I’ve left out. PEP 519 says that many stdlib functions that currently take strings for pathnames will be modified to also accept </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">PathLike</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> . For example, here’s how the signatures for </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">os.scandir()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> would change:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">@overload</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">def scandir() -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">></span> Iterator[DirEntry[str]]: ...</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">@overload</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">def scandir(path: </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-21653656371">AnyStr</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">) -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">></span> Iterator[DirEntry[AnyStr]]: ...</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">@overload</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def scandir(path: PathLike[AnyStr]) -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">></span> Iterator[DirEntry[AnyStr]]: ...</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">The first two entries are unchanged; I’ve just added a third overload. (Note that the alternative way of defining </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">scandir()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> would require more changes — an indication that this way is more natural.)</span></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">I also tried doing this with a union:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">@overload</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def scandir() </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">-</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">></span> Iterator[DirEntry[str]]: ...</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">@overload</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">def scandir(path: </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-21653656371">Union[AnyStr</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283">, PathLike[AnyStr]]) -</span></code><code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z thread-53340393283"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">></span> Iterator[DirEntry[AnyStr]]: ...</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">But I couldn’t get this to work, so the extra overload is probably the best we can do. Quite a few functions will require a similar treatment, sometimes introducing overloading where none exists today (but that shouldn’t hurt anything).</span></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">A note about </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">pathlib</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> : since it only deals with strings, its methods (the ones that PEP 519 says should be changed anyway) should use </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">PathLike[str]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> rather than </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">PathLike[AnyStr]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> .</span></div>\n<div>\n<h2>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Acknowledgments</span></h2>\n</div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">(Thanks for comments on the draft to Stephen Turnbull, Koos Zevenhoven, Eth</span><span class="author-d-4z65zz66zl57z75zyiz66zfr2fz87zwz89znuiz90zz78zoz72zz87zhgh7z71zz88zz77zfz66zquz87zq3xz82zcz82zq5caz88z9">a</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">n Furman, and Brett Cannon.)</span></div>\n<div style=\'clear: both;\'></div>\n</div>\n<div class=\'post-footer\'>\n<div class=\'post-footer-line post-footer-line-1\'>\n<span class=\'post-author vcard\'>\nPosted by\n<span class=\'fn\' itemprop=\'author\' itemscope=\'itemscope\' itemtype=\'http://schema.org/Person\'>\n<meta content=\'https://www.blogger.com/profile/12821714508588242516\' itemprop=\'url\'/>\n<a class=\'g-profile\' href=\'https://www.blogger.com/profile/12821714508588242516\' rel=\'author\' title=\'author profile\'>\n<span itemprop=\'name\'>Guido van Rossum</span>\n</a>\n</span>\n</span>\n<span class=\'post-timestamp\'>\nat\n<meta content=\'http://neopythonic.blogspot.com/2016/05/adding-type-annotations-for-fspath.html\' itemprop=\'url\'/>\n<a class=\'timestamp-link\' href=\'http://neopythonic.blogspot.com/2016/05/adding-type-annotations-for-fspath.html\' rel=\'bookmark\' title=\'permanent link\'><abbr class=\'published\' itemprop=\'datePublished\' title=\'2016-05-18T07:06:00-07:00\'>7:06 AM</abbr></a>\n</span>\n<span class=\'reaction-buttons\'>\n</span>\n<span class=\'post-comment-link\'>\n<a class=\'comment-link\' href=\'https://www.blogger.com/comment.g?blogID=4195135246107166251&postID=8854185106045973213\' onclick=\'\'>\n3 comments:\n </a>\n</span>\n<span class=\'post-backlinks post-comment-link\'>\n</span>\n<span class=\'post-icons\'>\n<span class=\'item-control blog-admin pid-1774424698\'>\n<a href=\'https://www.blogger.com/post-edit.g?blogID=4195135246107166251&postID=8854185106045973213&from=pencil\' title=\'Edit Post\'>\n<img alt=\'\' class=\'icon-action\' height=\'18\' src=\'https://resources.blogblog.com/img/icon18_edit_allbkg.gif\' width=\'18\'/>\n</a>\n</span>\n</span>\n<div class=\'post-share-buttons goog-inline-block\'>\n</div>\n</div>\n<div class=\'post-footer-line post-footer-line-2\'>\n<span class=\'post-labels\'>\n</span>\n</div>\n<div class=\'post-footer-line post-footer-line-3\'>\n<span class=\'post-location\'>\n</span>\n</div>\n</div>\n</div>\n</div>\n\n </div></div>\n \n\n <div class="date-outer">\n \n<h2 class=\'date-header\'><span>Tuesday, May 17, 2016</span></h2>\n\n <div class="date-posts">\n \n<div class=\'post-outer\'>\n<div class=\'post hentry uncustomized-post-template\' itemprop=\'blogPost\' itemscope=\'itemscope\' itemtype=\'http://schema.org/BlogPosting\'>\n<meta content=\'4195135246107166251\' itemprop=\'blogId\'/>\n<meta content=\'1468618515324597653\' itemprop=\'postId\'/>\n<a name=\'1468618515324597653\'></a>\n<h3 class=\'post-title entry-title\' itemprop=\'name\'>\n<a href=\'http://neopythonic.blogspot.com/2016/05/the-anystr-type-variable.html\'>The AnyStr type variable</a>\n</h3>\n<div class=\'post-header\'>\n<div class=\'post-header-line-1\'></div>\n</div>\n<div class=\'post-body entry-content\' id=\'post-body-1468618515324597653\' itemprop=\'description articleBody\'>\n<div>\n<h1 class="ace-copy-paste-skip-this-tag">\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">The AnyStr type variable </span></h1>\n</div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">I was drafting a blog post on how to add type annotations for the new </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">__fspath__()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> protocol (</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://www.python.org/dev/peps/pep-0519/" href="https://www.python.org/dev/peps/pep-0519/" rel="noreferrer nofollow" target="_blank">PEP 519</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">) when I realized that I should write a separate post about </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">AnyStr</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> . So here it is.</span></div>\n<div>\n<h2>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">A simple function on strings</span></h2>\n</div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Let’s write a function that surrounds a string in parentheses. We’ll put it in a file named </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">demo.py</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> :</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def parenthesize(s):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> return \'(\' + s + \')\'</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">It works, too:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">>>> from demo import parenthesize</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">>>> print(parenthesize(\'hola\'))</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">(hola)</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Of course, if you pass it something that’s not a string it will fail:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">>>> parenthesize(42)</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Traceback (most recent call last):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> File "demo.py", line 1, in <module></module></span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> File "demo.py", line 2, in parenthesize</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">TypeError: Can\'t convert \'int\' object to str implicitly</span></code></div>\n<div>\n<h2>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Adding type annotations</span></h2>\n</div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Using </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://www.python.org/dev/peps/pep-0484/" href="https://www.python.org/dev/peps/pep-0484/" rel="noreferrer nofollow" target="_blank">PEP 484</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> type annotations we can clarify our little function’s signature:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def parenthesize(s: str) -> str:</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> return \'(\' + s + \')\'</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Nothing to it, right? Even if you’ve never heard of PEP 484 before you can guess what this means. (Note that PEP 484 also says that the runtime behavior is unchanged. The calls I showed above will still have exactly the same effect, including the </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">TypeError</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> raised by </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">parenthesize(42)</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> .)</span></div>\n<div>\n<h2>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Polymorphic functions</span></h2>\n</div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Now suppose this is actually part of a networking app and we need to be able to parenthesize byte strings as well as text strings. Here’s how you’d implement that:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def parenthesize(s):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> if isinstance(s, str):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> return \'(\' + s + \')\'</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> elif isinstance(s, bytes):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> return b\'(\' + s + b\')\'</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> else:</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> raise TypeError(f"That\'s not a string, it\'s a {type(s)}") # See </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://www.python.org/dev/peps/pep-0498/" href="https://www.python.org/dev/peps/pep-0498/" rel="noreferrer nofollow" target="_blank">PEP 498</a></span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">With a fancy word we call that a polymorphic function. How do you write a signature for such a function? For the answer we have to dive a little deeper into PEP 484. It defines a nifty operator named </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">Union</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> that lets us state that a type can be either this or that (or something else). In our case, it’s either </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> or </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">bytes</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , so we can write it like this:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">from typing import Union</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><br /></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def parenthesize(s: Union[str, bytes]) -> Union[str, bytes]:</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> if isinstance(s, str):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> # Etc.</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Now let’s write a little main program with a bug, to show off the type checker:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">from demo import parenthesize</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1" spellcheck="false"><br /></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">a = parenthesize(\'hello\')</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">b = parenthesize(b\'hola\')</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">c = a + b ### bug here<-- bug="" span=""></--></span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">print(c)</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">When we try to run this, the two </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">parenthesize()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> calls work fine (yay polymorphism!) but we get a </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">TypeError</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> on the last line:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">$ python3 main.py </span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Traceback (most recent call last):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> File "main.py", line 5, in <module></module></span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> c = a + b ### bug here<-- bug="" span=""></--></span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">TypeError: Can\'t convert \'bytes\' object to str implicitly</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">The reason should be pretty obvious: in Python 3 you can’t mix bytes and str objects. And when we type-check this program using </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="http://mypy-lang.org/" href="http://mypy-lang.org/" rel="noreferrer nofollow" target="_blank">mypy</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> we indeed get a type error:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">$ mypy main.py </span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">main.py:5: error: Unsupported operand types for + (likely involving Union)</span></code></div>\n<div>\n<h2>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Debugging the bug</span></h2>\n</div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">So let’s try a program without a bug:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">from demo import parenthesize</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1" spellcheck="false"><br /></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-bash" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">a = parenthesize(\'hello\')</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-bash" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">b = parenthesize(\'hola\')</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-bash" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">c = a + b ### bug here<-- bug="" no="" span=""></--></span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-bash" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">print(c)</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Run it and it works great:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">$ python3 main.py</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">(hello)(hola)</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">So the type checker should be happy too, right?</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">$ mypy main.py</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">main.py:5: error: Unsupported operand types for + (likely involving Union)</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Whoops! The same error. What happened? Of course, I set you up, so I can explain something about type checking.</span></div>\n<div>\n<h2>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">The trouble with </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z s"><s>tribbles</s></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> unions</span></h2>\n</div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">The type checker takes the signature at face value, so that when checking the call, it infers the type </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">Union[str, bytes]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> for every call to </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">parenthesize()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , regardless of what the arguments are. This is because, for most functions of even modest complexity, a type checker doesn’t understand enough about what’s going on in the function body, so it just has to believe the types in the signature (even though in this particular case it would probably be easy enough to do better).</span></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">In our test program the types of </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">a</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> and </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">b</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> are both inferred to be exactly what </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">parenthesize()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> claims to return, i.e., both variables have the type </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">Union[str, bytes]</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> . The type checker then analyzes the expression </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">a + b</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , and for this i</span><span class="author-d-z89zz72zz79zvhpz67zz83z9z66zz78zxz122z1xz74zu4z83z4myz73zkiz71zdz77zz71zz65zz79z4iz79ziosz75zz85zreqz69z">t</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> discovers a problem: if </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">a</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> is either str or bytes, and so is </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">b</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , then the </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">+</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> operator may be invoked on any of these combinations of types: </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str + str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str + bytes</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">bytes + str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , or </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">bytes + bytes</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> . But only the first and the last are valid! In Python 3, </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str + bytes</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> or </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">bytes + str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> are invalid operations.</span></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Aside: Even in Python 2, those two are suspect: since while </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">\'x\' + u\'y\'</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> indeed works (returning </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">u\'xy\'</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> ), other combinations will raise UnicodeDecodeError, e.g.:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">>>>\'Franç\' + u\'ois\'</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Traceback (most recent call last):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> File "<stdin>", line 1, in <module></module></stdin></span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">UnicodeDecodeError: \'ascii\' codec can\'t decode byte 0xc3 in position 4:</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ruby" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">ordinal not in range(128)</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Anyway, the type checker doesn’t like this business, and it rejects operations on Unions where some combinations are invalid. What can we do instead?</span></div>\n<div>\n<h2>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Function overloading</span></h2>\n</div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">One option would be function overloading. PEP 484 defines a magical decorator, </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">@overload</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , which lets us get around this problem. We could write something like this:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">from typing import overload</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><br /></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">@overload</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def parenthesize(s: str) -> str: ...</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">@overload</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def parenthesize(s: bytes) -> bytes: ...</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">This tells the type checker that if the argument is a </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , the return value is also a </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> , and similarly for </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">bytes</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> . Unfortunately </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">@overload</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> is only allowed in </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="http://mypy.readthedocs.io/en/latest/basics.html#library-stubs-and-the-typeshed-repo" href="http://mypy.readthedocs.io/en/latest/basics.html#library-stubs-and-the-typeshed-repo" rel="noreferrer nofollow" target="_blank">stub files</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">, which are a kind of interface definition files that show a type checker the signatures of a module’s contents without giving the implementation.</span></div>\n<div>\n<h2>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Type variables</span></h2>\n</div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Fortunately there’s an even better way, using type variables. This is how it goes:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">from typing import TypeVar</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><br /></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">S = TypeVar(\'S\')</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><br /></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def parenthesize(s: S) -> S:</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> if isinstance(s, str):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> return \'(\' + s + \')\'</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> elif isinstance(s, bytes):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> return b\'(\' + s + b\')\'</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> else:</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> raise TypeError("That\'s not a string, dude! It\'s a %s" % type(s))</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Well… Almost. Our main.py program (unchanged from above) now gets a clean bill of health, but when we type-check this version we get errors on both </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">return</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> lines:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-bash" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">demo.py: note: In function "parenthesize":</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-bash" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">demo.py:7: error: Incompatible return value type: expected S`-1, got builtins.str</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-bash" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">demo.py:9: error: Incompatible return value type: expected S`-1, got builtins.bytes</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">This is a </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://github.com/python/mypy/issues/1539" href="https://github.com/python/mypy/issues/1539" rel="noreferrer nofollow" target="_blank">bit hard to fathom</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">, but the fix is what I was leading up to anyway, so I’ll reveal it now:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">from typing import TypeVar</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><br /></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">S = TypeVar(\'S\', str, bytes)</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><br /></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def parenthesize(s: S) -> S:</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> if isinstance(s, str):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> return \'(\' + s + \')\'</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> elif isinstance(s, bytes):</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> return b\'(\' + s + b\')\'</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> else:</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> raise TypeError("That\'s not a string, dude! It\'s a %s" % type(s))</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">The only changed line is this one:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-bash" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">S = TypeVar(\'S\', str, bytes)</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">This notation is called a </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="http://mypy.readthedocs.io/en/latest/generics.html#type-variables-with-value-restriction" href="http://mypy.readthedocs.io/en/latest/generics.html#type-variables-with-value-restriction" rel="noreferrer nofollow" target="_blank">type variable with value restriction</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> . Yes, it’s mouthful; we sometimes also call it a </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z i"><i>constrained type variable</i></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">. </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">S</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> is a type variable restricted to a set of types. It also has the advantage of telling the type checker that types other than </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> or </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">bytes</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> are not acceptable. Without that, a call like this would have been considered valid:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-ini" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">x = parenthesize(42)</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">because the original type variable (without the restrictions) doesn\'t tell mypy that this is a bad idea.</span></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">In fact, this particular use case (a type variable constrained to str or bytes) is so commonly needed that it\'s predefined in the </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">typing</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> module, and all we have to do is import it:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">from typing import AnyStr</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1" spellcheck="false"><br /></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def parenthesize(s: AnyStr) -> AnyStr:</span></code></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> # Etc. -- trust me, it works!</span></code></div>\n<div>\n<h2>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">Real-world use of AnyStr</span></h2>\n</div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">In fact, this is how many polymorphic functions in the </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">os</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> and </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">os.path</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> modules are defined. For example, in the stub for </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">os.py</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> we find definitions like </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://github.com/python/typeshed/blob/master/stdlib/3/os/__init__.pyi#L236" href="https://github.com/python/typeshed/blob/master/stdlib/3/os/__init__.pyi#L236" rel="noreferrer nofollow" target="_blank">the following</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def link(src: AnyStr, link_name: AnyStr) -> None: ...</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">and also </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z attrlink url"><a class="attrlink" data-target-href="https://github.com/python/typeshed/blob/master/stdlib/3/os/path.pyi#L57" href="https://github.com/python/typeshed/blob/master/stdlib/3/os/path.pyi#L57" rel="noreferrer nofollow" target="_blank">this</a></span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">:</span></div>\n<div>\n<br /></div>\n<div>\n<code class="listtype-code listindent1 list-code1 lang-python" spellcheck="false"><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">def split(path: AnyStr) -> Tuple[AnyStr, AnyStr]: ...</span></code></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">These show us a bit more of the power of type variables: the signature for </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">link()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> indicates that either both arguments must be </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> or both must be </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">bytes</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> ; </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">split()</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> demonstrates that the type variable may also occur in more complex constructs: splitting a </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> returns a tuple of two </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">str</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> objects, while splitting </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">bytes</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> returns a tuple of two </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">bytes</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> objects.</span></div>\n<div>\n<br /></div>\n<div>\n<span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z">That’s all I wanted to share about </span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z inline-code">AnyStr</span><span class="author-d-16z86ztz122z98z81zz82zz85zunv3z82zpqfnlaklz69zehdlvnz73zz65zz81zz79zz73zpz76z22z66ztsz89zz122zz73zz122zfz83z"> . Thanks for comments on the draft to Stephen Turnbull, Koos Zevenhoven, Ethan Furman, and Brett Cannon.</span></div>\n<div>\n<br /></div>\n<div style=\'clear: both;\'></div>\n</div>\n<div class=\'post-footer\'>\n<div class=\'post-footer-line post-footer-line-1\'>\n<span class=\'post-author vcard\'>\nPosted by\n<span class=\'fn\' itemprop=\'author\' itemscope=\'itemscope\' itemtype=\'http://schema.org/Person\'>\n<meta content=\'https://www.blogger.com/profile/12821714508588242516\' itemprop=\'url\'/>\n<a class=\'g-profile\' href=\'https://www.blogger.com/profile/12821714508588242516\' rel=\'author\' title=\'author profile\'>\n<span itemprop=\'name\'>Guido van Rossum</span>\n</a>\n</span>\n</span>\n<span class=\'post-timestamp\'>\nat\n<meta content=\'http://neopythonic.blogspot.com/2016/05/the-anystr-type-variable.html\' itemprop=\'url\'/>\n<a class=\'timestamp-link\' href=\'http://neopythonic.blogspot.com/2016/05/the-anystr-type-variable.html\' rel=\'bookmark\' title=\'permanent link\'><abbr class=\'published\' itemprop=\'datePublished\' title=\'2016-05-17T09:53:00-07:00\'>9:53 AM</abbr></a>\n</span>\n<span class=\'reaction-buttons\'>\n</span>\n<span class=\'post-comment-link\'>\n<a class=\'comment-link\' href=\'https://www.blogger.com/comment.g?blogID=4195135246107166251&postID=1468618515324597653\' onclick=\'\'>\n5 comments:\n </a>\n</span>\n<span class=\'post-backlinks post-comment-link\'>\n</span>\n<span class=\'post-icons\'>\n<span class=\'item-control blog-admin pid-1774424698\'>\n<a href=\'https://www.blogger.com/post-edit.g?blogID=4195135246107166251&postID=1468618515324597653&from=pencil\' title=\'Edit Post\'>\n<img alt=\'\' class=\'icon-action\' height=\'18\' src=\'https://resources.blogblog.com/img/icon18_edit_allbkg.gif\' width=\'18\'/>\n</a>\n</span>\n</span>\n<div class=\'post-share-buttons goog-inline-block\'>\n</div>\n</div>\n<div class=\'post-footer-line post-footer-line-2\'>\n<span class=\'post-labels\'>\n</span>\n</div>\n<div class=\'post-footer-line post-footer-line-3\'>\n<span class=\'post-location\'>\n</span>\n</div>\n</div>\n</div>\n</div>\n\n </div></div>\n \n</div>\n<div class=\'blog-pager\' id=\'blog-pager\'>\n<span id=\'blog-pager-older-link\'>\n<a class=\'blog-pager-older-link\' href=\'http://neopythonic.blogspot.com/search?updated-max=2016-05-17T09:53:00-07:00&max-results=7\' id=\'Blog1_blog-pager-older-link\' title=\'Older Posts\'>Older Posts</a>\n</span>\n<a class=\'home-link\' href=\'http://neopythonic.blogspot.com/\'>Home</a>\n</div>\n<div class=\'clear\'></div>\n<div class=\'blog-feeds\'>\n<div class=\'feed-links\'>\nSubscribe to:\n<a class=\'feed-link\' href=\'http://neopythonic.blogspot.com/feeds/posts/default\' target=\'_blank\' type=\'application/atom+xml\'>Posts (Atom)</a>\n</div>\n</div>\n</div></div>\n</div>\n<div id=\'sidebar-wrapper\'>\n<div class=\'sidebar section\' id=\'header\'><div class=\'widget Header\' data-version=\'1\' id=\'Header1\'>\n<div id=\'header-inner\'>\n<div class=\'titlewrapper\'>\n<h1 class=\'title\'>\nNeopythonic\n</h1>\n</div>\n<div class=\'descriptionwrapper\'>\n<p class=\'description\'><span>Ramblings through technology, politics, culture and philosophy by the creator of the Python programming language.</span></p>\n</div>\n</div>\n</div></div>\n<div class=\'sidebar section\' id=\'sidebar\'><div class=\'widget Followers\' data-version=\'1\' id=\'Followers1\'>\n<h2 class=\'title\'>Followers</h2>\n<div class=\'widget-content\'>\n<div id=\'Followers1-wrapper\'>\n<div style=\'margin-right:2px;\'>\n<div><script type="text/javascript" src="https://apis.google.com/js/plusone.js"></script>\n<div id="followers-iframe-container"></div>\n<script type="text/javascript">\n window.followersIframe = null;\n function followersIframeOpen(url) {\n gapi.load("gapi.iframes", function() {\n if (gapi.iframes && gapi.iframes.getContext) {\n window.followersIframe = gapi.iframes.getContext().openChild({\n url: url,\n where: document.getElementById("followers-iframe-container"),\n messageHandlersFilter: gapi.iframes.CROSS_ORIGIN_IFRAMES_FILTER,\n messageHandlers: {\n \'_ready\': function(obj) {\n window.followersIframe.getIframeEl().height = obj.height;\n },\n \'reset\': function() {\n window.followersIframe.close();\n followersIframeOpen("https://www.blogger.com/followers.g?blogID\\x3d4195135246107166251\\x26colors\\x3dCgt0cmFuc3BhcmVudBILdHJhbnNwYXJlbnQaByMzMzMzMzMiByM0NDg4ODgqByNGRkZGRkYyByMwMDAwMDA6ByMzMzMzMzNCByM0NDg4ODhKByMwMDAwMDBSByM0NDg4ODhaC3RyYW5zcGFyZW50\\x26pageSize\\x3d21\\x26origin\\x3dhttp://neopythonic.blogspot.com/");\n },\n \'open\': function(url) {\n window.followersIframe.close();\n followersIframeOpen(url);\n },\n \'blogger-ping\': function() {\n }\n }\n });\n }\n });\n }\n followersIframeOpen("https://www.blogger.com/followers.g?blogID\\x3d4195135246107166251\\x26colors\\x3dCgt0cmFuc3BhcmVudBILdHJhbnNwYXJlbnQaByMzMzMzMzMiByM0NDg4ODgqByNGRkZGRkYyByMwMDAwMDA6ByMzMzMzMzNCByM0NDg4ODhKByMwMDAwMDBSByM0NDg4ODhaC3RyYW5zcGFyZW50\\x26pageSize\\x3d21\\x26origin\\x3dhttp://neopythonic.blogspot.com/");\n </script></div>\n</div>\n</div>\n<div class=\'clear\'></div>\n<span class=\'widget-item-control\'>\n<span class=\'item-control blog-admin\'>\n<a class=\'quickedit\' href=\'//www.blogger.com/rearrange?blogID=4195135246107166251&widgetType=Followers&widgetId=Followers1&action=editWidget§ionId=sidebar\' onclick=\'return _WidgetManager._PopupConfig(document.getElementById("Followers1"));\' rel=\'nofollow\' target=\'configFollowers1\' title=\'Edit\'>\n<img alt=\'\' height=\'18\' src=\'https://resources.blogblog.com/img/icon18_wrench_allbkg.png\' width=\'18\'/>\n</a>\n</span>\n</span>\n<div class=\'clear\'></div>\n</div>\n</div><div class=\'widget BlogArchive\' data-version=\'1\' id=\'BlogArchive1\'>\n<h2>Blog Archive</h2>\n<div class=\'widget-content\'>\n<div id=\'ArchiveList\'>\n<div id=\'BlogArchive1_ArchiveList\'>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate expanded\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy toggle-open\'>\n\n ▼ \n \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2019/\'>\n2019\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(1)</span>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate expanded\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy toggle-open\'>\n\n ▼ \n \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2019/03/\'>\nMarch\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(1)</span>\n<ul class=\'posts\'>\n<li><a href=\'http://neopythonic.blogspot.com/2019/03/why-operators-are-useful.html\'>Why operators are useful</a></li>\n</ul>\n</li>\n</ul>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n ► \n \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2018/\'>\n2018\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(1)</span>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n ► \n \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2018/11/\'>\nNovember\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(1)</span>\n</li>\n</ul>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n ► \n \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2016/\'>\n2016\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(5)</span>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n ► \n \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2016/07/\'>\nJuly\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(1)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n ► \n \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2016/05/\'>\nMay\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(3)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n ► \n \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2016/04/\'>\nApril\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(1)</span>\n</li>\n</ul>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n ► \n \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2013/\'>\n2013\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(2)</span>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n ► \n \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2013/10/\'>\nOctober\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(2)</span>\n</li>\n</ul>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n ► \n \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2011/\'>\n2011\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(5)</span>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n ► \n \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2011/08/\'>\nAugust\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(1)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n ► \n \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2011/07/\'>\nJuly\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(1)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n ► \n \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2011/06/\'>\nJune\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(1)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n ► \n \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2011/01/\'>\nJanuary\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(2)</span>\n</li>\n</ul>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n ► \n \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2009/\'>\n2009\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(16)</span>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n ► \n \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2009/12/\'>\nDecember\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(1)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n ► \n \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2009/11/\'>\nNovember\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(1)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n ► \n \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2009/09/\'>\nSeptember\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(1)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n ► \n \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2009/07/\'>\nJuly\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(2)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n ► \n \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2009/06/\'>\nJune\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(3)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n ► \n \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2009/05/\'>\nMay\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(1)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n ► \n \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2009/04/\'>\nApril\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(4)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n ► \n \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2009/03/\'>\nMarch\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(1)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n ► \n \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2009/01/\'>\nJanuary\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(2)</span>\n</li>\n</ul>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n ► \n \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2008/\'>\n2008\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(14)</span>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n ► \n \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2008/12/\'>\nDecember\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(2)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n ► \n \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2008/11/\'>\nNovember\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(5)</span>\n</li>\n</ul>\n<ul class=\'hierarchy\'>\n<li class=\'archivedate collapsed\'>\n<a class=\'toggle\' href=\'javascript:void(0)\'>\n<span class=\'zippy\'>\n\n ► \n \n</span>\n</a>\n<a class=\'post-count-link\' href=\'http://neopythonic.blogspot.com/2008/10/\'>\nOctober\n</a>\n<span class=\'post-count\' dir=\'ltr\'>(7)</span>\n</li>\n</ul>\n</li>\n</ul>\n</div>\n</div>\n<div class=\'clear\'></div>\n<span class=\'widget-item-control\'>\n<span class=\'item-control blog-admin\'>\n<a class=\'quickedit\' href=\'//www.blogger.com/rearrange?blogID=4195135246107166251&widgetType=BlogArchive&widgetId=BlogArchive1&action=editWidget§ionId=sidebar\' onclick=\'return _WidgetManager._PopupConfig(document.getElementById("BlogArchive1"));\' rel=\'nofollow\' target=\'configBlogArchive1\' title=\'Edit\'>\n<img alt=\'\' height=\'18\' src=\'https://resources.blogblog.com/img/icon18_wrench_allbkg.png\' width=\'18\'/>\n</a>\n</span>\n</span>\n<div class=\'clear\'></div>\n</div>\n</div><div class=\'widget Profile\' data-version=\'1\' id=\'Profile1\'>\n<h2>About Me</h2>\n<div class=\'widget-content\'>\n<a href=\'https://www.blogger.com/profile/12821714508588242516\'><img alt=\'My photo\' class=\'profile-img\' height=\'80\' src=\'//2.bp.blogspot.com/_FG9t5W1SJ14/SO0aRdEpTAI/AAAAAAAACvw/sQy2btDo2DI/S220-s80/IMG_2192.jpg\' width=\'53\'/></a>\n<dl class=\'profile-datablock\'>\n<dt class=\'profile-data\'>\n<a class=\'profile-name-link g-profile\' href=\'https://www.blogger.com/profile/12821714508588242516\' rel=\'author\' style=\'background-image: url(//www.blogger.com/img/logo-16.png);\'>\nGuido van Rossum\n</a>\n</dt>\n<dd class=\'profile-textblock\'>Python\'s BDFL</dd>\n</dl>\n<a class=\'profile-link\' href=\'https://www.blogger.com/profile/12821714508588242516\' rel=\'author\'>View my complete profile</a>\n<div class=\'clear\'></div>\n<span class=\'widget-item-control\'>\n<span class=\'item-control blog-admin\'>\n<a class=\'quickedit\' href=\'//www.blogger.com/rearrange?blogID=4195135246107166251&widgetType=Profile&widgetId=Profile1&action=editWidget§ionId=sidebar\' onclick=\'return _WidgetManager._PopupConfig(document.getElementById("Profile1"));\' rel=\'nofollow\' target=\'configProfile1\' title=\'Edit\'>\n<img alt=\'\' height=\'18\' src=\'https://resources.blogblog.com/img/icon18_wrench_allbkg.png\' width=\'18\'/>\n</a>\n</span>\n</span>\n<div class=\'clear\'></div>\n</div>\n</div></div>\n</div>\n<!-- spacer for skins that want sidebar and main to be the same height-->\n<div class=\'clear\'> </div>\n</div>\n<!-- end content-wrapper -->\n<div id=\'footer-wrapper\'>\n<div class=\'footer no-items section\' id=\'footer\'></div>\n</div>\n</div></div>\n<!-- end outer-wrapper -->\n<script src=\'https://apis.google.com/js/plusone.js\' type=\'text/javascript\'></script>\n\n<script type="text/javascript" src="https://www.blogger.com/static/v1/widgets/3236635003-widgets.js"></script>\n<script type=\'text/javascript\'>\nwindow[\'__wavt\'] = \'AOuZoY5yNKA5W7NfDL5jC8sgYKdhWMg9Vw:1582878948016\';_WidgetManager._Init(\'//www.blogger.com/rearrange?blogID\\x3d4195135246107166251\',\'//neopythonic.blogspot.com/\',\'4195135246107166251\');\n_WidgetManager._SetDataContext([{\'name\': \'blog\', \'data\': {\'blogId\': \'4195135246107166251\', \'title\': \'Neopythonic\', \'url\': \'http://neopythonic.blogspot.com/\', \'canonicalUrl\': \'http://neopythonic.blogspot.com/\', \'homepageUrl\': \'http://neopythonic.blogspot.com/\', \'searchUrl\': \'http://neopythonic.blogspot.com/search\', \'canonicalHomepageUrl\': \'http://neopythonic.blogspot.com/\', \'blogspotFaviconUrl\': \'http://neopythonic.blogspot.com/favicon.ico\', \'bloggerUrl\': \'https://www.blogger.com\', \'hasCustomDomain\': false, \'httpsEnabled\': true, \'enabledCommentProfileImages\': true, \'gPlusViewType\': \'FILTERED_POSTMOD\', \'adultContent\': false, \'analyticsAccountNumber\': \'\', \'encoding\': \'UTF-8\', \'locale\': \'en\', \'localeUnderscoreDelimited\': \'en\', \'languageDirection\': \'ltr\', \'isPrivate\': false, \'isMobile\': false, \'isMobileRequest\': false, \'mobileClass\': \'\', \'isPrivateBlog\': false, \'feedLinks\': \'\\x3clink rel\\x3d\\x22alternate\\x22 type\\x3d\\x22application/atom+xml\\x22 title\\x3d\\x22Neopythonic - Atom\\x22 href\\x3d\\x22http://neopythonic.blogspot.com/feeds/posts/default\\x22 /\\x3e\\n\\x3clink rel\\x3d\\x22alternate\\x22 type\\x3d\\x22application/rss+xml\\x22 title\\x3d\\x22Neopythonic - RSS\\x22 href\\x3d\\x22http://neopythonic.blogspot.com/feeds/posts/default?alt\\x3drss\\x22 /\\x3e\\n\\x3clink rel\\x3d\\x22service.post\\x22 type\\x3d\\x22application/atom+xml\\x22 title\\x3d\\x22Neopythonic - Atom\\x22 href\\x3d\\x22https://www.blogger.com/feeds/4195135246107166251/posts/default\\x22 /\\x3e\\n\', \'meTag\': \'\\x3clink rel\\x3d\\x22me\\x22 href\\x3d\\x22https://www.blogger.com/profile/12821714508588242516\\x22 /\\x3e\\n\', \'adsenseHostId\': \'ca-host-pub-1556223355139109\', \'adsenseHasAds\': false, \'view\': \'\', \'dynamicViewsCommentsSrc\': \'//www.blogblog.com/dynamicviews/4224c15c4e7c9321/js/comments.js\', \'dynamicViewsScriptSrc\': \'//www.blogblog.com/dynamicviews/752be96649ffb269\', \'plusOneApiSrc\': \'https://apis.google.com/js/plusone.js\', \'disableGComments\': true, \'sharing\': {\'platforms\': [{\'name\': \'Get link\', \'key\': \'link\', \'shareMessage\': \'Get link\', \'target\': \'\'}, {\'name\': \'Facebook\', \'key\': \'facebook\', \'shareMessage\': \'Share to Facebook\', \'target\': \'facebook\'}, {\'name\': \'BlogThis!\', \'key\': \'blogThis\', \'shareMessage\': \'BlogThis!\', \'target\': \'blog\'}, {\'name\': \'Twitter\', \'key\': \'twitter\', \'shareMessage\': \'Share to Twitter\', \'target\': \'twitter\'}, {\'name\': \'Pinterest\', \'key\': \'pinterest\', \'shareMessage\': \'Share to Pinterest\', \'target\': \'pinterest\'}, {\'name\': \'Email\', \'key\': \'email\', \'shareMessage\': \'Email\', \'target\': \'email\'}], \'disableGooglePlus\': true, \'googlePlusShareButtonWidth\': 300, \'googlePlusBootstrap\': \'\\x3cscript type\\x3d\\x22text/javascript\\x22\\x3ewindow.___gcfg \\x3d {\\x27lang\\x27: \\x27en\\x27};\\x3c/script\\x3e\'}, \'hasCustomJumpLinkMessage\': false, \'jumpLinkMessage\': \'Read more\', \'pageType\': \'index\', \'pageName\': \'\', \'pageTitle\': \'Neopythonic\'}}, {\'name\': \'features\', \'data\': {\'sharing_get_link_dialog\': \'true\', \'sharing_native\': \'false\'}}, {\'name\': \'messages\', \'data\': {\'edit\': \'Edit\', \'linkCopiedToClipboard\': \'Link copied to clipboard!\', \'ok\': \'Ok\', \'postLink\': \'Post Link\'}}, {\'name\': \'template\', \'data\': {\'name\': \'custom\', \'localizedName\': \'Custom\', \'isResponsive\': false, \'isAlternateRendering\': false, \'isCustom\': true}}, {\'name\': \'view\', \'data\': {\'classic\': {\'name\': \'classic\', \'url\': \'?view\\x3dclassic\'}, \'flipcard\': {\'name\': \'flipcard\', \'url\': \'?view\\x3dflipcard\'}, \'magazine\': {\'name\': \'magazine\', \'url\': \'?view\\x3dmagazine\'}, \'mosaic\': {\'name\': \'mosaic\', \'url\': \'?view\\x3dmosaic\'}, \'sidebar\': {\'name\': \'sidebar\', \'url\': \'?view\\x3dsidebar\'}, \'snapshot\': {\'name\': \'snapshot\', \'url\': \'?view\\x3dsnapshot\'}, \'timeslide\': {\'name\': \'timeslide\', \'url\': \'?view\\x3dtimeslide\'}, \'isMobile\': false, \'title\': \'Neopythonic\', \'description\': \'Ramblings through technology, politics, culture and philosophy by the creator of the Python programming language.\', \'url\': \'http://neopythonic.blogspot.com/\', \'type\': \'feed\', \'isSingleItem\': false, \'isMultipleItems\': true, \'isError\': false, \'isPage\': false, \'isPost\': false, \'isHomepage\': true, \'isArchive\': false, \'isLabelSearch\': false}}]);\n_WidgetManager._RegisterWidget(\'_NavbarView\', new _WidgetInfo(\'Navbar1\', \'navbar\', document.getElementById(\'Navbar1\'), {}, \'displayModeFull\'));\n_WidgetManager._RegisterWidget(\'_BlogView\', new _WidgetInfo(\'Blog1\', \'main\', document.getElementById(\'Blog1\'), {\'cmtInteractionsEnabled\': false, \'lightboxEnabled\': true, \'lightboxModuleUrl\': \'https://www.blogger.com/static/v1/jsbin/577060686-lbx.js\', \'lightboxCssUrl\': \'https://www.blogger.com/static/v1/v-css/368954415-lightbox_bundle.css\'}, \'displayModeFull\'));\n_WidgetManager._RegisterWidget(\'_HeaderView\', new _WidgetInfo(\'Header1\', \'header\', document.getElementById(\'Header1\'), {}, \'displayModeFull\'));\n_WidgetManager._RegisterWidget(\'_FollowersView\', new _WidgetInfo(\'Followers1\', \'sidebar\', document.getElementById(\'Followers1\'), {}, \'displayModeFull\'));\n_WidgetManager._RegisterWidget(\'_BlogArchiveView\', new _WidgetInfo(\'BlogArchive1\', \'sidebar\', document.getElementById(\'BlogArchive1\'), {\'languageDirection\': \'ltr\', \'loadingMessage\': \'Loading\\x26hellip;\'}, \'displayModeFull\'));\n_WidgetManager._RegisterWidget(\'_ProfileView\', new _WidgetInfo(\'Profile1\', \'sidebar\', document.getElementById(\'Profile1\'), {}, \'displayModeFull\'));\n</script>\n</body>\n</html>'
We have several options to get the titles:
Let's try to find the titles using each method:
html = s
html_parts = html.split("h3 class='post-title entry-title")
def get_title(html):
h = html.split("</h3>")[0]
print("After first split:\n %s\n" % h)
h = h.split("'>")[2]
print("After second split:\n %s\n" % h)
return h.replace("</a>", "").strip()
l = [get_title(i) for i in html_parts[1:]]
l
After first split: ' itemprop='name'> <a href='http://neopythonic.blogspot.com/2019/03/why-operators-are-useful.html'>Why operators are useful</a> After second split: Why operators are useful</a> After first split: ' itemprop='name'> <a href='http://neopythonic.blogspot.com/2018/11/what-do-do-with-your-computer-science.html'>What to do with your computer science career</a> After second split: What to do with your computer science career</a> After first split: ' itemprop='name'> <a href='http://neopythonic.blogspot.com/2016/07/about-spammers-and-comments.html'>About spammers and comments</a> After second split: About spammers and comments</a> After first split: ' itemprop='name'> <a href='http://neopythonic.blogspot.com/2016/05/union-syntax.html'>Union syntax</a> After second split: Union syntax</a> After first split: ' itemprop='name'> <a href='http://neopythonic.blogspot.com/2016/05/adding-type-annotations-for-fspath.html'>Adding type annotations for fspath</a> After second split: Adding type annotations for fspath</a> After first split: ' itemprop='name'> <a href='http://neopythonic.blogspot.com/2016/05/the-anystr-type-variable.html'>The AnyStr type variable</a> After second split: The AnyStr type variable</a>
['Why operators are useful', 'What to do with your computer science career', 'About spammers and comments', 'Union syntax', 'Adding type annotations for fspath', 'The AnyStr type variable']
import re
r = re.compile(r"<h3 class='post-title entry-title'.*?>.*?>(.*?)</a><\/h3>")
r.findall(html.replace("\r","").replace("\n",""))
['Why operators are useful', 'What to do with your computer science career', 'About spammers and comments', 'Union syntax', 'Adding type annotations for fspath', 'The AnyStr type variable']
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
l = soup.findAll('h3', attrs={'class': 'post-title entry-title'})
l
[<h3 class="post-title entry-title" itemprop="name"> <a href="http://neopythonic.blogspot.com/2019/03/why-operators-are-useful.html">Why operators are useful</a> </h3>, <h3 class="post-title entry-title" itemprop="name"> <a href="http://neopythonic.blogspot.com/2018/11/what-do-do-with-your-computer-science.html">What to do with your computer science career</a> </h3>, <h3 class="post-title entry-title" itemprop="name"> <a href="http://neopythonic.blogspot.com/2016/07/about-spammers-and-comments.html">About spammers and comments</a> </h3>, <h3 class="post-title entry-title" itemprop="name"> <a href="http://neopythonic.blogspot.com/2016/05/union-syntax.html">Union syntax</a> </h3>, <h3 class="post-title entry-title" itemprop="name"> <a href="http://neopythonic.blogspot.com/2016/05/adding-type-annotations-for-fspath.html">Adding type annotations for fspath</a> </h3>, <h3 class="post-title entry-title" itemprop="name"> <a href="http://neopythonic.blogspot.com/2016/05/the-anystr-type-variable.html">The AnyStr type variable</a> </h3>]
#Getting the titles
[t.text.strip() for t in l]
['Why operators are useful', 'What to do with your computer science career', 'About spammers and comments', 'Union syntax', 'Adding type annotations for fspath', 'The AnyStr type variable']
One straightforward way to collect data is using APIs. In the following example, we will use the Wikipedia Python package, which warps the MediaWiki API. First, we install the Wikipedia package. For visualization, we will also install the Networkx package
!pip install wikipedia
!pip install networkx
Collecting wikipedia Requirement already satisfied: requests<3.0.0,>=2.0.0 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from wikipedia) (2.22.0) Requirement already satisfied: beautifulsoup4 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from wikipedia) (4.8.0) Requirement already satisfied: certifi>=2017.4.17 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from requests<3.0.0,>=2.0.0->wikipedia) (2019.9.11) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from requests<3.0.0,>=2.0.0->wikipedia) (1.24.2) Requirement already satisfied: idna<2.9,>=2.5 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from requests<3.0.0,>=2.0.0->wikipedia) (2.8) Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from requests<3.0.0,>=2.0.0->wikipedia) (3.0.4) Requirement already satisfied: soupsieve>=1.2 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from beautifulsoup4->wikipedia) (1.9.3) Installing collected packages: wikipedia Successfully installed wikipedia-1.4.0 Requirement already satisfied: networkx in /anaconda3/envs/massivedata/lib/python3.6/site-packages (2.3) Requirement already satisfied: decorator>=4.3.0 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from networkx) (4.4.0)
import wikipedia
w = wikipedia.page("Machine Learning")
w.summary
'Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions, relying on patterns and inference instead. It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task. Machine learning algorithms are used in a wide variety of applications, such as email filtering and computer vision, where it is difficult or infeasible to develop a conventional algorithm for effectively performing the task.\nMachine learning is closely related to computational statistics, which focuses on making predictions using computers. The study of mathematical optimization delivers methods, theory and application domains to the field of machine learning. Data mining is a field of study within machine learning, and focuses on exploratory data analysis through unsupervised learning. In its application across business problems, machine learning is also referred to as predictive analytics.\n\n'
w.links[:20]
['ACM Computing Classification System', 'ACM Computing Surveys', 'ADALINE', 'AT&T Labs', 'Action selection', 'Active learning (machine learning)', 'Adaptive website', 'Affective computing', 'Alan Turing', 'Algorithm', 'Algorithm design', 'Algorithmic bias', 'Algorithmic efficiency', 'Amazon Machine Learning', 'Analysis of algorithms', 'Angoss', 'Anomaly detection', 'Apache Mahout', 'Apache Spark', 'Apache SystemML']
Let's build a graph in which each vertex is a category and each link is between categories that reference each other.
verticies = set(w.links[:50])
links = [("Machine Learning", l) for l in verticies]
for v in verticies:
try:
w = wikipedia.page(v)
for v2 in w.links:
if v2 in verticies:
links.append((v,v2))
except Exception as e:
print(e)
/anaconda3/envs/massivedata/lib/python3.6/site-packages/wikipedia/wikipedia.py:389: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. The code that caused this warning is on line 389 of the file /anaconda3/envs/massivedata/lib/python3.6/site-packages/wikipedia/wikipedia.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor. lis = BeautifulSoup(html).find_all('li')
"adeline" may refer to: Adeline (given name) Yves-Marie Adeline Adeline, Illinois Adeline Records Adeline Software International Ballade pour Adeline Portrait of Mary Adeline Williams "Adeline" (song) Adeleorina Cyclone Adeline Pépinières Arboretum Adeline Adeline (rocket) Sweet Adeline (disambiguation)
Let's draw the graph:
import networkx as nx
%matplotlib inline
g = nx.DiGraph()
g.add_edges_from(set(links))
nx.info(g)
'Name: \nType: DiGraph\nNumber of nodes: 51\nNumber of edges: 281\nAverage in degree: 5.5098\nAverage out degree: 5.5098'
import matplotlib.pyplot as plt
plt.figure(3,figsize=(14,14))
nx.draw_kamada_kawai(g, with_labels=True)
/anaconda3/envs/massivedata/lib/python3.6/site-packages/networkx/drawing/nx_pylab.py:579: MatplotlibDeprecationWarning: The iterable function was deprecated in Matplotlib 3.1 and will be removed in 3.3. Use np.iterable instead. if not cb.iterable(width): /anaconda3/envs/massivedata/lib/python3.6/site-packages/networkx/drawing/nx_pylab.py:676: MatplotlibDeprecationWarning: The iterable function was deprecated in Matplotlib 3.1 and will be removed in 3.3. Use np.iterable instead. if cb.iterable(node_size): # many node sizes
We are going to talk more about graphs in the next lectures.
In this section, we are going to give some examples of how to work with NumPy, and the advantages of NumPy's array (ndarray) over Python list object
import numpy as np
??np
dir(np)
['ALLOW_THREADS', 'AxisError', 'BUFSIZE', 'CLIP', 'ComplexWarning', 'DataSource', 'ERR_CALL', 'ERR_DEFAULT', 'ERR_IGNORE', 'ERR_LOG', 'ERR_PRINT', 'ERR_RAISE', 'ERR_WARN', 'FLOATING_POINT_SUPPORT', 'FPE_DIVIDEBYZERO', 'FPE_INVALID', 'FPE_OVERFLOW', 'FPE_UNDERFLOW', 'False_', 'Inf', 'Infinity', 'MAXDIMS', 'MAY_SHARE_BOUNDS', 'MAY_SHARE_EXACT', 'MachAr', 'ModuleDeprecationWarning', 'NAN', 'NINF', 'NZERO', 'NaN', 'PINF', 'PZERO', 'RAISE', 'RankWarning', 'SHIFT_DIVIDEBYZERO', 'SHIFT_INVALID', 'SHIFT_OVERFLOW', 'SHIFT_UNDERFLOW', 'ScalarType', 'Tester', 'TooHardError', 'True_', 'UFUNC_BUFSIZE_DEFAULT', 'UFUNC_PYVALS_NAME', 'VisibleDeprecationWarning', 'WRAP', '_NoValue', '_UFUNC_API', '__NUMPY_SETUP__', '__all__', '__builtins__', '__cached__', '__config__', '__doc__', '__file__', '__git_revision__', '__loader__', '__mkl_version__', '__name__', '__package__', '__path__', '__spec__', '__version__', '_add_newdoc_ufunc', '_distributor_init', '_globals', '_mat', '_pytesttester', 'abs', 'absolute', 'absolute_import', 'add', 'add_docstring', 'add_newdoc', 'add_newdoc_ufunc', 'alen', 'all', 'allclose', 'alltrue', 'amax', 'amin', 'angle', 'any', 'append', 'apply_along_axis', 'apply_over_axes', 'arange', 'arccos', 'arccosh', 'arcsin', 'arcsinh', 'arctan', 'arctan2', 'arctanh', 'argmax', 'argmin', 'argpartition', 'argsort', 'argwhere', 'around', 'array', 'array2string', 'array_equal', 'array_equiv', 'array_repr', 'array_split', 'array_str', 'asanyarray', 'asarray', 'asarray_chkfinite', 'ascontiguousarray', 'asfarray', 'asfortranarray', 'asmatrix', 'asscalar', 'atleast_1d', 'atleast_2d', 'atleast_3d', 'average', 'bartlett', 'base_repr', 'binary_repr', 'bincount', 'bitwise_and', 'bitwise_not', 'bitwise_or', 'bitwise_xor', 'blackman', 'block', 'bmat', 'bool', 'bool8', 'bool_', 'broadcast', 'broadcast_arrays', 'broadcast_to', 'busday_count', 'busday_offset', 'busdaycalendar', 'byte', 'byte_bounds', 'bytes0', 'bytes_', 'c_', 'can_cast', 'cast', 'cbrt', 'cdouble', 'ceil', 'cfloat', 'char', 'character', 'chararray', 'choose', 'clip', 'clongdouble', 'clongfloat', 'column_stack', 'common_type', 'compare_chararrays', 'compat', 'complex', 'complex128', 'complex256', 'complex64', 'complex_', 'complexfloating', 'compress', 'concatenate', 'conj', 'conjugate', 'convolve', 'copy', 'copysign', 'copyto', 'core', 'corrcoef', 'correlate', 'cos', 'cosh', 'count_nonzero', 'cov', 'cross', 'csingle', 'ctypeslib', 'cumprod', 'cumproduct', 'cumsum', 'datetime64', 'datetime_as_string', 'datetime_data', 'deg2rad', 'degrees', 'delete', 'deprecate', 'deprecate_with_doc', 'diag', 'diag_indices', 'diag_indices_from', 'diagflat', 'diagonal', 'diff', 'digitize', 'disp', 'divide', 'division', 'divmod', 'dot', 'double', 'dsplit', 'dstack', 'dtype', 'dual', 'e', 'ediff1d', 'einsum', 'einsum_path', 'emath', 'empty', 'empty_like', 'equal', 'errstate', 'euler_gamma', 'exp', 'exp2', 'expand_dims', 'expm1', 'extract', 'eye', 'fabs', 'fastCopyAndTranspose', 'fft', 'fill_diagonal', 'find_common_type', 'finfo', 'fix', 'flatiter', 'flatnonzero', 'flexible', 'flip', 'fliplr', 'flipud', 'float', 'float128', 'float16', 'float32', 'float64', 'float_', 'float_power', 'floating', 'floor', 'floor_divide', 'fmax', 'fmin', 'fmod', 'format_float_positional', 'format_float_scientific', 'format_parser', 'frexp', 'frombuffer', 'fromfile', 'fromfunction', 'fromiter', 'frompyfunc', 'fromregex', 'fromstring', 'full', 'full_like', 'fv', 'gcd', 'generic', 'genfromtxt', 'geomspace', 'get_array_wrap', 'get_include', 'get_printoptions', 'getbufsize', 'geterr', 'geterrcall', 'geterrobj', 'gradient', 'greater', 'greater_equal', 'half', 'hamming', 'hanning', 'heaviside', 'histogram', 'histogram2d', 'histogram_bin_edges', 'histogramdd', 'hsplit', 'hstack', 'hypot', 'i0', 'identity', 'iinfo', 'imag', 'in1d', 'index_exp', 'indices', 'inexact', 'inf', 'info', 'infty', 'inner', 'insert', 'int', 'int0', 'int16', 'int32', 'int64', 'int8', 'int_', 'int_asbuffer', 'intc', 'integer', 'interp', 'intersect1d', 'intp', 'invert', 'ipmt', 'irr', 'is_busday', 'isclose', 'iscomplex', 'iscomplexobj', 'isfinite', 'isfortran', 'isin', 'isinf', 'isnan', 'isnat', 'isneginf', 'isposinf', 'isreal', 'isrealobj', 'isscalar', 'issctype', 'issubclass_', 'issubdtype', 'issubsctype', 'iterable', 'ix_', 'kaiser', 'kron', 'lcm', 'ldexp', 'left_shift', 'less', 'less_equal', 'lexsort', 'lib', 'linalg', 'linspace', 'little_endian', 'load', 'loads', 'loadtxt', 'log', 'log10', 'log1p', 'log2', 'logaddexp', 'logaddexp2', 'logical_and', 'logical_not', 'logical_or', 'logical_xor', 'logspace', 'long', 'longcomplex', 'longdouble', 'longfloat', 'longlong', 'lookfor', 'ma', 'mafromtxt', 'mask_indices', 'mat', 'math', 'matmul', 'matrix', 'matrixlib', 'max', 'maximum', 'maximum_sctype', 'may_share_memory', 'mean', 'median', 'memmap', 'meshgrid', 'mgrid', 'min', 'min_scalar_type', 'minimum', 'mintypecode', 'mirr', 'mkl', 'mod', 'modf', 'moveaxis', 'msort', 'multiply', 'nan', 'nan_to_num', 'nanargmax', 'nanargmin', 'nancumprod', 'nancumsum', 'nanmax', 'nanmean', 'nanmedian', 'nanmin', 'nanpercentile', 'nanprod', 'nanquantile', 'nanstd', 'nansum', 'nanvar', 'nbytes', 'ndarray', 'ndenumerate', 'ndfromtxt', 'ndim', 'ndindex', 'nditer', 'negative', 'nested_iters', 'newaxis', 'nextafter', 'nonzero', 'not_equal', 'nper', 'npv', 'numarray', 'number', 'obj2sctype', 'object', 'object0', 'object_', 'ogrid', 'oldnumeric', 'ones', 'ones_like', 'os', 'outer', 'packbits', 'pad', 'partition', 'percentile', 'pi', 'piecewise', 'place', 'pmt', 'poly', 'poly1d', 'polyadd', 'polyder', 'polydiv', 'polyfit', 'polyint', 'polymul', 'polynomial', 'polysub', 'polyval', 'positive', 'power', 'ppmt', 'print_function', 'printoptions', 'prod', 'product', 'promote_types', 'ptp', 'put', 'put_along_axis', 'putmask', 'pv', 'quantile', 'r_', 'rad2deg', 'radians', 'random', 'rank', 'rate', 'ravel', 'ravel_multi_index', 'real', 'real_if_close', 'rec', 'recarray', 'recfromcsv', 'recfromtxt', 'reciprocal', 'record', 'remainder', 'repeat', 'require', 'reshape', 'resize', 'result_type', 'right_shift', 'rint', 'roll', 'rollaxis', 'roots', 'rot90', 'round', 'round_', 'row_stack', 's_', 'safe_eval', 'save', 'savetxt', 'savez', 'savez_compressed', 'sctype2char', 'sctypeDict', 'sctypeNA', 'sctypes', 'searchsorted', 'select', 'set_numeric_ops', 'set_printoptions', 'set_string_function', 'setbufsize', 'setdiff1d', 'seterr', 'seterrcall', 'seterrobj', 'setxor1d', 'shape', 'shares_memory', 'short', 'show_config', 'sign', 'signbit', 'signedinteger', 'sin', 'sinc', 'single', 'singlecomplex', 'sinh', 'size', 'sometrue', 'sort', 'sort_complex', 'source', 'spacing', 'split', 'sqrt', 'square', 'squeeze', 'stack', 'std', 'str', 'str0', 'str_', 'string_', 'subtract', 'sum', 'swapaxes', 'sys', 'take', 'take_along_axis', 'tan', 'tanh', 'tensordot', 'test', 'testing', 'tile', 'timedelta64', 'trace', 'tracemalloc_domain', 'transpose', 'trapz', 'tri', 'tril', 'tril_indices', 'tril_indices_from', 'trim_zeros', 'triu', 'triu_indices', 'triu_indices_from', 'true_divide', 'trunc', 'typeDict', 'typeNA', 'typecodes', 'typename', 'ubyte', 'ufunc', 'uint', 'uint0', 'uint16', 'uint32', 'uint64', 'uint8', 'uintc', 'uintp', 'ulonglong', 'unicode', 'unicode_', 'union1d', 'unique', 'unpackbits', 'unravel_index', 'unsignedinteger', 'unwrap', 'ushort', 'vander', 'var', 'vdot', 'vectorize', 'version', 'void', 'void0', 'vsplit', 'vstack', 'warnings', 'where', 'who', 'zeros', 'zeros_like']
As can be seen above NumPy comes with extensive functionality. One of the big advantages of NumPy is running fast on large numerical arrays. Let's see some examples:
import random
l = list(range(10000000))
random.shuffle(l)
a = np.array(l) # we can also create a Numpy array using np.arange function
# we can also create a large-scale random array using the following np.random.randint(1,10000000, size=10000000)
print(type(a))
print(a)
<class 'numpy.ndarray'> [9258905 4967159 351391 ... 4316321 7690808 2859538]
import sys
print(sys.getsizeof(l))
print(a.nbytes)
90000112 80000000
We can see that the NumPy ndarray object has less memory consumption. Let's see other NumPy advantages:
%timeit sum(l) / float(len(l))
%timeit np.average(a)
353 ms ± 9.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 8.89 ms ± 64.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit max(l)
%timeit np.max(a)
1.16 s ± 14.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 9.5 ms ± 205 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
On average using NumPy's average and max functions is considerably faster than using their counterpart list functions. Additionally, NumPy lets us do numerical operations on a list in a very easy and straightforward way. Let's add and subtract two lists:
l1 = list(range(1000000))
random.shuffle(l1)
l2 = list(range(1000000))
random.shuffle(l2)
a1 = np.array(l1)
a2 = np.array(l2)
#adding two list (the regular '+' operator on list will append the two lists)
l = []
for i in range(1000000):
l.append(l1[i] + l2[i])
l[:10]
[1673707, 810930, 1568831, 617571, 845777, 1407915, 1140404, 1556716, 1072977, 863667]
a = a1 + a2
a[:10]
array([1673707, 810930, 1568831, 617571, 845777, 1407915, 1140404, 1556716, 1072977, 863667])
%timeit l = [l1[i] + l2[i] for i in range(1000000)]
%timeit a = a1 + a2
340 ms ± 2.15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.24 ms ± 16.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
We can see that adding two arrays with NumPy is much simpler and faster. Using NumPy, we can also do another array arithmetic:
print(a1-a2)
print(a1*a2)
print(a1[:10]/a2[:10])
print(np.concatenate([a1,a2]))
[ 159285 300100 155847 ... 31803 -150837 364097] [693980852656 141886863725 609235604788 ... 815314174954 735169043488 287055317388] [1.21035748 2.17495057 1.22059273 0.02574788 0.92570423 2.04359434 0.66429856 0.84085795 7.09763405 0.13727336] [916496 555515 862339 ... 887186 936149 383811]
Let's look at other useful NumPy ndarray object usage:
a1 = np.array([1,2,3,4,5], dtype = np.int32) # array of integer
print("Array type %s" % a1.dtype)
a2 = np.array([1,2,3,4,5], dtype = np.float64) # array of floats
print("Array type %s" % a2.dtype)
Array type int32 Array type float64
a3 = a2.astype('float128')
print("Array type %s" % a3.dtype)
Array type float128
We can also create a 2-dimension array (or n-dimensions array):
a4 = np.array([[1,2,3,4],[5,6,7,8]])
a4[1] # access the second row
array([5, 6, 7, 8])
a4[1,1] # access a specific value
6
a4[:1,2:3]
array([[3]])
a4.reshape((1,8))
array([[1, 2, 3, 4, 5, 6, 7, 8]])
a4.reshape((4,2))
array([[1, 2], [3, 4], [5, 6], [7, 8]])
import numpy as np
a5 = np.array(['Jhon', "Marry", "William", "BoB"])
a5[a5 == "Marry"]
array(['Marry'], dtype='<U7')
It is important to remember that unlike regular Python lists, slicing an array won't create copies.
l = list(range(10))
a = np.array(l)
sub_l = l[2:-1]
sub_a = a[2:-1]
sub_l
[2, 3, 4, 5, 6, 7, 8]
sub_a
array([2, 3, 4, 5, 6, 7, 8])
sub_l[0] = 100
print(sub_l)
print(l)
[100, 3, 4, 5, 6, 7, 8] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
sub_a[0] = 100
print(sub_a)
print(a)
[100 3 4 5 6 7 8] [ 0 1 100 3 4 5 6 7 8 9]
In the above example, we can see that changing the subarray's values will also change the original array values. NumPy also has a Not a Number value (denoted NaN).
a = np.array([1,np.nan,2,4])
np.sum(a)
nan
To calculate various statistics that include NaN values, we can use the following functions:
print("Sum = %s" % np.nansum(a))
print("Max = %s" % np.nanmax(a))
print("STD = %s" % np.nanstd(a))
Sum = 7.0 Max = 4.0 STD = 1.247219128924647
Or, we can remove the NaN values from the array:
a = a[np.logical_not(np.isnan(a))]
print("Sum = %s" % np.sum(a))
print("Max = %s" % np.max(a))
print("STD = %s" % np.std(a))
Sum = 7.0 Max = 4.0 STD = 1.247219128924647
Let's start with an example. Similar to our last lecture work with DBMS and SQL, we want to explore the Netflix Shows Dataset but this time using Pandas. We will start by downloading the dataset from Kaggle (see first lecture for details):
# Creating a dataset directory
!mkdir ./datasets/netflix
# download the dataset from Kaggle and unzip it
!kaggle datasets download chasewillden/netflix-shows -p ./datasets/netflix/
!chdir ./datasets/netflix
!unzip ./datasets/netflix/*.zip -d ./datasets/netflix
!ls ./datasets/netflix
Downloading netflix-shows.zip to ./datasets/netflix 0%| | 0.00/10.6k [00:00<?, ?B/s] 100%|██████████████████████████████████████| 10.6k/10.6k [00:00<00:00, 8.98MB/s] /bin/sh: chdir: command not found Archive: ./datasets/netflix/netflix-shows.zip inflating: ./datasets/netflix/Netflix Shows.csv Netflix Shows.csv netflix-shows.zip
Now, let's load the shows data into a DataFrame Object:
import pandas as pd
import numpy as np
First, we load the Netflix Shows csv to a Pandas DataFrame object:
df = pd.read_csv('./datasets/netflix/Netflix Shows.csv', encoding='cp437' )
df
title | rating | ratingLevel | ratingDescription | release year | user rating score | user rating size | |
---|---|---|---|---|---|---|---|
0 | White Chicks | PG-13 | crude and sexual humor, language and some drug... | 80 | 2004 | 82.0 | 80 |
1 | Lucky Number Slevin | R | strong violence, sexual content and adult lang... | 100 | 2006 | NaN | 82 |
2 | Grey's Anatomy | TV-14 | Parents strongly cautioned. May be unsuitable ... | 90 | 2016 | 98.0 | 80 |
3 | Prison Break | TV-14 | Parents strongly cautioned. May be unsuitable ... | 90 | 2008 | 98.0 | 80 |
4 | How I Met Your Mother | TV-PG | Parental guidance suggested. May not be suitab... | 70 | 2014 | 94.0 | 80 |
... | ... | ... | ... | ... | ... | ... | ... |
995 | The BFG | PG | for action/peril, some scary moments and brief... | 60 | 2016 | 97.0 | 80 |
996 | The Secret Life of Pets | PG | for action and some rude humor | 60 | 2016 | NaN | 81 |
997 | Precious Puppies | TV-G | Suitable for all ages. | 35 | 2003 | NaN | 82 |
998 | Beary Tales | TV-G | Suitable for all ages. | 35 | 2013 | NaN | 82 |
999 | Growing Up Wild | G | NaN | 35 | 2016 | 80.0 | 80 |
1000 rows × 7 columns
Let's take a closer look on the user rating score column:
c = df['user rating score']
c
0 82.0 1 NaN 2 98.0 3 98.0 4 94.0 ... 995 97.0 996 NaN 997 NaN 998 NaN 999 80.0 Name: user rating score, Length: 1000, dtype: float64
print("Min = %s" % np.nanmin(c))
print("Max = %s" % np.nanmax(c))
print("Median = %s" % np.nanmedian(c))
print("Average = %s" % np.nanmean(c))
Min = 55.0 Max = 99.0 Median = 88.0 Average = 84.09421487603306
We can create a nice visualization of the user rating score column using the Seaborn package:
!pip install seaborn
Requirement already satisfied: seaborn in /anaconda3/envs/massivedata/lib/python3.6/site-packages (0.9.0) Requirement already satisfied: pandas>=0.15.2 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from seaborn) (0.25.1) Requirement already satisfied: numpy>=1.9.3 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from seaborn) (1.17.2) Requirement already satisfied: matplotlib>=1.4.3 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from seaborn) (3.1.1) Requirement already satisfied: scipy>=0.14.0 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from seaborn) (1.3.1) Requirement already satisfied: python-dateutil>=2.6.1 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from pandas>=0.15.2->seaborn) (2.8.0) Requirement already satisfied: pytz>=2017.2 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from pandas>=0.15.2->seaborn) (2019.3) Requirement already satisfied: cycler>=0.10 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from matplotlib>=1.4.3->seaborn) (0.10.0) Requirement already satisfied: kiwisolver>=1.0.1 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from matplotlib>=1.4.3->seaborn) (1.1.0) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from matplotlib>=1.4.3->seaborn) (2.4.2) Requirement already satisfied: six>=1.5 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from python-dateutil>=2.6.1->pandas>=0.15.2->seaborn) (1.12.0) Requirement already satisfied: setuptools in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from kiwisolver>=1.0.1->matplotlib>=1.4.3->seaborn) (41.4.0)
import seaborn as sns
c = c[np.logical_not(np.isnan(c))] # removes NaN
sns.set(style="darkgrid")
sns.distplot(c)
<matplotlib.axes._subplots.AxesSubplot at 0x101af7b6a0>
We can also visualize other interesting statistics, such as the rating histogram for each decade in just few lines of code:
import matplotlib.pyplot as plt
df['decade'] = df['release year'] - df['release year'] % 10 # add a decade column
df = df[df['decade'] >= 1980] # select only rows with decade larger or equal 1980
g = sns.FacetGrid(df, row="rating", col="decade", margin_titles=True)
g.map(plt.hist, "user rating score", color="steelblue")
<seaborn.axisgrid.FacetGrid at 0x101a8a6470>
Pandas have several objects to manipulate date. Let's take a closer look at them: We start with the Series object. The Series object is a generalized NumPy array, which has both data and index values. We can think of the Series object as a more efficient version of Python's dict object:
s = pd.Series([2,1,2,3,6,7])
s
0 2 1 1 2 2 3 3 4 6 5 7 dtype: int64
print(s.values)
print(type(s.values))
[2 1 2 3 6 7] <class 'numpy.ndarray'>
s.index
RangeIndex(start=0, stop=6, step=1)
We can use strings to be indices:
s = pd.Series([2,1,2,3,6,7], index=['a','b','c','de','e','f'])
s
a 2 b 1 c 2 de 3 e 6 f 7 dtype: int64
s['f']
7
s = pd.Series({'A':1, 'B':2, 'C':4})
s
A 1 B 2 C 4 dtype: int64
The second object we going to look at is the DataFrame object. We can think on the DataFrame object as a table with both columns and rows indices (similar to an Excel sheet :-).
df
title | rating | ratingLevel | ratingDescription | release year | user rating score | user rating size | decade | |
---|---|---|---|---|---|---|---|---|
0 | White Chicks | PG-13 | crude and sexual humor, language and some drug... | 80 | 2004 | 82.0 | 80 | 2000 |
1 | Lucky Number Slevin | R | strong violence, sexual content and adult lang... | 100 | 2006 | NaN | 82 | 2000 |
2 | Grey's Anatomy | TV-14 | Parents strongly cautioned. May be unsuitable ... | 90 | 2016 | 98.0 | 80 | 2010 |
3 | Prison Break | TV-14 | Parents strongly cautioned. May be unsuitable ... | 90 | 2008 | 98.0 | 80 | 2000 |
4 | How I Met Your Mother | TV-PG | Parental guidance suggested. May not be suitab... | 70 | 2014 | 94.0 | 80 | 2010 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
995 | The BFG | PG | for action/peril, some scary moments and brief... | 60 | 2016 | 97.0 | 80 | 2010 |
996 | The Secret Life of Pets | PG | for action and some rude humor | 60 | 2016 | NaN | 81 | 2010 |
997 | Precious Puppies | TV-G | Suitable for all ages. | 35 | 2003 | NaN | 82 | 2000 |
998 | Beary Tales | TV-G | Suitable for all ages. | 35 | 2013 | NaN | 82 | 2010 |
999 | Growing Up Wild | G | NaN | 35 | 2016 | 80.0 | 80 | 2010 |
991 rows × 8 columns
print("DataFrame columns %s" % df.columns)
print("DataFrame Index %s" % df.index)
DataFrame columns Index(['title', 'rating', 'ratingLevel', 'ratingDescription', 'release year', 'user rating score', 'user rating size', 'decade'], dtype='object') DataFrame Index Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ... 990, 991, 992, 993, 994, 995, 996, 997, 998, 999], dtype='int64', length=991)
We can change the index values to the following:
df.index = df["title"]
type(df["title"])
pandas.core.series.Series
df['rating']
title White Chicks PG-13 Lucky Number Slevin R Grey's Anatomy TV-14 Prison Break TV-14 How I Met Your Mother TV-PG ... The BFG PG The Secret Life of Pets PG Precious Puppies TV-G Beary Tales TV-G Growing Up Wild G Name: rating, Length: 991, dtype: object
df['rating']['Sherlock']
Sherlock TV-14 Sherlock TV-14 Sherlock TV-14 Name: rating, dtype: object
We can create new DataFrame objects by selecting columns and rows:
#selecting specific columns
df2 = df[['title','rating', 'release year']]
df2
title | rating | release year | |
---|---|---|---|
title | |||
White Chicks | White Chicks | PG-13 | 2004 |
Lucky Number Slevin | Lucky Number Slevin | R | 2006 |
Grey's Anatomy | Grey's Anatomy | TV-14 | 2016 |
Prison Break | Prison Break | TV-14 | 2008 |
How I Met Your Mother | How I Met Your Mother | TV-PG | 2014 |
... | ... | ... | ... |
The BFG | The BFG | PG | 2016 |
The Secret Life of Pets | The Secret Life of Pets | PG | 2016 |
Precious Puppies | Precious Puppies | TV-G | 2003 |
Beary Tales | Beary Tales | TV-G | 2013 |
Growing Up Wild | Growing Up Wild | G | 2016 |
991 rows × 3 columns
#selecting 10 top rows
df.iloc[:10]
title | rating | ratingLevel | ratingDescription | release year | user rating score | user rating size | decade | |
---|---|---|---|---|---|---|---|---|
title | ||||||||
White Chicks | White Chicks | PG-13 | crude and sexual humor, language and some drug... | 80 | 2004 | 82.0 | 80 | 2000 |
Lucky Number Slevin | Lucky Number Slevin | R | strong violence, sexual content and adult lang... | 100 | 2006 | NaN | 82 | 2000 |
Grey's Anatomy | Grey's Anatomy | TV-14 | Parents strongly cautioned. May be unsuitable ... | 90 | 2016 | 98.0 | 80 | 2010 |
Prison Break | Prison Break | TV-14 | Parents strongly cautioned. May be unsuitable ... | 90 | 2008 | 98.0 | 80 | 2000 |
How I Met Your Mother | How I Met Your Mother | TV-PG | Parental guidance suggested. May not be suitab... | 70 | 2014 | 94.0 | 80 | 2010 |
Supernatural | Supernatural | TV-14 | Parents strongly cautioned. May be unsuitable ... | 90 | 2016 | 95.0 | 80 | 2010 |
Breaking Bad | Breaking Bad | TV-MA | For mature audiences. May not be suitable for... | 110 | 2013 | 97.0 | 80 | 2010 |
The Vampire Diaries | The Vampire Diaries | TV-14 | Parents strongly cautioned. May be unsuitable ... | 90 | 2017 | 91.0 | 80 | 2010 |
The Walking Dead | The Walking Dead | TV-MA | For mature audiences. May not be suitable for... | 110 | 2015 | 98.0 | 80 | 2010 |
Pretty Little Liars | Pretty Little Liars | TV-14 | Parents strongly cautioned. May be unsuitable ... | 90 | 2016 | 96.0 | 80 | 2010 |
#selecting any other rows
df.iloc[::2]
title | rating | ratingLevel | ratingDescription | release year | user rating score | user rating size | decade | |
---|---|---|---|---|---|---|---|---|
title | ||||||||
White Chicks | White Chicks | PG-13 | crude and sexual humor, language and some drug... | 80 | 2004 | 82.0 | 80 | 2000 |
Grey's Anatomy | Grey's Anatomy | TV-14 | Parents strongly cautioned. May be unsuitable ... | 90 | 2016 | 98.0 | 80 | 2010 |
How I Met Your Mother | How I Met Your Mother | TV-PG | Parental guidance suggested. May not be suitab... | 70 | 2014 | 94.0 | 80 | 2010 |
Breaking Bad | Breaking Bad | TV-MA | For mature audiences. May not be suitable for... | 110 | 2013 | 97.0 | 80 | 2010 |
The Walking Dead | The Walking Dead | TV-MA | For mature audiences. May not be suitable for... | 110 | 2015 | 98.0 | 80 | 2010 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
Dawn of the Croods | Dawn of the Croods | TV-Y7 | Suitable for children ages 7 and older | 41 | 2017 | 72.0 | 80 | 2010 |
Wiener Dog Internationals | Wiener Dog Internationals | G | General Audiences. Suitable for all ages. | 35 | 2015 | NaN | 82 | 2010 |
The BFG | The BFG | PG | for action/peril, some scary moments and brief... | 60 | 2016 | 97.0 | 80 | 2010 |
Precious Puppies | Precious Puppies | TV-G | Suitable for all ages. | 35 | 2003 | NaN | 82 | 2000 |
Growing Up Wild | Growing Up Wild | G | NaN | 35 | 2016 | 80.0 | 80 | 2010 |
496 rows × 8 columns
df2 = df[['title','rating', 'release year']]
There are duplicate rows let's remove them:
print("DataFrame's shape (%s,%s)" %df2.shape)
df2 = df2.drop_duplicates()
print("DataFrame's shape (%s,%s) (after remove)" %df2.shape)
DataFrame's shape (991,3) DataFrame's shape (497,3) (after remove)
We can transpose between rows and columns:
df2.T
title | White Chicks | Lucky Number Slevin | Grey's Anatomy | Prison Break | How I Met Your Mother | Supernatural | Breaking Bad | The Vampire Diaries | The Walking Dead | Pretty Little Liars | ... | Flicka 2 | H2O: Just Add Water | Dolphin Tale | Step Dogs | Mia and Me | Russell Madness | Wiener Dog Internationals | Pup Star | Precious Puppies | Beary Tales |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
title | White Chicks | Lucky Number Slevin | Grey's Anatomy | Prison Break | How I Met Your Mother | Supernatural | Breaking Bad | The Vampire Diaries | The Walking Dead | Pretty Little Liars | ... | Flicka 2 | H2O: Just Add Water | Dolphin Tale | Step Dogs | Mia and Me | Russell Madness | Wiener Dog Internationals | Pup Star | Precious Puppies | Beary Tales |
rating | PG-13 | R | TV-14 | TV-14 | TV-PG | TV-14 | TV-MA | TV-14 | TV-MA | TV-14 | ... | PG | TV-PG | PG | TV-PG | TV-Y | PG | G | G | TV-G | TV-G |
release year | 2004 | 2006 | 2016 | 2008 | 2014 | 2016 | 2013 | 2017 | 2015 | 2016 | ... | 2010 | 2009 | 2011 | 2013 | 2015 | 2015 | 2015 | 2016 | 2003 | 2013 |
3 rows × 497 columns
Let's select only movies that were released after 2016:
df2 = df[df['release year'] > 2016]
df2.shape # we can get the number of columns and indexes
(100, 8)
We can add a decade column:
df['decade'] = df['release year'].apply(lambda y: y - y%10)
df
/anaconda3/envs/massivedata/lib/python3.6/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy """Entry point for launching an IPython kernel.
title | rating | ratingLevel | ratingDescription | release year | user rating score | user rating size | decade | |
---|---|---|---|---|---|---|---|---|
title | ||||||||
White Chicks | White Chicks | PG-13 | crude and sexual humor, language and some drug... | 80 | 2004 | 82.0 | 80 | 2000 |
Lucky Number Slevin | Lucky Number Slevin | R | strong violence, sexual content and adult lang... | 100 | 2006 | NaN | 82 | 2000 |
Grey's Anatomy | Grey's Anatomy | TV-14 | Parents strongly cautioned. May be unsuitable ... | 90 | 2016 | 98.0 | 80 | 2010 |
Prison Break | Prison Break | TV-14 | Parents strongly cautioned. May be unsuitable ... | 90 | 2008 | 98.0 | 80 | 2000 |
How I Met Your Mother | How I Met Your Mother | TV-PG | Parental guidance suggested. May not be suitab... | 70 | 2014 | 94.0 | 80 | 2010 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
The BFG | The BFG | PG | for action/peril, some scary moments and brief... | 60 | 2016 | 97.0 | 80 | 2010 |
The Secret Life of Pets | The Secret Life of Pets | PG | for action and some rude humor | 60 | 2016 | NaN | 81 | 2010 |
Precious Puppies | Precious Puppies | TV-G | Suitable for all ages. | 35 | 2003 | NaN | 82 | 2000 |
Beary Tales | Beary Tales | TV-G | Suitable for all ages. | 35 | 2013 | NaN | 82 | 2010 |
Growing Up Wild | Growing Up Wild | G | NaN | 35 | 2016 | 80.0 | 80 | 2010 |
991 rows × 8 columns
Similar to SQL, Pandas also has a groupby functionality. Namely, we can use groupby function to create a DataFrameGroupBy object:
df3 = df[['user rating score','decade', "release year"]]
df3 = df3.dropna()
g = df3.groupby("decade")
g
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x101c8c6c18>
We can access each group directly using the get_group function:
g.get_group(2010)
user rating score | decade | release year | |
---|---|---|---|
title | |||
Grey's Anatomy | 98.0 | 2010 | 2016 |
How I Met Your Mother | 94.0 | 2010 | 2014 |
Supernatural | 95.0 | 2010 | 2016 |
Breaking Bad | 97.0 | 2010 | 2013 |
The Vampire Diaries | 91.0 | 2010 | 2017 |
... | ... | ... | ... |
Growing Up Wild | 80.0 | 2010 | 2016 |
Paddington | 70.0 | 2010 | 2014 |
Dawn of the Croods | 72.0 | 2010 | 2017 |
The BFG | 97.0 | 2010 | 2016 |
Growing Up Wild | 80.0 | 2010 | 2016 |
423 rows × 3 columns
Or iterate over the groups:
d = {}
for y, year_df in g:
d[y] = year_df
d[2010]
user rating score | decade | release year | |
---|---|---|---|
title | |||
Grey's Anatomy | 98.0 | 2010 | 2016 |
How I Met Your Mother | 94.0 | 2010 | 2014 |
Supernatural | 95.0 | 2010 | 2016 |
Breaking Bad | 97.0 | 2010 | 2013 |
The Vampire Diaries | 91.0 | 2010 | 2017 |
... | ... | ... | ... |
Growing Up Wild | 80.0 | 2010 | 2016 |
Paddington | 70.0 | 2010 | 2014 |
Dawn of the Croods | 72.0 | 2010 | 2017 |
The BFG | 97.0 | 2010 | 2016 |
Growing Up Wild | 80.0 | 2010 | 2016 |
423 rows × 3 columns
g.max()
user rating score | release year | |
---|---|---|
decade | ||
1980 | 80.0 | 1989 |
1990 | 96.0 | 1999 |
2000 | 98.0 | 2009 |
2010 | 99.0 | 2017 |
g.describe()
user rating score | release year | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | mean | std | min | 25% | 50% | 75% | max | count | mean | std | min | 25% | 50% | 75% | max | |
decade | ||||||||||||||||
1980 | 10.0 | 66.800000 | 6.088240 | 58.0 | 67.0 | 67.0 | 68.0 | 80.0 | 10.0 | 1985.300000 | 2.451757 | 1982.0 | 1983.0 | 1986.0 | 1986.75 | 1989.0 |
1990 | 73.0 | 79.808219 | 9.974492 | 64.0 | 71.0 | 81.0 | 86.0 | 96.0 | 73.0 | 1996.164384 | 2.608819 | 1990.0 | 1994.0 | 1997.0 | 1998.00 | 1999.0 |
2000 | 94.0 | 83.010638 | 13.907911 | 55.0 | 74.0 | 89.0 | 95.0 | 98.0 | 94.0 | 2004.978723 | 2.873605 | 2000.0 | 2003.0 | 2006.0 | 2007.00 | 2009.0 |
2010 | 423.0 | 85.638298 | 11.873709 | 55.0 | 79.0 | 91.0 | 95.5 | 99.0 | 423.0 | 2015.200946 | 1.667873 | 2010.0 | 2015.0 | 2016.0 | 2016.00 | 2017.0 |
g.plot( y='user rating score')
decade 1980 AxesSubplot(0.125,0.125;0.775x0.755) 1990 AxesSubplot(0.125,0.125;0.775x0.755) 2000 AxesSubplot(0.125,0.125;0.775x0.755) 2010 AxesSubplot(0.125,0.125;0.775x0.755) dtype: object
pd.__version__
'0.25.1'
Let's downlpad at the Prime Time Emmy Awards Dataset to try to discover some interesting insights:
# Creating a dataset directory
!mkdir ./datasets/emmy
# download the dataset from Kaggle and unzip it
!kaggle datasets download pmagda/primetime-emmy-awards -p ./datasets/emmy
!unzip ./datasets/emmy/*.zip -d ./datasets/emmy
!ls ./datasets/emmy
Downloading primetime-emmy-awards.zip to ./datasets/emmy 100%|█████████████████████████████████████████| 235k/235k [00:00<00:00, 812kB/s] 100%|█████████████████████████████████████████| 235k/235k [00:00<00:00, 809kB/s] Archive: ./datasets/emmy/primetime-emmy-awards.zip inflating: ./datasets/emmy/emmy-awards-1949-2017.csv emmy-awards-1949-2017.csv primetime-emmy-awards.zip
import pandas as pd
%matplotlib inline
df = pd.read_csv("./datasets/emmy/emmy-awards-1949-2017.csv")
df
year | category | winner | nominee | detail | |
---|---|---|---|---|---|
0 | 1949 | Technical Award | 1 | Charles Mesak/Don Lee TV for Phasefader - In R... | NaN |
1 | 1949 | Best Film Made For Television | 0 | Christopher Columbus | CBS |
2 | 1949 | Best Film Made For Television | 0 | Hollywood Brevities | CBS |
3 | 1949 | Best Film Made For Television | 0 | It Could Happen To You | CBS |
4 | 1949 | Best Film Made For Television | 0 | Tell Tale Heart | CBS |
... | ... | ... | ... | ... | ... |
19234 | 2017 | Outstanding Production Design For Variety, Non... | 0 | Bill Nye Saves The World - "Earth Is A Hot Mess" | Netflix |
19235 | 2017 | Outstanding Production Design For Variety, Non... | 0 | Drunk History - "Hamilton" | Comedy Central |
19236 | 2017 | Outstanding Production Design For Variety, Non... | 0 | Portlandia - "Fred's Cell Phone Company" | IFC |
19237 | 2017 | Outstanding Production Design For Variety, Non... | 1 | Saturday Night Live - "Host: Alec Baldwin" | NBC |
19238 | 2017 | Outstanding Production Design For Variety, Non... | 0 | The Voice - "Live Finale, Part 1" | NBC |
19239 rows × 5 columns
Let's see how many Emmy Award winner we have over the years:
g = df[['year', 'winner']].groupby('year')
g.get_group(1952)
type(g)
g
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x101f805f60>
g.sum().plot(y="winner")
<matplotlib.axes._subplots.AxesSubplot at 0x101fdce278>
As we can see the number of Emmy winners each year increased over the years. Let's calculate how many winners in each category we had over the years:
print("Number of categories %s" % df['category'].unique().shape[0])
Number of categories 1043
Wow, over the years there were over 1000 unique categories. Let's see what are the most common categories:
g = df[['category', 'winner']].groupby('category')
g.sum().sort_values(['winner'], ascending=0)
winner | |
---|---|
category | |
Outstanding Individual Achievement | 75 |
Outstanding Individual Achievement In Animation | 67 |
Outstanding Comedy Series | 50 |
Outstanding Lead Actress In A Drama Series | 42 |
Outstanding Lead Actor In A Comedy Series | 42 |
... | ... |
Outstanding Single Performance By A Supporting Actor In Comedy Or Drama Special | 0 |
Best Contribution To Daytime Programming | 0 |
Outstanding Achievement In Graphic Design | 0 |
Outstanding Single Performance By A Supporting Actress In Comedy Or Drama Series | 0 |
Outstanding Makeup For Miniseries, Movie, Or A Special (Prosthetic) | 0 |
1042 rows × 1 columns
We can see that for some categories, we have data only on the nominees, but not on the winners. Let's check who is the most awarded individual:
winners_df = df[df['winner'] != 0]
winners_df
year | category | winner | nominee | detail | |
---|---|---|---|---|---|
0 | 1949 | Technical Award | 1 | Charles Mesak/Don Lee TV for Phasefader - In R... | NaN |
5 | 1949 | Best Film Made For Television | 1 | The Necklace (Your Show Time Series) | CBS |
11 | 1949 | Most Outstanding Television Personality | 1 | Shirley Dinsdale | NaN |
18 | 1949 | Most Popular Television Program | 1 | Pantomime Quiz Time | KTLA |
22 | 1949 | Special Award | 1 | Louis McManus For His Original Design of the Emmy | NaN |
... | ... | ... | ... | ... | ... |
19218 | 2017 | Outstanding Innovation In Interactive Programming | 1 | PEARL | NaN |
19220 | 2017 | Outstanding Music Supervision | 1 | Big Little Lies - "You Get What You Need" | HBO |
19228 | 2017 | Outstanding Original Interactive Program | 1 | The People's House - Inside The White House Wi... | Samsung / Oculus |
19231 | 2017 | Outstanding Picture Editing For Variety Progra... | 1 | Last Week Tonight With John Oliver - "F*ck 201... | HBO |
19237 | 2017 | Outstanding Production Design For Variety, Non... | 1 | Saturday Night Live - "Host: Alec Baldwin" | NBC |
4291 rows × 5 columns
winners_df[['nominee', 'winner']].groupby('nominee').count().sort_values(['winner'], ascending=0)
winner | |
---|---|
nominee | |
Frasier | 24 |
The Daily Show With Jon Stewart | 19 |
ER | 18 |
The Carol Burnett Show | 17 |
STAR TREK: THE NEXT GENERATION | 17 |
... | ... |
Invasion America | 1 |
Invader Zim | 1 |
Into The Storm | 1 |
Intervention - "Robby" | 1 |
Lucky Strike | 1 |
2556 rows × 1 columns
So, we have some clues on what is in the data, and who are the all-time Emmy winners. However, we didn't clean the data before analyzing it. This can result in inaccurate results. For example, lets look at nominee names that include the word 'MTV', and we will get the following:
df[df['nominee'].apply(lambda n: "MTV" in str(n))] # Another option is to search for MTV in the 'detail' column
year | category | winner | nominee | detail | |
---|---|---|---|---|---|
5856 | 1985 | Outstanding Art Direction For A Variety Or Mus... | 0 | 1st Annual MTV Video Music Awards | SYN |
8213 | 1992 | Outstanding Individual Achievement In Graphic ... | 1 | MTV LIQUID TV | MTV |
8426 | 1993 | Outstanding Technical Direction/Camera/Video F... | 0 | MTV UNPLUGGED | MTV |
8451 | 1993 | Outstanding Variety, Music Or Comedy Series | 0 | MTV UNPLUGGED | MTV |
8525 | 1993 | Outstanding Individual Achievement In Graphic ... | 0 | MTV VIDEO MUSIC AWARDS | MTV |
8648 | 1993 | Outstanding Individual Achievement In Sound Mi... | 0 | MTV UNPLUGGED | MTV |
8812 | 1994 | Outstanding Variety, Music Or Comedy Series | 0 | MTV UNPLUGGED | MTV |
9155 | 1995 | Outstanding Variety, Music Or Comedy Series | 0 | MTV UNPLUGGED | MTV |
9433 | 1996 | Outstanding Choreography | 0 | 1995 MTV Video Music Awards | MTV |
10786 | 1999 | Outstanding Technical Direction/Camera/Video F... | 0 | 1998 MTV Video Music Awards | MTV |
11072 | 2000 | Outstanding Animated Program (For Programming ... | 0 | MTV Downtown | MTV |
14142 | 2008 | Outstanding Art Direction For A Variety, Music... | 0 | 2007 MTV Video Music Awards | MTV |
14597 | 2009 | Outstanding Art Direction For Variety, Music O... | 1 | 2008 MTV Video Music Awards | MTV |
15557 | 2011 | Outstanding Art Direction For Variety, Music O... | 1 | 2010 MTV Video Music Awards | MTV |
We can observe that some of the nominees, such as 2010 MTV Video Music Awards, include a specific year. Therefore, in the aggregation process we won't accurately count all MTV Video Music Awards nominees. This is a great example of why it is very important to get familiar with and clean the data before analyzing it.
Further Reading:
In this example, we will use the Lord of the Rings Datasets, which include character data, as well as script data. First, let's download the dataset:
# Creating a dataset directory
!mkdir ./datasets/ltor
# download the dataset from Kaggle and unzip it
!kaggle datasets download paultimothymooney/lord-of-the-rings-data -p ./datasets/ltor
!unzip ./datasets/ltor/*.zip -d ./datasets/ltor
!ls ./datasets/ltor
Downloading lord-of-the-rings-data.zip to ./datasets/ltor 100%|███████████████████████████████████████| 88.0k/88.0k [00:00<00:00, 559kB/s] 100%|███████████████████████████████████████| 88.0k/88.0k [00:00<00:00, 556kB/s] Archive: ./datasets/ltor/lord-of-the-rings-data.zip inflating: ./datasets/ltor/lotr_characters.csv inflating: ./datasets/ltor/lotr_scripts.csv lord-of-the-rings-data.zip lotr_scripts.csv lotr_characters.csv
import re
import numpy as np
import pandas as pd
import networkx as nx
df = pd.read_csv("./datasets/ltor/lotr_characters.csv")
df = df.fillna('')
df
birth | death | gender | hair | height | name | race | realm | spouse | |
---|---|---|---|---|---|---|---|---|---|
0 | Female | Adanel | Men | Belemir | |||||
1 | TA 2978 | February 26 ,3019 | Male | Dark (book) Light brown (movie) | Boromir | Men | |||
2 | March ,3019 | Male | Lagduf | Orcs | |||||
3 | TA 280 | TA 515 | Male | Tarcil | Men | Arnor | Unnamed wife | ||
4 | Male | Fire-drake of Gondolin | Dragon | ||||||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
906 | Mid ,First Age | FA 495 | Female | Aerin | Men | Brodda | |||
907 | Aerandir | ||||||||
908 | YT during the ,Noontide of Valinor | FA 455 | Male | Golden | Aegnor | Elves | Loved ,Andreth but remained unmarried | ||
909 | TA 2917 | TA 3010 | Male | Adrahil II | Men | Unnamed wife | |||
910 | Before ,TA 1944 | Late ,Third Age | Male | Adrahil I | Men |
911 rows × 9 columns
Let's count the number of characters of each race and gender:
g = df.groupby(['race', 'gender'])
#g.size()# create a Series
g.size().to_frame (name='counts').sort_values('counts', ascending=0)
counts | ||
---|---|---|
race | gender | |
Men | Male | 334 |
137 | ||
Hobbits | Male | 89 |
Elves | Male | 76 |
Men | Female | 51 |
... | ... | ... |
Dragon | Male | 1 |
Black Uruk | Male | 1 |
Hobbits | male | 1 |
Eagle | Male | 1 |
Wolfhound | Male | 1 |
61 rows × 1 columns
Let's draw the spouse network. First, let's clean out the data and create a DataFrame object in which we have all the couples' data organized:
import re
def normailze_spouse(s):
if s in {'','Unnamed wife', 'None' }:
return None
if "Loved" not in s:
return s
r = re.compile("Loved ,(\w+)")
l = r.findall(s)
if len(l) > 0:
return l[0]
df['norm_spouse'] = df['spouse'].apply(lambda s: normailze_spouse(s))
links_df = df[['name', 'norm_spouse', 'race', 'gender']]
links_df = links_df.dropna()
links_df
name | norm_spouse | race | gender | |
---|---|---|---|---|
0 | Adanel | Belemir | Men | Female |
7 | Angrod | Eldalótë | Elves | Male |
10 | Ar-Pharazôn | Tar-Míriel | Men | Male |
12 | Ar-Gimilzôr | Inzilbêth | Men | Male |
21 | Tarannon Falastur | Berúthiel | Men | Male |
... | ... | ... | ... | ... |
897 | Ailinel | Orchaldor | Men | Female |
898 | Almarian | Tar-Meneldur | Men | Female |
899 | Algund | None known | Men | Male |
906 | Aerin | Brodda | Men | Female |
908 | Aegnor | Andreth | Elves | Male |
228 rows × 4 columns
links_df2 = pd.merge(links_df, links_df, left_on="name", right_on="norm_spouse")
links_df2
name_x | norm_spouse_x | race_x | gender_x | name_y | norm_spouse_y | race_y | gender_y | |
---|---|---|---|---|---|---|---|---|
0 | Adanel | Belemir | Men | Female | Belemir | Adanel | Men | Male |
1 | Angrod | Eldalótë | Elves | Male | Eldalótë | Angrod | Elves | Female |
2 | Ar-Pharazôn | Tar-Míriel | Men | Male | Tar-Míriel | Ar-Pharazôn | Men | Female |
3 | Ar-Gimilzôr | Inzilbêth | Men | Male | Inzilbêth | Ar-Gimilzôr | Men | Female |
4 | Tarannon Falastur | Berúthiel | Men | Male | Berúthiel | Tarannon Falastur | Men | Female |
... | ... | ... | ... | ... | ... | ... | ... | ... |
122 | Amroth | Nimrodel | Elves | Male | Nimrodel | Amroth | Elves | Female |
123 | Amarië | Finrod | Elves | Female | Finrod | Amarië | Elves | Male |
124 | Ailinel | Orchaldor | Men | Female | Orchaldor | Ailinel | Men | Male |
125 | Almarian | Tar-Meneldur | Men | Female | Tar-Meneldur | Almarian | Men | Male |
126 | Aegnor | Andreth | Elves | Male | Andreth | Aegnor | Men | Female |
127 rows × 8 columns
links_df2 = links_df2[["name_x","race_x", "gender_x", "name_y", "race_y", "gender_y"]]
links_df2
name_x | race_x | gender_x | name_y | race_y | gender_y | |
---|---|---|---|---|---|---|
0 | Adanel | Men | Female | Belemir | Men | Male |
1 | Angrod | Elves | Male | Eldalótë | Elves | Female |
2 | Ar-Pharazôn | Men | Male | Tar-Míriel | Men | Female |
3 | Ar-Gimilzôr | Men | Male | Inzilbêth | Men | Female |
4 | Tarannon Falastur | Men | Male | Berúthiel | Men | Female |
... | ... | ... | ... | ... | ... | ... |
122 | Amroth | Elves | Male | Nimrodel | Elves | Female |
123 | Amarië | Elves | Female | Finrod | Elves | Male |
124 | Ailinel | Men | Female | Orchaldor | Men | Male |
125 | Almarian | Men | Female | Tar-Meneldur | Men | Male |
126 | Aegnor | Elves | Male | Andreth | Men | Female |
127 rows × 6 columns
Next, let create a Networkx graph object
g = nx.Graph()
for i, r in links_df2.iterrows():
g.add_node(r['name_x'],race=r["race_x"], gender=r["gender_x"])
g.add_node(r['name_y'],race=r["race_y"], gender=r["gender_y"])
edges_list = [(r['name_x'], r['name_y']) for i, r in links_df2.iterrows()]
g.add_edges_from(edges_list)
import matplotlib.pyplot as plt
node_colors = []
# add node color by gender
for node in g.nodes(data=True):
if node[1]['gender'] == "Male":
node_colors.append('blue')
elif node[1]['gender'] == "Female":
node_colors.append('red')
else:
node_colors.append('green')
plt.figure(3,figsize=(14,14))
nx.draw_spring(g, with_labels=True, node_color=node_colors)
/anaconda3/envs/massivedata/lib/python3.6/site-packages/networkx/drawing/nx_pylab.py:579: MatplotlibDeprecationWarning: The iterable function was deprecated in Matplotlib 3.1 and will be removed in 3.3. Use np.iterable instead. if not cb.iterable(width):
We can also draw the vertices to have different shapes according to each character race. We will show how todo this later in the course.
There are some great tutorials on interesting real world datasets. Here are some examples: