Installation¶
Start using wikipedia for Python in less than 5 minutes! MediaWikiAPI is compatible with Python 3 and Python 2.7. If you are looking for the the full developer API, see MediaWikiAPI Documentation.
Begin by installing wikipedia:
$ pip install wikipedia
As alternative you can use the source code from Github.
Quickstart¶
Now let’s use the MediaWikiAPI. First you need to import the package and create MediaWikiAPI class.
In order to use search and suggestion call the corresponding methods search
and suggest
:
>>> from mediawikiapi import MediaWikiAPI
>>> mediawikiapi = MediaWikiAPI()
>>> mediawikiapi.search("Barack")
[u'Barak (given name)', u'Barack Obama', u'Barack (brandy)', u'Presidency of Barack Obama', u'Family of Barack Obama', u'First inauguration of Barack Obama', u'Barack Obama presidential campaign, 2008', u'Barack Obama, Sr.', u'Barack Obama citizenship conspiracy theories', u'Presidential transition of Barack Obama']
>>> mediawikiapi.suggest("Barak Obama") # returns the suggested Wikipedia title for a query or None
u'Barack Obama'
We can also get fewer or more results by using the results
kwarg:
>>> mediawikiapi.search("Ford", results=3)
[u'Ford Motor Company', u'Gerald Ford', u'Henry Ford']
To get the summary of an article, use mediawikiapi.summary
:
>>> mediawikiapi.summary("GitHub")
2011, GitHub was the most popular open source code repository site.\nGitHub Inc. was founded in 2008 and is based in San Francisco, California.\nIn July 2012, the company received $100 million in Series A funding, primarily from Andreessen Horowitz.'
>>> mediawikiapi.summary("Apple III", sentences=1)
u'The Apple III (often rendered as Apple ///) is a business-oriented personal computer produced and released by Apple Computer that was intended as the successor to the Apple II series, but largely considered a failure in the market. '
mediawikiapi.page
enables you to load and access data from full Wikipedia pages. Initialize with a page title (keep in mind the errors listed above), and then access most properties using property methods:
>>> ny = mediawikiapi.page("New York (state)")
>>> ny.title
u'New York (state)'
>>> ny.url
u'http://en.wikipedia.org/wiki/New_York_(state)'
>>> ny.content
u'New York is a state in the northeastern United States. New York was one of the original thir'...
>>> ny.images[0]
u'http://upload.wikimedia.org/wikipedia/commons/9/91/New_York_quarter%2C_reverse_side%2C_2001.jpg'
>>> ny.links[0]
u'1790 United States Census'
To change the language of the Wikipedia you are accessing, use mediawikiapi.conf.language
.
Remember to search for page titles in the language that you have set, not English!:
>>> mediawikiapi.config.language = "fr"
>>> print mediawikiapi.summary("Francois Hollande")
François Hollande, né le 12 août 1954 à Rouen, en Seine-Maritime, est un homme d'État français. Il est président de la République française depuis le 15 mai 2012...
To get a list of all possible language prefixes, try mediawikiapi.languages()
.
For more details and configuration option check API section.
Changelog¶
Changelog¶
Here you can find the full developer API for the MediaWikiAPI project.
Contents:
Version 1.1.3¶
Version 1.1¶
- Breaking change - add MediaWikiAPI class, now you can have more than one api access point with different configurations (Config instances). MediaWikiAPI class contains all the mediawikiapi function from version 1.0. The Config class cloud be pass as parameter during initialization.
- Support python 2
- Rename Configuration class to Config, add language field
- Config().get_api_url now accept language parameter
- Add timeout for requests, field in Config class called timeout (in seconds).
- Makes the pagepropsof a wikipedia page accessible PR #147 from @goldsmith repo.
- Fix suggestion, issue #108 by PR #131 from @goldsmith repo.
- Fix problem with hidden files in the article PR #132 @goldsmith repo.
- DisambiguationError contains now information about title and url PR #92 from @goldsmith repo.
- Fix issue where pageid request => redirect raises error PR #165
Version 1.0¶
- Fork Wikipedia
- Add language validation for mediawikiapi.set_lang
- Add lang title method to WikipediaPage
- Add re-usage the same requests session
- Fix installing error with version
- Fix WikipediaPage.sections
- Fix mock data
- Refactoring: seperate Language and Configuration classes
Indices and tables¶
API¶
MediaWikiAPI Documentation¶
Here you can find the full developer API for the MediaWikiAPI project.
Contents:
Classes¶
-
class
mediawikiapi.
MediaWikiAPI
(config=None)[source]¶ -
-
category_members
(title=None, pageid=None, cmlimit=10, cmtype='page')[source]¶ Get list of page titles belonging to a category. Keyword arguments:
- title - category title. Cannot be used together with “pageid”
- pageid - page id of category page. Cannot be used together with “title”
- cmlimit - the maximum number of titles to return
- cmtype - which type of page to include. (“page”, “subcat”, or “file”)
-
geosearch
(latitude, longitude, title=None, results=10, radius=1000)[source]¶ Do a wikipedia geo search for latitude and longitude using HTTP API described in http://www.mediawiki.org/wiki/Extension:GeoData
Arguments:
- latitude (float or decimal.Decimal)
- longitude (float or decimal.Decimal)
Keyword arguments:
- title - The title of an article to search for
- results - the maximum number of results returned
- radius - Search radius in meters. The value must be between 10 and 10000
-
languages
()[source]¶ List all the currently supported language prefixes (usually ISO language code).
Can be inputted to WikipediaPage.conf to change the Mediawiki that wikipedia requests results from.
Returns: dict of <prefix>: <local_lang_name> pairs. To get just a list of prefixes, use wikipedia.languages().keys().
-
page
(title=None, pageid=None, auto_suggest=True, redirect=True, preload=False)[source]¶ Get a WikipediaPage object for the page with title title or the pageid pageid (mutually exclusive).
Keyword arguments:
- title - the title of the page to load
- pageid - the numeric pageid of the page to load
- auto_suggest - let Wikipedia find a valid page title for the query
- redirect - allow redirection without raising RedirectError
- preload - load content, summary, images, references, and links during initialization
Attention!
The usage of auto_suggest may provide you with different page than you searched.
For example:
page(“The Squires (disambiguation)”, auto_suggest=True) returns page with title Squires (disambiguation)
page(“The Squires (disambiguation)”, auto_suggest=False) returns page with title The Squires (disambiguation)
-
random
(pages=1)[source]¶ Get a list of random Wikipedia article titles.
Note
Random only gets articles from namespace 0, meaning no Category, User talk, or other meta-Wikipedia pages.
Keyword arguments:
- pages - the number of random pages returned (max of 10)
-
search
(query, results=10, suggestion=False)[source]¶ Do a Wikipedia search for query.
Keyword arguments:
- results - the maxmimum number of results returned
- suggestion - if True, return results and suggestion (if any) in a tuple
-
suggest
(query)[source]¶ Get a Wikipedia search suggestion for query. Returns a string or None if no suggestion was found.
-
summary
(title, sentences=0, chars=0, auto_suggest=True, redirect=True)[source]¶ Plain text summary of the page. .. note:: This is a convenience wrapper - auto_suggest and redirect are enabled by default Keyword arguments: * sentences - if set, return the first sentences sentences (can be no greater than 10). * chars - if set, return only the first chars characters (actual text returned may be slightly longer). * auto_suggest - let Wikipedia find a valid page title for the query * redirect - allow redirection without raising RedirectError
-
-
class
mediawikiapi.
WikipediaPage
(title=None, pageid=None, redirect=True, preload=False, original_title='', request=None)[source]¶ Contains data from a Wikipedia page. Uses property methods to filter data from the raw HTML.
-
__init__
(title=None, pageid=None, redirect=True, preload=False, original_title='', request=None)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
backlinks
¶ List of pages that link to a given page
-
backlinks_ids
¶ List of pages ids that link to a given page
Note
It is not garanted that backlinks_ids list contains all backlinks. Sometimes the pageid is missing and only title is available, as a result len(backlinks_ids) <= len(backlinks).
-
categories
¶ List of categories of a page.
-
content
¶ Plain text content of the page, excluding images, tables, and other data.
-
coordinates
¶ Tuple of Decimals in the form of (lat, lon) or None
-
images
¶ List of URLs of images on the page.
-
lang_title
(lang_code)[source]¶ Get the title in specified language code Returns None if lang code or title isn’t found, otherwise returns a string with title. Raise LanguageException if language doesn’t exists
-
links
¶ List of titles of Wikipedia page links on a page.
Note
Only includes articles from namespace 0, meaning no Category, User talk, or other meta-Wikipedia pages.
-
parent_id
¶ Revision ID of the parent version of the current revision of this page. See
revision_id
for more information.
-
references
¶ List of URLs of external links on a page. May include external links within page that aren’t technically cited anywhere.
-
revision_id
¶ Revision ID of the page.
The revision ID is a number that uniquely identifies the current version of the page. It can be used to create the permalink or for other direct API calls. See Help:Page history for more information.
-
section
(section_title)[source]¶ Get the plain text content of a section from self.sections. Returns None if section_title isn’t found, otherwise returns a whitespace stripped string.
This is a convenience method that wraps self.content.
Warning
Calling section on a section that has subheadings will NOT return the full text of all of the subsections. It only gets the text between section_title and the next subheading, which is often empty.
-
sections
¶ List of section titles from the table of contents on the page.
-
summary
¶ Plain text summary of the page.
-
-
class
mediawikiapi.
Language
(language=None)[source]¶ Language used in mediawiki, if language is not defined English will be used
-
language
¶ Return language
-
-
class
mediawikiapi.
Config
(language=None, user_agent=None, rate_limit=None)[source]¶ Contains global configuration
-
__init__
(language=None, user_agent=None, rate_limit=None)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
get_api_url
(language=None)[source]¶ Return api for specified language Arguments: * language - (string or Language instance) specifying the language
-
language
¶ Return current global language
-
Exceptions¶
Global wikipedia exception and warning classes.
-
exception
mediawikiapi.exceptions.
HTTPTimeoutError
(query)[source]¶ Exception raised when a request to the Mediawiki servers times out.
-
exception
mediawikiapi.exceptions.
LanguageError
(language)[source]¶ Exception raised when a language prefix is set which is not available
-
exception
mediawikiapi.exceptions.
MediaWikiAPIException
(error)[source]¶ Base Wikipedia exception class.