Top Menu

Jump to content
Home
    Modules
      • Projects
      • Activity
      • News
    • Getting started
    • Introduction video
      Welcome to Weyd
      Get a quick overview of project management and team collaboration with OpenProject. You can restart this video from the help menu.

    • Help and support
    • Upgrade to Enterprise edition
    • User guides
    • Videos
    • Shortcuts
    • Community forum
    • Enterprise support

    • Additional resources
    • Data privacy and security policy
    • Digital accessibility (DE)
    • OpenProject website
    • Security alerts / Newsletter
    • OpenProject blog
    • Release notes
    • Report a bug
    • Development roadmap
    • Add and edit translations
    • API documentation
  • Sign in
      Create a new account
      Forgot your password?

Side Menu

  • Overview
  • News
  • Documentation
    Documentation

Content

You are here:
  1. Documentation
  2. Scrapers
  3. Source

Source

  • More
    • Print
    • Table of contents

The common keys required for all sources are:

  • "name" - String - The name of the source presented in weyd's links page
  • "language" - Array of String - This is not currently processed by weyd, but was put here based on other scraper packages using this. It will be used in the future, so it would be wise to add it now. Once implemented, weyd will use the ISO standard language codes.
  • "domains" - Array of String - This is not currently processed by weyd, but in the future it will be used to substitute into the "base_url" domain name to search backups in case the primary domain is down. These should just be FQDN without https and without port numbers.
  • "base_url" - String - The base URL that is prepended to search requests. This must be HTTPS and must contain a FQDN to a server with a valid SSL certificate issued by a trusted CA.
  • "search_url_format_episode" - JSON Object - This must be present even if the source being scraped does not contain TV shows.
    • "string_format" - String - Contains a ​Java String.format() template for String replacement. If the source being scraped does not contain TV shows, or you don't want weyd to search for TV shows, then leave this an empty string.
    • "replacement" - Array of String - This must contain equal number of items as the "string_format" contains for replacement. Each item must be in the same order as they will be replaced, and they must be the correct data type.
      • Possible values:
        • "title" - Replaced by "The Title Exactly How It Is" - Spaces get replaced with + to URL encode
        • "title_lower" - Replaced by "the title lowercase with spaces" - Spaces get replaced with + to URL encode
        • "title_lower_dash" - Replaced by "the-title-lowercase-with-dashes-for-spaces"
        • "year_text" - Replaced by the String version of year - "2001"
        • "year_int" - Replaced by the Integer of year - 2015
        • "season_text" - Replaced by the String version of season - "1", "15"
        • "season_int" - Replaced by the Integer version of season - 1, 5, 20
        • "episode_text" - Replaced by the String version of episode - "3", "12"
        • "episode_int" - Replaced by the Integer version of episode - 2, 4, 8
  • "search_url_format_movie" - JSON Object - This must be present even if the source being scraped does not contain Movies.
    • "string_format" - String - Contains a ​Java String.format() template for String replacement. If the source being scraped does not contain Movies, or you don't want weyd to search for Movies, then leave this an empty string.
    • "replacement" - Array of String - This must contain equal number of items as the "string_format" contains for replacement. Each item must be in the same order as they will be replaced, and they must be the correct data type.
      • Possible values:
        • "title" - Replaced by "The Title Exactly How It Is" - Spaces get replaced with + to URL encode
        • "title_lower" - Replaced by "the title lowercase with spaces" - Spaces get replaced with + to URL encode
        • "title_lower_dash" - Replaced by "the-title-lowercase-with-dashes-for-spaces"
        • "year_text" - Replaced by the String version of year - "2001"
        • "year_int" - Replaced by the Integer of year - 2015
  • search_url_format_season_pack" - JSON Object (optional - only available if "is_torrent":true) - The presence of this JSON Object indicates that this source can be searched for Series Packs. Be careful with this option because it will slow down the searching of your script. It's best to limit this to API sources and HTML sources that have "links_on_first_page": true.
    • "string_format" - String - Contains a ​Java String.format() template for String replacement. Use only 1 %s in the position where weyd will insert various combinations of the title for searching Series Packs. Do not include any %d for year, season, or episode.
    • "replacement" - Array of String - This must contain equal number of items as the "string_format" contains for replacement. Each item must be in the same order as they will be replaced, and they must be the correct data type. The last %s will be used by weyd to insert various combinations of the title for searching Series Packs. Do not include any year, season, or episode replacements.
      • Possible values:
        • "title" - Replaced by "The Title Exactly How It Is" - Spaces get replaced with + to URL encode
        • "title_lower" - Replaced by "the title lowercase with spaces" - Spaces get replaced with + to URL encode
        • "title_lower_dash" - Replaced by "the-title-lowercase-with-dashes-for-spaces"
  • "is_torrent" - Boolean - Does this source contain torrents (If this is true, then "is_direct" is ignored).
  • "is_direct" - Boolean - Does this source contain direct downloads.
  • "name_delete_filter" - Array of String - An array of String and each String will be removed from the scraped title.

Example

{
  "name": "weyd",
  "language": ["en"],
  "domains": ["weyd.app"],
  "base_url": "https://weyd.app",
  "search_url_format_episode": {
    "string_format": "/search?query=%s+s%02de%02d&type=tv",
    "replacement": [
      "title_lower",
      "season_int",
      "episode_int"
    ]
  },
  "search_url_format_movie": {
    "string_format": "/search?query=%s+%04d&&type=movie",
    "replacement": [
      "title_lower",
      "year_int"
    ]
  },
  "search_url_format_season_pack": {
    "string_format": "/%.1s/%s",
    "replacement": [
      "title_lower"
    ]
  },
  "is_torrent": true,
  "is_direct": false
}

Source types

There are two types of sources, and each has a different layout. A source can either be an API that returns JSON, or it can be a HTML based website.

Regardless of which type you're trying to access, all URLs must be HTTPS to a FQDN with a valid SSL certificate. You will not be able to use raw IPs or weyd cannot access the links.

To distinguish a source as API, you must include a JSON Object with the key "api"

{
 "api": {
    "key": "value",
    "key": "value",
    "key": "value"
  }
}

If this key ("api") exists, all other directives finding values on the page will be ignored.

Without the "api" key, the default is to handle this as a HTML based website.

Loading...