TOOLSP
"
WELCOME !

Please ! Use [CODE] tags for your LINKS and CODE.
Favor usar balisas [CODE] para sus vínculos y código.
Merci d'utiliser les balises [CODE] pour vos liens et code.

[code]http://Thank.you[/code]
"
*** GITHUB ***
.
.
URL Resolver [GIT]
.
.
.
.
Jx Update [REDIRECT]
.
.
*** Associated ***
.
Pastebin Your list online
.
mediafire Upload Files
.
imgur Upload Pics
lyngsat TV logos collection
transparent .png
.
http://hola.org free? VPN
Hola.apk free? VPN App.
.
hidester- free Proxy
.
webgrabplus EPG - Eng
xmltv EPG - Fr
kazer EPG - Fr
.
.wordreference Traduction
.
mail.com (fast sign-in)
.
.
Latest topics
» Regex for my favori serie
Sun 24 Sep - 21:12 by beezlo

» help regex
Fri 22 Sep - 21:08 by mickydoo

» I need i little help please!
Wed 20 Sep - 18:24 by jujuuj

» some help with vaughn or ustream .TV
Wed 20 Sep - 0:56 by adrianhn

» Xbmc tool box / Xbmc gui (Kodi)
Mon 18 Sep - 18:16 by oxus

» how to play this in kodi
Mon 11 Sep - 8:25 by retrorat1

» Where to put your list
Sun 10 Sep - 14:29 by oxus

FRIENDS / PARTNERS

forum

créer un forum


Pagination / How to deal with numerous pages (of movies)

View previous topic View next topic Go down

Pagination / How to deal with numerous pages (of movies)

Post by jujuuj on Wed 31 May - 19:18

Hello,

I am opening this new thread with the idea topropose different ways to manage pagination in makelists.

The video sites are generally having plenty of movies, by category. In every category we usually have more than one page, so when we do a makelist, we have to create a regex that will deal with pages, so we can open all the movies (and not only the first page...)

There are plenty of ways to do it, so this thread will provide you different examples/solutions for pagination, starting with the most simple and classical solutions, and finishing with the 'newest' .
avatar
jujuuj

Messages : 88
Date d'inscription : 2017-03-28

View user profile

Back to top Go down

Re: Pagination / How to deal with numerous pages (of movies)

Post by adrianhn on Fri 9 Jun - 19:35

the more simple way i use is making a text file with a number 1 to 15 Smile i just call from regex


Code:


 <link>$doregex[makelist3]</link>

 
<regex>
  <name>makelist3</name>
  <listrepeat><![CDATA[
      <title>Pagina [makelist3.param1]</title>
      <link>$doregex[makelist2]</link>
<thumbnail>http://adryanlist.org/adryan/img/adryflix.jpg</thumbnail>
  ]]></listrepeat>
  <expres><![CDATA[paginado:"(.*?)";]]></expres>
  <page>http://adryanlist.org/text_with_numbers_1_to_15.txt</page>
  <cookieJar></cookieJar>
</regex>

<regex>
<name>makelist2</name>
<listrepeat><![CDATA[
<title>[COLOR skyblue] [makelist2.param2]  [COLOR lightblue] idioma:[makelist2.param4][/COLOR][/COLOR] </title>
<link>$doregex[makelist]</link>
<thumbnail>[makelist2.param3]</thumbnail>
<fanart>[makelist2.param3]</fanart>
]]></listrepeat>
<expres>...</expres>
<page>http://www.PaginadePeliculas.com/page/[makelist3.param1]</page>
 </regex>


adrianhn

Messages : 8
Date d'inscription : 2017-04-17

View user profile

Back to top Go down

error

Post by JonnyB on Fri 9 Jun - 23:31

For a manual way no need external pages, something like this will do:

Code:

<regex>
<name>makelist</name>
<listrepeat><![CDATA[
<title>Page [makelist.param1]</title>
<link>_</link>
]]></listrepeat>
<expres>'(.*?)'</expres>
<page>'1''2''3''4''5''6'</page>
</regex>

JonnyB

Messages : 9
Date d'inscription : 2017-04-18

View user profile

Back to top Go down

Re: Pagination / How to deal with numerous pages (of movies)

Post by adrianhn on Fri 9 Jun - 23:36

thks for that .. i learn something everyday

adrianhn

Messages : 8
Date d'inscription : 2017-04-17

View user profile

Back to top Go down

Re: Pagination / How to deal with numerous pages (of movies)

Post by jujuuj on Sun 11 Jun - 12:15

JonnyB wrote:For a manual way no need external pages, something like this will do:

Code:

<regex>
<name>makelist</name>
<listrepeat><![CDATA[
<title>Page [makelist.param1]</title>
<link>_</link>
]]></listrepeat>
<expres>'(.*?)'</expres>
<page>'1''2''3''4''5''6'</page>
</regex>

Exact !
Or you can directly PASTE (a part of) the source code in the <page> tag
avatar
jujuuj

Messages : 88
Date d'inscription : 2017-03-28

View user profile

Back to top Go down

Manually Write the number of last page in a py function

Post by jujuuj on Sun 11 Jun - 12:28

So, if we know that we have 15 pages, (= that the last page is number 15)

we can do that :

Code:

<regex>
<name>makelist</name>
<listrepeat><![CDATA[
<title>Page [makelist.param1]</title>
<link>_</link>
]]></listrepeat>
<expres> (.*?),</expres>
<page>$doregex[get-list-page]</page>
</regex>

<regex> <!-- just a virtual quantity of pages I decide   = 15 -->
<name>get-list-page</name>                                            
<expres><![CDATA[#$pyFunction
def GetLSProData(page_data,Cookie_Jar,m):
 liste = list(range(17))
 return liste
]]></expres>
<page></page>
</regex>

It will create a list of 17 elements like that :
0, 1, 2, 3, 4, 5, 6,   °°°  15, 16

as there is no [espace  ] before the first number, the express will not grab it.
as there is no [coma ,]  after the last element it will not grab it.

So we have our list of numbers from 1 to 15 ! : it 's our [makelist.param1]

(and if you wanted 50 pages, just replace the 17 above by a 52)
avatar
jujuuj

Messages : 88
Date d'inscription : 2017-03-28

View user profile

Back to top Go down

Automatic search of the number of the last page

Post by jujuuj on Sat 24 Jun - 12:10

In that example, I don't want to manually write the number of last page (as it is different for every category)

In the easiest situation ( can happen in little sites, or tv sites)  you will have direct-links to all the pages (1 2 3 4 5 6.) appearing in web page and source sode ; so you just use an <expres>>(.*?)&lt;/a></expres> to grab all numbers and directly build your links with this list.  Let's forget this last case, out of range... too easy.

In all other situations, you do NOT have direct link to all page numbers : you 'll have link to page 1 2 3 ... and last  (for example).  So the idea will be :
1 get the number-of-last-page
2 build a list from 1 to that number-of-last-page
3 ... so we can have links to all pages  in our "kodi-makelist"  

So, basically, I will need a "regex"  able to find and record (=grab) this number-of-last-page

Usually, in source code u will find one line like this  :

<a href="/films/page-2">2</a><a href="/films/page-3">3</a><a href="/films/page-4">4</a><a href="/films/page-5">5</a><div class="nokta">...</div><a href="/films/page-71">71</a></div><a class="sonraki-sayfa" href="/films/page-2">suivante »</a> </div> </div>

as you can see, the source code "says" what number is last page   (here, 71)
(and it always "says" it).
So the idea is to grab that number-of-last-page    [and then we will create a list from 1 to that number-of-last-page]

Sometimes, in this line you can see  <a href="/films/page-71">Last</a>
in that case a very simple express will do the job   <expres>a href="\/films\/page-(.*?)">Last</expres>

Sometimes you don't see the word "Last" ... but the last page is the last number of list. For example :
blah 1 blah 2 blah 3 >...</div><a href="/films/page-71">71</a></div></div></div>    
so you can grab the last number of this line and it will be last page ...  
in the express,  the idea is to use the final </a></div></div></div>  that will be unique, to locate our number
<expres>a href="\/films\/page-.*?">(.*?)&lt;\/a>&lt;\/div>&lt;\/div>&lt;\/div></expres>
or (if you prefer)
<expres><![CDATA[a href="\/films\/page-.*?">(.*?)<\/a><\/div><\/div><\/div>]]></expres>

Not always possible to do that, as sometimes there are other pages after >Last<  in source code ...
a basic example of that is if the last in list is >Next<  (=siguiente, suivante).   but even in that case, you can find last-page-number  before the >Next< ; example
<a href="/films/page-2">2</a><a href="/films/page-3">3</a><a href="/films/page-4">4</a><a href="/films/page-5">5</a><div class="nokta">...</div><a href="/films/page-71">71</a></div><a class="sonraki-sayfa" href="/films/page-2">Next</a> </div> </div>
the interesting past is :   >71</a></div><a class="sonraki-sayfa" href="/films/page-2">Next<
the corresponding <expres><![CDATA[>(.*?)</a></div><[^>]+>Next<]]></expres>
(I can't explain all here, but this  [^>]+  is like a .*? except that it can NOT contain any > inside)

Some-(hard)-times, it can be easier to grab all the numbers of pages, and then to keep only the MAX (which is last page number).  Equally, u can  grab all the numbers of pages, and then SORT them (from min to max) and then keep the last one (sorted).   This can be done when using python to build the item (examples ? latter ... Wink)


Last edited by jujuuj on Sat 24 Jun - 12:32; edited 1 time in total
avatar
jujuuj

Messages : 88
Date d'inscription : 2017-03-28

View user profile

Back to top Go down

=

Post by jujuuj on Sat 24 Jun - 12:21

NOW THAT WE HAVE the "Number of last page"    

We have to buid a list  of page-numberS  (from 1 to "Number-of-last-page")
in order to build, latter, our links to all pages in a makelist.

That will be next step ...
(we will build it as in the "manual search" example above,
except we won't write any number manually in code,
we will replace this 'manual-write-step' by our 'Auto-search of last-page-number'
and in a second time we will use the 'Auto-creation of list of pages' (as in the "Manual case")
avatar
jujuuj

Messages : 88
Date d'inscription : 2017-03-28

View user profile

Back to top Go down

Re: Pagination / How to deal with numerous pages (of movies)

Post by jujuuj on Sun 27 Aug - 15:07

here you decide what is the number of the last page you want
Code:

<item>
<title>NewPCT1 [color=undefined]Estrenos de Cine - Screeners TORRENT [/color]  (Quasar)  
 >> newpct1 Completo = 33 paginas   (ESP)</title>
<link>$doregex[makelist2]</link>
<thumbnail>http://www.newpct1.com/pct1/library/content/template/images/newpct1.jpg</thumbnail>       <!--thumb del sitio  web-->
 
<regex> showing page-numbers
<name>makelist2</name>
<listrepeat><![CDATA[
<title>Estrenos de Cine -  Pagina [makelist2.param1].</title> (  or, if mklist 1, title :[makelist.param2] ,page [makelist2.param1]  )
<link>$doregex[makelist3]</link>
<referer></referer>
<thumbnail></thumbnail>
]]></listrepeat>
<expres> (.*?),</expres>
<page>$doregex[get-list-page]</page>
<cookieJar></cookieJar>
</regex>

<regex> just a virtual quantity of pages I decide   = 33
<name>get-list-page</name>                                            
<expres><![CDATA[#$pyFunction
def GetLSProData(page_data,Cookie_Jar,m):

 liste = list(range(35))
 return liste
]]></expres>
<page></page>
</regex>

<regex>
<name>makelist3</name>
<listrepeat><![CDATA[
   <title> [color=undefined][makelist3.param2] [/color]</title>
     <link>plugin://plugin.video.quasar/play?uri=$doregex[link-torrent]</link>
        <regex>
     <name>link-torrent</name>
   
     <expres>"(http.*?descargar-torrent.*?)"</expres>
     <page>[makelist3.param1]</page>
     <referer>http://www.newpct1.com/estrenos-de-cine/</referer>
     </regex>
   <referer></referer>
   <thumbnail>http://www.newpct1.com/pictures/f/[makelist3.param3]</thumbnail>
]]></listrepeat>
<expres><a\shref="(http:\/\/www.newpct1.com\/pelicula\/(.*?)\/)"\stitle.*?<img\ssrc="http:\/\/www.newpct1.com\/pictures\/f\/thumbs\/(.*?)"\s</expres>
<page>http://www.newpct1.com/estrenos-de-cine/pg/[makelist2.param1]</page>
<referer>http://www.newpct1.com/estrenos-de-cine/</referer>
<cookieJar></cookieJar>
</regex>
</item>



here you take all the pages in one  (but it can be verrrry long ...)

Code:


 
 
 <item>
 <title>V.0.3B  TEST  http://www.elitetorrent.net   ESP   Género/Multipaginas [FULL] (+resumén)  [QUASAR</title>
 <thumbnail>http://www.elitetorrent.net/images/logo_elite.png</thumbnail>
        <link>$doregex[makelistORDEN]</link>
 
        <regex>
        <name>makelistORDEN</name>  ORDEN
        <listrepeat><![CDATA[
        <title>[makelistORDEN.param2]</title>
 <link>$doregex[makelist0]</link>
        ]]></listrepeat>
 <expres>"([^"]+)*" \(=([^\)]+)\)  </expres>
 <page>"" (=Fecha de entrada)  "/orden:valoracion" (=Valoración)  "/orden:popularidad" (=Popularidad)  "" (= )  "" (=INFO : Algunas listas demoran muchisimo en cargar !)  "" (=Las versiones 1.x  son mas rapidas ...)  </page>
 <referer></referer>
 </regex>

        <regex>
        <name>makelist0</name>  categories
        <listrepeat><![CDATA[
        <title>[makelist0.param2]</title>
 <link>$doregex[makelist]</link>
        ]]></listrepeat>
 <expres><![CDATA[<a href="\/(categoria\/[^"]+)">([^<]+)</a>]]></expres>
        <page>http://www.elitetorrent.net</page>
 <referer></referer>
 </regex>

<regex>
<name>makelist</name> movies  (all pages)
 <listrepeat><![CDATA[
  <title>• [makelist.param3]  •  [COLOR grey]$doregex[resume][/COLOR] </title>
  <info> </info>
  <thumbnail>http://www.elitetorrent.net/[makelist.param2]</thumbnail>
  <link>plugin://plugin.video.quasar/play?uri=$doregex[link]</link>
]]></listrepeat>
<expres><![CDATA[#$pyFunction
import re, requests
def GetLSProData(page_data,Cookie_Jar,m):
 
################################################################## FORMULARY #################################################################
 starturl = 'http://www.elitetorrent.net/'     ### usually, ~ 'http://website.com/category/' ###
 categorylink = '[makelist0.param1]' ### check the number of the param ###
 centerurl = '[makelistORDEN.param1]/pag:'   ### usually, ~ '/page/' ###
 firstpage = 1  ### probably = 1 ###
 pagenumbersregex = 'href="[^"]+\D(\d{1,3})\/*"[^>]*>\w{0,9}<'   ### This one may work ... To adapt ###
 dataregex = '<a href="\/(torrent\/[^"]+)"><img src="([^"]+)" border="0" title="([^"]+)"'      ### To adapt ###
 headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0', 'Referer': 'http://www.elitetorrent.net'}      ### To adapt ###
##############################################################################################################################################

############################## GENERIC PART #####################################
 fullurl = starturl + categorylink  + centerurl + str(firstpage)
 source = requests.get(fullurl, headers= headers).text #
 data = [] #
 try: #
   last = sorted(map(int, re.findall(pagenumbersregex, source)))[-1] #
 except: #
   last = 30 #
 while firstpage <= last: #
     try: #
        fullurl = starturl + categorylink + centerurl + str(firstpage)#
        source = requests.get(fullurl, headers=headers).text #
        IsThatDataDifferent = re.findall(dataregex, source) #
        if IsThatDataDifferent == data:
   pass
 else:
   data += IsThatDataDifferent
 firstpage += 1 #
     except: #
        pass #
 return data #
#################################################################################
]]></expres><page></page>
<referer>http://www.elitetorrent.net/</referer>
  <cookieJar>$doregex[createCFCookie]</cookieJar>
  </regex>
 
 <regex>
        <name>resume</name>
 <expres>(?s)o: (\d\d\d\d).*?psis:([^"]+)"</expres>
        <page>http://www.elitetorrent.net/[makelist.param1]</page>
        <referer>http://www.elitetorrent.net/</referer>
        <cookieJar>open[elitetorrent.lwp]</cookieJar>
        </regex>
 
 <regex>
        <name>magnet</name>
        <expres>"(magnet:.*?)"</expres>
        <page>http://www.elitetorrent.net/[makelist.param1]</page>
        <referer>http://www.elitetorrent.net/</referer>
        <cookieJar>open[elitetorrent.lwp]</cookieJar>
        </regex>
        <regex>
        <name>link</name>
        <expres>$pyFunction:urllib.quote_plus('$doregex[magnet]')</expres>
        <page></page>
        </regex>
        <regex>
        <name>createCFCookie</name>
        <expres></expres>
        <page>$pyFunction:cloudflare.createCookie('http://www.elitetorrent.net/',Cookie_Jar,'Mozilla/5.0 (Windows NT 6.1; rv:14.0) Gecko/20100101 Firefox/14.0.1')</page>
        <cookieJar></cookieJar>
        </regex>
        <regex>
        <name>savecookie</name>
        <expres></expres>
        <page></page>
        <cookieJar>save[elitetorrent.lwp]</cookieJar>
        </regex>
        </item>  
 
 
 
 
 


I Have to go further ...

But it is too long to explain all ... So check more examples :

Code:




this one is multi-page ...  but it will NOT take all the pages,  just the first 25 ...

I mean that it will show a big list of movie, containing the first 25 pages of the category in the corresponding website

It has been built in order to be as generic as possible ...


        
<item>
<title>NEW! 4.32. m500    http://papystreaming.org/   [Max= 500 derniers films] </title>
<thumbnail>https://img2.picup.co/rs8/http://img2.picup.co/upload/images/CFNX.png</thumbnail>
<link>$doregex[makelist0]</link>

<regex>
<name>makelist0</name> categories list
<listrepeat><![CDATA[
   <title>[COLOR yellow] [makelist0.param2][/COLOR]</title>
   <link>$doregex[makelist]</link>
]]></listrepeat>
<expres>href=".*?\/category\/(.*?)\/".+?<i>(.*?)<</expres>
<page>http://papystreaming.org/</page>
<agent>Mozilla/5.0 (Windows NT 6.1; rv:14.0) Gecko/20100101 Firefox/14.0.1</agent>
<cookieJar></cookieJar>
</regex>

<regex>
    <name>makelist</name> movies  (all pages)
 <listrepeat><![CDATA[
   <title>[COLOR dodgerblue] [makelist.param2][/COLOR]</title>
   <link>$doregex[play]</link>
   <thumbnail>[makelist.param3]</thumbnail>
]]></listrepeat>
<expres><![CDATA[#$pyFunction
import re, requests
def GetLSProData(page_data,Cookie_Jar,m):
 
################################################################## FORMULARY #################################################################
 starturl = 'http://papystreaming.org/category/'     ### usually, ~ 'http://website.com/category/' ###
 categorylink = '[makelist0.param1]' ### check the number of the param ###
 centerurl = '/page/'      ### something like  '/page/' ###
 firstpage = 1 ### probably  = 1 ###
 endurl = '' ### can be empty ###    ### endurl without final / (SLASH) ###
#
 dataregex = 'poster" href="([^"]+)" title="([^"]+)">\n<img src="([^"]+)"'      ### To adapt ###
 headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0', 'Referer': ''}      ### To adapt ###
##############################################################################################################################################

############################## GENERIC PART #####################################
 fullurl = starturl + categorylink + centerurl + str(firstpage) + endurl #
 source = requests.get(fullurl, headers= headers).text #
 data = [] #
#
 last = 25 #
 while firstpage <= last: #
    try: #
        fullurl = starturl + categorylink + centerurl + str(firstpage) + endurl #
        source = requests.get(fullurl, headers=headers).text #
        data += re.findall(dataregex, source) #
        firstpage += 1 #
    except: #
        pass #
 return data #
##########################################################by#twogun#and#jujuuj###
]]></expres><page></page>
</regex>

<regex>
<name>play</name>
<expres><![CDATA[#$pyFunction
import re,requests
def GetLSProData(page_data,Cookie_Jar,m):
  link = 'http:' + re.findall('link":"(\W+player[^"]+)',page_data)[0].replace('\\','')
  source=requests.get(link,headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; rv:14.0) Gecko/20100101 Firefox/14.0.1','Referer':'[makelist.param1]','Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8','Connection':'keep-alive'}).text
  return re.findall('src":"([^"]+)',source)[0].replace('\\','')
]]></expres>
<page>[makelist.param1]</page>
<referer>http://papystreaming.org/category/[makelist0.param1]</referer>
<agent>Mozilla/5.0 (Windows NT 6.1; rv:14.0) Gecko/20100101 Firefox/14.0.1</agent>
<cookieJar></cookieJar>
</regex>
</item>
        


Items are here for examples, and may be outdated ...

Code:



<item>
<title> [COLOR yellow]C PAS BIEN  torrent  FR
  Films à l'affiche [/COLOR]</title>
<link>$doregex[makelist2]</link>
<thumbnail>https://www.tvaddons.ag/kodi-addons/cache/images/529c5630b6355079548bfb80a30214_icon.png</thumbnail>     

<regex>                           showing page-numbers
<name>makelist2</name>
<listrepeat><![CDATA[
<title>A l'Affiche.  Page [makelist2.param1].</title>           (  or, if mklist 1 :[makelist.param2] ,page [makelist2.param1]  )
<link>$doregex[makelist3]</link>
<referer></referer>
<thumbnail></thumbnail>
]]></listrepeat>
<expres> (.*?),</expres>
<page>$doregex[get-list-page]</page>
<cookieJar></cookieJar>
</regex>

<regex>                           just a virtual quantity of pages I decide  = 50
<name>get-list-page</name>                                           
<expres><![CDATA[#$pyFunction
def GetLSProData(page_data,Cookie_Jar,m):
 liste = list(range(52))
 return liste
]]></expres>
<page></page>
</regex>

<regex>
<name>makelist3</name>
<listrepeat><![CDATA[
    <title>[COLOR skyblue][makelist3.param4] [/COLOR]  </title>
      <link>plugin://plugin.video.quasar/play?uri=$doregex[link-torrent]</link>
      <info>[makelist3.param2]</info>
      <regex>
      <name>link-torrent</name>
      <expres>"(.*?\.torrent)"</expres>
      <page>[makelist3.param3]</page>
      <referer>http://www.cpasbien-torrents.fr/films</referer>
      </regex>
    <referer></referer>
    <thumbnail>[makelist3.param1]</thumbnail>
]]></listrepeat>
<expres>border-2">\n.*?img src="(.*?jpg)".*?\n.*?\n.*?\n.*?text">(.*?)&lt;\/p>\n.*?\n.*?\n.*?\n.*?\n.*?\n.*?\n.*?\n.*?\n.*?\n.*?\n.*?\n.*?\n.*?\n*?\n.*?h5>&lt;a href="(.*?)" class="full-link border-2" title="(.*?)"></expres>
<page>http://www.cpasbien-torrents.fr/page/[makelist2.param1]</page>
<referer>http://www.cpasbien-torrents.fr/</referer>
<cookieJar></cookieJar>
</regex>
</item>
 
avatar
jujuuj

Messages : 88
Date d'inscription : 2017-03-28

View user profile

Back to top Go down

MULTI-PAGES Makelist

Post by jujuuj on Sun 27 Aug - 15:21

INFO  for  Reverse  Engineering  Wink

Code:


 ***********************************************
 ***********************************************
i would like that £$π understand me if when I write ...
<regex>
<name>makelist</name>
<listrepeat><=!=[=C=D=A=T=A=[
<title> [makelist.param2] </title>
<link>https://www.tvaddons.ag/forums/$doregex[next-regex]</link>
<thumbnail></thumbnail>
]=]=></listrepeat>
<expres>a href="(.*?)" title="(.*?)"</expres>
<page>http://www.anyvideosite.zzz/categZ/page-1/</page>
<page>http://www.anyvideosite.zzz/categZ/page-2/</page>
<page>http://www.anyvideosite.zzz/categZ/page-3/</page>
<page>http://www.anyvideosite.zzz/categZ/page-4/</page>
<page>http://www.anyvideosite.zzz/categZ/page-5/</page>




1/ 2gun

<regex>
<name>makelist</name>
<listrepeat><![CDATA[
<title> [makelist.param2] </title>
<link> [makelist.param1] </link>
<thumbnail>http:[makelist.param2]</thumbnail>
]]></listrepeat>
<expres>a href="(.*?)" title="(.*?)"</expres>
<page>$doregex[data]</page>
</regex>

<regex>
<name>data</name>
<expres><![CDATA[#$pyFunction
import re, requests
def GetLSProData(page_data,Cookie_Jar,m):
 headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0'}
 pn = 1
 data = []

 while pn <= 5:
    page = 'http://www.anyvideosite.zzz/categZ/page-' + str(pn)
    source = requests.get(page, headers=headers).text
    data += re.findall('a href="(.*?)" title="(.*?)"', source)
    pn += 1
 return data
]]></expres>
</regex>





2/ 2gun
One way to get around this would be to use the try statement and have the loop exit if it fails.
This example sets sets the loop to 500 (as you suggested), but will used "try:" and "except:" to break out of the loop when an error occurs.
Added note: while pn <= 0: would be no limit since pn can never equal 0 because starts a 1 and only goes up in value.

Code:

<regex>
<name>data</name>
<expres><![CDATA[#$pyFunction
import re, requests
def GetLSProData(page_data,Cookie_Jar,m):
 headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0'}
 pn = 1
 data = []

 while pn <= 100:
    try:
        page = 'http://www.anyvideosite.zzz/categZ/page-' + str(pn)
        source = requests.get(page, headers=headers).text
        data += re.findall('a href="(.*?)" title="(.*?)"', source)
        pn += 1
    except:
        pass
 return data
]]></expres>
</regex>



3/   2gun   A cleaner method would be to use regex to define variable that is equal to the total number of pages. Then use that variable to define the number of times to loop.

Example: using regex to define the number of pages to read.
Note: This code in these example and will not run unless you replace the page data and regex with valid data.

Code:

<regex>
<name>data</name>
<expres><![CDATA[#$pyFunction
import re, requests
def GetLSProData(page_data,Cookie_Jar,m):
 headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0'}
 page = 'http://www.anyvideosite.zzz/categz/'
 source = requests.get(page, headers=headers).text
 count = re.findall('last page = "(\d+)"', source)[0]
 pn = 1
 data = []
 
 while pn <= int(count):
    page = 'http://www.anyvideosite.zzz/categZ/page-' + str(pn)
    source = requests.get(page, headers=headers).text
    data += re.findall('a href="(.*?)" title="(.*?)"', source)
    pn += 1
 return data
]]></expres>
</regex>




4/ ju
<regex>
<name>makelist</name>
<listrepeat><![CDATA[
<title> [makelist.param2] </title>
<link> [makelist.param1] </link>
<thumbnail>NA</thumbnail>
]]></listrepeat>
<expres>"(http://.*?.mp4)" title"(.*?)"</expres>
<page>$doregex[data]</page>
</regex>

<regex>
<name>data</name>
<expres><![CDATA[#$pyFunction
import re, requests
def GetLSProData(page_data,Cookie_Jar,m):
 headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0'}
 sn = 1
 data = []

 while sn <= 4:
    site1 = 'http://S1'
    site2 = 'http://S2blabla'
    site3 = 'http://Sthree'
    site4 = 'http://SCUATRO'
    source = requests.get(site+ str(sn), headers=headers).text
    data += re.findall('a href="(http:\/\/.*?.mp4)" title="(.*?)"', source)
    sn += 1
 return data
]]></expres>
</regex>



5/ ju
<regex>
<name>makelist</name>
<listrepeat><![CDATA[
<title> [makelist.param2] </title>
<link> [makelist.param1] </link>
<thumbnail>NA</thumbnail>
]]></listrepeat>
<expres>"(http://.*?.mp4)" title"(.*?)"</expres>
<page>$doregex[data]</page>
</regex>

<regex>
<name>data</name>
<expres><![CDATA[#$pyFunction
import re, requests
def GetLSProData(page_data,Cookie_Jar,m):
 headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0'}
 sn = 1
 dataall = []
 data1 = []
 data2 = []
 data3 = []
 data4 = []
 
 while sn <= 4:
    site1 = 'http://S1'
    site2 = 'http://S2blabla'
    site3 = 'http://Sthree'
    site4 = 'http://SCUATRO'
    source = requests.get(site+ str(sn), headers=headers).text
    data1 += re.findall('a href="(http:\/\/.*?.mp4)" title="(.*?)"', source)
    data2 += re.findall('a href="(http:\/\/.*?.mp4)" >"(.*?)"<, source)
    data3 += re.findall('a href="(http:\/\/.*?.mp4)".*?title="(.*?)"', source)
    data4 += re.findall('a href="(http:\/\/.*?.mp4)" titulo="(.*?)"', source)
    sn += 1
 dataall += data1 + data2 + data3 + data4
 return dataall
]]></expres>
</regex>


6/  streamwatcher
<regex>
<name>makelist</name>
<listrepeat><![CDATA[
<title> [makelist.param2] </title>
<link> [makelist.param1] </link>
<thumbnail>NA</thumbnail>
]]></listrepeat>
<expres><![CDATA[#$pyFunction
import re, requests, HTMLParser
def GetLSProData(page_data,Cookie_Jar,m):
 headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0'}
 sn = 1
 data = []

 while sn <= 4:
    site1 = 'http://S1'
    site2 = 'http://S2blabla'
    site3 = 'http://Sthree'
    site4 = 'http://SCUATRO'
    site= HTMLParser.HTMLParser().site1 + HTMLParser.HTMLParser().site2+ HTMLParser.HTMLParser().site3+ HTMLParser.HTMLParser().site4
    source = requests.get(site+ str(sn), headers=headers).text
    data += re.findall('a href="(http:\/\/.*?.mp4)" title="(.*?)"', source)
    sn += 1
 return data
]]></expres>
<page></page>
</regex>


7/ (1express)  twogun
<regex>
<name>data</name>
<expres><![CDATA[#$pyFunction
import re, requests
def GetLSProData(page_data,Cookie_Jar,m):
 headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0'}
 sn = 1
 data = []
 site = 'http://S1', 'http://S2blabla', 'http://Sthree', 'http://SCUATRO'
 while sn <= 4:
    source = requests.get(site[sn-1], headers=headers).text
    data += re.findall('a href="(http:\/\/.*?.mp4)" title="(.*?)"', source)
    sn += 1
 return data
]]></expres>
</regex>


8/ gujal
A slightly cleaned version of twogun's code that allows more sites rather than hard coding to 4.
Code:
<regex>
<name>data</name>
<expres><![CDATA[#$pyFunction
import re, requests
def GetLSProData(page_data,Cookie_Jar,m):
 headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0'}
 data = []
 sites = ['http://S1', 'http://S2blabla', 'http://Sthree', 'http://SCUATRO']
 for site in sites:
    source = requests.get(site, headers=headers).text
    data += re.findall('a href="(http:\/\/.*?.mp4)" title="(.*?)"', source)
 return data
]]></expres>
</regex>



6/  streamwatcher
<regex>
<name>makelist</name>
<listrepeat><![CDATA[
<title> [makelist.param2] </title>
<link> [makelist.param1] </link>
<thumbnail>NA</thumbnail>
]]></listrepeat>
<expres><![CDATA[#$pyFunction
import re, requests, HTMLParser
def GetLSProData(page_data,Cookie_Jar,m):
 headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0'}
 sn = 1
 data = []

 while sn <= 4:
    site1 = 'http://S1'
    site2 = 'http://S2blabla'
    site3 = 'http://Sthree'
    site4 = 'http://SCUATRO'
    site= HTMLParser.HTMLParser().site1 + HTMLParser.HTMLParser().site2+ HTMLParser.HTMLParser().site3+ HTMLParser.HTMLParser().site4
    source = requests.get(site+ str(sn), headers=headers).text
    data += re.findall('a href="(http:\/\/.*?.mp4)" title="(.*?)"', source)
    sn += 1
 return data
]]></expres>
<page></page>
</regex>



86/    
[code]
<!--  MULTI-SITE (one-webpage-per-site) / One-COLLECTIVE-EXPRES  -->
<regex>
<name>makelist</name>
<listrepeat><![CDATA[
<title> [makelist.param2] </title>
<link> [makelist.param1] </link>
<thumbnail>NA</thumbnail>
]]></listrepeat>
<expres><![CDATA[#$pyFunction
import re, requests
def GetLSProData(page_data,Cookie_Jar,m):
 headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0'}
 data = []
 sites = ['http://S1', 'http://S2blabla', 'http://Sthree', 'http://SCUATRO']
 for site in sites:
    source = requests.get(site, headers=headers).text
    data += re.findall('a href="(http:\/\/.*?.mp4)" title="(.*?)"', source)
 return data
]]></expres>
<page></page>
</regex>
[/code]

GJ geeks !



 <!--  MULTI-PAGE (of onevideosite.com/categZ/page...) -->







9/ (various expres) twogun

semi-working code:
Code:

<item>
<title>test</title>
<link>$doregex[makelist]</link>
<regex>
<name>makelist</name>
<listrepeat><![CDATA[
<title>[makelist.param2]</title>
<link>[makelist.param1]</link>
<info>[makelist.param1]</info>
]]></listrepeat>
<expres>u'([^']+)', u'([^']+)'</expres>
<page>$doregex[data]</page>
</regex>

<regex>
<name>data</name>
<expres><![CDATA[#$pyFunction
import re, requests
def GetLSProData(page_data,Cookie_Jar,m):
 headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0'}
 sn = 1
 data = []

 while sn <= 4:
    site = 'http://pastebin.com/raw/YA1vwh0g', 'http://pastebin.com/raw/sF5Sjbq8', 'http://pastebin.com/raw/GAMsGVcu','http://pastebin.com/raw/KdYPRFRU'
    reg = '(http:\/\/.*?.mp4)" title="(.*?)"', 'a href="(http:\/\/.*?.mp4)" >"(.*?)"<', 'a href="(http:\/\/.*?.mp4)".*?title="(.*?)"', 'a href="(http:\/\/.*?.mp4)" titulo="(.*?)"'
    source = requests.get(site[sn - 1], headers=headers).text
    data += re.findall(reg[sn - 1], source)
    sn += 1
 return data
]]></expres>
<page></page>
</regex>
</item>

I hope this has been helpful.





98 /

<!--bof trying building MULTI-SITE / MULTI EXPRES , but in "Gujal's way"   -->
<item>
<title>test</title>
<link>$doregex[makelist]</link>
<regex>
<name>makelist</name>
<listrepeat><![CDATA[
<title>[makelist.param2]</title>
<link>[makelist.param1]</link>
<info>[makelist.param1]</info>
]]></listrepeat>
<expres>u'([^']+)', u'([^']+)'</expres>
<page>$doregex[data]</page>
</regex>

<regex>
<name>data</name>
<expres><![CDATA[#$pyFunction
import re, requests
def GetLSProData(page_data,Cookie_Jar,m):
 headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0'}
 data = []
 sites = ['http://S1', 'http://S2blabla', 'http://Sthree', 'http://SCUATRO']
 regs = ['(http:\/\/.*?.mp4)" title="(.*?)"', 'a href="(http:\/\/.*?.mp4)" >"(.*?)"<', 'a href="(http:\/\/.*?.mp4)".*?title="(.*?)"', 'a href="(http:\/\/.*?.mp4)" titulo="(.*?)"']
 for site in sites:
    source = requests.get(site, headers=headers).text
    for reg in regs:
    data += re.findall(reg, source)
 return data
]]></expres>
<page></page>
</regex>
</item>

problem :  i think that code will test every expres on every site ... not exactly the idea ...


question :  i don't really understand the   u' '   and the   ,   in the first expres (? ...)  
 
 
986 /

<!-- bof  MULTI-SITE / MULTI EXPRES -->
<item>
<title>test</title>
<link>$doregex[makelist]</link>
<regex>
<name>makelist</name>
<listrepeat><![CDATA[
<title>[makelist.param2]</title>
<link>[makelist.param1]</link>
<info>[makelist.param1]</info>
]]></listrepeat>
<expres>u'([^']+)', u'([^']+)'</expres>
<page>$doregex[data]</page>
</regex>

<regex>
<name>data</name>
<expres><![CDATA[#$pyFunction
import re, requests
def GetLSProData(page_data,Cookie_Jar,m):
 headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0'}
 data = []
 sites = ['http://S1', 'http://S2blabla', 'http://Sthree', 'http://SCUATRO']
 regs = ['(http:\/\/.*?.mp4)" title="(.*?)"', 'a href="(http:\/\/.*?.mp4)" >"(.*?)"<', 'a href="(http:\/\/.*?.mp4)".*?title="(.*?)"', 'a href="(http:\/\/.*?.mp4)" titulo="(.*?)"']
 for site in sites:
    source = requests.get(site, headers=headers).text
    for reg in regs:
    data += re.findall(reg, source)
 return data
]]></expres>
<page></page>
</regex>
</item>




10/ 2gun
semi-working code:
Code:

<item>
<title>bof  test</title>
<link>$doregex[makelist]</link>
<regex>
<name>makelist</name>
<listrepeat><![CDATA[
<title>[makelist.param2]</title>
<link>[makelist.param1]</link>
<info>[makelist.param1]</info>
]]></listrepeat>
<expres>u'([^']+)', u'([^']+)'</expres>
<page>$doregex[data]</page>
</regex>

<regex>
<name>data</name>
<expres><![CDATA[#$pyFunction
import re, requests
def GetLSProData(page_data,Cookie_Jar,m):
 headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0'}
 sn = 1
 data = []

 while sn <= 4:
    site = 'http://pastebin.com/raw/YA1vwh0g', 'http://pastebin.com/raw/sF5Sjbq8', 'http://pastebin.com/raw/GAMsGVcu','http://pastebin.com/raw/KdYPRFRU'
    reg = '(http:\/\/.*?.mp4)" title="(.*?)"', 'a href="(http:\/\/.*?.mp4)" >"(.*?)"<', 'a href="(http:\/\/.*?.mp4)".*?title="(.*?)"', 'a href="(http:\/\/.*?.mp4)" titulo="(.*?)"'
    source = requests.get(site[sn - 1], headers=headers).text
    data += re.findall(reg[sn - 1], source)
    sn += 1
 return data
]]></expres>
<page></page>
</regex>
</item>

I hope this has been helpful.





11 / 2gun
<!--  MULTI-SITE (one-webpage-per-site) + MULTI-EXPRES (parameters must be in same order) -->
<item>
<title>test</title>
<link>$doregex[makelist]</link>
<regex>
<name>makelist</name>
<listrepeat><![CDATA[
<title>[makelist.param2]</title>
<link>[makelist.param1]</link>
<info>[makelist.param1]</info>
]]></listrepeat>
<expres>u'([^']+)', u'([^']+)'</expres>
<page>$doregex[data]</page>
</regex>

<regex>
<name>data</name>
<expres><![CDATA[#$pyFunction
import re, requests
def GetLSProData(page_data,Cookie_Jar,m):
 headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0'}
 data = []
 sites = [('http://pastebin.com/raw/YA1vwh0g', '(http:\/\/.*?.mp4)" title="(.*?)"'), ('http://pastebin.com/raw/sF5Sjbq8', 'a href="(http:\/\/.*?.mp4)" >"(.*?)"<'), ('http://pastebin.com/raw/GAMsGVcu', 'a href="(http:\/\/.*?.mp4)".*?title="(.*?)"'), ('http://pastebin.com/raw/KdYPRFRU', 'a href="(http:\/\/.*?.mp4)" titulo="(.*?)"')]

 for site, reg in sites:
    source = requests.get(site, headers=headers).text
    data += re.findall(reg, source)
 return data
]]></expres>
<page></page>
</regex>
</item>


////////////





21 /


<!--  1-SITE (1-category)  /  JOIN-all-PAGES  /  1-EXPRES    Clean BUT check "last-page" -->
<item>
<title> 1-SITE  /  JOIN-PAGES    (data-in-regex[data])</title>
<link>$doregex[makelist]</link>
    <regex>
    <name>makelist</name>
    <listrepeat><![CDATA[
    <title> [makelist.param2] </title>
    <link> [makelist.param1] </link>
    <thumbnail>http:[makelist.param2]</thumbnail>
    ]]></listrepeat>
    <expres>a href="(.*?)" title="(.*?)"</expres>
    <page>$doregex[data]</page>
    </regex>
<regex>
<name>data</name>
<expres><![CDATA[#$pyFunction
import re, requests
def GetLSProData(page_data,Cookie_Jar,m):
 headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0'}
 page = 'http://www.anyvideosite.zzz/categz/'
 source = requests.get(page, headers=headers).text
 count = re.findall('last page = "(\d+)"', source)[0]
 pn = 1
 data = []
 while pn <= int(count):
    page = 'http://www.anyvideosite.zzz/categZ/page-' + str(pn)
    source = requests.get(page, headers=headers).text
    data += re.findall('a href="(.*?)" title="(.*?)"', source)
    pn += 1
 return data
]]></expres>
</regex>
</item>




22 /

<!--  1-SITE (1-category)  /  JOIN-all-PAGES  /  1-EXPRES     -->
<item>
<title> 1-SITE  /  JOIN-PAGES    (data-in-regex[data])</title>
<link>$doregex[makelist]</link>
    <regex>
    <name>makelist</name>
    <listrepeat><![CDATA[
    <title> [makelist.param2] </title>
    <link> [makelist.param1] </link>
    <thumbnail>http:[makelist.param2]</thumbnail>
    ]]></listrepeat>
    <expres>a href="(.*?)" title="(.*?)"</expres>
    <page>$doregex[data]</page>
    </regex>
<regex>
<name>data</name>
<expres><![CDATA[#$pyFunction
import re, requests
def GetLSProData(page_data,Cookie_Jar,m):
 headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0'}
 page = 'http://www.anyvideosite.zzz/categz/'
 source = requests.get(page, headers=headers).text
 count = re.findall('last page = "(\d+)"', source)[0]
 pn = 1
 data = []
 while pn <= int(count):
    page = 'http://www.anyvideosite.zzz/categZ/page-' + str(pn)
    source = requests.get(page, headers=headers).text
    data += re.findall('a href="(.*?)" title="(.*?)"', source)
    pn += 1
 return data
]]></expres>
</regex>
</item>

 sites = ['http://S1', 'http://S2blabla', 'http://Sthree', 'http://SCUATRO']
 for site in sites:
    source = requests.get(site, headers=headers).text
    data += re.findall('a href="(http:\/\/.*?.mp4)" title="(.*?)"', source)



  
avatar
jujuuj

Messages : 88
Date d'inscription : 2017-03-28

View user profile

Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum