Je possède un Hébergement Web (en fait vous êtes actuellement dessus…) chez OVH. N'étant pas vraiment satisfait des outils de statistiques proposés : Urchin n'est plus maintenu, OVHcloud Web Statistics qui est encore jeune et Awstats que je trouve vraiment bien mais qui ne propose de voir l'activité que sur une journée.
C'est la raison pour laquelle j'ai décidé de voir comment télécharger les fichiers de logs afin de pouvoir les exploiter manuellement.
Nous ne pouvons pas utiliser les identifiants principaux pour récupérer les journaux, nous devons donc créer un compte spécial dédié à cela depuis l'espace client OVHCloud Web.
Et voila nous avons maintenant toutes les informations nécessaires pour pouvoir télécharger nos journaux.
[user@host ~]$ USR=ovhlogsuser
[user@host ~]$ PASS=Myverycomplexpassw0rD
[user@host ~]$ URL=https://log.clusterXXX.hosting.ovh.net/shebangthedolphins.net/
[user@host ~]$ DOMAIN=$(awk -F '/' '{ print $4 }' <<< $URL)
[user@host ~]$ DOMAIN=shebangthedolphins.net
[user@host ~]$ wget --http-user="$USR" --http-password="$PASS" -A *gz -r -nd ""$URL"/logs/logs-10-2020/"
[user@host ~]$ ls -lh
total 596K
-rw------- 1 std std 55 10 déc. 2019 robots.txt.tmp
-rw-r--r-- 1 std std 24K 2 oct. 02:56 shebangthedolphins.net-01-10-2020.log.gz
-rw-r--r-- 1 std std 17K 3 oct. 02:19 shebangthedolphins.net-02-10-2020.log.gz
[…]
-rw-r--r-- 1 std std 14K 31 oct. 06:08 shebangthedolphins.net-30-10-2020.log.gz
-rw-r--r-- 1 std std 52K 1 nov. 06:08 shebangthedolphins.net-31-10-2020.log.gz
[user@host ~]$ wget --http-user="$USR" --http-password="$PASS" "$URL"/logs/logs-$(/bin/date --date='1 days ago' '+%m-%Y')/"$DOMAIN"-$(/bin/date --date='1 days ago' '+%d-%m-%Y').log.gz
[user@host ~]$ ls -lh
total 20K
-rw-r--r-- 1 std std 18K 25 nov. 06:29 shebangthedolphins.net-24-11-2020.log.gz
[user@host ~]$ perl-rename -v 's/(.*)-(\d\d)-(\d\d)-(\d\d\d\d)(.*)/$4-$3-$2-$1$5/' *gz
[user@host ~]$ ls -lh
total 596K
-rw-r--r-- 1 std std 24K 2 oct. 02:56 2020-10-01-shebangthedolphins.net.log.gz
-rw-r--r-- 1 std std 17K 3 oct. 02:19 2020-10-02-shebangthedolphins.net.log.gz
-rw-r--r-- 1 std std 14K 4 oct. 02:32 2020-10-03-shebangthedolphins.net.log.gz
[user@host ~]$ wget --http-user="$USR" --http-password="$PASS" "$URL"/logs/logs-$(/bin/date --date='1 days ago' '+%m-%Y')/"$DOMAIN"-$(/bin/date --date='1 days ago' '+%d-%m-%Y').log.gz -O /tmp/$(/bin/date --date='1 days ago' '+%Y-%m-%d')-"$DOMAIN".log.gz
[user@host ~]$ ls -lh /tmp/*gz
-rw-r--r-- 1 std std 18K 25 nov. 06:29 /tmp/2020-11-24-shebangthedolphins.net.log.gz
[user@host ~]$ for DAY in $(seq 1 30); do wget --http-user="$USR" --http-password="$PASS" "$URL"/logs/logs-$(/bin/date --date=''$DAY' days ago' '+%m-%Y')/"$DOMAIN"-$(/bin/date --date=''$DAY' days ago' '+%d-%m-%Y').log.gz -O /tmp/$(/bin/date --date=''$DAY' days ago' '+%Y-%m-%d')-"$DOMAIN".log.gz; done
[user@host ~]$ ls -lh /tmp/*gz
-rw-r--r-- 1 std std 17K 26 oct. 06:12 /tmp/2020-10-25-shebangthedolphins.net.log.gz
-rw-r--r-- 1 std std 14K 27 oct. 06:44 /tmp/2020-10-26-shebangthedolphins.net.log.gz
[...]
-rw-r--r-- 1 std std 18K 24 nov. 06:38 /tmp/2020-11-23-shebangthedolphins.net.log.gz
-rw-r--r-- 1 std std 18K 25 nov. 06:29 /tmp/2020-11-24-shebangthedolphins.net.log.gz
PS C:\Users\std> $user = "ovhlogsuser"
PS C:\Users\std> $pass = "Myverycomplexpassw0rD"
PS C:\Users\std> $secpasswd = ConvertTo-SecureString $pass -AsPlainText -Force
PS C:\Users\std> $credential = New-Object System.Management.Automation.PSCredential($user, $secpasswd)
PS C:\Users\std> $domain = "shebangthedolphins.net"
PS C:\Users\std> $url = "https://log.clusterXXX.hosting.ovh.net/$domain/"
PS C:\Users\std> Invoke-WebRequest -Credential $credential -Uri ("$url" + "logs/logs-" + $((Get-Date).AddDays(-1).ToString("MM-yyyy")) + "/$domain" + "-" + $((Get-Date).AddDays(-1).ToString("dd-MM-yyyy")) + ".log.gz") -OutFile "$((Get-Date).AddDays(-1).ToString("yyyy-MM-dd"))-$domain.log"
PS C:\Users\std> dir
Directory: C:\Users\std
ode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 05/12/2020 15:45 360238 2020-12-04-shebangthedolphins.net.log
PS C:\Users\std> 1..30 | ForEach-Object { Invoke-WebRequest -Credential $credential -Uri ("$url" + "logs/logs-" + $((Get-Date).AddDays(-"$_").ToString("MM-yyyy")) + "/$domain" + "-" + $((Get-Date).AddDays(-"$_").ToString("dd-MM-yyyy")) + ".log.gz") -OutFile "$((Get-Date).AddDays(-"$_").ToString("yyyy-MM-dd"))-$domain.log" }
Maintenant que nous avons téléchargé des fichiers journaux, voyons voir avec quelques exemples comment en extraire des informations.
[user@host ~]$ DOMAIN=shebangthedolphins.net
[user@host ~]$ zgrep -viE 'Bytespider|Trident|bot|404|GET \/ HTTP|BingPreview|Seekport Crawler' 2020-11-24-shebangthedolphins.net.log.gz | grep html | awk '{ print $1" "$11 }' | sort | uniq | awk '{ print $2 }' | sort | uniq -c | sort -n | tr -s "[ ]" | sed 's/^ //' | grep "$DOMAIN"
1 "http://shebangthedolphins.net/backup_burp.html"
1 "http://shebangthedolphins.net/prog_autoit_backup.html"
1 "http://shebangthedolphins.net/prog_powershell_kesc.html"
1 "http://shebangthedolphins.net/vpn_ipsec_06linux-to-linux_tunnel-x509.html"
1 "https://shebangthedolphins.net/fr/windows_grouppolicy_execute_powershell_script.html"
1 "https://shebangthedolphins.net/gnulinux_courier.html"
1 "https://shebangthedolphins.net/gnulinux_vnc_remotedesktop.html"
1 "https://shebangthedolphins.net/vpn_openvpn_windows_server.html"
1 "https://shebangthedolphins.net/windows_icacls.html"
1 "http://www.shebangthedolphins.net/vpn_ipsec_03linux-to-windows_transport-psk.html"
2 "https://shebangthedolphins.net/windows_mssql_alwayson.html"
3 "https://shebangthedolphins.net/fr/vpn_openvpn_buster.html"
7 "https://shebangthedolphins.net/ubiquiti_ssh_commands.html"
[user@host ~]$ DOMAIN=shebangthedolphins.net
[user@host ~]$ for i in *.log.gz; do echo "------------"; echo "$i"; zgrep -viE 'Bytespider|Trident|bot|404|GET \/ HTTP|BingPreview|Seekport Crawler' "$i" | grep html | awk '{ print $1" "$11 }' | grep "$DOMAIN" | sort | uniq | awk '{ print $2 }' | wc -l; done
------------
2020-11-19-shebangthedolphins.net.log.gz
19
------------
2020-11-20-shebangthedolphins.net.log.gz
24
------------
2020-11-21-shebangthedolphins.net.log.gz
8
------------
2020-11-22-shebangthedolphins.net.log.gz
16
------------
2020-11-23-shebangthedolphins.net.log.gz
15
------------
2020-11-24-shebangthedolphins.net.log.gz
13
[user@host ~]$ DOMAIN=shebangthedolphins.net
[user@host ~]$ YEAR=2020
[user@host ~]$ for i in $(seq -w 1 12); do echo "------------"; echo "$YEAR-$i"; zgrep -viE 'Bytespider|Trident|bot|404|GET \/ HTTP|BingPreview|Seekport Crawler' $YEAR-"$i"*.log.gz | grep html | awk '{ print $1" "$11 }' | grep "$DOMAIN" | sort | uniq | awk '{ print $2 }' | wc -l; done 2>/dev/null
------------
2020-01
101
------------
2020-02
73
------------
2020-03
92
------------
2020-04
91
------------
2020-05
87
------------
2020-06
73
------------
2020-07
81
------------
2020-08
97
------------
2020-09
135
[user@host ~]$ zgrep "html HTTP.*200.*[0-9]\{4\} \"\(https://www.google\|https://www.bing\|https://www.qwant\|https://duckduckgo\)" 2021-08-20-shebangthedolphins.net.log.gz | grep html | awk '{ print $1" "$7 }' | sort -n | uniq | awk '{ print $2 }' | sort | uniq -c | sort -n
[…]
8 /windows_grouppolicy_manage_searchbox.html
9 /windows_icacls.html
10 /vpn_openvpn_bullseye.html
11 /gnulinux_vnc_remotedesktop.html
11 /windows_rds_mfa.html
12 /fr/vpn_openvpn_windows_server.html
14 /windows_grouppolicy_shutdown.html
15 /gnulinux_nftables_examples.html
25 /vpn_openvpn_windows_server.html
72 /ubiquiti_ssh_commands.html
[user@host ~]$ zcat *.log.gz | grep "html HTTP.*200.*[0-9]\{4\} \"\(https://www.google\|https://www.bing\|https://www.qwant\|https://duckduckgo\)" | grep html | awk '{ print $1" "$7 }' | sort -n | uniq | awk '{ print $2 }' | sort | uniq -c | sort -n
[…]
11 /gnulinux_vnc_remotedesktop.html
11 /openbsd_packetfilter.html
12 /fr/gnulinux_nftables_examples.html
15 /windows_imule.html
20 /backup_burp.html
36 /fr/vpn_openvpn_buster.html
38 /vpn_openvpn_windows_server.html
112 /ubiquiti_ssh_commands.html
[user@host ~]$ YEAR=2021
[user@host ~]$ for i in $(seq -w 1 12); do echo "------------"; echo "$YEAR-$i"; zcat $YEAR-"$i"*.log.gz | grep "html HTTP.*200.*[0-9]\{4\} \"\(https://www.google\|https://www.bing\|https://www.qwant\|https://duckduckgo\)" | grep html | awk '{ print $1" "$7 }' | sort -n | uniq | wc -l; done 2>/dev/null
------------
2021-01
1311
------------
2021-02
1650
------------
2021-03
2566
------------
2021-04
3511
------------
2021-05
5452
------------
2021-06
6922
------------
2021-07
6437
------------
2021-08
4788
Des scripts faits maison que j'utilise pour voir rapidement l'évolution des résultats des pages consultées depuis les moteurs de recherche.
#! /bin/sh
for LOGS in *.log.gz; do
echo "-----------------------------"
echo "LOGS : $LOGS"
for i in www.google www.bing www.qwant duckduckgo; do
RESULT=$(zgrep "html HTTP.*200.*[0-9]\{4\} \"https://"$i"" $LOGS | wc -l)
echo "$i = $RESULT"
done
done
-----------------------------
LOGS : 2020-11-21-shebangthedolphins.net.log.gz
www.google = 6
www.bing = 1
www.qwant = 0
duckduckgo = 4
-----------------------------
LOGS : 2020-11-22-shebangthedolphins.net.log.gz
www.google = 10
www.bing = 2
www.qwant = 0
duckduckgo = 6
-----------------------------
LOGS : 2020-11-23-shebangthedolphins.net.log.gz
www.google = 7
www.bing = 6
www.qwant = 2
duckduckgo = 1
#! /bin/sh
for LOGS in "$1"*.log.gz; do
TOTAL=0
echo "-----------------------------"
echo "LOGS : $LOGS"
for i in www.google www.bing www.qwant duckduckgo www.ecosia.org; do
RESULT=$(zgrep "html HTTP.*200.*[0-9]\{4\} \"https://"$i"" $LOGS | wc -l)
echo "$i = $RESULT"
TOTAL=$(($TOTAL+$RESULT))
done
echo "TOTAL $(date -d $(awk -F'-' '{ print $1"-"$2"-"$3 }' <<< $LOG) '+%A') : $TOTAL"
done
[user@host ~]$ sh ./scripts.sh 2021-02-1
-----------------------------
LOGS : 2021-02-10-shebangthedolphins.net.log.gz
www.google = 31
www.bing = 15
www.qwant = 1
duckduckgo = 23
www.ecosia.org = 0
TOTAL mercredi : 70
-----------------------------
LOGS : 2021-02-11-shebangthedolphins.net.log.gz
www.google = 38
www.bing = 11
www.qwant = 2
duckduckgo = 24
www.ecosia.org = 0
TOTAL jeudi : 75
-----------------------------
#! /bin/sh
# Role : Extract ovh logs stats
# Author : http://shebangthedolphins.net/
pages=false
csv=false
usage()
{
echo "usage: ./std_ovh.sh YYYY-MM-DD"
echo "[-c] : export total stats to /tmp/stats.csv file"
echo "[-p <url>|<XX most viewed pages>] : export specific <url> or XX most viewed urls to /tmp/stats.csv file"
echo "ex : ./std_ovh.sh 2021-03"
echo "ex : ./std_ovh.sh 2021-03 -c"
echo "ex : ./std_ovh.sh 2021-03 -p vpn_openvpn_windows_server.html"
echo "ex : ./std_ovh.sh 2021-03 -p 10"
exit 3
}
case "$1" in
*)
LOGS=$1
shift # Remove the first argument (wich will be for example 2021-09-)
while getopts "p:ch" OPTNAME; do
case "$OPTNAME" in
p)
ARGP=${OPTARG}
pages=true
;;
c)
csv=true
;;
h)
usage
;;
*)
usage
;;
esac
done
esac
# show help if no arguments or -c AND -p are set
if [[ ( -z "$LOGS" ) || ( $pages == "true" && $csv == "true" ) ]] ; then
usage
fi
# create /tmp/stats.csv header
if $csv; then echo "date,google,bing,qwant,ddg,ecosia" > /tmp/stats.csv; fi
if $pages; then
if [[ "$ARGP" =~ ^[0-9]+$ ]]; then
HEAD=$ARGP
HTML="html"
else
HTML=$ARGP
HEAD=1
fi
for i in $(zgrep "$HTML HTTP.*200.*[0-9]\{4\} \"https://\(www.google\|www.bing\|www.qwant\|duckduckgo\|www.ecosia.org\)" $LOGS*.log.gz | sed 's/.*GET \(.*\) HTTP.*/\1/' | sort | uniq -c | sort -rn | head -n $HEAD | awk '{ print $2 }');
do
csv_header=$csv_header,$i
done
csv_header="date"$csv_header
echo "$csv_header" > /tmp/stats.csv
TMPFILE=mktemp #create mktemp file and put path inside TMPFILE variable. This file is used to improve perf (we store logs needed only in it).
fi
for LOGS in $(ls -1 $LOGS*.log.gz); do
if $pages
then
csv_data=$(date -d $(awk -F'-' '{ print $1"-"$2"-"$3 }' <<< $LOGS) '+%Y.%m.%d')
zgrep "html HTTP.*200.*[0-9]\{4\} \"https://\(www.google\|www.bing\|www.qwant\|duckduckgo\|www.ecosia.org\)" $LOGS > $TMPFILE #put interesting results inside TMPFILE
for i in $(sed 's/,/\n/g' /tmp/stats.csv | grep "html")
do
csv_data=$csv_data","$(zgrep "$i HTTP.*200.*[0-9]\{4\} \"https://\(www.google\|www.bing\|www.qwant\|duckduckgo\|www.ecosia.org\)" $TMPFILE | wc -l)
done
echo "$csv_data" >> /tmp/stats.csv
else
TOTAL=0
echo "-----------------------------"
echo "LOGS : $LOGS"
CSV=$(awk -F"-" '{ print $1"-"$2"-"$3 }' <<< $LOGS)
for i in www.google www.bing www.qwant duckduckgo www.ecosia.org; do
RESULT=$(zgrep "html HTTP.*200.*[0-9]\{4\} \"https://"$i"" $LOGS | wc -l)
echo "$i = $RESULT"
TOTAL=$(($TOTAL+$RESULT))
if $csv; then
CSV=$CSV,"$RESULT"
fi
done
echo "TOTAL $(date -d $(awk -F'-' '{ print $1"-"$2"-"$3 }' <<< $LOGS) '+%A %d %b %Y') : $TOTAL"
if $csv; then echo "$CSV" >> /tmp/stats.csv; fi
fi
done
if $pages; then rm $TMPFILE; fi #remove TMPFILE
[user@host ~]$ sh ./std_ovh.sh 2021- -c
[user@host ~]$ tail /tmp/stats.csv
date,google,bing,qwant,ddg,ecosia
2021-03-30,82,10,1,38,0
2021-03-31,87,26,5,32,0
[…]
2021-04-07,70,17,5,20,1
2021-04-08,71,19,5,29,0
[user@host ~]$ sh ./std_ovh.sh 2021-09 -p 3
[user@host ~]$ tail /tmp/stats.csv
date,/ubiquiti_ssh_commands.html,/vpn_openvpn_bullseye.html,/vpn_openvpn_windows_server.html
2021.09.01,89,28,39
2021.09.02,109,19,41
[…]
2021.09.06,82,33,44
2021.09.07,94,37,29
[user@host ~]$ sh ./std_ovh.sh 2021-09 -p vpn_openvpn_bullseye.html
[user@host ~]$ tail /tmp/stats.csv
date,/vpn_openvpn_bullseye.html
2021.09.01,28
2021.09.02,19
[…]
2021.09.06,33
2021.09.07,37
PS C:\ > $domain = "shebangthedolphins.net"
PS C:\ > Select-String .\2020-11-05-shebangthedolphins.net.log -NotMatch -Pattern "Bytespider","Trident","bot","404","GET / HTTP","BingPreview","Seekport Crawler" | Select-String -Pattern "html" | %{"{0} {1}" -f $_.Line.ToString().Split(' ')[0],$_.Line.ToString().Split(' ')[10]} | Select-String -Pattern "$domain.*html" | Sort-Object | Get-Unique | %{"{0}" -f $_.Line.ToString().Split(' ')[1]} | group -NoElement | Sort-Object Count | %{"{0} {1}" -f $_.Count, $_.Name }
1 "https://shebangthedolphins.net/fr/prog_introduction.html"
1 "https://shebangthedolphins.net/fr/prog_sh_check_snmp_synology.html"
1 "http://shebangthedolphins.net/openbsd_network_interfaces.html"
1 "https://shebangthedolphins.net/fr/menu.html"
1 "https://shebangthedolphins.net/fr/windows_commandes.html"
1 "https://shebangthedolphins.net/fr/windows_run_powershell_taskschd.html"
1 "http://shebangthedolphins.net/prog_sh_check_snmp_synology.html"
1 "https://shebangthedolphins.net/fr/windows_grouppolicy_reset.html"
1 "https://shebangthedolphins.net/fr/windows_grouppolicy_update_policy.html"
1 "https://shebangthedolphins.net/virtualization_kvm_windows10.html"
1 "https://shebangthedolphins.net/windows_event_on_usb.html"
1 "https://shebangthedolphins.net/prog_powershell_movenetfiles.html"
1 "http://shebangthedolphins.net/prog_autoit_backup.html"
1 "https://shebangthedolphins.net/fr/vpn_openvpn_buster.html"
1 "http://shebangthedolphins.net/prog_powershell_kesc.html"
1 "https://shebangthedolphins.net/index.html"
1 "http://shebangthedolphins.net/gnulinux_courier.html"
4 "https://shebangthedolphins.net/ubiquiti_ssh_commands.html"
6 "https://shebangthedolphins.net/vpn_openvpn_windows_server.html"
Contact :