Various Notes

LSP V2
- Currently active articles from after the redesign of the LSP website.
- Many articles were removed, and all previous attachments, mugshots, etc were removed.
- The dates for the current (V2) version of the LSP site are from 2020-03-13 (Y-m-d) to the present.
LSP V1
- Archived articles from the previous incarnation of the LSP website which was removed around 2023-06-05 (Y-m-d).
- These include articles from 2008-01-01 to 2023-06-04.
- The attachments for these articles are accessible.
AG V2
- Current articles starting around 2024-05-22.
- Website changed and needed to refactor for retrieval. Tables were split into v1 and v2.
AG V1
- All archived articles from the Louisiana Attorney General's website.
- The attachments which could be downloaded are available for the articles.
Statistics
- The statistics aren't perfect.
- Example. Using the websites built-in page search only searching the TITLE contents. Whereas, the statistics pages themselves are searching the full article contents. This will produce a mismatch due to disparity between article titles and contents.
- Reasons for these offsets include some articles merely mentioning some keywords.
- Another example of incorrect offset is under LSP Covid category for '1' - 'arrests'.
- Another example, on LSP V2, is when LSP reuses the same article to post an update. The url and content will change, resulting in this database needing to store two copies of what appears to be one article on their website.
Extras
- Issues are corrected within the data as found. Things such as: (1) LSP forgets to label troop category, (2) wrong troop category or formatted incorrectly.
- Data from the original news sites are kept as-is when retrieved. This means certain issues won't be fixed. On some articles there will be odd formatting due to the author of the article going stupid with colors/inline styles/etc. Some of these issues are corrected upon page load, and some will retain the odd formatting. Regardless, the original data remains in the database.
- More issues which won't be corrected is, for example, articles in the AG's V2 tables which contain contents from previous articles. This isn't an error on this end. That is how the data was posted to the new's sites and the intent is to keep it preserved as it was posted. This means keeping those errors even if it screws up the statistics.
- In view_all pages the table header columns can be clicked to sort. This may take some time on slower machines, and especially LSP V1. The full dataset (without content) is loaded locally on page load.

Article Specific Notes

id record_id object_id note
1 9414 t_lsp_articles_v1 This record was posted around 2023-04-02 and was given the wrong post date of 2023-04-22.
2 9684 t_ag_articles_v1 The article of the AG's website has an error where the final person in the list of arrested is missing due to the author's error. The final person should be displayed with this information: 'Jalen Anthony Walker (B/M, DOB 04/08/1992) – 2 counts felony carnal knowledge of juvenile, indecent behavior with juvenile, computer aided solicitation of a minor'. The source checked to compare for this error is: https://www.wwltv.com/article/news/crime/63-arrested-across-louisiana-for-child-pornography-abuse/289-dfe537ba-dbf2-402f-b982-28cdd41c0899.
3 4911 t_lsp_articles_v1 This article does not have any attachments on the LSP site, despite the articles claim that a photo was attached.
4 150 t_lsp_articles_v2 Article on LSP site does not have the bail amount. The bail amount of $70,400 was found on the following news site for the same article: https://www.kalb.com/2023/06/07/glenmora-man-arrested-500-counts-child-porn.
5 1069 t_lsp_articles_v2 Article was missing troop category. Troop L.
6 1057 t_lsp_articles_v2 Article was missing troop category. Troop D.
7 2 t_lsp_articles_v2 Article was missing troop category. Troop E.
8 1064 t_lsp_articles_v2 Article was missing troop category. Troop D.
9 1065 t_lsp_articles_v2 Article was missing troop category. Troop D.
10 126 t_lsp_articles_v2 Article was missing troop category. Troop B.
11 260 t_lsp_articles_v2 Article category improperly set to 'Statewide News Releases'. Corrected for database.
12 261 t_lsp_articles_v2 Article category improperly set to 'Statewide News Releases'. Corrected for database.
13 1094 t_lsp_articles_v2 Article was missing troop category. Troop A.
14 1103 t_lsp_articles_v2 This article is an update/replacement of article 1098 on the LSP website. New content and url. As such, this article will increase the total count of articles by 1 above the actual amount still on the website.
15 10755 t_ag_articles_v1 There were at one time two images which can be assumed to be mugshots. However, upon scraping, those links are dead.
16 13174 t_ag_articles_v1 This article has only the heading without any content. This is correct on the AG website as well. Edit: the article was removed from the ag's website shortly after the scrape.
17 13175 t_ag_articles_v1 First article with the new Attorney General Liz Murrill. Also, there is a link to a pdf in the article body. However, the link is not a direct link, rather a redirection using a marketing affliate company (for them to attempt tracking on how many people are clicking on the attachment most likely). Therefore, manual retrieval of the attachment was done. A possible fix is to follow every link and test for wanted file extensions after the redirect, but that isn't implemented at the moment.
18 1181 t_lsp_articles_v2 Article was missing troop category. Troop A.
19 1217 t_lsp_articles_v2 The publish date provided from LSP is mistyped. Correct publish date is 2024-03-14. The category is also missing, but reading article shows it to be headquarters news release (Statewide).
20 13226 t_ag_articles_v1 The two attachments within the article have broken links which cause DNS errors. They don't exist. The attachments were supposed to be images.
21 1217 t_lsp_articles_v2 The arrest stats for Statewide is one less due to the author using the date 0001-01-1. It does appear in total counts though. This applies to v_lsp_articles_sex_arrests_by_year_by_troop_v2.
22 24 t_ag_articles_v2 Record 24 is an example of a larger issue originating from the source material directly from the AG's website. I've discovered an issue while reviewing statistics. Some articles have content from entirely different articles within them. Such as, article 24 also having the contents from article 10 within it. There are multiple records with this issue.
23 1556 t_lsp_articles_v2 Article was missing troop category. Troop L.
24 1757 t_lsp_articles_v2 Article was missing troop category. Troop L.
25 1763 t_lsp_articles_v2 Article was missing troop category. Troop NOLA.
26 1799 t_lsp_articles_v2 Article was missing troop category. Troop B.
27 1891 t_lsp_articles_v2 Article was missing title header within article. Troop E.
28 327 t_ag_articles_v2 An attachment for this article is an empty file and has been saved as such. Filename: AZCDHO6KJJGM3BQMTZX4NV3VM4.avif.
29 1923 t_lsp_articles_v2 Article was missing troop category. Troop E.